This action might not be possible to undo. Are you sure you want to continue?
)
LECTURE NOTES
Introduction to Probability Theory and
Stochastic Processes (STATS)
Helmut Strasser
Department of Statistics and Mathematics
Vienna University of Economics and Business
Administration
Helmut.Strasser@wuwien.ac.at
January 25, 2006
Copyright c ( 2005 by Helmut Strasser
All rights reserved. No part of this text may be reproduced, stored in a retrieval sys
tem, or transmitted, in any form or by any means, electronic, mechanical, photocoping,
recording, or otherwise, without prior written permission of the author.
2
Contents
Preliminaries i
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
0.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
0.3 Nature of these notes . . . . . . . . . . . . . . . . . . . . . . . . . . iii
0.4 Time table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
I Measure theory 1
1 Measure and probability 3
1.1 Fields and contents . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Contents on the real line . . . . . . . . . . . . . . . . . . . . . . . . 4
Contents on R
d
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Finite ﬁelds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Sigmaﬁelds and measures . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 The extension theorem . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Measurable functions and random variables 13
2.1 The idea of measurability . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The basic abstract assertions . . . . . . . . . . . . . . . . . . . . . . 14
2.3 The structure of realvalued measurable functions . . . . . . . . . . . 15
2.4 Probability models . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Integral and expectation 19
3.1 The integral of simple functions . . . . . . . . . . . . . . . . . . . . 19
3.2 The extension to nonnegative functions . . . . . . . . . . . . . . . . 20
3.3 Integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Selected topics 27
4.1 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 27
4.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Iterated integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3
4 CONTENTS
II Probability theory 33
5 Measure theoretic language of probability 35
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 The information set of random variables . . . . . . . . . . . . . . . . 35
5.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Conditional expectation 39
6.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Projection properties . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7 Stochastic sequences 43
7.1 The ruin problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
One player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Two players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.2 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Filtrations and stopping times . . . . . . . . . . . . . . . . . . . . . . 46
Wald’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8 The Wiener process 53
8.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.2 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.4 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
First passage times of the Wiener process . . . . . . . . . . . . . . . 59
The reﬂection principle . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.5 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.6 More on stopping times . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.7 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9 The ﬁnancial market picture 73
9.1 Assets and trading strategies . . . . . . . . . . . . . . . . . . . . . . 73
9.2 Financial markets and arbitrage . . . . . . . . . . . . . . . . . . . . . 75
9.3 Martingale measures . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.4 Change of numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . 77
CONTENTS 5
10 Stochastic calculus 79
10.1 Elementary Integration . . . . . . . . . . . . . . . . . . . . . . . . . 79
Bounded variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
The CauchyStieltjes integral . . . . . . . . . . . . . . . . . . . . . . 80
Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2 The stochastic integral . . . . . . . . . . . . . . . . . . . . . . . . . 83
The integral of stepfunctions . . . . . . . . . . . . . . . . . . . . . . 83
Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Extending the stochastic integral . . . . . . . . . . . . . . . . . . . . 86
Path properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.3 Calculus for the stochastic integral . . . . . . . . . . . . . . . . . . . 89
The associativity rule . . . . . . . . . . . . . . . . . . . . . . . . . . 90
The integrationbyparts formula . . . . . . . . . . . . . . . . . . . . 90
Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11 Applications to ﬁnancial markets 95
11.1 Selfﬁnancing trading strategies . . . . . . . . . . . . . . . . . . . . 95
11.2 Markovian wealth processes . . . . . . . . . . . . . . . . . . . . . . 96
11.3 The BlackScholes market model . . . . . . . . . . . . . . . . . . . . 97
The BlackScholes equation . . . . . . . . . . . . . . . . . . . . . . 98
The market price of risk . . . . . . . . . . . . . . . . . . . . . . . . . 99
12 Stochastic differential equations 101
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . 102
12.3 Wiener driven models . . . . . . . . . . . . . . . . . . . . . . . . . . 104
13 Martingales and stochastic calculus 107
13.1 Martingale properties of the stochastic integral . . . . . . . . . . . . . 107
Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
13.2 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . 110
13.3 Levy’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
13.4 Exponential martingale and Girsanov’s theorem . . . . . . . . . . . . 112
14 Pricing of claims 113
III Appendix 115
15 Foundations of modern analysis 117
15.1 Sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
15.2 Sequences of real numbers . . . . . . . . . . . . . . . . . . . . . . . 118
6 CONTENTS
15.3 Realvalued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.4 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.5 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Preliminaries
0.1 Introduction
The goal of this course is to give an introduction into some mathematical concepts and
tools which are indispensable for understanding the modern mathematical theory of
ﬁnance. Let us give an overview of historic origins of some of the mathematical tools.
The central topic will be those probabilistic concepts and results which play an
important role in mathematical ﬁnance. Therefore we have to deal with mathematical
probability theory. Mathematical probability theory is formulated in a language that
comes from measure theory and integration. This language differs considerably from
the language of classical analysis, known under the label of calculus. Therefore, our
ﬁrst step will be to get an impression of basic measure theory and integration.
We will not go into the advanced problems of measure theory where this theory be
comes exciting. Such topics would be closely related to advanced set theory and topol
ogy which also differ basically from set theoretic language and topologically driven
slang which is convenient for talking about mathematics but nothing more. Similarly,
our usage of measure theory and integration is sort of a convenient language which
on this level is of little interest in itself. For us its worth arises with its power to give
insight into exciting applications like probability and mathematical ﬁnance.
Therefore, our presentation of measure theory and integration will be an overview
rather than a specialized training program. We will become more and more familiar
with the language and its typical kind of reasoning as we go into those applications
for which we are highly motivated. These will be probability theory and stochastic
calculus.
In the ﬁeld of probability theory we are interested in probability models having a
dynamic structure, i.e. a time evolution governed by endogeneous correlation proper
ties. Such probability models are called stochastic processes.
Probability theory is a young theory compared with the classical cornerstones of
mathematics. It is illuminating to have a look at the evolution of some fundamental
ideas of deﬁning a dynamic structure of stochastic processes.
One important line of thought is looking at stationarity. Models which are them
selves stationary or are cumulatives of stationary models have determined the econo
metric literature for decades. For Gaussian models one need not distinguish between
strict and weak (covariance) stationarity. As for weak stationarity it turns out that typi
i
ii PRELIMINARIES
cal processes follow difference or differential equations driven by some noise process.
The concept of a noise process is motivated by the idea that it does not transport any
information.
From the beginning of serious investigation of stochastic processes (about 1900)
another idea was leading in the scientiﬁc literature, i.e. the Markov property. This
is not the place to go into details of the overwhelming progress in Markov chains
and processes achieved in the ﬁrst half of the 20th century. However, for a long time
this theory failed to describe the dynamic behaviour of continuous time Markov pro
cesses in terms of equations between single states at different times. Such equations
have been the common tools for deterministic dynamics (ordinary difference and dif
ferential equations) and for discrete time stationary stochastic sequences. In contrast,
continuous time Markov processes were deﬁned in terms of the dynamic behaviour of
their distributions rather than of their states, using partial difference and differential
equations.
The situation changed dramatically about the middle of the 20th century. There
were two ingenious concepts at the beginning of this disruption. The ﬁrst is the con
cept of a martingale introduced by Doob. The martingale turned out to be the ﬁnal
mathematical ﬁxation of the idea of noise. The notion of a martingale is located be
tween a process with uncorrelated increments and a process with independent incre
ments, both of which were the competing noise concepts up to that time. The second
concept is that of a stochastic integral due to K. Ito. This notion makes it possible to
apply differential reasoning to stochastic dynamics.
At the beginning of the stochastic part of this lecture we will present an introduc
tion to the ideas of martingales and stopping times at hand of stochastic sequences
(discrete time processes). However, the main subject of the second half of the lecture
will be continuous time processes with a strong focus on the Wiener process. However,
the notions of martingales, semimartingales and stochastic integrals are introduced in
a way which lays the foundation for the study of more general process theory. The
choice of examples is governed by be the needs of ﬁnancial applications (covering the
notion of gambling, of course).
0.2 Literature
Let us give some comments to the bibliography.
The popular monograph by Bauer, [1], has been for a long time the standard text
book in Germany on measure theoretic probability. However, probability theory has
many different faces. The book by Shiryaev, [21], is much closer to those modern
concepts we are heading to. Both texts are mathematically oriented, i.e. they aim at
giving complete and general proofs of fundamental facts, preferable in abstract terms.
A modern introduction into probability models containing plenty of fascinating phe
nomena is given by Bremaud, [6] and [7]. The older monograph by Bremaud, [5], is
not located at the focus of this lecture but contains as appendix an excellent primer on
0.3. NATURE OF THESE NOTES iii
probability theory.
Our topic in stochastic processes will be the Wiener process and the stochastic
analysis of Wiener driven systems. A standard monograph on this subject is Karatzas
and Shreve, [15]. The Wiener systems part of the probability primer by Bremaud
gives a very compact overview of the main facts. Today, Wiener driven systems are
a very special framework for modelling ﬁnancial markets. In the meanwhile, general
stochastic analysis is in a more or less ﬁnal state, called semimartingale theory. Present
and future research applies this theory in order to get a much more ﬂexible modelling
of ﬁnancial markets. Our introduction to semimartingale theory follows the outline by
Protter, [20] (see also [19]).
Let us mention some basic literature on mathematical ﬁnance.
There is a standard source by Hull, [11]. Although this book heavily tries to present
itself as not demanding, nevertheless the contrary is true. The reason is that the com
bination of ﬁnancial intuition and the appearently informal utilization of advanced
mathematical tools requires on the reader’s side a lot of mathematical knowledge in
order to catch the intrinsics. Paul Wilmott, [22] and [23], tries to cover all topics in
ﬁnancial mathematics together with the corresponding intuition, and to make the an
alytical framework a bit more explicit and detailed than Hull does. I consider these
books by Hull and Wilmott as a must for any beginner in mathematical ﬁnance.
The books by Hull and Wilmott do not pretend to talk about mathematics. Let us
mention some references which have a similar goal as this lecture, i.e. to present the
mathematical theory of stochastic analysis aiming at applications in ﬁnance.
A very popular book which may serve as a bridge from mathematical probability
to ﬁnancial mathematics is by Björk, [4]. Another book, giving an introduction both
to the mathematical theory and ﬁnancial mathematics, is by Hunt and Kennedy, [12].
Standard monographs on mathematical ﬁnance which could be considered as cor
nerstones marking the state of the art at the time of their publication are Karatzas and
Shreve, [16], Musiela and Rutkowski, [17], and Bielecki and Rutkowski, [3]. The
present lecture should lay some foundations for reading books of that type.
0.3 Nature of these notes
These lecture notes are not intended to be a selfcontained text to be used for self study.
The notes are rather an outline of the main concepts and facts.
The classroom lecture will be a selection of the notes but in parts present more
explanantion and motivation. Diagrams will be drawn on the blackboard and are not
copied to the notes. Many facts are formulated in the notes as exercises with or without
hints. Some of the exercises will be solved during the lecture, some are home exercises
or classroom exercises to be presented by students.
The style of the text is very formal. Together with informal explanations during the
lecture the text should train the students for studying more advanced literature. In order
to meet the different skills of the audience (applied, theoretic, formal or informal) the
iv PRELIMINARIES
exercises are classiﬁed with respect to difﬁculty and required mathematical skills.
The written exams will be open book exams, meaning that the lecture notes without
any additional comments may be used during the exam. The problems of the exams
will be exercises of the notes or additional exercises posed in the classroom. The
solutions should include extensive references of the concepts used in the solution to
the notions contained in the notes.
The ﬁnal collection of those problems where the exams are sampled from will be
ﬁxed during the lecture.
The notes are not yet ﬁnished. There are some chapters and sections missing (”un
der construction”) and a lot of review questions have still to be formulated to be used
for the exams. Filling of the gaps will be performed in the light of student reactions
and feedback during the classes.
0.4 Time table
The following is concerning with the course from January to march 2006.
Week Unit Subject
2 1 measure and probability
2 measure and probability
3 measurable functions
3 4 exercises
5 integrals and expectation
6 integrals and expectation
7 integrals and expectation
4 8 exercises
9 conditional expectation
10 stochastic sequences (gambling)
11 stochastic sequences (martingales)
5 12 exercises
13 Wiener process
14 Wiener process (ﬁrst passage times)
15 Wiener process (stopping times)
6 skiing, midterm test
7 16 exercises
17 the ﬁnancial market picture (discrete time trading)
18 stochastic calculus (Stieltjes integrals, differential notation)
19 stochastic calculus (semimartingales, stochastic integrals)
8 20 exercises
21 stochastic calculus (Ito calculus)
22 applications to ﬁnancial markets (Black Scholes model)
0.4. TIME TABLE v
9 23 exercises
24 linear stochastic differential equations
25 martingales and stochastic integrals
10 26 martingales (Levy’s theorem, martingale representation)
27 martingales (Girsanov theorem)
28 exercises
vi PRELIMINARIES
Part I
Measure theory
1
Chapter 1
Measure and probability
1.1 Fields and contents
We start with the notion of a ﬁeld. Roughly speaking, a ﬁeld is a system of sub
sets where the basic set operations (union, intersection, complementation) can be per
formed without leaving the system.
1.1 Deﬁnition. Let Ω = ∅ be a set. A ﬁeld on Ω is a system / of subsets A ⊆ Ω
satisfying the following conditions:
(1) Ω ∈ /, ∅ ∈ /
(2) If A
1
, A
2
∈ / then A
1
∪ A
2
∈ / and A
1
∩ A
2
∈ /
(3) If A ∈ / then A
c
∈ /
1.2 Problem. (easy)
Discuss minimal sets of conditions such that a system is a ﬁeld.
The second basic notion is that of a content. A content is an additive set function
on a ﬁeld.
1.3 Deﬁnition. A content is a set function µ deﬁned on a ﬁeld / such that
(1) µ(A) ∈ [0, ∞] whenever A ∈ /
(2) µ(∅) = 0
(3) µ(A
1
∪ A
2
) = µ(A
1
) +µ(A
2
) whenever A
1
, A
2
∈ / and A
1
∩ A
2
= ∅
1.4 Problem. (easy)
Let µ[/ be a content. Then A
1
⊆ A
2
implies µ(A
1
) ≤ µ(A
2
).
1.5 Problem. (intermediate)
(a) Show that every content satisﬁes the inclusionexclusion law:
µ(A
1
) +µ(A
2
) = µ(A
1
∪ A
2
) +µ(A
1
∩ A
2
)
3
4 CHAPTER 1. MEASURE AND PROBABILITY
(b) The preceding problem gives a formula for µ(A
1
∪ A
2
) provided that all sets
have ﬁnite content. Extend this forumla to the union of three sets.
1.6 Deﬁnition. Let / be a ﬁeld and let µ[/ be a content. The content µ is called
σadditive if
µ
¸
i∈N
A
i
=
¸
i∈N
µ(A
i
)
for every pairwise disjoint sequence (A
i
)
k∈N
⊆ / such that
¸
i∈N
A
i
∈ /.
If a content is σadditive then the content has several continuity properties which
facilitate caculations.
1.7 Lemma. Let µ[/ be a content on a ﬁeld. Consider the following properties:
(a) µ[/ is σadditive.
(b) For every sequence (A
i
)
i∈N
⊆ / such the A
i
↑ A ∈ / we have µ(A
i
) ↑ µ(A).
(c) For every sequence (A
i
)
i∈N
⊆ / such the A
i
↓ A ∈ / with µ(A
1
) < ∞ we
have µ(A
i
) ↓ µ(A).
(d) For every sequence (A
i
)
i∈N
⊆ / such the A
i
↓ ∅ with µ(A
1
) < ∞ we have
µ(A
i
) ↓ 0.
Then: (a) ⇔ (b) ⇒ (c) ⇔ (d)
If µ(Ω) < ∞then all assertions are equivalent.
Proof: See Bauer, [1]. 2
Further reading: Shiryaev [21], chapter II, paragraph 1.
As we shall see later when we are dealing with measures it is very easy to construct
contents. But it is not easy to construct contents with given properties, e.g. with special
geometric properties. The next paragraphs deal with the most important examples of
contents which later will be extended to those measures that are most common in
applications.
Contents on the real line
Let Ω = (−∞, ∞] and let 1 be the system of subsets arising as unions of ﬁnitely
many intervals of the form (a, b] where −∞ ≤ a < b ≤ ∞(leftopen and rightclosed
intervals).
1.8 Problem. (intermediate)
Explain why 1is a ﬁeld. (Include ∅ as the union of nothing).
1.9 Problem. (advanced)
Show that each element B ∈ 1can be written as a union of disjoint intervals
B =
n
¸
i=1
(a
i
, b
i
] (1)
1.1. FIELDS AND CONTENTS 5
where −∞ ≤ a
1
< b
1
≤ a
2
< b
2
≤ a
3
< . . . < b
n−1
≤ a
n
< b
n
≤ ∞.
Hint: Let H be the system of disjoint unions of intervals. First, show that H is closed
under intersections. Be careful when applying the distributive law. Second, that any
ﬁnite union of intervals can be written as
n
¸
i=1
I
i
= I
1
∪ (I
2
` I
1
) ∪ (I
3
` (I
1
∪ I
2
))
where
I
k
` (I
1
∪ ∪ I
k−1
)
is in H. For the latter apply the ﬁrst part of the proof.
Let α : R → R be an increasing function. Deﬁne α(−∞) = inf α and α(∞) =
sup α. It will be shown in the following exercises that
λ
α
((a, b]) := α(b) −α(a) (2)
determines a content on 1. (Note that in probablity theory this is the usual way to
deﬁne probability distributions by distribution functions !)
1.10 Problem. (advanced)
(a) Show that any (hypothetical) content λ
α
satisfying (2) necessarily satisﬁes
A =
n
¸
i=1
I
i
, where (I
i
) are pw. dj. intervals ⇒ λ
α
(A) =
n
¸
i=1
λ
α
(I
i
) (3)
(b) Show that using (3) as a deﬁnition is unambiguous.
(c) Show that (3) deﬁnes a content on 1which is ﬁnite on bounded sets.
1.11 Deﬁnition. The content λ
α
[1is called a LebesgueStieltjes content.
1.12 Example. Lebesgue content
Let α(x) = x. Then λ
α
((a, b]) = b−a is the length of the interval (a, b]. Therefore,
in this special case the content λ
α
is simply the geometric volume function. It is called
the Lebesgue content and is denoted by λ.
We have seen that any increasing function α : R → R deﬁnes a content λ
α
[1.
In order to extend such contents to greater families of sets we have to check whether
λ
α
[1is σadditive.
1.13 Lemma. The content λ
α
[1is σadditive iff α is right continuous.
Proof: Assume that λ
α
[1is σadditive. Then from
(a, b] =
¸
n∈N
(a, b + 1/n]
6 CHAPTER 1. MEASURE AND PROBABILITY
it follows
α(b) −α(a) = λ
α
((a, b]) = lim
n→∞
λ
α
((a, b + 1/n]) = lim
n→∞
α(b + 1/n) −α(a).
This means that α is right continuous at b.
This was the easy part. The proof of the converse is a bit more tricky. We show the
converse for bounded α only (λ
α
(Ω) < ∞). Let us prove that 1.7(d) is satisﬁed.
Let (A
n
) ⊆ 1 such that A
n
↓ ∅. Choose > 0. For every A
n
we may ﬁnd a
compact set K
n
and a set B
n
∈ 1 such that B
n
⊆ K
n
⊆ A
n
and λ
α
(A
n
` B
n
) < .
(At this point rightcontinuity goes in !) Since A
n
↓ ∅ it follows that K
n
↓ ∅. Since
the sets K
n
are compact there is some N such that K
N
= ∅, hence B
N
= ∅. (This is
the socalled ﬁnite intersectionproperty of compect sets). It follows that λ
α
(A
N
) < .
Since is arbitrarily small we have λ
α
(
¸
∞
n=1
A
n
) = 0. This proves 1.7(d) and the
assertion for ﬁnite contents. 2
Contents on R
d
The following is a summary of facts. Proofs are similar but sometimes a bit more
complicated than in the onedimensional case.
Denote R := (−∞, ∞] and let O
d
be the collection of all subsets Q ⊆ R
d
of the
form
Q =
d
¸
i=1
(a
i
, b
i
], −∞ ≤ a
i
< b
i
≤ ∞,
(socalled leftopen rightclosed parallelotops). Denote by 1
d
the set of all ﬁnite
unions of sets in O
d
. The sets in 1
d
are called ﬁgures.
1.14 Theorem. (a) The set 1
d
is a ﬁeld.
(b) Each set Q ∈ 1
d
is a union of pairwise disjoint sets of O
d
.
For the proof see Bauer [1].
In order to deﬁne a content on 1
d
we ﬁrst have to deﬁne the content on O
d
and then
try to extend it to 1
d
. This was exactly the procedure that we performed for d = 1.
For d > 1 it is natural to consider the geometric volume
λ
d
d
¸
i=1
(a
i
, b
i
]
:=
d
¸
i=1
(b
i
−a
i
) (4)
This can actually be extended to 1
d
resulting in a content called the Lebesgue content.
1.15 Theorem. There is a uniquely determined content λ
d
on 1
d
such that (4) is
satisﬁed. The content λ
d
is σadditive.
For a proof see Bauer [1].
1.2. SIGMAFIELDS AND MEASURES 7
Finite ﬁelds
Since many probabilistic applications are concerned with ﬁnite ﬁelds it is illuminating
to discuss the structure of ﬁnite ﬁelds in more detail. Let us collect the main facts in
terms of exercises.
1.16 Problem. (easy)
Let ( = (C
1
, C
2
, . . . , C
m
) be a ﬁnite partition of Ω. Show that
1 :=
¸
i∈α
C
i
: α ⊆ (1, . . . , m)
¸
is a ﬁeld on Ω and that it is the smallest ﬁeld containing (.
In the situation of 1.16 we say that the partition ( generates the ﬁeld 1.
1.17 Problem. (intermediate)
Show that every ﬁnite ﬁeld is generated by a partition.
Hint: A set A ∈ 1is called an atom if
A = ∅, and ∅ = B ⊆ A, B ∈ 1 ⇒ B = A
Show that the collection ( of all atoms of 1 is a partition generating 1. (Show that
for x ∈ Ω the set A
x
:=
¸
¦A ∈ 1 : x ∈ A¦ is the unique atom containing x.)
1.18 Problem. (easy)
Let 1 be a ﬁnite ﬁeld and let ( = ¦C
1
, C
2
, . . . , C
m
¦ be the generating partition.
Show that for every choice of numbers a
i
≥ 0 there exists exactly one content µ[1
sucht that µ(C
i
) = a
i
.
The preceding assertions are the basis of the elementary theory of probability. The
socalled Laplacian deﬁnition of a probability content results in the uniform content,
i.e. µ(C
i
) = 1/m.
Further reading: Shiryaev [21], chapter I.
1.19 Review questions. Explain the structure and generation of ﬁnite ﬁelds. How
to deﬁne contents on ﬁnite ﬁelds ?
1.2 Sigmaﬁelds and measures
1.20 Deﬁnition. A ﬁeld T on Ω is a σﬁeld if
(F
i
)
i∈N
⊆ T ⇒
¸
i∈N
A
i
∈ T
8 CHAPTER 1. MEASURE AND PROBABILITY
A pair (Ω, T) where T is a σﬁeld on Ω is called a measurable space.
1.21 Problem. (intermediate)
(a) A ﬁeld T is a σﬁeld iff
(F
i
)
i∈N
⊆ T ⇒
¸
i∈N
A
i
∈ T
(b) A ﬁeld T is a σﬁeld iff the union of every increasing (decreasing) sequence of
sets in T is in T, too.
(c) A ﬁeld T is a σﬁeld iff the union of every pairwise disjoint sequence of sets in
T is in T, too.
1.22 Deﬁnition. A σadditive content which is deﬁned on a σﬁeld is called a
measure.
A measure (resp. content) is called ﬁnite if µ(Ω) < ∞. A measure P (resp.
content) is called a probability measure (resp. content) if P(Ω) = 1. If µ[T is a
measure then (Ω, T, µ) is a measure space. If P[T is a probability measure then
(Ω, T, P) is called a probability space.
As for the existence of measures things are easy with ﬁnite ﬁelds. Actually any
ﬁnite ﬁeld is a σﬁeld and any content on a ﬁnite ﬁeld is σadditive (in a trivial sense)
and is therefore a measure. Therefore the concept of a measure differs fromthe concept
of a content only on inﬁnite σﬁelds.
In the following we perform some warming up by discussing some very simple
examples of measures.
1.23 Problem. (easy)
Let (Ω, T) be any measurable space. Let x ∈ Ω some point and keep it ﬁxed. For
every A ∈ T deﬁne
δ
x
(A) =
1 whenever x ∈ A
0 whenever x ∈ A
Show that δ
x
: A → δ
x
(A) is a measure (the onepoint measure at the point x).
(Note that the case T = 2
Ω
is covered by this deﬁnition.)
1.24 Problem. (easy)
(a) Show that every ﬁnite linear combination of measures with nonnegative coefﬁ
cients is a measure.
(b) Show that every countable linear combination of measures with nonnegative
coefﬁcients is a measure.
Any linear combination of point ist called a discrete measure.
1.2. SIGMAFIELDS AND MEASURES 9
1.25 Problem. (easy)
Describe the values of ﬁnite linear combinations of onepoint measures:
Let a
1
, a
2
, . . . , a
m
be any pairwise different points in Ω and keep them ﬁxed. Let
p
1
, p
2
, . . . , p
m
be any nonnegative numbers. Then
µ :=
n
¸
i=1
p
i
δ
a
i
⇒ µ(A) =
¸
i:a
i
∈A
p
i
, A ⊆ Ω.
1.26 Problem. (easy)
(a) Write the binomial distribution as a linear combination of point measures.
(b) Write the geometric distribution as a linear combination of point measures.
1.27 Problem. (easy)
Let x = (x
1
, x
2
, . . . , x
n
) be any ﬁnite sequence of elements in Ω (e.g. an empirical
sample). Let ¦a
1
, a
2
, . . . , a
n
¦ be the set of different components of x and denote by f
j
the relative frequency of a
j
in x.
Show that the empirical measure satisﬁes
1
n
n
¸
i=1
δ
x
i
=
m
¸
j=1
f
j
δ
a
j
(This is the ”frequency table” of an empirical distribution.)
When we are dealing with point measures or linear combinations of point measures
we need not worry about σﬁelds since such measures are welldeﬁned on T = 2
Ω
.
In general, however, it is not possible to deﬁne measures with given properties on
T = 2
Ω
. We have to be more modest and to be satisﬁed if we ﬁnd measures that are
deﬁned on σﬁelds containing at least reasonable sets indispensable for applications.
Usually it is not very difﬁcult to ﬁnd a ﬁeld which contains sufﬁciently many rea
sonable sets, e.g. the ﬁeld of ﬁgures in R
d
containing all rectangles. But how to
proceed from ﬁelds to σﬁelds ?
Let / be a ﬁeld which is not a σﬁeld. We would like to enlarge / in such a way
that the result is a σﬁeld. The following questions arise:
(1) Are there any σﬁelds T containg / ?
(2) If yes, is there a smallest σﬁeld containing / ?
The answer to both questions is yes.
1.28 Problem. (advanced)
(a) The system 2
Ω
(system of all subsets of Ω is a σﬁeld.
(b) The intersection of any family of σﬁelds is a σﬁeld.
10 CHAPTER 1. MEASURE AND PROBABILITY
(c) Let ( be any system of subsets on Ω and denote by σ(() the intersection of all
σﬁelds containing (:
σ(() =
¸
C⊆F
T
Then σ(() is the smallest σﬁeld containing (:
( ⊆ T, T is a σﬁeld ⇒ ( ⊆ σ(() ⊆ T
1.29 Deﬁnition. For any system( of sets in Ω the smallest σﬁeld T that contains
( is called the σﬁeld generated by ( and is denoted by T = σ((). The system ( is
called a generator of T.
It turns out that the situation is simple as long as Ω is a countable set.
1.30 Problem. (easy)
Show that the σﬁeld on N which is generated by the onepoint sets of N is
T = 2
N
.
The preceding exercise shows: If we want to have all onepoint sets in the σﬁled
then for countable Ω every subset of Ω has to be in the σ ﬁeld.
1.31 Problem. (intermediate)
(a) Let Ω = N and T = 2
N
. Deﬁne µ(A) := [A[, A ⊆ N (the counting measure).
Show that (N, 2
N
, µ) is a measure space.
(b) Show that for every sequence of numbers a
n
≥ 0 there is exactly one measure
µ[T such that µ(¦n¦) := a
n
.
(c) Discuss how to deﬁne probability measures on (N, 2
N
).
The following exercise shows that for Ω = R the system of onepoint sets is not
sufﬁcient to generate a reasonable σﬁeld.
1.32 Problem. (intermediate for mathematicians)
What is the σﬁeld on R that is generated by the onepoint sets of R ?
Answer: The system of sets which are either countable or the complement of a count
able set.
The preceding exercise shows that onepoint sets do not generate a σﬁeld on R
which contains intervals ! Therefore we have to include intervals in our generating
system. The starting point is the algebra 1of ﬁgures.
1.33 Deﬁnition. The σﬁeld on R (R
d
) which is generated by the algebra 1(1
d
)
is called the Borel σﬁeld and is denoted by B = B(R) (B(R
d
)).
1.3. THE EXTENSION THEOREM 11
The sets in the Borel σﬁeld are called Borel sets.
1.34 Problem. (easy)
(a) Show that B contains all intervals (including onepoint sets).
(b) Is Q a Borel set ?
1.35 Problem. (easy for mathematicians) Show that B(R
d
) contains all open sets
and all closed sets.
1.36 Review questions. What is a ﬁeld and what is a σﬁeld ? What is the
difference between a content and a measure.
1.37 Review questions. Explain the ideas of generating a σﬁeld by a system of
sets. Explain how the measurable spaces (N, 2
N
), (R, B(R)), (R
d
, B(R
d
)) are gener
ated.
1.3 The extension theorem
The fundamental problem of measure theory is the extension problem. The extension
problem deals with the question whether a given content on a ﬁeld / can be extended
to a measure on the σﬁeld σ(/).
It is clear that for the existence of an extension the content must be σadditive. This
is a necessary condition. For ﬁnite contents it is even sufﬁcient.
1.38 Theorem. Every ﬁnite σadditive content µ[/ deﬁned on a ﬁeld has a
uniquely determined measure extension to T = σ(/).
Proof: (Outline. Further reading: Bauer [1])
Let us indicate some ideas of the proof. W.l.g. we assume that µ(Ω) = 1.
The ﬁrst step of the proof is to try an extension of the content to all subsets M ⊆ Ω.
This is done by
µ
∗
(M) = inf
¸
i∈N
µ(A
i
) : (A
i
)
i∈N
⊆ /, M ⊆
¸
i∈N
A
i
¸
Unfortunately, this deﬁnition does not result in a measure, even not in a content. The
set function µ
∗
is a socalled outer measure. It is clear that µ
∗
[/ = µ[/.
Next, deﬁne
´ := ¦M ⊆ Ω : µ
∗
(M) +µ
∗
(M
c
) = 1¦
It is clear that / ⊆ ´. It turns out that ´ is a σﬁeld and therefore σ(/) ⊆ ´.
Moreover, it is shown that the restriction µ
∗
[´is a measure. Therefore it is a measure
extension of µ[/ to some σﬁeld containing /, thus at least a measure extension to
σ(/).
12 CHAPTER 1. MEASURE AND PROBABILITY
The uniqueness of the extension is shown in the following way. Let µ
1
[σ(/) and
µ
2
[σ(/) be two extensions of µ[/. Let
´
1
:= ¦M ∈ σ(/) : µ
1
(M) = µ
2
(M)¦
By assumption we have / ⊆ ´
1
. It can be shown that ´
1
is a σﬁeld. Then it
follows that σ(/) ⊆ ´
1
. 2
1.39 Remarks.
The following remarks can be understood (”proved”) with the information pro
vided by the outline of the proof of the measure extension theorem. Assume that
µ(Ω) < ∞.
(1) Although the sets in σ(/) or ´ can be rather complicated they don’t differ
very much from sets in /: For every M ∈ ´and every (arbitrarily small) > 0 there
is a set A ∈ / such that µ(M ` A) < and µ(A ` M) < .
(2) The proof of the measure extension theorem results in an extension to ´which
actually is a larger σﬁeld than σ(/). However, from (1) it follows that for every
M ∈ ´there is some A ∈ σ(/) such that µ(M ` A) = 0 and µ(A ` M) = 0.
1.40 Problem. (advanced for mathematicians)
(1) Prove assertion (1) of Remark 1.39.
(2) Prove assertion (2) of Remark 1.39.
What about the extension of nonﬁnite contents ?
1.41 Deﬁnition. A content µ[/ is called σﬁnite if there is a sequence (A
i
)
i∈N
⊆
/ such that
¸
i∈N
A
i
= Ω and µ(A
i
) < ∞for every i ∈ N.
1.42 Theorem. Every σﬁnite σadditive content µ[/ deﬁned on a ﬁeld has a
uniquely determined measure extension to T = σ(/).
The proof is similar, but a bit more complicated than in the ﬁnite case.
We may apply the measure extension theorem since every λ
α
[1 is obviously σ
ﬁnite.
1.43 Corollary. For every increasing and right continuous function α : R → R
there is a uniquely determined measure λ
α
[B such that λ
α
((a, b]) = α(b) −α(a).
For α(x) = x the measure λ
α
= λ is called the Lebesgue measure.
It should be noted that there are subsets of R (resp. R
d
) that are not Borel sets.
However, the construction of such sets can be very complicated.
1.44 Review questions. State the measure extension theorem. Show how to apply
this theorem for deﬁning Borel measures on R.
Chapter 2
Measurable functions and random
variables
2.1 The idea of measurability
Let (Ω, /, µ) be a measure space and let (Y, B) be a measurable space. Moreover,
let f : Ω → Y be a function. We are going to consider the problem of mapping the
measure µ to the set Y be means of the function f.
The following example serves as a ﬁrst motivation. More details concerning mea
sure theoretic probability concepts are given in section 2.4.
2.1 Example. Distribution of a random variable
The concept of the distribution of a random variable an important special case of
mapping a measure from one set to another.
Let X be a random variable. This is a function from a probability space (Ω, /, P)
to R. Since the probability space is a rather abstract object it is convenient to put the
essentials of the random variable X into analytically tractable terms. Usually we are
only interested in the probabilities P(X ∈ B) the collection of which is the distribution
P
X
, i.e. P
X
(B) = P(X ∈ B), B ∈ B. The distribution is a set function on (R, B)
and it is deﬁned by mapping the probability measure P to (R, B) via the function X.
However, for deﬁning P
X
(B) it is essential that the expression P(X ∈ B) makes
sense. This the case iff the inverse image (X ∈ B) = X
−1
(B) is in /. Therefore a
random variable cannot be an arbitrary function X : Ω → R but must satisfy (X ∈
B) ∈ / for all B ∈ B. This property is called measurability.
2.2 Deﬁnition. A function f : (Ω, /) → (Y, B) is called (/, B)measurable if
f
−1
(B) ∈ / for all B ∈ B.
If f : (Ω, /, µ) → (Y, B) is (/, B)measurable then we may deﬁne
µ
f
(B) := µ(f ∈ B) = µ(f
−1
(B)), B ∈ B.
This is the image of µ under f or the distribution of f under µ.
13
14 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES
2.3 Problem. (easy)
Show that µ
f
is indeed a measure on B.
Let us agree upon some terminology.
(1) When we consider realvalued functions then we always use the Borelσﬁeld
in the range of f.
E.g.: If f : (Ω, T) → (R, B) then we simply say that f is Tmeasurable if we mean
that it is (T, B)measurable.
(2) When we consider functions f : R → R then (B, B)measurability is called
Borel measurability. The term ”Borel” is thus concerned with the σﬁeld in the domain
of f.
To get an idea what measurability means let us consider some simple examples.
2.4 Problem. (easy)
Let Ω, T, µ) be a measure space and let f = 1
A
where A ⊆ Ω.
(a) Show that f is Tmeasurable iff A ∈ T.
(b) Find µ
f
.
It follows that very complicated functions are Borelmeasurable, e.g. f = 1
Q
.
Recall that a simple function is a realvalued function which has only ﬁnitely many
values. Any simple function f can be written as
f =
n
¸
i=1
a
i
1
F
i
if F
i
= (f = a
i
) where ¦a
1
, a
2
, . . . , a
n
¦ denotes the set of different function values of
f. This is the canonical representation of f.
Any linear combination of indicator functions is simple but need not be canonical.
It is canonical iff both the sets supporting the indicators are pairwise disjoint and the
coefﬁcients are pairwise different. There is exactly one canonical representation.
2.5 Problem. (easy)
Let Ω, T, µ) be a measure space and let f : Ω →R be a simple function.
(a) Show that f is Tmeasurable iff all sets of the canonical representation are
in T.
(b) Find µ
f
.
2.2 The basic abstract assertions
There are two fundamental principles for dealing with measurability. The ﬁrst proici
ple says that measurability is a property which is preserved under composition of func
tions.
2.3. THE STRUCTURE OF REALVALUED MEASURABLE FUNCTIONS 15
2.6 Theorem. Let f : (Ω, /) → (Y, B) be (/, B)measurable, and let g :
(Y, B) → (Z, () be (B, ()measurable. Then g ◦ f is (/, ()measurable.
2.7 Problem. (easy) Prove 2.6.
The second principle is concerned with checking measurability. For checking mea
surability it is sufﬁcient to consider the sets in a generating system of the range σﬁeld.
2.8 Theorem. Let f : (Ω, /) → (Y, B) and let ( be a generating system of B, i.e.
B = σ((). Then f is (/, B)measurable iff f
−1
(C) ∈ / for all C ∈ (.
Proof: Let T := ¦D ⊆ Y : f
−1
(D) ∈ /¦. It can be shown that T is a σﬁeld. If
f
−1
(C) ∈ / for all C ∈ ( then ( ⊆ T. This implies σ(() ⊆ T. 2
2.9 Problem. (intermediate) Fill in the details of the proof of 2.8.
2.10 Review questions. Explain the abstract concept of a measurable function.
State the basic abstract properties of measurable functions.
2.3 The structure of realvalued measurable functions
Let (Ω, T) be a measurable space. Let L(T) be the set of all Tmeasurable realvalued
functions. We start with the most common and most simple criterion for checking
measurability of a realvalued function.
2.11 Problem. (intermediate)
Show that a function f : Ω → R is Tmeasurable iff (f ≤ α) ∈ T for every
α ∈ R.
Hint: Apply 2.8.
This provides us with a lot of examples of Borelmeasurable functions.
2.12 Problem. (easy)
(a) Show that every monotone function f : R →R is Borelmeasurable.
(b) Show that every continuous function f : R
n
→R is B
n
measurable.
Hint: Note that (f ≤ α) is a closed set.
(c) Let f : (Ω, T) → R be Tmeasurable. Show that f
+
, f
−
, [f[, and every
polynomial a
0
+a
1
f + +a
n
f
n
are Tmeasurable.
The next exercise is a ﬁrst step towards the measurability of expressions involving
several measurable functions.
2.13 Problem. (intermediate) Let (f
1
, f
2
, . . . , f
n
be measurable functions. Then
f = (f
1
, f
2
, . . . , f
n
) : Ω →R
n
16 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES
is (T, B
n
)measurable.
2.14 Corollary. Let f
1
, f
2
, . . . , f
n
be measurable functions. Then for every con
tinuous function φ : R
n
→R the composition φ(f
1
, f
2
, . . . , f
n
) is measurable.
Proof: Apply 2.6. 2
2.15 Corollary. Let f
1
, f
2
be measurable functions. Then f
1
+ f
2
, f
1
f
2
, f
1
∩ f
2
,
f
1
∪ f
2
are measurable functions.
2.16 Problem. Prove 2.15.
As a result we see that L(T) is a space of functions where we may perform any
algebraic operations without leaving the space. Thus it is a very convenient space
for formal manipulations. Moreover, we may even perform all of those operations
involving a countable set (e.g. a sequence) of measurable functions !
2.17 Theorem. Let (f
n
)
n∈N
be a sequence of measurable functions. Then sup
n
f
n
,
inf
n
f
n
are measurable functions. Let A := (∃ lim
n
f
n
). Then A ∈ T and lim
n
f
n
1
A
is measurable.
Proof: Since
(sup
n
f
n
≤ α) =
¸
n
(f
n
≤ α)
it follows from 2.11 that sup
n
f
n
and inf
n
f
n
= −sup
n
(−f
n
) are measurable. We have
A := (∃ lim
n
f
n
) =
sup
k
inf
n≥k
f
n
= inf
k
sup
n≥k
f
n
This implies A ∈ T. The last statement follows from
lim
n
f
n
= sup
k
inf
n≥k
f
n
on A.
2
Note that the preceding corollaries are only very special examples for the power of
theorem 2.6. Roughly speaking, any function which can be written as an expression
involving countable many operations with countable many measurable functions is
measurable. It is rather difﬁcult to construct nonmeasurable functions.
Next we turn to the question how typical measurable functions look like. Let us
denote the set of all Tmeasurable simple functions by o(T). Clearly, all limits of
simple measurable functions are measurable. The remarkable fact being fundamental
for almost everything in integration theory is the converse of this statement.
2.18 Theorem. (a) Every measurable function f is the limit of some sequence of
simple measurable functions.
2.4. PROBABILITY MODELS 17
(b) If f is bounded then the approximating sequence can be chosen to be uniformly
convergent.
(c) If f ≥ 0 then the approximating sequence can be chosen to be increasing.
Proof: The fundamental statement is (c).
Let f ≥ 0. For every n ∈ N deﬁne
f
n
:=
(k −1)/2
n
whenever (k −1)/2
n
≤ f < k/2
n
, k = 1, 2, . . . , n2
n
n whenever f ≥ n
Then f
n
↑ f. If f is bounded then (f
n
) converges uniformly to f. Parts (a) and b
follow from f = f
+
−f
−
. 2
2.19 Problem. (easy)
Draw a diagram illustrating the construction of the proof of 2.18.
2.20 Review questions. Describe the structure of the set of realvalued measur
able functions. Explain the role of simple functions.
2.4 Probability models
The term random variable is simply the probabilistic name of a measurable function.
Let (Ω, T, P) be a probability space.
2.21 Deﬁnition. Any Tmeasurable realvalued function X : Ω → R is called a
random variable.
Let X be a random variable. Then the function F
X
: R → [0, 1] deﬁned by
F
X
(x) := P(X ≤ x), x ∈ R
is the distribution function of X. The distribution of X is P
X
, i.e. the image of P
under X deﬁned by
P
X
(B) := P(X
−1
(B)) = P(X ∈ B), B ∈ B.
Thus, the distribution function F
X
determines the values of the distribution P
X
on
intervals by
P
X
((a, b]) = F(b) −F(a).
2.22 Problem. (easy)
(a) Show that any distribution function is rightcontinuous.
(b) Show that the distribution P
X
= λ
F
.
A major problem of probability theory is the converse problem: Given a function
F, does there exist a probability space (Ω, T, P) and a random variable X such that
P
X
= λ
F
?
18 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES
2.23 Problem. (easy)
Let F : R → [0, 1] be increasing. Show that λ
F
is a σadditive probability content
iff F is rightcontinuous and satisﬁes F(−∞) = 0 and F(∞) = 1.
2.24 Deﬁnition. A distribution function is a function F : R → [0, 1] which is
increasing, rightcontinuous and satisﬁes F(−∞) = 0 and F(∞) = 1.
2.25 Problem. (intermediate)
Let F be a distribution function. Show that there is a probability space (Ω, T, P)
and a random variable X such that P(X ≤ x) = F(x).
Hint: Let Ω = R, T = B(R), P = λ
F
and X(ω) = ω.
2.26 Example. Joint distribution functions
Let X := (X
1
, X
2
, . . . , X
d
) be a random vector. Then the distribution
P
X
(Q) := P(X ∈ Q), Q ∈ 1
d
,
is a σadditive content on 1
d
.
The joint distribution function of X is deﬁned to be
F(x
1
, x
2
, . . . , x
d
) := P(X
1
≤ x
1
, X
2
≤ x
2
, . . . , X
d
≤ x
d
)
For deﬁning joint distributions one usually goes the other way round and starts with a
function F : R
d
→R to deﬁne
P(X
1
≤ x
1
, X
2
≤ x
2
, . . . , X
d
≤ x
d
) := F(x
1
, x
2
, . . . , x
d
)
But this only makes sense if this deﬁnition can be extended to an σadditive content
P
X
(Q) := P(X ∈ Q), Q ∈ 1
d
. There are conditions on F to guarantee the
possibility of such an extension.
Further reading: Shiryaev [21], chapter II, paragraph 3, sections 13.
2.27 Review questions. Explain how the measure extension theorem is applied to
construct probability spaces and random variables with given distributions.
Chapter 3
Integral and expectation
3.1 The integral of simple functions
Let (Ω, T, µ) be a measure space. We start with deﬁning the µintegral of a measurable
simple function.
3.1 Deﬁnition. Let f =
¸
n
i=1
a
i
1
F
i
be a nonnegative simple Tmeasurable
function with its canonical representation. Then
f dµ :=
n
¸
i=1
a
i
µ(F
i
)
is called the µintegral of f.
We had to restrict the preceding deﬁnition to nonnegative functions since we admit
the case µ(F) = ∞. If we were dealing with a ﬁnite measure µ the deﬁnition would
work for all Tmeasurable simple functions.
3.2 Example.
Let (Ω, T, P) be a probability space and let X =
¸
n
i=1
a
i
1
F
i
be a simple random
variable. Then we have E(X) =
X dP.
3.3 Theorem. The µintegral on o(T)
+
has the following properties:
(1)
1
F
dµ = µ(F),
(2)
(sf +tg) dµ = s
f dµ +t
g dµ if s, t ∈ R
+
and f, g ∈ o(T)
+
(3)
f dµ ≤
g dµ if f ≤ g and f, g ∈ o(T)
+
Proof: The only nontrivial part is to prove that
(f +g)dµ =
fdµ +
gdµ. 2
3.4 Problem. (intermediate)
Show that
(f + g)dµ =
fdµ +
gdµ for f, g ∈ o(T)
+
.
Hint: Try to ﬁnd the canonical representation of f +g in terms of the canonical repre
sentations of f and g.
19
20 CHAPTER 3. INTEGRAL AND EXPECTATION
It follows that the deﬁning formula of the µintegral can be applied to any (non
negative) linear combination of indicators, not only to canonical representations !
3.5 Theorem. (Transformation theorem)
Let (Ω, T, µ) be a measure space and let g ∈ L(T). Then for every f ∈ o
+
(B)
f ◦ g dµ =
f dµ
g
3.6 Problem. (easy)
Prove 3.5.
3.7 Problem. (easy)
Let (Ω, T, P) be a probability space and X a random variable with distribution
function F. Explain the formula
E(f ◦ X) =
f dλ
F
3.2 The extension to nonnegative functions
We know that every nonnegative measurable function f is the limit of an increasing
sequence (f
n
) of measurable simple functions: f
n
↑ f. It is a natural idea to think of
the integral of f as something like
f dµ := lim
n
f
n
dµ (5)
This is actually the way we will succeed. But there are some points to worry about. The
reader whose does not like to worry should grasp Beppo Levi’s theorem and proceed
to the next section.
First of all, we should ask whether the limit on the right hand side exists. The
integrals
f
n
dµ form an increasing sequence in [0, ∞]. This sequence either has a
ﬁnite limit or it increases to ∞. Both cases are covered by our deﬁnition.
The second and far more subtle question is whether the deﬁnition is compatible
with the deﬁnition of the integral on o(T). This is the only nontrivial part of the
extension process of the integral and it is the point where σadditivity of µ is required.
3.8 Theorem. Let f ∈ o(T)
+
and (f
n
) ⊆ o(T)
+
. Then
f
n
↑ f ⇒ lim
n
f
n
dµ =
f dµ
3.2. THE EXTENSION TO NONNEGATIVE FUNCTIONS 21
Proof: Note that ”≤” is clear. For an arbitrary > 0 let B
n
:= (f ≤ f
n
(1 +)).
It is clear that
1
B
n
f dµ ≤
1
B
n
f
n
(1 +) dµ ≤
f
n
dµ (1 +)
From B
n
↑ Ω it follows that A ∩ B
n
↑ A and µ(A ∩ B
n
) ↑ µ(A) by σadditivity. We
get
f dµ =
n
¸
j=1
α
j
µ(A
j
) = lim
n
n
¸
j=1
α
j
µ(A
j
∩ B
n
) = lim
n
1
B
n
f dµ
which implies
f dµ ≤ lim
n
f
n
dµ (1 +)
Since is arbitrarily small the assertion follows. 2
The third question is whether the value of the limit is independent of the approxi
mating sequence. This is straightforward using 3.8.
3.9 Theorem. Let (f
n
) and g
n
) be increasing sequences of nonnegative measur
able simple functions. Then
lim
n
f
n
= lim
n
g
n
⇒ lim
n
f
n
dµ = lim
n
g
n
dµ.
Proof: It is sufﬁcient to prove the assertion with ”≤” replacing ”=”. Since
lim
k
f
n
∩ g
k
= f
n
∩ lim
k
g
k
= f
n
we obtain by 3.8
f
n
dµ = lim
k
f
n
∩ g
k
dµ ≤ lim
k
g
k
dµ
2
Thus, we have a valid deﬁnition (5) of the integral on L(T)
+
. It is now straight
forward that 3.3 carries over to L(T)
+
.
3.10 Theorem. The µintegral on L(T)
+
has the following properties:
(1)
1
F
dµ = µ(F),
(2)
(sf +tg) dµ = s
f dµ +t
g dµ if s, t ∈ R
+
and f, g ∈ L(T)
+
(3)
f dµ ≤
g dµ if f ≤ g and f, g ∈ L(T)
+
The extension process is complete if we succed to extend 3.8 to L(T)
+
.
3.11 Theorem. (Theorem of Beppo Levi)
Let f ∈ L(T)
+
and (f
n
) ⊆ L(T)
+
. Then
f
n
↑ f ⇒ lim
n
f
n
dµ =
f dµ
22 CHAPTER 3. INTEGRAL AND EXPECTATION
Proof: We have to show ”≥”.
For every n ∈ N let (f
nk
)
k∈N
be an increasing sequence in o(T)
+
such that
lim
k
f
nk
= f
n
. Deﬁne
g
k
:= f
1k
∪ f
2k
∪ . . . ∪ f
kk
Then
f
nk
≤ g
k
≤ f
k
≤ f whenever n ≤ k.
It follows that g
k
↑ f and
f dµ = lim
k
g
k
dµ ≤ lim
k
f
k
dµ
2
3.12 Problem. (intermediate for mathematicians) Prove Fatou’s lemma: For every
sequence (f
n
) of nonnegative measurable functions
liminf
n
f
n
dµ ≥
liminf
n
f
n
dµ
Hint: Recall that liminf
n
x
n
= lim
k
inf
n≥k
x
n
. Consider g
k
:= inf
n≥k
f
n
and apply
Levi’s theorem to (g
k
).
3.13 Problem. (intermediate for mathematicians)
For every sequence (f
n
) of nonnegative measurable functions we have
∞
¸
n=1
f
n
dµ =
∞
¸
n=1
f
n
dµ
3.14 Problem. (intermediate)
Let f ∈ L(T)
+
. Prove Markoff’s inequality
µ(f > a) ≤
1
a
f dµ, a > 0.
3.15 Problem. (intermediate)
Let f ∈ L(T)
+
. Show that
f dµ = 0 implies µ(f = 0) = 0.
Hint: Show that µ(f > 1/n) = 0 for every n ∈ N.
An assertion A about a measurable function f is said to hold µalmost everywhere
(µa.e.) if µ(A
c
) = 0. Using this terminology the assertion of the preceding exercise
can be phrased as:
f dµ = 0, f ≥ 0 ⇒ f = 0 µa.e.
3.3. INTEGRABLE FUNCTIONS 23
If we are talking about probability measures and random variables ”almost everwhere”
is sometimes replaced by ”almost sure”.
3.16 Problem. (easy)
Let f ∈ L(T)
+
. Show that
f dµ < ∞implies µ(f > a) < ∞for every a > 0.
3.3 Integrable functions
Now the integral is deﬁned for every nonnegative measurable function. The value of
the integral may be ∞. In order to deﬁne the integral for measurable functions which
may take both positive and negative values we have to exclude inﬁnite integrals.
3.17 Deﬁnition. A measurable function f is µintegrable if
f
+
dµ < ∞ and
f
−
dµ < ∞. If f is µintegrable then
f dµ :=
f
+
dµ −
f
−
dµ
The set of all µintegrable functions is denoted by L
1
(µ) = L
1
(Ω, T, µ).
Proving the basic properties of the integral of integrable functions is an easy matter.
We collect these fact in a couple of problems.
3.18 Problem. (easy)
Show that f ∈ L(T) is µintegrable iff
[f[ dµ < ∞.
3.19 Theorem. The set L
1
(µ) is a linear space and the µintegral is a linear
functional on L
1
(µ).
3.20 Problem. (intermediate)
Prove 3.19.
3.21 Theorem. The µintegral is an isotonic functional on L
1
(µ).
3.22 Problem. (easy)
Prove 3.21.
3.23 Problem. (easy)
Let f ∈ L
1
(µ). Show that [
f dµ[ ≤
[f[ dµ.
3.24 Problem. (easy)
Let f be a measurable function and assume that there is an integrable function g
such that [f[ ≤ g. Then f is integrable.
3.25 Problem. (easy)
(a) Discuss the question whether bounded measurable functions are integrable.
24 CHAPTER 3. INTEGRAL AND EXPECTATION
(b) Characterize those measurable simple functions which are integrable.
For notational convenience we denote
A
f dµ :=
1
A
f dµ, A ∈ T.
3.26 Problem. (easy)
(a) Let f be a measurable function such that f = 0 µa.e.. Then f is integrable and
f dµ = 0.
(b) Let f be an integrable function. Then
f = 0 µa.e. ⇔
A
f dµ = 0 for all A ∈ T
3.27 Problem. (easy)
(a) Let f and g be measurable functions such that f = g µa.e.. Then f is integrable
iff g is integrable.
(b) Let f and g be integrable functions. Then
f = g µa.e. ⇔
A
f dµ =
A
g dµ for all A ∈ T
Many assertions in measure theory concerning measurable functions are stable un
der linear combinations and under convergence. Assertions of such a type need only
be proved for indicators. The procedure of proving (understanding) an assertion for in
dicators and extending it to nonnegative and to integrable functions is called measure
theoretic induction.
3.28 Problem. (easy)
Extend the transformation theorem by measure theoretic induction.
3.29 Problem. (easy)
Show that integrals are linear with respect to the integrating measure.
3.4 Convergence
One of the reasons for the great success of abstract integration theory are the conver
gence theorems for integrals. The problem is the following. Assume that (f
n
) is a
sequence of integrable functions converging to some function f. When can we con
clude that f is integrable and
lim
n
f
n
dµ =
f dµ ?
3.4. CONVERGENCE 25
The most popular result concerning this issue is Lebesgue’s theorem on dominated
convergence.
3.30 Theorem. Dominated convergence theorem
Let (f
n
) be a sequence of measurable function which is dominated by an integrable
function g, i.e. [f
n
[ ≤ g, n ∈ N. Then
f
n
→ f µa.e. ⇒ f ∈ L
1
(µ) and lim
n
f
n
dµ =
f dµ
The dominated convergence can be used and applied like a black box without being
aware of its proof. However, the proof is very easy and follows straightforward from
Levi’s theorem 3.11 and Fatou’s lemma 3.12.
Proof: Integrability of f is obvious. Moreover, the sequences g − f
n
and g + f
n
consist of nonnegative measurable functions. Therefore we may apply Fatou’s lemma:
(g −f) dµ ≤ liminf
(g −f
n
) dµ =
g dµ −limsup
n
f
n
dµ
and
(g + f) dµ ≤ liminf
(g +f
n
) dµ =
g dµ + liminf
n
f
n
dµ
This implies
f dµ ≤ liminf
n
f
n
dµ ≤ limsup
n
f
n
dµ ≤
f dµ
2
3.31 Problem. (easy)
Show that under the assumptions of the dominated convergence theorem we even
have
lim
n
[f
n
−f[ dµ = 0
(This type of convergence is called mean convergence.)
3.32 Problem. (easy)
Discuss the question whether a uniformly bounded sequence of measurable func
tions fulﬁlls is dominated in the sense of the dominated convergence theorem.
26 CHAPTER 3. INTEGRAL AND EXPECTATION
Chapter 4
Selected topics
4.1 Spaces of integrable functions
We know that the space L
1
= L
1
(Ω, T, µ) is a vector space. We would like to deﬁne
a norm on L
1
.
A natural idea is to deﬁne
[[f[[
1
:=
[f[ dµ, f ∈ L
1
.
It is easy to see that this deﬁnition has the following properties:
(1) [[f[[
1
≥ 0, f = 0 ⇒ [[f[[
1
= 0,
(2) [[f +g[[
1
≤ [[f[[
1
+[[g[[
1
, f, g ∈ L
1
,
(3) [[λf[[
1
≤ [λ[ [[f[[
1
, λ ∈ R, f ∈ L
1
.
However, we have
[[f[[
1
= 0 ⇒ f = 0 µa.e.
A function with zero norm need not be identically zero ! Therefore, [[.[[
1
is not a norm
on L
1
but only a pseudonorm.
In order to get a normed space one has to change the space L
1
in such a way that all
functions f = g µa.e. are considered as equal. Then f = 0 µa.e. can be considered
as the null element of the vector space. The space of integrable functions modiﬁed in
this way is denoted by L
1
= L
1
(Ω, T, P).
4.1 Discussion.
For those readers who want to have hard facts instead of soft wellness we provide
some details.
For any f ∈ L(T) let
˜
f = ¦g ∈ L(T) : f = g µa.e.¦
denote the equivalence class of f. Then integrability is a classproperty and the space
L
1
:= ¦
˜
f : f ∈ L
1
¦
27
28 CHAPTER 4. SELECTED TOPICS
is a vector space. The value of the integral depends only on the class and therefore it
deﬁnes a linear function on L
1
having the usual properties. In particular, [[
˜
f[[
1
:= [[f[[
1
deﬁnes a norm on L
1
.
It is common practise to work with L
1
instead of L
1
but to write f instead of
˜
f.
This is a typical example of what mathematicians call abuse of language.
4.2 Theorem. The space L
1
(Ω, T, P) is a Banach space.
Proof: Let (f
n
) be a Cauchy sequence in L
1
, i.e.
∀ > 0 ∃N() such that
[f
n
−f
m
[ dµ < whenever n, m ≥ N().
Let n
i
:= N(1/2
i
). Then
[f
n
i+1
−f
n
i
[ dµ <
1
2
i
It follows that for all k ∈ N
[f
n
1
[ +[f
n
2
−f
n
1
[ + +[f
n
k+1
−f
n
k
[
dµ ≤ C < ∞
Hence the corresponding inﬁnite series converges which implies that
[f
n
1
[ +
∞
¸
i=1
[f
n
i+1
−f
n
i
[ < ∞ µa.e.
Since absolute convergence of series in R implies convergence (here completenes of
R goes in) the partial sums
f
n
1
+ (f
n
2
−f
n
1
) + + (f
n
k
−f
n
k−1
) = f
n
k
converge to some limit f µa.e. Mean convergence of (f
n
) follows fromFatou’s lemma
by
[f
n
−f[ dµ =
lim
k→∞
[f
n
−f
n
k
[ dµ
≤ liminf
k→∞
[f
n
−f
n
k
[ dµ < whenever n ≥ N().
2
Let
L
2
= L
2
(Ω, T, µ) = ¦f ∈ L(T) :
f
2
dµ < ∞¦
This is another important space of integrable functions.
4.3 Problem. (easy)
(a) Show that L
2
is a vector space.
4.2. MEASURES WITH DENSITIES 29
(b) Show that
f
2
dµ < ∞is a property of the µequivalence class of f ∈ L(T).
By L
1
= L
1
(Ω, Tµ) we again denote the corresponding space of equivalence
classes. On this space there is an inner product
< f, g >:=
fg dµ, f, g ∈ L
2
.
The corresponding norm is
[[f[[
2
=< f, f >=
f
2
dµ
1/2
4.4 Theorem. The space L
2
(Ω, T, µ) is a Hilbert space.
4.2 Measures with densities
Let (Ω, T, µ) be a measure space and let f ∈ L
+
(T).
4.5 Problem. (intermediate)
Show that ν : A →
A
f dµ, A ∈ T is a measure.
We would like to say that f is the density of ν with respect to µ but for doing so
we have to be sure that f is uniquely determined by ν. This is not true, in general.
4.6 Lemma. Let f, g ∈ L
+
(T). Then
A
f dµ =
A
g dµ ∀A ∈ T ⇒ µ((f = g) ∩ A) = 0 ∀µ(A) < ∞.
In other words: f = g µa.e. on every set of ﬁnite µmeasure.
Proof: Let µ(M) < ∞ and deﬁne M
n
:= M ∩ (f ≤ n) ∩ g ≤ n). Since f1
M
n
and g1
M
n
are µintegrable it follows that f1
M
n
= g1
M
n
µa.e. For n → ∞ we have
M
n
↑ M which implies f1
M
= g1
M
µa.e. 2
Now, the basic uniquenes theorem follows immediately.
4.7 Theorem. If µ is ﬁnite or σﬁnite then
A
f dµ =
A
g dµ ∀A ∈ T ⇒ f = g µa.e.
4.8 Deﬁnition. Let µ be σﬁnite and deﬁne
ν : A →
A
f dµ, A ∈ T.
30 CHAPTER 4. SELECTED TOPICS
Then ν = fµ and f =:
dν
dµ
is called the density or the RadonNikodym derivative of ν
with respect to µ.
Which measures have densities w.r.t. other measures ?
4.9 Problem. (easy)
Let ν = fµ. Show that µ(A) = 0 implies ν(A) = 0, A ∈ T.
4.10 Deﬁnition. Let µ[T and ν[T be measures. The measure ν is said to be
absolutely continuous w.r.t the measure µ[T (ν < µ) if
µ(A) = 0 ⇒ ν(A) = 0, A ∈ T.
We saw that absolute continuity is necessary for having a density. It is even sufﬁ
cient.
4.11 Theorem. (RadonNikodym theorem)
Assume that µ is σﬁnite. If ν < µ then ν = fµ for some f ∈ L
+
(T).
Proof: See Bauer, [2]. 2
We will meet measures with densities frequently when we explore stochastic anal
ysis. Therefore a this point we present two most important special cases.
The ﬁrst example deals with LebesgueStieltjes measures. Let α : R → R be an
increasing function which is supposed to be continuous on R and differentiable exept
of ﬁnitely many points. We will show that λ
α
= α
λ.
4.12 Problem. (intermediate)
Let α : R →R be an increasing function which is supposed to be continuous on R
and to not differentiable at at most ﬁnitely many points. Show that λ
α
= α
λ.
A density w.r.t the Lebesgue measure is called a Lebesgue density.
4.13 Problem. (intermediate)
Let P and Q be probability measures of a ﬁnite ﬁeld T.
(1) State Q < P in terms of the generating partition of T.
(2) If Q < P ﬁnd dQ/dP.
Finally, we have to ask how µintegrals can be transformed into νintegrals.
4.14 Problem. (intermediate)
Let ν = fµ. Discuss the validity of
f dν =
f
dν
dµ
dµ
4.3. ITERATED INTEGRATION 31
Hint: Prove it for f ∈ o
+
(T) and extend it by measure theoretic induction.
4.15 Problem.
Let (Ω, T, P) be a measure space and X a random variable with differentiable
distribution function F. Explain the formulas
P(X ∈ B) =
B
F
(t) dt and E(g ◦ X) =
g(t)F
(t) dt
4.3 Iterated integration
UNDER CONSTRUCTION
32 CHAPTER 4. SELECTED TOPICS
Part II
Probability theory
33
Chapter 5
Measure theoretic language of
probability
5.1 Basics
Let (Ω, T, P) be a probability space. The probability space serves as a model of a
random experiment. The σﬁeld T is the ﬁeld of observable events. Observability of
A means that after performing the random experiment it can be decided whether A is
realized or not. In this sense the σﬁeld can be identiﬁed with the information which
is obtained after having performed the random experiment.
The probability measure P gives to each event A ∈ T a probability P(A). The
intuitive nature of the probability will become clear later.
A function X : Ω →R is a random variable if assertions about the function value
(e.g. (X ∈ B), B ∈ B) are observable, i.e. are in T. Therefore, a random variable is
simply an Tmeasurable function.
Let X be a nonnegative or integrable random variable. Then the integral of X is
called expectation of X and is denoted by
E(X) =
X dP
5.2 The information set of random variables
The information set of a random variable X is the family of events which can be
expressed in terms of X, i.e. the family of events X ∈ B, B ∈ B. This is a subσﬁeld
of T and is denoted by σ(X). It is called the σﬁeld generated by X.
5.1 Problem. (easy)
Show that σ(X) is a σﬁeld.
5.2 Problem. (intermediate)
(a) Let X be an indicator random variable. Find σ(X).
35
36 CHAPTER 5. MEASURE THEORETIC LANGUAGE OF PROBABILITY
(b) Let X be a simple random variable. Find σ(X).
Let X and Y be random variables such that Y = f ◦ X where f is some Borel
measurable function. Since (Y ∈ B) = (X ∈ f
−1
(B)) it follows that σ(Y ) ⊆ σ(X).
In other words: If f is causal dependent of X then the information of Y is contained
in the information set of X. This is intuitively very plausible: Any assertion about Y
can be stated as an assertion about X.
It is a remarkable fact that even the converse is true.
5.3 Theorem. (Causality theorem)
Let X and Y be random variables such that σ(Y ) ⊆ σ(X). Then there exists a
measurable function f such that Y = f ◦ X.
Proof: By measure theoretic induction it is sufﬁcient to prove the assertion for
Y = 1
A
, A ∈ T.
Recall that σ(Y ) = ¦∅, Ω, A, A
c
¦. From σ(Y ) ⊆ σ(X) it follows that A ∈ σ(X),
i.e. A = (X ∈ B) for some B ∈ B. This means 1
A
= 1
B
◦ X. 2
5.3 Independence
The notion of independence marks the point where probability theory goes beyond
abstract measure theory.
Recall that two events A, B ∈ T are independent if the product formula P(A ∩
B) = P(A)P(B) is true. This is easily extended to families of events.
5.4 Deﬁnition. Let ( and T be subfamilies of T. The families ( and T are said to
be independent (with respect to P) if P(A∩B) = P(A)P(B) for every choice A ∈ (
and B ∈ T.
It is natural to call random variables X and Y independent if the corresponding
information sets are independent.
5.5 Deﬁnition. Two random variables X and Y are independent if σ(X) and σ(Y )
are independent.
How to check independence of random variables ? Is it sufﬁcient to check the
independence of generators of the inforamtion sets ? This is not true, in general, but
with a minor modiﬁcation it is.
5.6 Theorem. Let X and Y be random variables and let ( and T be generators
of the corresponding information sets. If ( and T are independent and closed under
intersection then X and Y are independent.
5.7 Problem. (intermediate)
Let F(x, y) be the joint distribution function of (X, Y ). Show that X and Y are
independent iff F(x, y) = h(x)k(y) for some functions h and k.
5.3. INDEPENDENCE 37
For independent random variables there is a product formula for expectations.
5.8 Theorem. (1) Let X ≥ 0 and Y ≥ 0 be independent random variables. Then
E(Y X) = E(X)E(Y )
(2) Let X ∈ L
1
and Y ∈ L
1
be indpendent random variables. Then XY ∈ L
1
and
E(Y X) = E(X)E(Y )
Proof: Apply measure theoretic induction. 2
38 CHAPTER 5. MEASURE THEORETIC LANGUAGE OF PROBABILITY
Chapter 6
Conditional expectation
6.1 The concept
Let us explore the relation between a random variable and a σﬁeld. Let (Ω, T, P) be
a probability space and let / ⊆ T be subσﬁeld.
If a random variable X is /measurable then the information available in / tells
us everything about X. If the random variable X is not /measurable we could be
interested in ﬁnding an optimal /measurable approximation of X in a sense to be
speciﬁed. This program leads to the concept of conditional expectation.
A successful way consists in decomposing X into a sum Y + R where Y is /
measurable and Ris uncorrelated to /. If we require that E(X) = E(Y ) then E(R) =
0. The condition on R of being uncorrelated to / means
A
RdP = 0 for all A ∈ /.
In other words the approximating variable Y should satisfy
A
X dP =
A
Y dP for all A ∈ /
For these integrals to be deﬁned we need nonnegative or integrable random variables.
6.1 Deﬁnition. Let (Ω, T, P) be a probability space and let / ⊆ T be subσﬁeld.
Let X be a nonnegative or integrable random variable. The conditional expectation
E(X[/) of X given / is an /measurable random variable Y satisfying
A
X dP =
A
Y dP for all A ∈ /
39
40 CHAPTER 6. CONDITIONAL EXPECTATION
6.2 Theorem. The conditional expectation E(X[/) exists if X is integrable or
X ≥ 0.
Proof: This is a consequence of the RadonNikodym theorem. If X ≥ 0 then
µ(A) :=
A
X dP deﬁnes a σﬁnite measure on /such that µ < P. Deﬁne E(X[/) :=
dµ
dP
. If X is integrable apply the preceding to X
+
and X
−
. 2
6.3 Problem.
(a) The conditional expectation E(X[/) is uniquely determined Pa.e.
(b) If X ≥ 0 then E(X[/) ≥ 0 Pa.e.
(c) If X is integrable then E(X[/) is integrable, too.
6.4 Problem. E(E(X[/)) = E(X)
6.5 Problem. Find the conditional expectation given a ﬁnite ﬁeld.
6.6 Problem. If X is /measurable then E(X[/) = X.
6.7 Problem. If X is independent of / then E(X[/) = E(X).
6.2 Properties
Since the deﬁnition of the conditional expectation is linear in X and Y the following
two assertions are almost obvious:
(1) Assume that X and Y are nonnegative random variables. Then
E(αX + βY [/) = αE(X[/) + βE(Y [/) whenever α, β ≥ 0.
(2) Assume that X and Y are integrable random variables. Then
E(αX +βY [/) = αE(X[/) +βE(Y [/) whenever α, β ∈ R.
6.8 Theorem. Iterated conditioning
UNDER CONSTRUCTION
6.9 Theorem. Redundant conditioning
UNDER CONSTRUCTION
Inequalities
6.10 Problem. Show that X ≤ Y implies E(X[/) ≤ E(Y [/). Distinguish
between the nonnegative and the integrable case.
6.3. CALCULATION 41
6.11 Problem. Show that [E(X[/)[ ≤ E([X[[/) if X is integrable.
6.12 Theorem. Jensen’s inequality
UNDER CONSTRUCTION
Further topics:
• L
2
hereditary property
• CSinequality
Projection properties
UNDER CONSTRUCTION
Convergence
6.13 Problem. Show that X
n
L
1
→ X implies E(X
n
[/)
L
1
→ E(X[/).
6.14 Problem. Prove a Fatou’s lemma for conditional expectations.
6.15 Problem. Prove a Lebesgue’s dominated convergence theorem for condi
tional expectations.
6.3 Calculation
• given a random variable, causality theorem
• given a dominating product measure
• given several random variables
• the Gaussian case
42 CHAPTER 6. CONDITIONAL EXPECTATION
Chapter 7
Stochastic sequences
7.1 The ruin problem
One player
Let us start with a very simple gambling system.
A gambler bets a stake of one unit at subsequent games. The games are indepen
dent and p denotes the probability of winning. In case of winning the gambler’s return
is the double stake, otherwise the stake is lost.
A stochastic model of such a gambling system consists of a probability space
(Ω, T, P) and a sequence of random variables (X
i
)
i≥1
. The random variables are inde
pendent with values +1 and −1 representing the gambler’s gain or loss at time i ≥ 1.
Thus, we have P(X = 1) = p. The sequence of partial sums, i.e. the accumulated
gains,
S
n
= X
1
+X
2
+ +X
n
is called a random walk on Z starting at zero. If p = 1/2 then it is a symmetric random
walk.
A major foundation problem of probability theory is the question whether there
exists a stochastic model for a random walk. We do not discuss such questions but
refer to the literature.
Assume the the gambler starts at i = 0 with capital V
0
= a. Then her wealth after
n games is
V
n
= a +X
1
+ X
2
+ +X
n
= a + S
n
The sequence (V
n
)
n≥0
of partial sums is called a random walk starting at a.
We assume that the gambler plans to continue gambling until her wealth is c > a
or 0. Let
T
x
:= min¦n : V
n
= x¦
Then q
0
(a) := P(T
0
< T
c
[V
0
= a) is called the probability of ruin. Similarly, q
c
(a) :=
P(T
c
< T
0
[V
0
= a) is the probability of winning.
43
44 CHAPTER 7. STOCHASTIC SEQUENCES
How to evaluate the probability of ruin ? It will turn out that the probability can be
obtained by studying the dynamic behaviour of the gambling situation. Thus, this is a
basic example of a situation which is typical for stochastic analysis: Probabilities are
not only obtained by combinatorial methods but also and often much more easily by a
dynamic argument resulting in a difference or differential equation.
The starting point is the following assertion.
7.1 Lemma. The ruin probabilities satisfy the difference equation
q
c
(a) = p q
c
(a + 1) + (1 −p) q
c
(a −1) whenever 0 < a < c
with boundary conditions q
c
(0) = 0 and q
c
(c) = 1.
It is illuminating to understand the assertion with the help of an heuristic argument:
If the random walk starts at V
0
= a, 0 < a < c, then we have V
1
= a + 1 with
probability p and V
1
= a −1 with probability 1 −p. This gives
P(T
c
< T
0
[V
0
= a) = pP(T
c
< T
0
[V
1
= a + 1) + (1 −p)P(T
c
< T
0
[V
1
= a −1)
However, the random walk starting at time i = 1 has the same ruin probabilites as the
random walk starting at i = 0. This proves the assertion. In this argument we utilized
the intuitively obvious fact that the starting time of the random walk does not affect its
ruin probabilites.
In order to calculate the ruin probabilities we have to solve the difference equation.
7.2 Discussion. Solving the ruin equation The difference equation
x
a
= px
a+1
+ (1 −p)x
a−1
whenever a = 1, . . . , c −1
has the general solution
x
a
=
A + B
1 −p
p
a
if p = 1/2
A + Ba if p = 1/2
The constants A and B are determined by the boundary conditions. This gives
q
c
(a) =
1 −p
p
a
−1
1 −p
p
c
−1
if p = 1/2
a
c
if p = 1/2
7.1. THE RUIN PROBLEM 45
In order to calculate q
0
(a) we note that q
0
(a) = ¯ q
c
(c − a) where ¯ q denotes the ruin
probabilities of a random walk with interchanged transitions probabilities. This im
plies
q
0
(a) =
p
1 −p
c−a
−1
p
1 −p
c
−1
if p = 1/2
c −a
c
if p = 1/2
Easy calculations show that
q
c
(a) + q
0
(a) = 1
which means that gambling ends with probability 1.
7.3 Problem. (intermediate)
(a) Fill in the details of solving the difference equation of the ruin problem.
(b) Show that the random walk hits the boundaries almost surely (with probability
one).
Two players
Now we assume that two players with initial capitals a and b are playing against each
other. The stake of each player is 1 at each game. The game ends when one player is
ruined.
This is obviously equivalent to the situation of the preceding section leading to
P(player 1 wins) = q
a+b
(a)
P(player 2 wins) = q
0
(a).
We know that the game ends with probability one.
Let us turn to the situation where player 1 has unlimited initial capital. Then the
game can only end by the ruin of player 2, i.e. if
sup
n
S
n
≥ b
where S
n
denotes the accumulated gain of player 1.
7.4 Theorem. Let (S
n
) be a random walk on Z. Then
P(sup
n
S
n
≥ b) =
1 whenever p ≥ 1/2
p
1 −p
b
whenever p < 1/2
46 CHAPTER 7. STOCHASTIC SEQUENCES
7.5 Problem. (advanced)
Prove 7.4.
Hint: Show that P(sup
n
S
n
≥ b) = lim
a→∞
q
a+b
(a).
Note that P(sup
n
S
n
≥ 1) is the probability that a gambler with unlimited initial
capital gains 1 at some time. If p ≥ 1/2 this happens with probability 1 if we wait
sufﬁciently long. Later we will see that in a fair game (p = 1/2) the expected waiting
time is inﬁnite.
7.2 Stopping times
Optional stopping
Let us consider the question whether gambling chances can be improved by a gambling
system.
Let us start with a particularly simple gambling system, called optional stopping
system. The idea is as follows: The gambler waits up to a random time σ and then
starts gambling. (The game at period σ + 1 is the ﬁrst game to play.) Gambling is
continued until a further random time τ ≥ σ and then stops. (The game at period τ is
the last game to play.) Random times are random variables σ, τ : Ω →N
0
∪ ¦∞¦.
Now it is an important point that the choice of the starting time σ and the stopping
time τ depend only on the information available up to those times since the gambler
does not know the future.
Filtrations and stopping times
Let X
1
, X
2
, . . . , X
k
, . . . be a sequence of random variables representing the outcomes
of a game at times k = 1, 2, . . ..
7.6 Deﬁnition. The σﬁeld T
k
:= σ(X
1
, X
2
, . . . , X
k
). which is generated by the
events (X
1
∈ B
1
, X
2
∈ B
2
, . . . , X
k
∈ B
k
), B
i
∈ B, is called the past of the sequence
(X
i
) at time k.
7.7 Problem. (advanced)
State and prove a causality theorem for σ(X
1
, X
2
, . . . , X
k
)measurable ransom
variables.
Hint: Let ( be the generating system of σ(X
1
, X
2
, . . . , X
k
) and let T be the family of
random variables that are functions of (X
1
, X
2
, . . . , X
k
). Show that T is a σﬁeld and
that ( ⊆ T. This implies that any indicator of a set in σ(X
1
, X
2
, . . . , X
k
) is a function
of (X
1
, X
2
, . . . , X
k
). Extend this result by measure theoretic induction.
The past at time k is the information set of the beginning (X
1
, X
2
, . . . , X
k
) of the
sequence (X
i
). The history of the game is the family of σﬁelds (T
k
)
k
≥
0
where T
0
=
7.2. STOPPING TIMES 47
¦∅, Ω¦. The history is an increasing sequence of σﬁelds representing the increasing
information in course of time.
7.8 Deﬁnition. Any increasing sequence of σﬁelds is called a ﬁltration.
7.9 Deﬁnition. A sequence (X
k
) of random variables is adapted to a ﬁltration
(T
k
)
k≥0
if X
k
is T
k
measurable for every k ≥ 0.
Clearly, every sequence of random variables is adapted to its own history.
Now we are in the position to give a formal deﬁnition of a stopping time.
7.10 Deﬁnition. Let (T
k
) be a ﬁltration. A random variable τ : Ω → N
0
∪ ¦∞¦
is a stopping time (relative to the ﬁltration (T
k
)) if
(τ = k) ∈ T
k
for every k ∈ N.
7.11 Problem. Let (T
k
) be a ﬁltration and let τ : Ω → N
0
∪ ¦∞¦ be a random
variable. Show that the following assertions are equivalent:
(a) (τ = k) ∈ T
k
for every k ∈ N
(b) (τ ≤ k) ∈ T
k
for every k ∈ N
(c) (τ < k) ∈ T
k−1
for every k ∈ N
(d) (τ ≥ k) ∈ T
k−1
for every k ∈ N
(e) (τ > k) ∈ T
k
for every k ∈ N
7.12 Problem. Let (X
n
)
n≥0
be adapted. Show that the hitting time
τ = min¦k ≥ 0 : X
k
∈ B¦
is a stopping time for any B ∈ B. (Note that τ = ∞if X
k
∈ B for all k ∈ N.)
In view of the causality theorem the realisation of the events (τ = k) is determined
by the values of the random variables X
1
, X
2
, . . . , X
k
, i.e.
1
(τ=k)
= f
k
(X
1
, X
2
, . . . , X
k
)
where f
k
are any functions.
Wald’s equation
If our gambler applies a stopping system (σ, τ) with ﬁnite stopping times then her gain
is S
τ
−S
σ
.
7.13 Problem. (easy)
Let (X
k
) be a sequence adapted to (T
k
) and let τ be a ﬁnite stopping time. Then
X
τ
is a random variable.
48 CHAPTER 7. STOCHASTIC SEQUENCES
Does the stopping systemimprove the gambler’s chances ? We require some prepa
rations.
7.14 Problem. (easy)
Let Z be a randomvariable with values in N
0
. Showthat E(Z) =
¸
∞
k=1
P(Z ≥ k).
7.15 Theorem. Wald’s equation
Let (X
k
) be an independent sequence of integrable random varibales with a com
mon expectation E(X
k
) = µ. If τ is an integrable stopping time then S
τ
is integrable
and
E(S
τ
) = µE(τ)
Proof: We will show that the equation is true both for the positive parts and the
negative parts of X
k
. Let X
k
≥ 0. Then
E(S
τ
) =
∞
¸
k=1
τ=k
S
k
dP =
∞
¸
i=1
τ≥i
X
i
dP
=
∞
¸
i=1
E(X
i
)P(τ ≥ i) = µE(τ)
Note that all terms are ≥ 0 which allows interchanging sums iteration. 2
7.16 Problem. (advanced)
Let τ := min(k ≥ 0 : S
k
= −a or S
k
= b).
(a) Show that Wald’s equation is valid.
Hint: Let τ
n
:= τ ∩ n. Show that τ
n
satisﬁes Wald’s equation. Let n → ∞ to obtain
integrability of τ.
(b) Calculate E(τ).
7.17 Problem. (intermediate)
Let (S
k
) be a symmetric random walk and let τ := min(k ≥ 0 : S
k
= 1). Show
that E(τ) = ∞.
Hint: Assume E(τ) < ∞and derive a contradiction.
The following theorem answers our question for improving chances by stopping
strategies. It shows that unfavourable games cannot be turned into fair games and
fair games cannot be turned into favourable games. The result is a consequence of
Wald’s equation and it is the prototype of the fundamental optional stopping theorem
of stochastic analysis.
7.18 Theorem. Optional stopping of random walks
Let (X
k
) be an independent sequence of integrable random variables with a com
mon expectation E(X
k
) = µ. Let σ ≤ τ be integrable stopping times. Then:
7.3. GAMBLING SYSTEMS 49
(a) µ < 0 ⇒ E(S
τ
−S
σ
) < 0.
(b) µ = 0 ⇒ E(S
τ
−S
σ
) = 0.
(c) µ > 0 ⇒ E(S
τ
−S
σ
) > 0.
7.3 Gambling systems
Next we generalize our gambling system. We are going to admit that the gambler may
vary the stakes. The stopping system is a special case where only 0 and 1 are admitted
as stakes.
The stake for game n is denoted by H
n
and has to be nonnegative. It is ﬁxed
before period n and therefore must be T
n−1
measurable since it is determined by the
outcomes at times k = 1, 2, . . . , n − 1. The sequence of stakes (H
n
) is thus not only
adapted but even predictable.
7.19 Problem.
Determine the stakes H
n
corresponding to a stopping systemand check predictabil
ity.
Hint: Show that H
n
= 1
(σ,τ]
(n).
The gain at game k is H
k
X
k
= H
k
(S
k
− S
k−1
) = H
k
∆S
k
. For the wealth of the
gambler after n games we obtain
V
n
= V
0
+
n
¸
k=1
H
k
(S
k
−S
k−1
) = V
0
+
n
¸
k=1
H
k
∆S
k
(6)
If the stakes are integrable then we have
E(V
n
) = E(V
n−1
) +E(H
n
)E(X
n
).
In particular, if p = 1/2 we have E(V
n
) = 0 for all n ∈ N.
However, if the total gambling time is unbounded (but ﬁnite !) then this is no
longer true.
7.20 Example. Doubling strategy
Let τ be the waiting time to the ﬁrst success, i.e.
τ = min¦k ≥ 1 : X
k
= 1¦
and deﬁne
H
n
:= 2
n−1
1
τ≥n
Obviously, the stakes are integrable. However, we have
P(V
τ
= 1) = 1 (7)
50 CHAPTER 7. STOCHASTIC SEQUENCES
for any p ∈ (0, 1). Therefore, a fair game can be transformed into a favourable game
by such a strategy. And this is true although the stopping time τ is integrable, actually
E(τ) = 1/p !
7.21 Problem.
Prove (7).
The assertion of the optional stopping theorem remains valid for gambling systems
if the stopping times are bounded.
7.22 Theorem. Optional stopping for gambling systems
Let (X
k
) be an independent sequence of integrable random variables with a com
mon expectation E(X
k
) = µ. Let (V
n
) be the sequence of wealths generated by a
gambling system with integrable stakes. If σ ≤ τ are bounded stopping times then
(a) µ < 0 ⇒ E(V
τ
−V
σ
) < 0,
(b) µ = 0 ⇒ E(V
τ
−V
σ
) = 0,
(c) µ > 0 ⇒ E(V
τ
−V
σ
) > 0.
Proof: Let N := max τ. Since
V
τ
−V
σ
=
N
¸
k=1
H
k
X
k
1
σ<k≤τ
and since (σ < k ≤ τ) is independent of X
k
it follows that
E(V
τ
−V
σ
) = µ
N
¸
k=1
E(H
k
1
σ<k≤τ
).
2
7.23 Problem. (easy)
Let p = 1/2. Show that for the doubling strategy we have E(V
τ∩n
) = 0.
7.24 Problem. (advanced)
Explain for the doubling strategy, why Lebesgue’s theorem on dominated conver
gence does not imply E(V
τ∩n
) → E(V
τ
), although V
τ∩n
→ V
τ
.
Hint: Show that the sequence (V
τ∩n
) is not dominated from below by an integrable
random variable.
7.4 Martingales
Let (X
n
)
n≥0
be a sequence of integrable randomvariables adapted to a ﬁltration (T
n
)
n≥0
with T
0
= ¦∅, Ω¦.
In gambler’s speech gambling systems are called martingales. This might be the
reason for the following mathematical terminology.
7.4. MARTINGALES 51
7.25 Deﬁnition. Let (T
k
) be a ﬁltration and let (Y
k
) be an adapted sequence of
integrable random variables.
(1) The sequence (Y
k
) is called a martingale if E(Y
σ
) = E(Y
τ
) for all bounded
stopping times σ ≤ τ.
(2) The sequence (Y
k
) is called a submartingale if E(Y
σ
) ≤ E(Y
τ
) for all bounded
stopping times σ ≤ τ.
(3) The sequence (Y
k
) is called a supermartingale if E(Y
σ
) ≥ E(Y
τ
) for all
bounded stopping times σ ≤ τ.
We deﬁned martingales by the property of τ → E(X
τ
) being constant for bounded
stopping times τ. This property can be rewritten in terms of conditional expectations.
We start with a fundamental identity.
7.26 Lemma. If σ ≤ τ are bounded stopping times then for any A ∈ T
A
(X
τ
−X
σ
)dP =
n
¸
j=1
A∩(σ<j≤τ)
(E(X
j
[T
j−1
) −X
j−1
)dP
Proof: Let τ ≤ n. It is obvious that
X
τ
= X
0
+
¸
j≤τ
(X
j
−X
j−1
) = X
0
+
n
¸
j=1
1
τ≥j
(X
j
−X
j−1
)
This gives
A
(X
τ
−X
0
)dP =
n
¸
j=1
A∩(τ≥j)
(X
j
−X
j−1
)dP
We may replace X
j
by E(X
j
[T
j−1
). 2
7.27 Problem. Fill in the details of the proof of 7.26.
7.28 Theorem. The sequence (X
n
)
n≥0
is a martingale iff
E(X
j
[T
j−1
) = X
j−1
, j ≥ 1. (8)
Proof: The ”if”part of the assertion is clear from 7.26.
Let F ∈ T
j−1
and deﬁne
τ :=
j −1 whenever ω ∈ F,
j whenever ω ∈ F.
Then τ is a stopping time. From E(X
j
) = E(X
τ
) the ”only if”part follows. 2
Equation (8) is the common deﬁnition of a martingale.
52 CHAPTER 7. STOCHASTIC SEQUENCES
7.29 Problem. Extend 7.28 to submartingales and supermartingales.
We conclude this section by the elementary version of the celebrated DoobMeyer
decomposition.
7.30 Theorem. Each adapted sequence (X
n
)
n≥0
of integrable random variables
can be written as
X
n
= M
n
+A
n
, n ≥ 0,
where (M
n
) is a martingale and (A
n
) is a predictable sequence, i.e. A
n
is T
n−1

measurable for every n ≥ 0.
The decomposition is unique up to constants.
The sequence (X
n
)
n≥0
is a submartingale iff (A
n
) is increasing, it is a supermartin
gale iff (A
n
) is decreasing, and it is a martingale iff (A
n
) is constant.
Proof: Let
M
n
=
n
¸
j=1
(X
j
−E(X
j
[T
j−1
))
and
A
n
=
n
¸
j−1
(E(X
j
[T
j−1
) −X
j−1
)
This proves existence of the decomposition. Uniqueness follows from the fact that a
predictable martingale is constant. The rest is obvious. 2
7.31 Problem. Show that a predictable martingale is constant.
Equation (8) extends easily to stopping times after having deﬁned the past of a
stopping time.
7.32 Problem. Let τ be a stopping time. Show that
T
τ
:= ¦A ∈ T : F ∩ (τ ≤ j) ∈ T
j
, j ≥ 0¦
is a σﬁeld (the past of the stopping time τ).
7.33 Theorem. Optional stopping for martingales
Let (X
n
)
n≥0
be a martingale. Then for any pair σ ≤ τ of bounded random vari
ables
E(X
τ
[T
σ
) = X
σ
Proof: Applying 7.26 to A ∈ T
σ
proves the assertion. 2
7.5 Convergence
UNDER CONSTRUCTION
Chapter 8
The Wiener process
8.1 Basic concepts
In this section we introduce the Wiener process or Brownian Motion process.
A stochastic process (random process) on a probability space (Ω, T, P) is a family
(X
t
)
t≥0
of random variables. The parameter t is usually interpreted as time. Therefore,
the intuitive notion of a stochastic process is that of random system whose state at time
t is X
t
.
There are some notions related to a stochastic process (X
t
)
t≥0
which are important
from the very beginning: the starting value X
0
, the increments X
t
−X
s
for s < t, and
the paths t → X
t
(ω), ω ∈ Ω.
The Wiener process is deﬁned in terms of these notions.
8.1 Deﬁnition. A stochastic process (W
t
)
t≥0
is called a Wiener process if
(1) the starting value is W
0
= 0,
(2) the increments W
t
−W
s
are N(0, t −s)distributed and mutually independent for
nonoverlapping intervals,
(3) the paths are continuous for Palmost all ω ∈ Ω.
8.2 Remark. As it is the case with every probability model one has to ask whether
there exist a probability space (Ω, T, P) and a family of random variables (W
t
) satis
fying the properties of Deﬁnition 8.1. The mathematical construction of such models
is a complicated matter and is one of great achievements of probability theory in the
ﬁrst half of the 20th century. Accepting the existence of the Wiener process as a valid
mathematical model we may forget the details of the construction (there are several of
them) and start with the axioms stated in 8.1. (Further reading: KaratzasShreve [15],
section 2.2.)
8.3 Discussion. Wiener process as random walk
Later we will show:
53
54 CHAPTER 8. THE WIENER PROCESS
Any process (X
t
)
t≥0
with continuous paths and independent increments such that
E(X
t
−X
s
) = 0 and V(X
t
−X
s
) = t −s is necessarily a Wiener process.
This means that there are three structural properties which are essential for the
concepts of a Wiener process:
(1) The process has independent increments.
(2) The expectation of the increments is zero.
(3) The variance of the increments is proportional to the length of the time interval.
Let us motivate these properties at hand of speciﬁc discrete time models.
Let X
1
, X
2
, . . . be independent and such that P(X
i
= 1) = P(X
i
= −1) = 1/2.
Then
S
n
= X
1
+X
2
+ . . . + X
n
whenever n = 1, 2, . . .
is called a symmetric random walk. It is easy to see that the increments S
n
− S
m
are
independent. Moreover, we have
E(X
i
) = 0 ⇒ E(S
n
−S
m
) = 0 and V(X
i
) = 1 ⇒ V(S
n
−S
m
) = n −m
Thus, the Wiener process can be interpreted as a continuous time version of a symmet
ric random walk.
8.4 Problem. (easy) Let (W
t
)
t≥0
be a Wiener process. Show that X
t
:= −W
t
,
t ≥ 0, is a Wiener process, too.
8.5 Problem. (intermediate) Show that W
t
/t
P
→ 0 as t → ∞.
8.6 Deﬁnition. The past of a process (X
t
)
t≥0
at time t is the σﬁeld of events
T
X
t
= σ(X
s
: s ≤ t) generated by variables X
s
of the process prior to t, i.e. s ≤ t.
The internal history of (X
t
)
t≥0
is the family (T
X
t
)
t≥0
of pasts of the process.
The intuitive idea behind the concept of past is the following: T
X
t
consists of all
events which are observable if one observes the process up to time t. It represents the
information about the process available at time t. It it obvious that t
1
< t
2
⇒ T
X
t
1
⊆
T
X
t
2
. If X
0
is a constant then T
X
0
= ¦∅, Ω¦.
Independence of increments is actually a much stronger property than it sounds.
8.7 Theorem. The increments W
t
−W
s
of a Wiener process (W
t
)
t≥0
are indepen
dent of the past T
W
s
.
Proof: Let s
1
≤ s
2
≤ . . . ≤ s
n
≤ s < t. Then the random variables
W
s
1
, W
s
2
−W
s
1
, . . . , W
s
n
−W
s
n−1
, W
t
−W
s
are independent. It follows that even the random variables W
s
1
, W
s
2
, . . . , W
s
n
are
independent of W
t
− W
s
. Since this is valid for any choice of time points s
i
≤ s the
independence assertion carries over to the whole past T
W
t
. 2
8.2. QUADRATIC VARIATION 55
8.2 Quadratic variation
For beginners the most surprising properties are the path properties of a Wiener pro
cess.
The paths of a Wiener process are continuous (which is part of our deﬁnition). In
this respect the paths seem to be not complicated since they have no jumps or other
singularities. It will turn out, however, that in spite of their continuity, the paths of a
Wiener process are of a very peculiar nature.
8.8 Remark. (This remark is based on section 10.1.)
Recall that for a function f : [0, ∞) → R with bounded variation on compacts we
have
lim
n
¸
i=1
[f(t
n
i
) −f(t
n
i−1
)[ = V
t
0
(f) < ∞
for each Riemannian sequence of subdivisions 0 = t
n
0
< t
n
1
< . . . < t
n
n
= t and
every ﬁxed t > 0. Recall that all smooth (continuously differentiable) functions are
of bounded variation on compacts. For such functions it follows that their quadratic
variation, deﬁned as
lim
n
¸
i=1
[f(t
n
i
) −f(t
n
i−1
)[
2
by Riemannian sequences, is necessarily zero for every t > 0.
8.9 Problem. (easy for mathematicians) Show that the quadratic variation of a
continuous BVfunction is zero on every compact interval.
We will now show that (almost) all paths of a Wiener process have nonvanishing
quadratic variation. This implies that the paths cannot be smooth. Actually, it can be
shown that they are nowhere differentiable. (Further reading: KaratzasShreve [15],
section 2.9.)
8.10 Theorem. Let (W
t
)
t≥0
be a Wiener process. For every t > 0 and every
Riemannian sequence of subdivisions 0 = t
n
0
< t
n
1
< . . . < t
n
n
= t
n
¸
i=1
[W(t
n
i
) −W(t
n
i−1
)[
2
P
→ t, t > 0.
8.11 Problem. (intermediate) Prove 8.10.
Hint: Let Q
n
:=
¸
n
i=1
[W(t
n
i
) − W(t
n
i−1
)[
2
for a particular Riemannian sequence
of subdivisions. Show that E(Q
n
) = t and V (Q
n
) → 0. Then the assertion follows
from Cebysev’s inequality.
56 CHAPTER 8. THE WIENER PROCESS
The assertion of 8.10 can be improved to Palmost sure convergence which implies
that the quadratic variation on [0, t] of almost all paths is actually t. It is remarkable
that the quadratic variation of the Wiener process is a deterministic function of a very
simple nature.
8.3 Martingales
We start with some general deﬁnitions.
8.12 Deﬁnition. Any increasing family of σﬁelds (T
t
)
t≥0
is called a ﬁltration.
A process (Y
t
)
t≥0
is adapted to the ﬁltration (T
t
)
t≥0
if Y
t
is T
t
measurable for every
t ≥ 0.
The internal history (T
X
t
)
t≥0
of a process (X
t
)
t≥0
is a ﬁltration and the process
(X
t
)
t≥0
is adapted to its internal history. But also Y
t
:= φ(X
t
) for any measurable
function φ is adapted to the internal history of (X
t
)
t≥0
. Adaption simply means that
the past of the process (Y
t
)
t≥0
at time t is contained in T
t
. Having the information
contained in T
t
we know everything about the process up to time t.
8.13 Deﬁnition. A martingale relative to the ﬁltration (T
t
)
t≥0
is an adapted and
integrable stochastic process (X
t
)
t≥0
such that
E(X
t
[T
s
) = X
s
whenever s < t
It is a square integrable martingale if E(X
2
t
) < ∞, t ≥ 0.
8.14 Problem. (easy)
Show that the martingale property remains valid if the ﬁltration is replaced by an
other ﬁltration consisting of smaller σﬁelds, provided that the process is still adapted.
Let us consider some important martingales related to the Wiener process. Let
(T
t
)
t≥0
be the internal history of the Wiener process (W
t
)
t≥0
.
8.15 Theorem. A Wiener process is a square integrable martingale with respect
to its internal history.
Proof: Since W
t
− W
s
is independent of T
s
it follows that E(W
t
− W
s
[T
s
) =
E(W
t
−W
s
) = 0. Hence E(W
t
[T
s
) = E(W
s
[T
s
) = W
s
. 2
A nonlinear function of a martingale typically is not a martingale. But the next
theorem is a ﬁrst special case of a very general fact: It is sometimes possible to correct
a process by a bounded variation process in such a way that the result is a martingale.
8.3. MARTINGALES 57
8.16 Theorem. The process W
2
t
−t is a square integrable martingale with respect
to the internal history of the driving Wiener process (W
t
)
t≥0
.
Proof: Note that
W
2
t
−W
2
s
= (W
t
−W
s
)
2
+ 2W
s
(W
t
−W
s
)
This gives
E(W
2
t
−W
2
s
[T
s
) = E((W
t
−W
s
)
2
[T
s
) + 2E(W
s
(W
t
−W
s
)[T
s
) = t −s
2
The assertion of 8.16 can be written in the following way: For X
t
= W
2
t
there is a
decomposition X
t
= M
t
+A
t
where (M
t
)
t≥0
is a martingale and (A
t
)
t≥0
is a bounded
variation process. Such a decomposition is a mathematical form of the idea that a
process X
t
is the sum of a (rapidely varying) noise component and a (slowly varying)
trend component.
8.17 Problem. (easy) Let (X
t
)
t≥0
be any process with independent increments
such that E(X
t
) = 0 and E(X
2
t
) = t. Show that assertions 8.15 and 8.16 are valid for
(X
t
)
t≥0
.
8.18 Theorem. The process exp(aW
t
−a
2
t/2) is a martingale with respect to the
internal history of the driving Wiener process (W
t
)
t≥0
.
Proof: Use e
aW
t
= e
a(W
t
−W
s
)
e
aW
s
to obtain
E(e
aW
t
[T
s
) = E(e
a(W
t
−W
s
)
)e
aW
s
= e
a
2
(t−s)/2
e
aW
s
2
The process
c(W)
t
:= exp(W
t
−t/2)
is called the exponential martingale of (W
t
)
t≥0
.
UNDER CONSTRUCTION: MAXIMAL INEQUALITY
8.19 Problem. (intermediate) Let (Ω, T, P) be a probability space and let (T
t
)
t≥0
be a ﬁltration. Let Q ∼ P be an equivalent probability measure. Denote P
t
:= P[T
t
and Q
t
:= Q[T
t
.
(a) Show that P
t
∼ Q
t
for every t ≥ 0.
(b) Show that
dQ
t
dP
t
= E
dQ
dP
T
t
for every t ≥ 0.
(c) Show that the process
dQ
t
dP
t
is a positive martingale such that
dQ
t
dP
t
> 0 Pa.e.
for every t ≥ 0.
58 CHAPTER 8. THE WIENER PROCESS
(d) Prove the socalled „Bayesformula”:
E
Q
(X[T
t
) =
E
P
(X
dQ
dP
[T
t
)
E
P
(
dQ
dP
[T
t
)
whenever X ≥ 0 or X ∈ L
1
(Q).
8.20 Problem. (intermediate) Let (W
t
)
t≥0
be a Wiener process on a probability
space (Ω, T, P) and (T
t
)
t≥0
its internal history. Deﬁne
dQ
t
dP
t
:= e
aW
t
−a
2
t/2
, t ≥ 0.
(a) Show that there is a uniquely determined probability measure Q[T
∞
such that
Q[T
t
= Q
t
, t ≥ 0.
(b) Show that Q[T
∞
is equivalent to P[T
∞
.
(c) Show that
W
t
:= W
t
−at is a Wiener process under Q.
Hint: Prove that for s < t
E
Q
e
λ(
W
t
−
W
s
)
T
s
= e
λ
2
(t−s)/2
.
8.4 Stopping times
Let (X
t
)
t≥0
be a right continuous adapted process such that X
0
= 0 and for some
a > 0 let
τ = inf¦t ≥ 0 : X
t
≥ a¦
The random variable τ is called a ﬁrst passage time: It is the time when the process
hits the level a for the ﬁrst time. By right continuity of the paths we have
τ ≤ t ⇔ max
s≤t
X
s
≥ a (9)
Thus, we have (τ ≤ t) ∈ T
t
for all t ≥ 0.
8.21 Problem. (easy for mathematicians) Prove (9).
8.22 Deﬁnition. A random variable τ : Ω → [0, ∞] is called a stopping time if
(τ ≤ t) ∈ T
t
for all t ≥ 0.
8.23 Problem. (intermediate for mathematicians) Show that every bounded stop
ping time τ is limit of a decreasing sequence of bounded stopping times each of which
has only ﬁnitely many values.
8.4. STOPPING TIMES 59
Hint: Let T = max τ. Deﬁne τ
n
= k/2
n
whenever (k − 1)/2
n
< τ ≤ k/2
n
,
k = 0, . . . , T2
n
.
Let (M
t
)
t≥0
be a martingale. Then we have E(M
t
) = E(M
0
) for every t ≥ 0. This
can be extended to stopping times. The following is the simplest form of the famous
optional stopping theorem.
8.24 Theorem. (Optional stopping theorem) Let (M
t
)
t≥0
be a martingale with
rightcontinuous paths and let τ be a bounded stopping time. Then E(M
τ
) = E(M
0
).
Proof: Assume that τ ≤ T. Let τ
n
↓ τ where τ
n
are bounded stopping times with
ﬁnitely many values. Then it follows from the discrete version of the optional stopping
theorem that E(M
τ
n
) = E(M
0
). Clearly we have M
τ
n
→ M
τ
. Since E(M
τ
n
) =
E(M
T
[T
τ
n
) the sequence (M
τ
n
) is uniformly integrable and the assertion follows. 2
8.25 Problem. (intermediate) Let (X
t
)
t≥0
be an integrable process adapted to a
ﬁltration (T
t
)
t≥0
. Show that (a) implies (b):
(a) E(X
σ
) = E(X
0
) for all bounded stopping times σ.
(b) (X
t
)
t≥0
is a martingale.
Hint: For s < t and F ∈ T
s
deﬁne
τ :=
s whenever ω ∈ F,
t whenever ω ∈ F.
Then τ is a stopping time. From E(X
t
) = E(X
τ
) the assertion follows.
First passage times of the Wiener process
As an application of the optional stopping theorem we derive the distribution of ﬁrst
passage times of the Wiener process.
8.26 Theorem. Let (W
t
)
t≥0
be a Wiener process and for a > 0 and b ∈ R deﬁne
τ
a,b
:= inf¦t : W
t
≥ a +bt¦
Then we have
E(e
−λτ
a,b
1
(τ
a,b
<∞)
) = e
−a(b+
√
b
2
+2λ)
, λ ≥ 0
Proof: Applying the optional stopping theorem to the exponential martingale of
the Wiener process we get
E(e
θW
τ
−θ
2
τ/2
) = 1
for every θ ∈ R and every bounded stopping time τ. Therefore this equation is true for
τ
n
:= τ
a,b
∩ n for every n ∈ N. We note that (use 8.5)
e
θW
τ
n
−θ
2
τ
n
/2
P
→ e
θW
τ
a,b
−θ
2
τ
a,b
/2
1
(τ
a,b
<∞)
60 CHAPTER 8. THE WIENER PROCESS
Applying the dominated convergence theorem it follows (at least for sufﬁciently large
θ) that
E(e
θW
τ
a,b
−θ
2
τ
a,b
/2
1
(τ
a,b
<∞)
) = 1
The rest are easy computations. Since W
τ
a,b
= a + bτ
a,b
we get
E(e
(θb−θ
2
/2)τ
a,b
1
(τ
a,b
<∞)
) = e
−ab
Putting λ := −θb + θ
2
/2 proves the assertion. 2
8.27 Problem. (advanced)
Fill in the details of the proof of 8.26.
8.28 Problem. (easy)
In the following problems treat the cases b > 0, b = 0 and b < 0 separately.
(a) Find P(τ
a,b
< ∞).
(b) Find E(τ
a,b
).
8.29 Problem. (easy)
(a) Does the assertion of the optional sampling theorem hold for the martingale
(W
t
)
t≥0
and τ
a,b
?
(b) Does the assertion of the optional sampling theorem hold for the martingale W
2
t
−t
and τ
a,b
?
8.30 Problem. (intermediate)
(a) Show that P(τ
0,b
= 0) = 1 for every b > 0. (Consider E(e
−λτ
a
n
,b
) for a
n
↓ 0.)
Give a verbal interpretation of this result.
(b) Show that P(max
t
W
t
= ∞, min
t
W
t
= −∞) = 1.
(c) Conclude from (a) that almost all paths of (W
t
)
t≥0
inﬁnitely often cross every
horizontal line.
From 8.26 we obtain the distribution of the ﬁrst passage times.
8.31 Corollary. Let τ
a,b
be deﬁned as in 8.26. Then
P(τ
a,b
≤ t) = 1 −Φ
a +bt
√
t
+ e
−2ab
Φ
−a + bt
√
t
, t ≥ 0
Proof: Let G(t) := P(τ
a,b
≤ t) and let F
a,b
(t) denote the right hand side of the
asserted equation. We want to show that F
a,b
(t) = G(t), t ≥ 0. For this we will apply
the uniqueness of the Laplace transform. Note that 8.26 says that
∞
0
e
−λt
dG(t) = e
−a(b+
√
b
2
+2λ)
, t ≥ 0
8.4. STOPPING TIMES 61
Therefore, we have to show that
∞
0
e
−λt
dF
a,b
(t) = e
−a(b+
√
b
2
+2λ)
, t ≥ 0
This is done by the following simple calculations. First, it is shown that
F
a,b
(t) =
1
√
2π
t
0
a
s
3/2
exp
−
a
2
2s
−
b
2
s
2
−ab
ds
(This is done by calculating the derivatives of both sides.) Then it follows that
e
ab
t
0
e
−λs
dF
a,b
(s) = e
a
√
b
2
+2λ
F
a,
√
b
2
+2λ
(t)
Putting t = ∞the assertion follows. 2
8.32 Problem. (requires calculation skills)
Fill in the details of the proof of 8.31.
8.33 Problem. (easy)
Find the distribution of max
s≤t
W
s
.
The following problems are concerned with ﬁrst passage times for two horizontal
boundaries. Let c, d > 0 and deﬁne
σ
c,d
= inf¦t : W
t
∈ (−c, d)¦
8.34 Problem. (easy)
(a) Show that σ
c,d
is a stopping time.
(b) Show that P(σ
c,d
< ∞) = 1.
For σ
c,d
the application of the optional sampling theorem is straighforward since
[W
t
[ ≤ max¦c, d¦ for t ≤ σ
c,d
.
8.35 Problem. (easy)
Find the distribution of W
σ
c,d
.
Hint: Note that E(W
σ
c,d
) = 0 (why ?) and remember that W
σ
c,d
has only two different
values.
Solution: P(W
σ
c,d
= −c) =
d
c+d
, P(W
σ
c,d
= d) =
c
c+d
8.36 Problem. (easy)
Find E(σ
c,d
).
Hint: Note that E(W
2
σ
c,d
) = E(σ
c,d
) (why ?).
Solution: E(σ
c,d
) = cd.
62 CHAPTER 8. THE WIENER PROCESS
8.37 Discussion. Distribution of σ
c,d
The distribution of the stopping time σ
c,d
is a more complicated story. It is easy
to obtain the Laplace transforms. Obtaining probabilistic information requires much
more analytical efforts.
For reasons of symmetry we have
A :=
W
σ
c,d
=−c
e
−θ
2
σ
c,d
/2
dP =
W
σ
d,c
=c
e
−θ
2
σ
d,c
/2
dP
and
B :=
W
σ
c,d
=d
e
−θ
2
σ
c,d
/2
dP =
W
σ
d,c
=−d
e
−θ
2
σ
d,c
/2
dP
From
1 = E
e
θW
σ
c,d
−θ
2
σ
c,d
/2
and 1 = E
e
θW
σ
d,c
−θ
2
σ
d,c
/2
we obtain a system of equations for A and B leading to
A =
e
θd
−e
−θd
e
θ(c+d)
−e
−θ(c+d)
and B =
e
θc
−e
−θc
e
θ(c+d)
−e
−θ(c+d)
This implies
E(e
−λσ
c,d
) =
e
−c
√
2λ
+e
−d
√
2λ
1 +e
−(c+d)
√
2λ
Expanding this into an inﬁnite geometric series and applying
∞
0
e
−λt
dF
a,0
(t) = e
−a
√
2λ
, t ≥ 0
we could obtain an inﬁnite series expansion of the distribution of σ
c,d
.
(Further reading: KaratzasShreve [15], section 2.8.)
The reﬂection principle
Let (W
t
)
t≥0
be a Wiener process and let (T
t
)
t≥0
be its internal history.
Let s > 0 and consider the process X
t
:= W
s+t
−W
s
, t ≥ 0. Since the Wiener pro
cess has independent increments the process (X
t
)
t≥0
is independent of T
s
. Moreover,
it is easy to see that (X
t
)
t≥0
is a Wiener process. Let us give an intuitive interpretation
of these facts.
Assume that we observe the Wiener process up to time s. Then we know the past
T
s
and the value W
s
at time s. What about the future ? How will the process behave
for t > s ? The future variation of the process after time s is give by (X
t
)
t≥0
. From the
remarks above it follows that the future variation is that of a Wiener process which is
independent of the past. The common formulation of this fact is: At every time s > 0
the Wiener process starts afresh.
8.4. STOPPING TIMES 63
8.38 Problem. (easy)
Show that the process X
t
:= W
s+t
−W
s
, t ≥ 0 is a Wiener process for every s ≥ 0.
There is a simple consequence of the property of starting afresh at every time s.
Note that
W
t
=
W
t
whenever t ≤ s
W
s
+ (W
t
−W
s
) whenever t > s
Deﬁne the corresponding process reﬂected at time s by
W
t
=
W
t
whenever t ≤ s
W
s
−(W
t
−W
s
) whenever t > s
Then it is clear that (W
t
)
t≥0
and (
W
t
)
t≥0
have the same distribution. This assertion
looks rather harmless and selfevident. However, it becomes a powerful tool when it
is extended to stopping times.
8.39 Theorem. (Reﬂection principle)
Let τ be any stopping time and deﬁne
W
t
=
W
t
whenever t ≤ τ
W
τ
−(W
t
−W
τ
) whenever t > τ
Then the distributions of (W
t
)
t≥0
and (
W
t
)
t≥0
are equal.
Proof: Let us show that the single random variables W
t
and
W
t
have equal distri
butions. Equality of the ﬁnite dimensional marginal distributions is shown in a similar
manner.
We have to show that for any bounded continuous function f we have E(f(W
t
)) =
E(f(
W
t
)). For obvious reasons we need only show
τ<t
f(W
t
) dP =
τ<t
f(
W
t
) dP
which is equivalent to
τ<t
f(W
τ
+ (W
t
−W
τ
)) dP =
τ<t
f(W
τ
−(W
t
−W
τ
)) dP
The last equation is obviously true for stopping times with ﬁnitely many values. The
common approximation argument then proves the assertion. 2
8.40 Problem. (advanced)
To get an idea of howthe full proof of the reﬂection principle works showE(f(W
t
1
, W
t
2
)) =
E(f(
W
t
1
,
W
t
2
)) for t
1
< t
2
and bounded continuous f.
Hint: Distinguish between τ < t
1
, t
1
≤ τ < t
2
and t
2
≤ τ.
64 CHAPTER 8. THE WIENER PROCESS
The reﬂection principle offers an easy way for obtaining information on ﬁrst pas
sage times.
8.41 Theorem. Let M
t
:= max
s≤t
W
s
. Then
P(M
t
≥ y, W
t
< y −x) = P(W
t
> y +x), t > 0, y > 0, x ≥ 0
Proof: Let τ := inf¦t : W
t
≥ y¦ and ˜ τ := inf¦t :
W
t
≥ y¦. Then
P(M
t
≥ y, W
t
< y −x) = P(τ ≤ t, W
t
< y −x)
= P(˜ τ ≤ t,
W
t
< y −x)
= P(τ ≤ t, W
t
> y +x)
= P(W
t
> y +x)
2
8.42 Problem. (easy)
Use 8.41 to ﬁnd the distribution of M
t
.
8.43 Problem. (intermediate)
Find P(W
t
< z, M
t
< y) when z < y, y > 0.
8.5 Augmentation
For technical reasons which will become clear later the internal history of the Wiener
process is slightly too small. It is convenient to increase the σﬁelds of the internal
history in a way that does not destroy the basic properties of the underlying process.
This procedure is called augmentation.
8.44 Deﬁnition. Let T
t+
:=
¸
s>t
T
s
and deﬁne
T
t
:= ¦F ∈ T
∞
: P(F´G) = 0 for some G ∈ T
t+
¦
Then (T
t
)
t≥0
is the augmented ﬁltration.
8.45 Problem. (intermediate)
Show that the augmented ﬁltration is really a ﬁltration.
8.46 Corollary. Let (W
t
)
t≥0
be a Wiener process. Then the increments W
t
− W
s
are independent of T
W
s
, s ≥ 0.
Proof: (Outline) It is easy to see that
E(e
a(W
t
−W
s
)
[T
W
s+
) = e
(t−s)/2
8.5. AUGMENTATION 65
This implies
E(1
F
e
a(W
t
−W
s
)
) = P(F)E(e
a(W
t
−W
s
)
)
for every F ∈ T
W
s+
. From the totality of exponentials property (see the proof of 13.11)
it follows that W
t
− W
s
is independent of T
W
s+
. It is clear that this carries over to T
s
.
2
8.47 Problem. (intermediate for mathematicians)
Fill in the details of the proof of 8.46,
8.48 Theorem. Let (T
t
)
t≥0
be a ﬁltration. Then the augmented ﬁltration is right
continuous, i.e.
T
t
=
¸
s>t
T
s
Proof: It is clear that ⊆ holds. In order to prove ⊇ let F ∈
¸
s>t
T
s
. We have to
show that F ∈ T
t
.
For every n ∈ N there is G
n
∈ T
t+1/n
such that P(F´G
n
) = 0. Deﬁne
G :=
∞
¸
m=1
∞
¸
n=m
G
n
=
∞
¸
m=K
∞
¸
n=m
G
n
∈ T
t+1/K
for all K ∈ N.
Then G ∈ T
t+
and P(G´F) = 0. 2
One says that a ﬁltration satisﬁes the ”usual conditions” if it is rightcontinuous
and contains all negligible sets of T
∞
. The internal history of the Wiener process
does not satisfy the usual conditions. However, every augmented ﬁltration satisﬁes the
usual conditions. Thus, 8.46 and 8.48 show that every Wiener process has independent
increments with respect to a ﬁltration that satisﬁes the usual conditions. When we are
dealing with a Wiener process we may suppose that the underlying ﬁltration satisﬁes
the usual conditions.
8.49 Problem. (easy)
Show that the assertions of 8.15, 8.16 and 8.18 are valid for the augmented internal
history of the Wiener process.
Let us illustrate the convenience of ﬁltrations satisfying the usual conditions by a
further result. For some results on stochastic integrals it will be an important point that
martingales are cadlag. A general martingale need not be cadlag. We will show that a
martingale has a cadlag modiﬁcation if the ﬁltration satisﬁes the usual conditions.
8.50 Theorem. Let (X
t
)
t≥0
be a martingale w.r.t. a ﬁltration satisfying the usual
conditions. Then there is a cadlag modiﬁcation of (X
t
)
t≥0
.
Proof: (Outline. Further reading: KaratzasShreve, [15], Chapter 1, Theorem
3.13.)
66 CHAPTER 8. THE WIENER PROCESS
We begin with path properties which are readily at hand: There is a set A ∈ T
∞
,
satisfying P(A) = 1 and such that the restricted process (X
t
)
t∈Q
has paths with right
and left limits for every ω ∈ A. This is a consequence of the upcrossings inequality by
Doob. See KaratzasShreve, [15], Chapter 1, Proposition 3.14, (i).
It is now our goal to modify the martingale in such a way that it becomes cadlag.
The idea is to deﬁne
X
+
t
:= lim
s↓t,s∈Q
X
s
, t ≥ 0.
on A and X
+
t
:= 0 elsewhere. It is easy to see that the paths of (X
+
t
)
t≥0
are cadlag.
Since (T
t
)
t≥0
satisﬁes the usual conditions it follows that (X
+
t
)
t≥0
is adapted. We have
to show that (X
+
t
)
t≥0
is a modiﬁcation of (X
t
)
t≥0
, i.e. X
t
= X
+
t
Pa.e. for all t ≥ 0.
Let s
n
↓ t, (s
n
) ⊆ Q. Then X
s
n
= E(X
s
1
[T
s
n
) is uniformly integrable which
implies X
s
n
L
1
→ X
+
t
. From X
t
= E(X
s
n
[T
t
) we obtain X
t
= E(X
+
t
[T
t
) = X
+
t
Pa.e.
2
8.6 More on stopping times
The interplay between stopping times and adapted processes is at the core of stochas
tic analysis. In this section we try to provide a lot of information for reasons of later
reference. We will state most of the assertions as exercises with hints if necessary.
Throughout the section we assume tacitely that the ﬁltration satisﬁes the usual condi
tions.
Further reading: KaratzasShreve, [15], Chapter 1, section 1.2.
Let τ be a stopping time. The intuitive meaning of (τ ≤ t) ∈ T
t
is as follows: At
every time t it can be decided whether τ ≤ t or not.
8.51 Problem. (intermediate)
Show that τ is a stopping time iff (τ < t) ∈ T
t
for every t ≥ 0.
8.52 Problem. (easy)
Let σ, τ and τ
n
be stopping times.
(a) Then σ ∩ τ, σ ∪ τ and σ +τ are stopping times.
(b) τ +α for α ≥ 0 and λτ for λ ≥ 1 are stopping times.
(c) sup
n
τ
n
, inf
n
τ
n
are stopping times.
Let (X
t
)
t≥0
be a process adapted to a ﬁltration (T
t
)
t≥0
and let A ⊆ R. Deﬁne
τ
A
= inf¦t : X
t
∈ A¦
Then τ
A
is called the hitting time of the set A.
8.53 Remark. The question, for which sets A a hitting time τ
A
is a stopping time,
is completely solved. The solution is as follows.
8.6. MORE ON STOPPING TIMES 67
We may assume that P[T
∞
is complete, i.e. that all subsets of negligible sets are
added to T
∞
. The whole theory developed so far is not affected by such a completion.
We could assume from the beginning that our probability space is complete. The
reason why we did not mention this issue is simple: We did not need completeness so
far.
However, the most general solution of the hitting time problem needs complete
ness. The following is true:
If P[T
∞
is complete and if the ﬁltration satisﬁes the usual conditions then the
hitting time of every Borel set is a stopping time.
For further comments see JacodShiryaev, [14], Chapter I, 1.27 ff.
For particular cases the stopping time property of hitting times is easy to prove.
8.54 Theorem. Assume that (X
t
)
t≥0
has rightcontinuous paths and is adapted to
a ﬁltration which satisﬁes the usual conditions.
(a) Then τ
A
is a stopping time for every open set A.
(b) If (X
t
)
t≥0
has continuous paths then τ
A
is a stopping time for every closed set A.
Proof: (a) Note that
τ < t ⇔ X
s
∈ A for some s < t
Since A is open and (X
t
)
t≥0
has rightcontinuous paths it follows that
τ < t ⇔ X
s
∈ A for some s < t, s ∈ Q
(b) Let A be closed and let (A
n
) be open neighbourhoods of A such that
¯
A
n
↓ A.
Deﬁne τ := lim
n→∞
τ
A
n
≤ τ
A
which exists since τ
A
n
↑. We will show that τ = τ
A
.
Since τ
A
n
≤ τ
A
we have τ ≤ τ
A
. By continuity of paths we have X
τ
A
n
→ X
τ
whenever τ < ∞. Since X
τ
A
n
∈
¯
A
n
it follows that X
τ
∈ A whenever τ < ∞. This
implies τ
A
≤ τ. 2
We need a notion of the past of a stopping time.
8.55 Problem. (intermediate) A stochastic interval is an interval whose boundaries
are stopping times.
(a) Show that the indicators of stochastic intervals are adapted processes.
Hint: Consider 1
(τ,∞)
and 1
[τ,∞)
.
(b) Let τ be a stopping time and let F ⊆ Ω. Show that the process 1
F
1
[0,τ)
is adapted
iff F ∩ (τ ≤ t) ∈ T
t
for all t ≥ 0.
(c) Let T
τ
:= ¦F : F ∩ (τ ≤ t) ∈ T
t
, t ≥ 0¦. Show that T
τ
is a σﬁeld.
8.56 Deﬁnition. Let τ be a stopping time. The σﬁeld T
τ
is called the past of τ.
The intuitive meaning of the past of a stopping time is as follows: An event F is in
the past of τ if at every time t the occurrence of F can be decided provided that τ ≤ t.
68 CHAPTER 8. THE WIENER PROCESS
Many of the subsequent assertions can be understood intuitively if this interpretation
is kept in mind.
8.57 Problem. (advanced)
Let σ and τ be stopping times.
(a) If σ ≤ τ then T
σ
⊆ T
τ
.
(b) T
σ∩τ
= T
σ
∩ T
τ
.
(c) The sets (σ < τ), (σ ≤ τ) and (σ = τ) are in T
σ
∩ T
τ
.
Hint: Start with proving (σ < τ) ∈ T
τ
and (σ ≤ τ) ∈ T
τ
.
(d) Show that every stopping time σ is T
σ
measurable.
(e) Let τ
n
↓ τ. Show that T
τ
=
¸
∞
n=1
T
τ
n
.
There is a fundamental rule for iterated conditional expectations with respect to
pasts of stopping times.
8.58 Theorem. Let Z be an integrable or nonnegative random variable and let σ
and τ be stopping times. Then
E(E(Z[T
σ
)[T
τ
) = E(Z[T
σ∩τ
)
Proof: The proof is bit tedious and therefore many textbooks pose it as exercise
problem (see KaratzasShreve, [15], Chapter 1, 2.17). Let us give more detailed hints.
We have to start with showing that
F ∩ (σ < τ) ∈ T
σ∩τ
and F ∩ (σ ≤ τ) ∈ T
σ∩τ
whenever F ∈ T
σ
Note that the nontrivial part is to show ∈ T
τ
. The trick is to observe that on (σ ≤ τ)
we have (τ ≤ t) = (τ ≤ t) ∩ (σ ≤ t).
The second step is based on the ﬁrst step and consists in showing that
1
(σ≤τ)
E(Z[T
σ
) = 1
(σ≤τ)
E(Z[T
σ∩τ
) (10)
Finally, we prove the assertion separately on (σ ≤ τ) and (σ ≥ τ). For case 1 we
apply (10) to the inner conditional expectation. For case 2 we apply (10) to the outer
conditional expectation (interchanging the roles of σ and τ). 2
We arrive at the most important result on stopping times and martingales. Aprelim
inary technical problem is whether an adapted process stopped at σ is T
σ
measurable.
Intuitively, this should be true.
8.59 Discussion. Measurability of stopped processes
Let (X
t
)
t≥0
be an adapted process and σ a stopping time. We ask whether X
σ
1
(σ<∞)
is T
σ
measurable.
8.6. MORE ON STOPPING TIMES 69
It is easy to prove the assertion for rightcontinuous processes with the help of
8.23. This would be sufﬁcient for the optional stopping theorem below. However,
for stochastic integration we want to be sure that the assertion is also valid for left
continuous processes. This can be shown in the following way.
Deﬁne
X
n
t
:= n
t
0
X
s
e
n(s−t)
ds
Then (X
n
t
)
t≥0
are continuous adapted processes such that X
n
t
→ X
t
provided that
(X
t
)
t≥0
has leftcontinuous paths. Since the assertion is true for (X
n
t
) it carries over
to (X
t
).
8.60 Theorem. (Optional stopping theorem) Let (M
t
)
t≥0
be a right continuous
martingale. If σ is a bounded stopping time and τ is any stopping time then
E(M
σ
[T
τ
) = M
σ∩τ
Proof: The proof is based on the following auxiliary assertion: Let τ be a bounded
stopping time and let M
t
:= E(Z[T
t
) for some integrable random variable Z. Then
M
τ
= E(Z[T
τ
).
Let τ be a stopping time with ﬁnitely many values t
1
< t
2
< . . . < t
n
. Then
M
t
n
−M
τ
=
n
¸
k=1
(M
t
k
−M
t
k−1
)1
(τ≤t
k−1
)
(Prove it on (τ = t
j−1
)). It follows that E((M
t
n
− M
τ
)1
F
) = 0 for every F ∈ T
τ
.
This proves the auxiliary assertion for stopping times with ﬁnitely many values. The
extension to arbitrary bounded stopping times is done by 8.23.
Let T = sup σ. The assertion of the theorem follows from
E(M
σ
[T
τ
) = E(E(M
T
[T
σ
)[T
τ
) = E(M
T
[T
σ∩τ
) = M
σ∩τ
.
2
We ﬁnish this section with two consequences of the optional stopping theorem
which are fundamental for stochastic integration.
8.61 Corollary. Let τ be any stopping time. If (M
t
)
t≥0
is a martingale then
(M
τ∩t
)
t≥0
is a martingale, too.
8.62 Problem. (easy)
Prove 8.61.
8.63 Corollary. Let (M
t
)
t≥0
be a martingale. Let σ ≤ τ be stopping times and let
Z be T
σ
measurable and bounded. Then Z(M
τ∩t
−M
σ∩t
) is a martingale, too.
70 CHAPTER 8. THE WIENER PROCESS
8.64 Problem. (intermediate)
Prove 8.63.
Hint: Apply 8.25.
UNDERCONSTRUCTION: EXTENSIONTOSUB ANDSUPERMARTINGALES
8.7 The Markov property
We explain and discuss the Markov property at hand of the Wiener process.
When we calculate conditional expectations given the past T
s
of a stochastic pro
cess (X
t
)
t≥0
then from the general point of view conditional expectations E(X
t
[T
s
)
are T
s
measurable, i.e. they depend on any X
u
, u ≤ s. But when we were dealing
with special conditional expectations given the past of a Wiener process then we have
got formulas of the type
E(W
t
[T
s
) = W
s
, E(W
2
t
[T
s
) = W
2
s
+ (t −s), E(e
aW
t
[T
s
) = e
aW
s
+a
2
(t−s)/2
These conditional expectations do not use the whole information available in T
s
but
only the value W
s
of the Wiener process at time s.
8.65 Theorem. Let (W
t
)
t≥0
be a Wiener process and (T
t
)
t≥0
its internal history.
Then for every Pintegrable function Z which is σ(
¸
u≥s
T
u
)measurable we have
E(Z[T
s
) = φ(W
s
)
where φ is some measurable function.
Proof: For the proof we only have to note that the system of functions
e
a
1
W
s+h
1
+a
2
W
s+h
2
+···+a
n
W
s+h
n
, h
i
≥ 0, n ∈ N,
is total in L
2
(σ(
¸
u≥s
T
u
)). 2
8.66 Problem. (intermediate)
Under the assumptions of 8.65 show that E(Z[T
s
) = E(Z[W
s
).
8.65 is the simplest and basic formulation of the Markov property. It is, however,
illuminating to discuss more sophisticated versions of the Markov property. We need
some preliminaries.
8.67 Remark. Redundant conditioning
We have to be aware of an important property of conditional expectations.
Let X be /measurable and let Y be Pintegrable and independent of /. Then we
know that
E(XY [/) = XE(Y [/) = XE(Y )
8.7. THE MARKOV PROPERTY 71
The conditional expectation depends on / only through X. This can be understood
intuitively in the following way: Since X is /measurable the information in / gives
the whole information on X. The rest (i.e. Y ) is independent of /. Note, that the
equation can be written as follows:
E(XY [/) = φ ◦ X where φ(ξ) = E(ξY )
This view can be extended to much more general cases:
E(f(X, Y )[/) = φ ◦ X where φ(ξ) = E(f(ξ, Y ))
(provided that f is sufﬁciently integrable).
Let us calculate E(f(W
s+t
)[T
s
) where f is bounded and measurable. We have
E(f(W
s+t
)[T
s
) = E(f(W
s
+ (W
s+t
−W
s
))[T
s
)
Since W
s
is T
s
measurable and W
s+t
−W
s
is independent of T
s
we have
E(f(W
s+t
)[T
s
) = φ ◦ W
s
where φ(ξ) = E(f(ξ + (W
s+t
−W
s
))) (11)
Roughly speaking, conditional expectations simply are expectations depending de
pending on a parameter slot where the present value of the process has to be plugged
in.
8.68 Theorem. Let (W
t
)
t≥0
be a Wiener process and (T
t
)
t≥0
its internal history.
Then the conditional distribution of (W
s+t
)
t≥0
given T
s
is the same as the distribution
of a process ξ +
W
t
where ξ = W
s
and (
W
t
)
t≥0
is any (other) Wiener process.
Proof: Extend (11) to functions of several variables. 2
8.68 contains that formulation which is known as the ordinary Markov property of
the Wiener process. It says that at every time point s the Wiener process starts afresh at
the state ξ = W
s
as a new Wiener process forgetting everything what happened before
time s.
It is a remarkable fact with far reaching consequences that the Markov property
still holds if time s is replaced by a stopping time. The essential preliminary step is the
following.
8.69 Theorem. Let τ be any stopping time and deﬁne Q(F) = P(F[τ < ∞),
F ∈ T
∞
. Then the process
X
t
:= W
τ+t
−W
τ
, t ≥ 0,
is a Wiener process under Q which is independent of T
τ
.
Proof: (Outline) Let us show that
F
f(W
τ+t
−W
τ
) dP = P(F)E(f(W
t
))
72 CHAPTER 8. THE WIENER PROCESS
when F ⊆ (τ < ∞), F ∈ T
τ
and f is any bounded continuous function. But this is
certainly true for stopping times with ﬁnitely many values. The common approxima
tion argument proves the equation. Noting that the equation holds for τ + s, s > 0,
replacing τ, proves the assertion. 2
8.70 Problem. (advanced)
Fill in the details of the proof of 8.69.
8.71 Theorem. (Strong Markov property)
Let (W
t
)
t≥0
be a Wiener process and (T
t
)
t≥0
its internal history. Let σ be any
stopping time. Then on (σ < ∞) the conditional distribution of (W
σ+t
)
t≥0
given T
σ
is
the same as the distribution of a process ξ +
W
t
where ξ = W
σ
and (
W
t
)
t≥0
is some
(other) Wiener process.
Further reading: KaratzasShreve [15], sections 2.5 and 2.6.
Chapter 9
The ﬁnancial market picture
This chapter gives an overview over the basic concepts of pricing in ﬁnancial markets.
We restrict our view to trading strategies with a ﬁnite number of trading times. How
ever, all concepts are formulated such that they are valid also in the general case which
will be tractable by stochastic analysis.
During the presentation of stochastic analysis we will refer to the concepts and
ideas discussed in this chapter. Thus, the present chapter is the basic motivation for
going into the troubles of stochastic analysis.
9.1 Assets and trading strategies
Let S = (S
0
, S
1
, . . . , S
m
) be a ﬁnite set of rightcontinuous processes modelling the
value of tradable assets of a ﬁnancial market. We consider a ﬁnite time horizon [0, T],
T < ∞. A trading strategy in S is a process which determines how many units of each
asset are held during a time interval:
H
k
t
=
n
¸
j=1
a
k
j−1
1
(σ
j−1
,σ
j
]
, k = 0, 1, 2, . . . , n
where 0 = σ
0
≤ σ
1
≤ . . . ≤ σ
n
= T are stopping times and a
k
j−1
is T
σ
j−1
measurable.
Thus, the processes H
k
t
are adapted and leftcontinuous processes. Leftcontinuity is
essential because trading strategies must be predictable.
The market value of the portfolio (H
k
t
)
k
at time t is given by
V
t
=
¸
k
H
k
t
S
k
t
The process (V
t
) is the wealth process corresponding to the trading strategy.
The trading strategy is called selfﬁnancing if the changes in the portfolio at σ
j
are
ﬁnanced by nothing else than the value of the portfolio:
¸
k
H
k
σ
j−1
S
k
σ
j
=
¸
k
H
k
σ
j
S
k
σ
j
(12)
73
74 CHAPTER 9. THE FINANCIAL MARKET PICTURE
The selfﬁnancing property has the consequence that the wealth process can be writ
ten as a gambling system like (6). Later we will see that for continuous trading the
corresponding representation is that of an integral.
9.1 Theorem. A trading strategy (H
k
)
k
is selfﬁnancing iff
V
t
= V
0
+
¸
k
¸
j
H
k
σ
j−1
(S
k
σ
j
∩t
−S
k
σ
j−1
∩t
) (13)
Proof: It is easy to see that 13 implies 12.
Conversely, if the trading strategy is selfﬁnancing then for t ∈ (σ
j−1
, σ
j
] we have
V
t
−V
σ
j−1
=
¸
k
(H
k
σ
j−1
S
k
t
−H
k
σ
j−2
S
k
σ
j−1
) (14)
=
¸
k
H
k
σ
j−1
(S
k
t
−S
k
σ
j−1
) (15)
This formula can be extended to t ∈ [0, T] by writing it as
V
σ
j
∩t
−V
σ
j−1
∩t
=
¸
k
H
k
σ
j−1
(S
k
σ
j
∩t
−S
k
σ
j−1
∩t
)
The assertion follows from V
t
−V
0
=
¸
j
(V
σ
j
∩t
−V
σ
j−1
∩t
). 2
9.2 Problem. Fill in the details of the proof of 9.1.
The value of assets is measured in terms of a unit of money. The unit of money
can be a currency or some other positive value process. It is important to know how
the properties of trading strategies behave under a change of the unit of money.
It should be noted that formula (16) only holds for wealth processes of selfﬁnancing
trading strategies.
9.3 Theorem. Let N be any positive rightcontinuous process. A trading strategy
(H
k
)
k
is selfﬁnancing for S = (S
0
, S
1
, . . . , S
m
) iff it is selfﬁnancing for
¯
S where
¯
S
k
= S
k
/N. If V and
¯
V are the corresponding wealth processes then
˜
V
t
=
V
t
N
t
, t ≥ 0. (16)
Proof: The ﬁrst part follows easily by dividing (12) by N
σ
j
.
For proving (16) we get from (14) that
¯
V
t
−
¯
V
σ
j−1
=
V
t
N
t
−
V
σ
j−1
N
σ
j−1
whenever σ
j−1
< t ≤ σ
j
With t = σ
j
an induction argument implies
¯
V
σ
j
= V
σ
j
/N
σ
j
. This proves the assertion.
2
9.2. FINANCIAL MARKETS AND ARBITRAGE 75
9.2 Financial markets and arbitrage
A claim at time t = T is any T
T
measurable random variable C. The fundamental
problem of mathematical ﬁnance is to ﬁnd a reasonable price x
0
at time t = 0 for the
claim C.
There are two methods to ﬁnd a price x
0
for the claim. The insurance method is to
deﬁne x
0
as the expectation under P of the discounted claim. The risk of this kind of
pricing is controlled by selling a large number of claims. Then by the LLN the average
cost of that set of claims equals x
0
. But this works only if the claims are independent.
That might be true for insurance but not for ﬁnancial markets.
The more recent and most important method of pricing is risk neutral pricing using
hedge strategies. This leads to the concept of a market model.
9.4 Deﬁnition. A ﬁnancial market is a set ´of wealth processes (for the moment
rightcontinuous processes) with the following properties:
(1) ´ is a vector space, i.e. every linear combination of wealth process in ´ is
contained in ´.
(2) Every selfﬁnancing trading strategy based on ﬁnitely many wealth processes
in ´leads to a wealth process in ´.
9.5 Problem. Show that (2) implies (1) in 9.4.
A claim C has a hedge in the market ´(is attainable) if there is a wealth process
V ∈ ´satisfying V
T
= C. In this case it looks reasonable to deﬁne V
0
= x
0
. This is
called risk neutral pricing. However, the risk neutral pricing method is only reasonable
if the price does not allow arbitrage.
The idea is the following: A price x
0
for a claim C is arbitragefree if there does
not exist a wealth process V ∈ ´such that V
0
= x
0
and V
T
≥ C, V
T
= C. Alas, such
a concept does not work since there are very plausible market models where arbitrage
can be achieved with highly risky wealth processes. Such wealth processes have to be
excluded from competition.
9.6 Deﬁnition. A wealth process V ∈ ´ is called admissible if it is bounded
from below.
9.7 Deﬁnition. A price x
0
is an arbitragefree price for a claimC if there does not
exist an admissible wealth process V ∈ ´such that V
0
= x
0
and V
T
≥ C, V
T
= C.
How can we be sure that risk neutral pricing leads to arbitragefree prices ?
9.3 Martingale measures
Let ´be a market model. The common answer to the question posed is the existence
of a socalled martingale measure.
76 CHAPTER 9. THE FINANCIAL MARKET PICTURE
A generating system of ´ is a subset ´
0
⊆ ´ such that every wealth process
V ∈ ´is generated by a trading strategy based on ﬁnitely many elements of ´
0
.
9.8 Deﬁnition. A martingale measure is a probability measure Q ∼ P such that
all wealth processes of some generating system are martingales.
9.9 Lemma. If there exists a martingale measure then all admissible wealth
processes V ∈ ´satisfy
E(V
t
[T
s
) ≤ V
s
whenever s < t.
(Admissible wealth processes are ”supermartingales”).
The proof of this lemma is postponed.
The following theorem is fundamental for the modern theory of pricing in ﬁnancial
markets.
9.10 Theorem. Let ´ be a market model and Q some martingale measure. Let
C be an attainable claim with a hedge whose wealth process is a Qmartingale (a
martingale hedge). Then x
0
:= E
Q
(C) is an arbitragefree price.
Proof: Let C be a claim and let V be the wealth process of a martingale hedge of
the claim. Clearly, we have x
0
= V
0
.
Let V
1
∈ ´ be an admissible wealth process such that V
1
0
= V
0
= x
0
and
V
1
T
≥ C = V
T
. Then we have
E
Q
(V
T
) = V
0
and E
Q
(V
1
T
) ≤ V
1
0
= V
0
Since V
1
T
− V
T
≥ 0 and E
Q
(V
1
T
− V
T
) ≤ 0 it follows that V
1
T
= V
T
= C Qa.e. und
hence Pa.e. 2
Theorem 9.10 shows that the existence of a martingale measure makes risk neutral
pricing an easy exercise (at least in theory). Therefore it is important to ask whether
we may expect to have martingale measures for arbitragefree markets. For common
ﬁnancial market models martingale measures exist, as a rule. However, for a mathe
matician it is a challenge to ask whether the existence of a martingale measure is only
some sufﬁcient condition for a market to be arbitragefree, or it is even necessary. This
turns out to be a rather delicate question.
9.11 Deﬁnition. Amarket model ´is arbitragefree if x
0
= 0 is an arbitragefree
price of the claim C = 0.
9.12 Problem. Show that the existence of a martingale measure implies that the
market is arbitragefree.
It is not true that every arbitragefree market admits martingale measures.
9.4. CHANGE OF NUMERAIRE 77
9.4 Change of numeraire
There is an easy aspect of the existence of martingale measures which is important for
practical purposes.
Assume that one of the assets of the market is a riskless (for the moment: non
stochastic) positive asset, e.g. a bank account N
t
= e
rt
.
9.13 Problem. In an arbitragefree market all riskless wealth processes are pro
portional.
9.14 Problem. Show: If a martingale measure exists then all riskless assets of
the market are constant.
If a market contains a bank account with positive interest then there cannot exist a
martingale measure ! This message sounds a bit disappointing. Fortunately, there is an
easy solution of that problem.
Assume that there exists some positive tradable asset N (a numeraire). Then we
may deﬁne this asset as our unit of money. For the market model this amounts to
dividing all value processes by N, resulting in a socalled normalized market
´con
sisting of the wealth processes given in 9.3. The normalized market has only constant
riskless assets proportional to 1 = N/N and therefore does not exclude the existence
of a martingale measure. If we try to ﬁnd martingale measures, ﬁrst we have to look
for numeraires in order to normalize the market.
Summing up, the risk neutral pricing machinery runs as follows:
(1) Find a numeraire N and turn to the normalized market.
(2) Find a martingale measure Q for the normalized market.
(3) If C/N
T
has a martingale hedge in the normalized market then deﬁne the price
to be x
0
= N
0
E
Q
(C/N
T
).
78 CHAPTER 9. THE FINANCIAL MARKET PICTURE
Chapter 10
Stochastic calculus
10.1 Elementary Integration
Bounded variation
Let f : [0, T] →R be any function.
10.1 Deﬁnition. The variation of f on the interval [s, t] ⊆ [0, T] is
V
t
s
(f) := sup
n
¸
j=1
[f(t
j
) −f(t
j−1
)[
where the supremum is taken over all subdivisions s = t
0
< t
1
< . . . < t
n
= t and all
n ∈ N.
A function f is of bounded variation on [0, T] if V
T
0
(f) < ∞. The set of all
functions of bounded variation is denoted by BV ([0, T]).
10.2 Problem. (intermediate)
Let f be differentiable on [s, t] with continuous derivative. Then f ∈ BV and
V
t
s
(f) =
t
s
[f
(u)[du
10.3 Problem. (easy)
Show that BV is a vector space.
10.4 Problem. (very easy)
Show that monotone functions are BV and calculate their variation.
79
80 CHAPTER 10. STOCHASTIC CALCULUS
10.5 Problem. (intermediate)
Show that any function f ∈ BV can be written as f = g − h where g, h are
increasing and satisfy V
t
0
(f) = g(t) +h(t).
Hint: Let g(t) := (V
t
0
(f) +f(t))/2 and h(t) := (V
t
0
(f) −f(t))/2.
10.6 Problem. Which BVfunctions are Borelmeasurable ?
There are continuous functions on compact intervals which are not of bounded
variation.
The CauchyStieltjes integral
Let T ([0, T]) be the set of all leftcontinuous stepfunctions on [0, T], i.e. functions of
the form
f(t) =
n
¸
k=1
a
k
1
(t
k−1
,t
k
]
where 0 = t
0
< t
1
< . . . < t
n
= T is some subdivision. If f ∈ T and if g is
rightcontinuous then we deﬁne
T
0
f dg :=
n
¸
k=1
a
k
(g(t
k
) −g(t
k−1
)) (17)
If g is increasing then this deﬁnition coincides with
f dλ
g
. In a similar way as for
the abstract integral we conclude that 17 is a valid deﬁnition being linear both in f and
g.
We want to extend the integral to a more general class of functions f. Of course,
this could be done along the lines of general integration theory. But we will describe
the older and more elementary approach by Cauchy for its formal similarity to Protter’s
([19]) deﬁnition of the stochastic integral.
Recall that a sequence of subdivisions 0 = t
0
< t
1
< . . . < t
n
= T is called
Riemannian if max [t
k
−t
k−1
[ → 0.
10.7 Lemma. Let f be leftcontinuous with limits from the right (caglad). Then
for any Riemannian sequence of subdivisions the sequence of step functions
f
n
:=
n
¸
k=1
f(t
k−1
)1
(t
k−1
,t
k
]
converges uniformly to f, i.e. [[f
n
−f[[
u
→ 0.
Proof: This is a beginner’s lemma if f is continuous on [0, T]. If f has inﬁnitely
many jumps one has to work a little harder. 2
Such step functions on Riemannian sequences of subdivisions can be used to ex
tend the integral to arbitrary cagladfunctions due to the following inequality.
10.1. ELEMENTARY INTEGRATION 81
10.8 Lemma. If f ∈ T and if g is rightcontinuous then
T
0
f dg
≤ [[f[[
u
V
T
0
(g)
This lemma implies that the integral is continuous under uniform convergence of
the integrands provided that g is of bounded variation. In particular, if (f
n
) converges
uniformly then the sequence of integrals is also convergent. This leads to the deﬁnition
of the integral.
10.9 Deﬁnition. Let f be caglad and g be cadlag and of bounded variation. Then
the CauchyStieltjes integral is
T
0
f dg := lim
n→∞
T
0
f
n
dg
where (f
n
) is any sequence in T converging uniformly to f.
10.10 Problem. Show that for increasing g the CSintegral coincides with the
abstract integral for λ
g
.
For g(t) = t this is the notion of an integral that is taught in schools.
10.11 Remark. Later the notion of the stochastic integral will be deﬁned in the
following way (due to Protter [19]). The integrators g are extended to adapted cadlag
processes for which the integrals of adapted caglad stepprocesses have a continuity
property similar to the CSintegral. It turns out that not only processes with BVpaths
have such a continuity property but also martingales. For such processes (socalled
semimartingales) the deﬁnition of the integral works for adapted caglad processes. In
this way all processes can be used as integrators which can be written as a sum of a
cadlag martingale and an adapted cadlag BVprocess.
Differential calculus
Let f be caglad on [0, T] and g ∈ BV ([0, T]) be cadlag. For notational convenience
we deﬁne
t
0
f dg :=
T
0
1
(0,t]
f dg, 0 ≤ t ≤ T,
and
f • g : t →
t
0
f dg, 0 ≤ t ≤ T.
10.12 Theorem. Let f be caglad on [0, T] and g ∈ BV ([0, T]) be cadlag. Then
the following assertions are true:
82 CHAPTER 10. STOCHASTIC CALCULUS
(a) f • g is cadlag.
(b) f • g is of bounded variation since V (f • g) = [f[ • V (g).
(c) If g is continuous then f • g is continuous.
Proof: (a) and (c) are due to the fact that cadlag and continuity is inherited under
uniform convergence. The equation under (b) is a consequence of the corresponding
equation for stepfunctions. 2
Now we turn to the three basic rules of differential calculus. These are the proto
types for the corresponding rules of stochastic calculus. The rules are concerned with
the evaluation of
T
0
f d(g • h),
T
0
f d(gh),
T
0
f d(g ◦ h)
We assume tacitely that all involved functions fulﬁl those conditions which are re
quired for making the expressions welldeﬁned.
The ﬁrst rule is associativity. Let f and g be caglad and h cadlag and BV. Then
T
0
f d(g • h) =
T
0
fg dh, in short: d(g • h) = g dh
This is true by deﬁnition for f = 1
(0,t]
and extends to general f by a straightforward
induction argument.
There is an important special case. Let h(t) = t and let G be the primitive of g, i.e.
G
= g. Since G = g • h we obtain
T
0
f dG =
T
0
f(s)G
(s) ds, in short: dG(s) = G
(s) ds
The second rule is the product rule which in integral notation is called integration
by parts. For this let g and h be continuous and BV. Then
T
0
f d(gh) =
T
0
fg dh +
T
0
fhdg, in short: d(gh) = g dh + hdg
For f = 1
(0,t]
this means
g(t)h(t) = g(0)h(0) +
t
0
g dh +
t
0
hdg
This gives wellknown formulas if g and h are differentiable.
The proof runs over approxmation by stepfunctions. Let 0 = t
0
< t
1
< . . . <
t
n
= t by some subdivision. Deﬁne ∆g(t
j
) := g(t
j
) − g(t
j−1
) and ∆h(t
j
) similarly.
Then
g(t)h(t) = g(0)h(0) +
n
¸
j=1
g(t
j−1
)∆h(t
j
) +
n
¸
j=1
h(t
j−1
)∆g(t
j
) +
n
¸
j=1
∆g(t
j
)∆h(t
j
)
10.2. THE STOCHASTIC INTEGRAL 83
The assertion follows since by the BVproperty the last term tends to zero for a Rie
mannian sequence of subdivisions.
At this point we can already see some of the frictions with extending such formulas
to the stochastic case. The notorious last term vanishes if at least one of the functions
g or h is BV. But if both functions are Wiener paths then the last term tends to the
quadratic variation !
The last rule is the chain rule or substitution rule. Let g be continuously differen
tiable. Then
T
0
f d(g ◦ h) =
T
0
f (g
◦ h) dh, in short: d(g ◦ h) = (g
◦ h) dh
The special case dg(t) = g
(t)dt is the chain rule of ordinary calculus.
There is an elegant proof verifying the formula for arbitrary power functions g(t) =
t
k
by the product rule, passing to polynomials and ﬁnally applying the Weierstrass
approximation theorem. A different approach is based on Taylor polynomials where
the terms of higher order vanish by the BVproperty of h. Both proofs indicate that in
the stochastic case the formula will have to be changed by including quadratic variation
terms. The result will be the Itoformula.
10.2 The stochastic integral
Let (Z
t
)
t≥0
be any rightcontinuous adapted process. It is our goal to deﬁne an integral
T
0
H dZ, t ≥ 0,
for left continuous adapted processes (H
t
)
t≥0
. Let L
0
be the set of all leftcontinuous
adapted processes. There are subsets of L
0
where the deﬁnition of the integral is easy.
The integral of stepfunctions
Let c
0
be the set of processes of the form
H
t
(ω) =
n
¸
j=0
a
j−1
(ω)1
(s
j−1
,s
j
]
(t)
where 0 = s
0
< s
1
< . . . < s
n
= T is a subdivision and a
j
is T
s
j
measurable for
every j. It is easy to see that (H
t
)
t≥0
is leftcontinuous and adapted.
A bit more general is the set c of processes
H
t
(ω) =
n
¸
j=0
a
j−1
(ω)1
(σ
j−1
,σ
j
]
(t) (18)
84 CHAPTER 10. STOCHASTIC CALCULUS
where 0 = σ
0
< σ
1
< . . . < σ
n
= T is a subdivision and a
j
is T
σ
j
measurable for
every j. Again it is obvious that the paths are leftcontinuous and from 8.55(b) we
know that the processes in c are adapted.
10.13 Problem. Let H ∈ c be deﬁned by (18). Show that
1
(0,t]
H =
¸
n
j=1
a
j−1
1
(σ
j−1
∩t,σ
j
∩t]
.
For functions in c it obvious how to deﬁne the integral. This can be done pathwise
and leads to the following deﬁnition:
t
0
H dZ :=
T
0
1
(0,t]
H dZ =
n
¸
j=1
a
j−1
(Z
σ
j
∩t
−Z
σ
j−1
∩t
)
if H is deﬁned by (18). Since for each single path this is an ordinary Stieltjes integral
we have immediately the properties:
t
0
(αH
1
+βH
2
) dZ = α
t
0
H
1
dZ +β
t
0
H
2
dZ (19)
t
0
Hd(αZ
1
+ βZ
2
) = α
t
0
H dZ
1
+β
t
0
H dZ
2
(20)
For notational convenience denote H • Z : t →
t
0
H dZ.
10.14 Theorem. Let (M
t
)
t≥0
be a martingale and let H ∈ c be bounded. Then
H • M is a martingale.
Proof: Apply 8.63. 2
10.15 Discussion. Financial markets
Let us continue the ﬁnancial market framework of chapter 9.
First we observe that the representation of selfﬁnancing trading strategies in 9.1
can be written as an integral:
V
t
= V
0
+
¸
k
¸
j
H
k
σ
j−1
(S
k
σ
j
∩t
−S
k
σ
j−1
∩t
) = V
0
+
¸
k
t
0
H
k
s
dS
k
s
Thus, the selfﬁnancing property is characterized by the equation
V
t
=
¸
k
H
k
t
S
k
t
=
¸
k
H
k
0
S
k
0
+
¸
k
t
0
H
k
s
dS
k
s
(21)
If there is a martingale measure Q and if the trading strategy is bounded then the
corresponding wealth process is a martingale under Q.
10.2. THE STOCHASTIC INTEGRAL 85
It follows that every claim C which can be hedged by a bounded selfﬁnancing
trading strategy in c is a martingale hedge and the pricing formula x
0
= E
Q
(C) can
be applied.
However, many claims cannot be (exactly) hedged using only ﬁnitely many trad
ing times. Therefore for dealing with general claims we have to consider continuous
trading strategies. The selfﬁnancing property of continuous trading strategies will be
deﬁned by a formula like (21) and for this we have to extend our notion of the integral
to continuous integrands. If the integrators are processes with paths of bounded varia
tion then the integral extension could be done pathwise like a CSintegral. But if our
assets behave like random walks, e.g. driven by a Wiener process, there is no hope to
have paths of bounded variation.
Semimartingales
10.16 Deﬁnition. A right continuous process (X
t
)
t≥0
is a semimartingale if for
every sequence (H
n
) of processes in c the following condition holds:
sup
s≤T
[H
n
s
[ → 0 ⇒ sup
t≤T
t
0
H
n
s
dZ
s
P
→ 0
The set of all semimartingales is denoted by o.
It will turn out that a reasonable extension process of the stochastic integral can be
carried out for integrator processes which are semimartingales. It is therefore impor
tant to get an overview over typical processes that are semimartingales. Before we turn
to such examples let us study the structure of the set o of semimartingales.
10.17 Problem. (intermediate for mathematicians)
Show that:
(a) The set of semimartingales is a vector space.
(b) If X ∈ o then for every stopping time the stopped process X
τ
:= (X
τ∩t
)
t≥0
is
a semimartingale.
(c) Let τ
n
↑ ∞be a sequence of stopping times such that X
τ
n
∈ o for every n ∈ N.
Then X ∈ o.
Hint: Note that (X
t
= X
τ
n
t
) ⊆ (τ
n
< t).
The concept of semimartingales is only reasonable if it covers adapted cadlag pro
cesses with paths of bounded variation. From 10.8 it follows that this is actually the
case.
The following result opens the door to stochastic processes like the Wiener process.
86 CHAPTER 10. STOCHASTIC CALCULUS
10.18 Theorem. Every square integrable cadlag martingale (M
t
)
t≥0
is a semi
martingale.
Proof: Let (H
n
) be a sequence in c such that [[H
n
[[
u
→ 0. Since
t
0
H
n
dM is a
martingale we have by the maximal inequality
P
sup
s≤t
s
0
H
n
dM
> a
≤
1
a
2
E
t
0
H
n
dM
2
For convenience let M
j
:= M
σ
n
j
∩t
. We have
E
t
0
H
n
dM
2
= E
n
¸
j=1
a
j−1
(M
j
−M
j−1
)
2
= E
n
¸
j=1
a
2
j−1
(M
j
−M
j−1
)
2
≤ [[H
n
[[
2
u
E
n
¸
j=1
(M
j
−M
j−1
)
2
= [[H
n
[[
2
u
E
n
¸
j=1
(M
2
j
−M
2
j−1
)
≤ [[H
n
[[
2
u
E(M
2
t
)
2
Thus, we proved that (W
t
)
t≥0
is a semimartingale.
10.19 Problem. (easy)
Show that (W
2
t
)
t≥0
is a semimartingale.
10.20 Problem. (intermediate)
Show that every cadlag martingale (M
t
)
t≥0
with continuous paths is a semimartin
gale.
Hint: Let τ
n
= inf¦t : [M
t
[ ≥ n¦ and show that M
τ
n
is a square integrable martingale
for very n ∈ N.
Summing up, we have shown that every cadlag process which is a sum of a con
tinuous martingale and an adapted process with paths of bounded variation is a semi
martingale.
Actually every cadlag martingale is a semimartingale. See JacodShiryaev, [14],
Chapter I, 4.17.
Extending the stochastic integral
The extension of the stochastic integral from c to L
0
is based on the fact that every
process in L
0
can be approximated by processes in c. In short, the procedure is as
10.2. THE STOCHASTIC INTEGRAL 87
follows. Let X is a semimartingale and let H ∈ L
0
. Consider some sequence (H
n
) in
c such that H
n
→ H and deﬁne
T
0
H dX := lim
n→∞
T
0
H
n
dX (22)
However, in order to make sure that such a deﬁnition makes sense one has to consider
several mathematical issues.
10.21 Discussion. Foundations of the extension process
The main points of deﬁnition (22) are existence and uniqueness of the limit. Let X ∈ o
and H ∈ L
0
. We follow Protter, [20].
(1) One can always ﬁnd a sequence (H
n
) ⊆ c such that
sup
s≤T
[H
n
s
−H
s
[
P
→ 0
(2) Semimartingales satisfy
(H
n
) ⊆ c, sup
s≤T
[H
n
s
[
P
→ 0 ⇒ sup
t≤T
t
0
H
n
dX
P
→ 0.
(This is slightly stronger than the deﬁning property of semimartingales.)
(3) From (2) it follows that for every sequence (H
n
) ⊆ c satisfying (1) the corre
sponding sequence of stochastic integrals
T
0
H
n
dX
is a Cauchy sequence with respect to convergence in probability, uniformly on [0, T].
Therefore there exists a process Y such that
sup
t≤T
t
0
H
n
dX −Y
t
P
→ 0.
(4) From (2) it follows that the limiting process Y does not depend on the sequence
(H
n
).
The preceding discussion shows that there is a welldeﬁned stochastic integral
T
0
H dX whenever H ∈ L
0
and X ∈ o. The stochastic integral has a strong con
tinuity property.
10.22 Theorem. Let X be a semimartingale. Then for every sequence (H
n
) of
processes in L
0
sup
s≤T
[H
n
s
[
P
→ 0 ⇒ sup
t≤T
t
0
H
n
s
dZ
s
P
→ 0
88 CHAPTER 10. STOCHASTIC CALCULUS
For deriving (understanding) the basic properties or rules of this stochastic integral
we will apply the following approximation result.
10.23 Theorem. Let X be a semimartingale and H ∈ L
0
. Assume that 0 = t
n
0
<
t
n
1
< . . . < t
n
k
n
= t is any Riemannian sequence of subdivisions of [0, t]. Then
sup
s≤t
k
n
¸
j=1
H
t
j−1
(X
t
j
−X
t
j−1
) −
t
0
H dX
P
→ 0
Proof: This is not a proof but a comment. The assertion can be proved by our
means if the paths of H are continuous. In this case the Riemannian step processes
k
n
¸
j=1
H
t
j−1
1
(t
j−1
,t
j
]
converge to H in probability, uniformly on compacts, (actually they converge every
where, uniformly on compacts). If H is only leftcontinuous then the Riemannian
step functions converge to H pointwise, but not necessarily uniformly on compacts.
In order to achieve uniform convergence one has to replace the arbitrary deterministic
sequence of subdivisions by a particular sequence of subdivisions based on stopping
times instead of ﬁxed interval boundaries.
However, the result is true anyway. For BVprocesses X it is a consequence of
Lebesgue’s theorem on dominated convergence. It can also be proved for square inte
grable martingales X. The universal validity follows from a very deep representation
theorem for semimartingales which is not available for us at this stage. Confer Protter,
[20], or JacodShiryaev, [14]. 2
Let us apply 10.23 for the evaluation of a fundamental special case.
10.24 Theorem. Let (W
t
)
t≥0
be a Wiener process. Then
t
0
W
s
dW
s
=
1
2
(W
2
t
−t) (23)
Proof: Let 0 = t
0
≤ t
1
≤ . . . ≤ t
n
= t be an interval partition such that
max [t
j
−t
j−1
[ → 0 as n → ∞. This implies
n
¸
j=1
W
t
j−1
(W
t
j
−W
t
j−1
)
P
→
t
0
W
s
dW
s
10.3. CALCULUS FOR THE STOCHASTIC INTEGRAL 89
On the other hand we have
W
2
t
=
n
¸
j=1
(W
2
t
j
−W
2
t
j−1
) =
=
n
¸
j=1
(W
t
j
−W
t
j−1
)
2
+ 2
n
¸
j=1
W
t
j−1
(W
t
j
−W
t
j−1
)
We know that
n
¸
j=1
(W
t
j
−W
t
j−1
)
2
→ t (P)
This proves the assertion. 2
10.25 Problem. How to modify (23) if (W
t
)
t≥0
is replaced by some BVprocess ?
It is clear that the linearity properties (19) remain valid for the stochastic integral
with H ∈ L
0
.
10.26 Problem. Deﬁne
t
s
H dX :=
t
0
1
(s,∞)
H dW
(1) Prove a concatenation property of the stochastic integral.
(2) Show that
t
s
1
F
H dX = 1
F
t
s
H dX whenever F ∈ T
s
.
Path properties
UNDER CONSTRUCTION
The Wiener integral
UNDER CONSTRUCTION
10.3 Calculus for the stochastic integral
There are three fundamental rules for calculations with the stochastic integral which
correspond to the three rules considered in section 10.1:
(1) the associativity rule,
(2) the integrationbyparts formula,
(3) the chain rule (Ito’s formula)
90 CHAPTER 10. STOCHASTIC CALCULUS
The associativity rule
This rule can be formulated brieﬂy as follows. Let H, G ∈ L
0
and X ∈ o. Then
H • (G• X) = (HG) • X, in short: d(G• X) = GdX (24)
Details are as follows.
10.27 Theorem.
(1) Let X ∈ o and G ∈ L
0
. Then G• X is in o.
(2) Let H ∈ L
0
. Then
T
0
H d(G• X) =
T
0
HGdX.
Proof: For H
n
∈ c
0
we have
T
0
H
n
d(G• X) =
T
0
H
n
GdX
If H
n
→ 0 in an appropriate sense this implies the semimartingale property of G• X.
If H
n
→ H in an appropriate sense the asserted equation follows. 2
There is an important consequence of rule (24) which should be isolated.
10.28 Theorem. Truncation rule
Let H ∈ L
0
and X ∈ o. Then for any stopping time τ
T
0
1
(0,τ]
H dX =
T∩τ
0
H dX =
T
0
H dX
τ
10.29 Problem. (intermediate)
Prove 10.28.
Hint: The ﬁrst equation follows from the deﬁnition of the integral on c. For the second
equation note that 1
(0,τ]
• X = X
τ
.
The integrationbyparts formula
We restrict our presentation of the integrationbyparts formula to processes with con
tinuous paths.
Recall the deterministic integrationbyparts formula for continuous BVfunctions:
f(t)g(t) −f(0)g(0) =
t
0
f dg +
t
0
g df
This formula is not true for arbitrary semimartingales. The following is a deﬁnition
rather than a theorem but is called the integration by parts formula.
10.3. CALCULUS FOR THE STOCHASTIC INTEGRAL 91
10.30 Deﬁnition. Let X and Y be semimartingales with continuous paths. Deﬁne
[X, Y ]
t
:= X
t
Y
t
−X
0
Y
0
−
t
0
X dY −
t
0
Y dX, t ≥ 0.
This process is called the quadratic covariation of X and Y .
It is clear that [X, Y ] is welldeﬁned and is a continuous adapted process. The
integration by parts formula can be written as
T
0
H d(XY ) =
T
0
HX dY +
T
0
HY dX +
T
0
H d[X, Y ], H ∈ L
0
or in short
d(XY ) = XdY + Y dX + d[X, Y ]
However, this only makes sense if [X, Y ] is a semimartingale. So let us have a closer
look onto [X, Y ].
10.31 Problem. (easy)
Show that [X, Y ] is linear in both arguments.
10.32 Theorem. Let X and Y be semimartingales with continuous paths. For
every Riemannian sequence of subdivisions
n
¸
j=1
(X
t
j
−X
t
j−1
)(Y
t
j
−Y
t
j−1
)
P
→ [X, Y ]
t
, t ≥ 0.
Proof: This follows easily from the deﬁnition of [X, Y ] when it is approximated
by Riemannian sequence of step processes. 2
10.33 Problem. (intermediate)
Fill in the details of the proof of 10.32.
From 10.32 it follows that [X, X] =: [X] is the quadratic variation of X. This is
an increasing process, hence a BVprocess and a semimartingale. Moreover, since
[X, Y ] =
1
2
([X +Y ] −[X −Y ])
also the quadratic covariation is a BVprocess and a semimartingale.
10.34 Problem. (easy)
Show: If X and Y are continuous semimartingales then XY is a (continuous)
semimartingale, too.
92 CHAPTER 10. STOCHASTIC CALCULUS
10.35 Problem. (intermediate)
Let X be a continuous BVprocess and Y, Z continuous semimartingales. Show
that:
(a) [X] = 0,
(b) [X, Y ] = 0,
(c) [X +Y, Z] = [Y, Z].
10.36 Problem. (intermediate)
Let X ∈ o be continuous.
(a) Show that
dX
2
= 2XdX +d[X]
(b) Find a formula for dX
k
, k ∈ N.
Hint: Use induction on k.
10.37 Problem. (advanced)
Show that [X
τ
, Y ] = [X, Y ]
τ
.
Hint: This is intuitively clear from 10.32 and could be made precise by approximating
τ by a sequence of stopping times with ﬁnitely many values. However, it can be
obtained from the deﬁnition without any approximation argument. For this note that
t
0
X
τ
dY =
τ∩t
0
X dY +X
τ∩t
(Y
t
−Y
τ∩t
)
10.38 Problem. (intermediate)
Show that [H • X, Y ] = H • [X, Y ].
Hint: Prove it for H = 1
(σ,τ]
where σ ≤ τ are stopping times.
10.39 Problem. (easy)
Let (W
t
) be a Wiener process. Calculate [H • W].
10.40 Problem. (intermediate)
Show that H • W is a BVprocess iff H ≡ 0.
10.41 Problem. (intermediate)
A process of the form
X
t
= x
0
+
t
0
a
s
ds +
t
0
b
s
dW
s
is called an Itoprocess.
Show that a and b are uniquely determined by X.
10.3. CALCULUS FOR THE STOCHASTIC INTEGRAL 93
Ito’s formula
Now we turn to the most important and most powerful rule of stochastic analysis.
10.42 Theorem. Ito’s formula
Let X ∈ o be continuous and let φ : R → R be twice differentiable with continu
ous derivatives. Then
φ(X
t
) = φ(X
0
) +
t
0
φ
(X
s
) dX
s
+
1
2
t
0
φ
(X
s
) d[X]
s
Proof: The assertion is true for polynomials. Since smooth functions can be
approximated uniformly by polynomials in such a way that also the corresponding
derivatives are approximated the assertion follows. 2
10.43 Problem. (easy)
Show that Ito’s formula is true if φ is a polynomial.
Hint: Start with powers φ(x) = x
k
.
10.44 Problem. (very easy)
State 10.42 in terms of differentials.
10.45 Problem. (easy)
Calculate dW
a
t
, a > 0.
10.46 Problem. (advanced)
Use Ito’s formula to ﬁnd a recursion formula for E(W
k
t
), k ∈ N.
10.47 Problem. (easy)
Calculate de
αW
t
.
10.48 Deﬁnition. Let X ∈ o be continuous. Then
c(X) = e
X−[X]/2
is called the stochastic exponential of X.
10.49 Problem. (intermediate)
Let X ∈ o be continuous and Y := c(X). Show that
Y
t
= Y
0
+
t
0
Y
s
dX
s
, in short: dY = Y dX
There is a subtle point to discuss. Consider some positive continuous semimartin
gale X and a function like φ(x) = log(x) or φ(x) = 1/x. Then we may consider φ(X)
94 CHAPTER 10. STOCHASTIC CALCULUS
since it is welldeﬁned and realvalued. But Ito’s formula cannot be applied in that ver
sion we have proved it. The reason for this difﬁculty is due to the fact that the range
of X is not contained in a compact interval where φ can be approached uniformly by
polynomials.
10.50 Problem. (advanced)
Let X be a positive continuous semimartingale.
(a) Show that Ito’s formula holds for φ(x) = log(x) and for φ(x) = 1/x.
Hint: Let τ
n
= min¦t ≥ 0 : X
t
≥ 1/n¦. Apply Ito’s formula to X
τ
n
and let n → ∞.
(b) Show that φ(X) is a semimartingale.
10.51 Problem. (intermediate)
Let X be a continuous positive semimartingale. Find
t
0
1/X
k
s
dX
s
, k ∈ N.
10.52 Problem. (intermediate)
Show that every positive continuous semimartingale X can be written as a stochas
tic exponential c(L).
Hint: Note that dX = X dL implies dL = 1/X dX.
10.53 Theorem. Ito’s formula
Let X, Y ∈ o be continuous and let φ : R
2
→ R be twice differentiable with
continuous derivatives. Then
φ(X
t
, Y
t
) = φ(X
0
, Y
0
)
+
t
0
φ
1
(X
s
, Y
s
) dX
s
+ +
t
0
φ
2
(X
s
, Y
s
) dY
s
+
1
2
t
0
φ
11
(X
s
, Y
s
) d[X]
s
+
t
0
φ
12
(X
s
, Y
s
) d[X, Y ]
s
+
1
2
t
0
φ
22
(X
s
, Y
s
) d[Y ]
s
Proof: The assertion is true for polynomials. Since smooth functions can be
approximated uniformly by polynomials in such a way that also the corresponding
derivatives are approximated the assertion follows. 2
10.54 Problem. (advanced)
Show that Ito’s formula is true if φ : R
2
→R is a polynomial.
Hint: Start with powers φ(x, y) = x
k
y
l
.
10.55 Problem. (very easy)
State 10.53 in terms of differentials.
10.56 Problem. (intermediate)
State Ito’s formula for φ(x, t).
Hint: Apply 10.53 to Y
t
= t.
10.57 Problem. (easy)
Use 10.53 to derive the differential equation for the stochastic differential.
Chapter 11
Applications to ﬁnancial markets
11.1 Selfﬁnancing trading strategies
Consider a ﬁnancial market model ´ = (X, Y ) which is generated by two semi
martingales X and Y . Continuing the discussion 10.15 leads to the following deﬁni
tion.
11.1 Deﬁnition. A trading strategy (H
X
, H
Y
) (consisting of leftcontinuous
adapted processes) is selfﬁnancing if
H
X
t
X
t
+ H
Y
t
Y
t
= H
X
0
X
0
+ H
Y
0
Y
0
+
t
0
H
X
dX +
t
0
H
Y
dY
or in other words
d(H
X
X) + d(H
Y
Y ) = H
X
dX + H
Y
dY
The property of being selfﬁnancing is a very strong property which narrows the
set of available wealth processes considerably. Let us illustrate this fact at hand of
continuous models.
Assume that the market model and the trading strategy are continuous. Then from
the integration by parts formula we have
d(H
X
X) +d(H
Y
Y ) = H
X
dX +H
Y
dY
+ XdH
X
+Y dH
Y
+d[X, H
X
] + d[Y, H
Y
]
If the trading strategy is selfﬁnancing then the expression on the second line vanishes.
In ﬁnancial calculations it is often convenient to change the unit of money. The
most simple example is discounting by a ﬁxed interest rate. But there are also im
portant applications where the „numeraire” is a stochastic process. It seems to be
95
96 CHAPTER 11. APPLICATIONS TO FINANCIAL MARKETS
intuitively clear that such a change of numeraire should have no inﬂuence on the trad
ing strategy and should not destroy the selfﬁnancing property. The following theorem
shows that this is actually true.
11.2 Theorem. Assume that the market model and the trading strategy are con
tinuous and
dV := d(H
X
X) + d(H
Y
Y ) = H
X
dX + H
Y
dY
Let Z be a continuous semimartingale. Then
d(V Z) = d(H
X
XZ) +d(H
Y
Y Z) = H
X
d(XZ) + H
Y
d(Y Z)
Proof: The ﬁrst equality is obvious. The second follows from
d(V Z) = ZdV + V dZ +d[Z, V ]
= ZH
X
dX + ZH
Y
dY +H
X
XdZ +H
Y
Y dZ +d[Z, V ]
= H
X
(ZdX +XdZ) + H
Y
(ZdY + Y dZ) +d[Z, V ]
= H
X
(d(XZ) −d[X, Z]) +H
Y
(d(Y Z) −d[Y, Z]) +d[Z, V ]
= H
X
d(XZ) +H
Y
d(Y Z) +d[Z, V ] −H
X
d[X, Z] −H
Y
d[Y, Z]
2
Assume now that X is a positive continuous semimartingale. Applying the preced
ing result to Z = 1/X we obtain
V
t
/X
t
= H
X
t
+H
Y
t
(Y
t
/X
t
) = V
0
/X
0
+
t
0
H
Y
d(Y/X)
11.3 Problem.
Show that any wealth process V satisfying
V
t
/X
t
= H
X
t
+H
Y
t
(Y
t
/X
t
) = V
0
/X
0
+
t
0
Hd(Y/X)
for some continuous adapted process H is a selfﬁnancing wealth process. Find the
corresponding trading strategy.
11.2 Markovian wealth processes
Let ´ = (X
1
, X
2
, . . . , X
n
) be a ﬁnancial market model consisting of Itoprocesses
dX
i
t
= µ
it
dt + σ
it
dW
t
.
11.3. THE BLACKSCHOLES MARKET MODEL 97
This is a socalled onefactor model since only one Wiener process is responsible for
random ﬂuctuations. We assume that the processes (σ
it
) are positive.
Let V be a wealth process generated by a selfﬁnancing trading strategy. The
wealth process is called Markovian if there exists a function f(x, t), x ∈ R
n
, t ≥ 0,
such that V
t
= f(X
t
, t) where X
t
= (X
1
t
, X
2
t
, . . . , X
n
t
).
We will show that if the function f(x, t) is smooth then it satisﬁes necessarily par
tial differential equations which for special cases are known as BlackScholes equa
tions.
To begin with we note that the selfﬁnancing property implies the existence of a
trading strategy (φ
1
, φ
2
, . . . , φ
n
) such that
f(X
t
, t) = f(X
0
, 0) +
n
¸
i=1
t
0
φ
i
s
dX
i
s
=
n
¸
i=1
φ
i
t
X
i
t
On the other hand the Itoformula gives
f(X
t
, t) = f(X
0
, 0) +
n
¸
i=1
t
0
f
x
i
(X
s
, s)dX
i
s
+
t
0
f
t
(X
s
, s)ds +
1
2
¸
i,j
t
0
f
x
i
x
j
(X
s
, s)σ
is
σ
js
ds
Both representations are Itoprocesses which are equal if both the dW
t
part and the
dtpart coincide. The equality of the dW
t
part gives φ
i
t
= f
x
i
(X
t
, t) and thus the ﬁrst
partial differential equation:
f(X
t
, t) =
n
¸
i=1
f
x
i
(X
t
, t)X
i
t
Comparing the dtpart gives the second partial differential equation:
f
t
(X
t
, t) +
1
2
¸
i,j
f
x
i
x
j
(X
t
, t)σ
it
σ
jt
= 0
In former times wealth processes have been calculated by solving these partial differ
ential equations by analytical or numerical methods.
11.3 The BlackScholes market model
The simplest mathematical model of a ﬁnancial asset is the model of a bank account
(B
t
) with ﬁxed interest r > 0:
B
t
= B
0
e
rt
⇔ dB
t
= rB
t
dt (25)
98 CHAPTER 11. APPLICATIONS TO FINANCIAL MARKETS
Denoting R
t
:= rt the bank account follows the differential equation
dB
t
= B
t
dR
t
Stochastic models for ﬁnancial assets are often based on a stochastic model for the
rendite process (R
t
).
Assume that R
t
= µt +σW
t
where (W
t
) is a Wiener process. If this („generalized
Wiener process”) is a model of the rendite of an asset (S
t
) then it follows that
dS
t
= S
t
dR
t
= µS
t
dt +σS
t
dW
t
(26)
This is a stochastic differential equation. The number σ > 0 is called the volatility of
the asset.
11.4 Problem. Show that S
t
= S
0
e
(µ−σ
2
/2)dt+σW
t
is a solution of (26).
11.5 Deﬁnition. A BlackScholes model is a market model which is generated by
two assets (B
t
, S
t
) following equations (25) and (26).
Let us give an overview over the available wealth processes in the BlackScholes
model. We begin with smooth Markovian wealth processes.
The BlackScholes equation
We are going to apply 11.2. A BlackScholes model consists of the assets X
1
t
= e
rt
and X
2
t
= S
t
where σ
1t
= 0 and σ
2t
= σS
t
. Let f(X
1
t
, X
2
t
, t) be a selfﬁnancing
wealth process and deﬁne
g(x, t) = f(e
rt
, x, t)
The BlackScholes equation is the partial differential equation for the function g(x, t).
Note that
g
t
= f
t
+f
x
1
re
rt
and g
x
= f
x
2
Since
g = f
x
1
e
rt
+f
x
2
x = f
x
1
e
rt
+g
x
x
we obtain
g
t
= f
t
+ r(g −g
x
x)
From 11.2 we know that
f
t
= −
1
2
f
x
2
x
2
σ
2
2
= −
1
2
g
xx
σ
2
x
2
This leads to the famous BlackScholes equation
g
t
+
1
2
g
xx
σ
2
x
2
+rg
x
x = rg (27)
11.3. THE BLACKSCHOLES MARKET MODEL 99
The market price of risk
Much more insight into the structure of wealth processes is obtained in a different way.
We are now going to apply 11.1.
Let (V
t
) be a positive wealth process. Then it can be written as
dV
t
= µ
V
t
V
t
dt +σ
V
t
V
t
dW
t
Let S := S/B and V := V/B. From the integration by parts formula it follows that
dS
t
= (µ −r)S
t
dt +σS
t
dW
t
and
dV
t
= (µ
V
t
−r)V
t
dt + σ
V
t
V
t
dW
t
On the other hand we know from 11.1 that for some process τ
t
dV
t
= τ
t
dS
t
= τ
t
((µ −r)S
t
dt + σS
t
dW
t
)
This implies
τ
t
S
t
=
σ
V
t
σ
V
t
and
µ −r
σ
=
µ
V
−r
σ
V
11.6 Theorem. Let λ := (µ − r)/σ (the „market price of risk”). Then a wealth
process
dV
t
= µ
V
t
V
t
dt +σ
V
t
V
t
dW
t
is selfﬁnancing iff
µ
V
t
−r = λσ
V
t
, t ≥ 0. (28)
11.7 Problem. Prove 11.6.
11.8 Problem. Show that for a Markovian wealth process equations (27) and (28)
are equivalent.
100 CHAPTER 11. APPLICATIONS TO FINANCIAL MARKETS
Chapter 12
Stochastic differential equations
12.1 Introduction
A (Wiener driven) stochastic differential equation is an equation of the form
dX
t
= b(t, X
t
)dt + σ(t, X
t
) dW
t
where (W
t
)
t≥0
is a Wiener process and b(t, x) and σ(t, x) are given functions. The
problem is to ﬁnd a process (X
t
)
t≥0
that satisﬁes the equation. Such a process is then
called a solution of the differential equation.
Note that the differential notation is only an abbreviation for the integral equation
X
t
= x
0
+
t
0
b(s, X
s
) ds +
t
0
σ(s, X
s
) dW
s
There are three issues to be discussed for differential equations:
(1) Theoretical answers for existence and uniqueness of solutions.
(2) Finding analytical expressions for solutions.
(3) Calculating solutions by numerical methods.
We will focus on analytical expressions for important but easy special cases. How
ever, let us indicate some issues which are important from the theoretical point of view.
For stochastic differential equations even the concept of a solution is a subtle ques
tion. We have to distinguish between weak and strong solutions, even between weak
and strong uniqueness. It is not within the scope of this text to give precise deﬁnitions
of these notions. But the idea can be described in an intuitive way.
A strong solution is a solution where the driving Wiener process (and the under
lying probability space) is ﬁxed in advance and the solution (X
t
)
t≥0
is a function of
this given driving Wiener process. A weak solution is an answer to the question: Does
there exist a probability space where a process (X
t
)
t≥0
and a Wiener process (W
t
)
t≥0
exist such that the differential equation holds ?
101
102 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS
When we derive analytical expressions for solutions we will derive strong solu
tions. In particular for linear differential equations (to be deﬁned below) complete
formulas for strong solutions are available.
There is a general theory giving sufﬁcient conditions for existence and uniqueness
of nonexploding strong solutions. Both the proofs and the assertions of this theory
are quite similar to the classical theory of ordinary differential equations. We refer to
HuntKennedy [12] and KaratzasShreve [15].
Let us introduce some terminology.
Any stochastic differential equation is time homogeneous if b(t, x) = b(x) and
σ(t, x) = σ(x).
A linear differential equation is of the form
dX
t
= (a
0
(t) +a
1
(t)X
t
)dt + (σ
0
(t) + σ
1
(t)X
t
)dW
t
It is a homogeneous linear differential equation if a
0
(t) = σ
0
(t) = 0.
The simplest homogeneous case is
dX
t
= µX
t
dt + σX
t
dW
t
which corresponds to the Black Scholes model. The constant σ is called the volatility
of the model. If the volatility is time dependent then it is a local volatility model.
There are plenty of linear differential equations used in the theory of stochastic in
terest rates. If (B
t
) denotes a process that is a model for a bank account with stochastic
interest rate then
r
t
:=
B
t
B
t
⇔ B
t
= B
0
e
t
0
r
s
ds
is called the short rate. Popular short rate models are the Vasicek model
dr
t
= a(b −r
t
)dt +σdW
t
and the HullWhite model
dr
t
= (θ(t) −a(t)r
t
)dt + σ(t)dW
t
12.2 The abstract linear equation
Let Y and Z be any continuous semimartingales. The abstract homogeneous linear
equation is
dX
t
= X
t
dY
t
and its solution is known to us as
X
t
= x
0
e
Y
t
−[Y ]
t
/2
= x
0
c(Y
t
)
This is the recipe to solve any homogeneous linear stochastic differential equation.
There is nothing more to say about it at the moment.
12.2. THE ABSTRACT LINEAR EQUATION 103
12.1 Problem. (easy)
Solve dX
t
= a(t)X
t
dt +σ(t)X
t
dW
t
.
Things become more interesting when we turn to the general inhomogeneous equa
tion
dX
t
= X
t
dY
t
+ dZ
t
There is an explicit expression for the solution but it is much more illuminating to
memorize the approach how to arrive there.
The idea is to write the equation as
dX
t
−X
t
dY
t
= dZ
t
and to ﬁnd an integrating factor that transforms the left hand side into a total differen
tial.
Let dA
t
= A
t
dY
t
and multiply the equation by 1/A
t
giving
1
A
t
dX
t
−
X
t
A
t
dY
t
=
1
A
t
dZ
t
(29)
Note that
d
1
A
t
= −
1
A
t
dY
t
+
1
A
t
d[Y ]
t
Then
d
1
A
t
X
t
=
1
A
t
dX
t
+X
t
d
1
A
t
+d
1
A
t
, X
t
=
1
A
t
dX
t
−
X
t
A
t
dY
t
+
X
t
A
t
d[Y ]
t
−
1
A
t
d[Y, X]
t
=
1
A
t
dX
t
−
X
t
A
t
dY
t
−
1
A
t
d[Y, Z]
t
Thus, the left hand side of (29) differs froma total differential by a known BVfunction.
We obtain
d
1
A
t
X
t
=
1
A
t
dZ
t
−
1
A
t
d[Y, Z]
t
leading to
X
t
= A
t
x
0
−
t
0
1
A
s
d[Y, Z]
s
+
t
0
1
A
s
dZ
s
(30)
Note that the solution is particularly simple if either Y or Z are BVprocesses.
12.2 Problem. (intermediate)
Fill in and explain all details of the derivation of (30).
104 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS
12.3 Wiener driven models
The Vasicek model is
dX
t
= (ν −µX
t
)dt + σdW
t
For ν = 0 the solution is called the OrnsteinUhlenbeck process.
The Vasicek model is a special case of the inhomogeneous linear equation for
dY
t
= −µdt and dZ
t
= νdt + σdW
t
Therefore the integrating factor is A
t
= e
−µt
and the solution is obtained as in the case
of an ordinary linear differential equation.
12.3 Problem. (advanced)
Show that the solution of the Vasicek equation is
X
t
= e
−µt
x
0
+
ν
µ
(1 −e
−µt
) +σ
t
0
e
−µ(t−s)
dW
s
12.4 Problem. (advanced)
Derive the following properties of the Vasicek model:
(a) The process (X
t
)
t≥0
is a Gaussian process (i.e. all joint distribution are normal
distributions).
(b) Find E(X
t
) and lim
t→∞
E(X
t
).
(c) Find V (X
t
) and lim
t→∞
V (X
t
).
(d) Find Cov(X
t
, X
t+h
) and lim
t→∞
Cov(X
t
, X
t+h
).
12.5 Problem. (advanced)
Let X
0
∼ N
ν
µ
,
σ
2
2µ
. Explore the mean and covariance structure of a Vasicek
model starting with X
0
.
Let us turn to models that are not time homogneous.
12.6 Problem. (intermediate)
The Brownian bridge:
(a) Find the solution of
dX
t
= −
1
1 −t
X
t
dt + dW
t
, 0 ≤ t < 1.
(b) Show that (X
t
)
t≥0
is a Gaussian process. Find the mean and the covariance
structure.
(c) Show that X
t
→ 0 if t → 1.
12.3. WIENER DRIVEN MODELS 105
12.7 Problem. (intermediate)
Find the solution of the HullWhite model:
dX
t
= (θ(t) −a(t)X
t
)dt +σ(t)dW
t
Finally, let us consider a nonlinear model.
12.8 Problem. (advanced)
Let Z
t
= c(µt + σW
t
).
(a) For a > 0 ﬁnd the differential equation of
X
t
:=
Z
t
1 +a
t
0
Z
s
ds
(b) What about a < 0 ?
106 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS
Chapter 13
Martingales and stochastic calculus
13.1 Martingale properties of the stochastic integral
Facts
Let (M
t
)
t≥0
be a continuous square integrable martingale, i.e. E(M
2
t
) < ∞, t ≥ 0.
We would like to know for which H ∈ L
0
H • M : t →
t
0
H dM
is a square martingale.
There are two main results in this section.
13.1 Theorem. For any continuous square integrable martingale (M
t
)
t≥0
the
process M
2
t
−[M]
t
is a martingale.
13.2 Theorem. Let (M
t
)
t≥0
be a continuous square integrable martingale and
H ∈ L
0
. Then H•M is a square integrable martingale for t ∈ [0, T] iff E([H•M]
T
) <
∞.
We will outline the proofs at the end of the section. At this point we attempt to
understand the assertions and their consequences.
First we note that 13.1 is known to us for the Wiener process. Thus, it is a general
ization of a familiar structure.
For a better understanding of 13.2 we note that
[H • M]
T
=
T
0
H
2
s
d[M]
s
Therefore
t
0
H
s
dM
s
, t ≤ T, is a square integrable martingale iff
E
T
0
H
2
s
d[M]
s
< ∞
107
108 CHAPTER 13. MARTINGALES AND STOCHASTIC CALCULUS
Thus, we have to check the Pintegrability of a Stieltjes integral. For Wiener driven
martingales this is even an ordinary Lebesgue integral.
If the condition is satisﬁed then by 13.1 it follows that
t
0
H
s
dM
s
2
−
t
0
H
2
s
d[M]
s
, t ≤ T,
is a martingale which means that
E
t
0
H
s
dM
s
2
= E
t
0
H
2
s
d[M]
s
This is one of the most important identities of stochastic analysis. It was the original
starting point of the construction of the stochastic integral and it is still the starting
point of further extensions of the stochastic integral to larger entities than L
0
.
By the way, what we did (Protter’s [19] approach ”without tears”) is the stochastic
counterpart of the CauchyStieltjes integral. The most general version of the stochas
tic integral (being not the subject of this text) could be considered as the stochastic
counterpart of abstract (Lebesgue) integration theory.
Let us mention that the assertion of 13.1 is related to 13.2 by
M
2
t
−[M]
t
= M
2
0
+ 2
t
0
M
s
dM
s
This implies that M • M is a martingale. However, it is not necessarily a square
integrable martingale !
13.3 Problem. (intermediate)
Show that every continuous square integrable martingale of bounded variation is
necessarily constant.
13.4 Problem. (intermediate)
Let (M
t
)
t≥0
be a continuous square integrable martingale. If (A
t
) is a continuous
adapted process of bounded variation such that M
2
t
− A
t
is a martingale, then A
t
=
[M]
t
.
Proofs
Now, let us turn to the proofs of 13.1 and 13.2. For warming up we provide some
straightforward facts.
13.5 Problem. (advanced)
For every continuous martingale (M
t
)
t≥0
we have E(M
2
t
) ≥ E([M]
t
).
Hint: Show that for any subdivision 0 = t
0
< t
1
< . . . < t
n
= t
E(M
2
t
) = E
n
¸
j=1
(M
t
j
−M
t
j−1
)
2
13.1. MARTINGALE PROPERTIES OF THE STOCHASTIC INTEGRAL 109
and apply Fatou’s lemma to an appropriate subsequence of a Riemannian sequence.
13.6 Problem. (easy)
Prove the ”only if” part in 13.2.
13.7 Problem. (intermediate)
Suppose you know that for every continuous square integrable martingale (M
t
)
t≥0
the equation E(M
2
t
) = E([M]
t
), t ≥ 0, is true. Show that this implies that M
2
t
−[M]
t
is even a martingale.
Hint: Apply 8.25.
Next we prove a preliminary assertion. The proof isolates some arguments which
are related to the martingale structure.
13.8 Lemma. Let (M
t
)
t≥0
be a continuous square integrable martingale. Then
H • M is a square integrable martingale for every bounded H ∈ L
0
.
Proof: It is sufﬁcient to show that E(
t
0
H dM) = 0.
Let 0 = t
0
< t
1
< . . . < t
n
= t be a the nth element of a Riemannian sequence
of subdivisions and deﬁne
H
n
=
n
¸
j=1
H
t
j−1
1
(t
j−1
,t
j
]
Then E(
t
0
H
n
dM) = 0 and
t
0
H
n
dM
P
→
t
0
H dM. It remains to showthat E([
t
0
H
n
dM]
2
)
is bounded. For this, note that
E
t
0
H
n
dM
2
= E
n
¸
j=1
H
t
j−1
(M
t
j
−M
t
j−1
)
2
=
n
¸
j=1
E(H
2
t
j−1
(M
t
j
−M
t
j−1
)
2
)
≤ C
n
¸
j=1
E((M
t
j
−M
t
j−1
)
2
) = C(E(M
2
t
) −E(M
2
0
))
2
Now we are in the position to prove 13.1.
Proof: (of Theorem 13.1) For a bounded martingale the assertion follows from
the integration by parts formula and 13.8. For proving the general case it is sufﬁcient
to show that E(M
2
t
) = E([M]
t
).
Recall that for any stopping time τ the identity [M
τ
]
t
= [M]
t∩τ
holds. Let
τ
n
= inf¦t : [M
t
[ ≥ n¦
110 CHAPTER 13. MARTINGALES AND STOCHASTIC CALCULUS
Then it follows that
E(M
2
t∩τ
n
) = E((M
τ
n
t
)
2
) = E([M
τ
n
]
t
) = E([M]
t∩τ
n
)
Letting n → ∞ it is clear that E([M]
t∩τ
n
) → E([M]
t
). The corresponding conver
gence of the left hand side follows from M
2
t∩τ
= E(M
t
[T
τ
)
2
≤ E(M
2
t
[T
τ
). 2
The following assertion is a continuation of 13.8.
13.9 Lemma. Let (M
t
)
t≥0
be a continuous square integrable martingale. Then
E((H • M)
2
t
) = E([H • M]
t
) for every bounded H ∈ L
0
.
Proof: With the aid of 13.1 it follows that
E((M
t
−M
s
)
2
[T
s
) = E([M]
t
−[M]
s
[T
s
)
Then the equation array of the proof of 13.8 can be improved to
E
t
0
H
n
dM
2
= E
n
¸
j=1
H
t
j−1
(M
t
j
−M
t
j−1
)
2
=
n
¸
j=1
E(H
2
t
j−1
(M
t
j
−M
t
j−1
)
2
)
=
n
¸
j=1
E(H
2
t
j−1
([M]
t
j
−[M]
t
j−1
)) = E
t
0
H
2
n
d[M]
This is extended to bounded H ∈ L
0
by routine arguments. 2
Proof: (of Theorem 13.2) We need only prove the ”if”part and for this it is
sufﬁcient to prove that 13.9 extends to arbitrary H ∈ L
0
.
Let τ
n
:= inf¦t : [H
t
[ ≥ n¦. Then (by leftcontinuity !) H
τ
n
is bounded and tends
to H. Lemma 13.9 can be applied to H
τ
n
and the assertion is proved again by routine
arguments. 2
13.2 Martingale representation
Let (W
t
)
t≥0
be a Wiener process. We know that
t
0
H
s
dW
s
, t ≥ 0,
is a square integrable martingale iff
E
t
0
H
2
s
ds
< ∞, t ≥ 0.
13.2. MARTINGALE REPRESENTATION 111
Now, in this special case there is a remarkable converse: Each square integrable mar
tingale arises in this way !
We have to be a bit more modest: If we conﬁne ourselves (as we have done so far)
to H ∈ L
0
(leftcontinuous adapted processes) then all square integrable martingales
can only be approximated which arbitrary precision by stochastic integrals. We will
comment this point later.
The martingale representation fact is an easy consequence of the following seem
ingly simpler assertion:
Each random variable C ∈ L
2
(T
t
) (each ”claim”) can be (approximately) written
as a stochastic integral (”hedged” by a selfﬁnancing strategy).
Let us introduce some simplifying terminology.
13.10 Deﬁnition. A set ( of random variables in L
2
(T
t
) is called dense if for
every C ∈ L
2
(T
t
) there is a sequence C
n
⊆ ( such that E((C
n
−C)
2
) → 0.
A set ( of random variables in L
2
(T
t
) is called total if the linear hull of ( is dense.
Thus, we want to prove
13.11 Theorem. The set of all integrals
t
0
H dW with H ∈ L
0
and E(
t
0
H
2
s
ds) <
∞is dense in L
2
(T
t
).
Proof: The starting point is that T
t
is generated by (W
s
)
s≤t
and therefore also by
(e
W
s
)
s≤t
. Therefore an obvious dense set consists of the functions
φ(e
W
s
1
, e
W
s
2
, . . . , e
W
s
n
),
where φ is some continuous function with compact support and s
1
, s
2
, . . . , s
n
is some
ﬁnite subset of [0, t]. Every continuous function can be approximated uniformly by
polynomials (Weierstrass’ theorem) and polynomials are linear combinations of pow
ers. Thus, we arrive at a total set consisting of
exp
n
¸
j=1
k
j
W
s
j
which after reshufﬂing can be written as
exp
n
¸
j=1
a
j−1
(W
s
j
−W
s
j−1
)
= exp
t
0
f(s) dW
s
(31)
for some bounded leftcontinuous (step) function f : [0, t] →R. It follows that the set
of functions (differring from (31) by constant factors)
G
t
= exp
t
0
f(s) dW
s
−
1
2
t
0
f
2
(s) ds
112 CHAPTER 13. MARTINGALES AND STOCHASTIC CALCULUS
is total when f varies in the set of all bounded leftcontinuous (step) functions f :
[0, t] →R.
Recall that (G
s
)
s≤t
is a square integrable martingale and satisﬁes
G
t
= 1 +
t
0
Gd(f • W) =
t
0
G
s
f(s)dW
s
From 13.1 it follows that
E
t
0
G
2
s
f
2
(s) ds
< ∞.
Therefore, the set of integrals
t
0
H
s
dW
s
where H ∈ L
0
and E
t
0
H
2
s
ds
< ∞
is total and by linearity of the integral even dense. 2
UNDER CONSTRUCTION:
Extension to predictable processes.
Representation of martingales.
13.3 Levy’s theorem
UNDER CONSTRUCTION
13.4 Exponential martingale and Girsanov’s theorem
UNDER CONSTRUCTION
Chapter 14
Pricing of claims
UNDER CONSTRUCTION
113
114 CHAPTER 14. PRICING OF CLAIMS
Part III
Appendix
115
Chapter 15
Foundations of modern analysis
Futher reading: Dieudonné, [8].
15.1 Sets and functions
15.1 Problem. (easy) Prove de Morgan’s laws:
(A ∪ B)
c
= A
c
∩ B
c
, (A ∩ B)
c
= A
c
∪ B
c
15.2 Problem. (intermediate) Prove de Morgan’s laws:
¸
i∈N
A
i
c
=
¸
i∈N
A
c
i
,
¸
i∈N
A
i
c
=
¸
i∈N
A
c
i
cartesian products, rectangles
Let X and Y be nonempty sets.
A function f : X → Y is a set of pairs (x, f(x)) ∈ X Y such that for every
x ∈ X there is exactly one f(x) ∈ Y . X is the domain of f and Y is the range of f.
A function f : X → Y is injective if f(x
1
) = f(x
2
) implies x
1
= x
2
. It is
surjective if for every y ∈ Y there is x ∈ X such that f(x) = y. If a function is
injective and surjective then it is bijective.
If A ⊆ X then f(A) := ¦f(x) : x ∈ A¦ is the image of A under f. If B ⊆ Y then
f
−1
(B) := ¦x : f(x) ∈ B¦ is the inverse image of B under f.
15.3 Problem. (easy)
Show that:
(a) f
−1
(B
1
∪ B
2
) = f
−1
(B
1
) ∪ f
−1
(B
2
).
(b) f
−1
(B
1
∩ B
2
) = f
−1
(B
1
) ∩ f
−1
(B
2
).
117
118 CHAPTER 15. FOUNDATIONS OF MODERN ANALYSIS
(c) f
−1
(B
c
) = (f
−1
(B))
c
(d) Extend (a) and (b) to families of sets.
15.4 Problem. (easy)
Show that:
(a) f(A
1
∪ A
2
) = f(A
1
) ∪ f(A
2
).
(b) f(A
1
∩ A
2
) ⊆ f(A
1
) ∩ f(A
2
).
(c) Give an example where inequality holds in (b).
(d) Show that for injective functions equality holds in (b).
(e) Extend (a) and (b) to families of sets.
15.5 Problem. (easy)
Show that:
(a) f(f
−1
(B)) = f(X) ∩ B
(b) f
−1
(f(A)) ⊇ A
Let f : X → Y and g : Y → Z. Then the composition g ◦ f is the function from
X to Z such that (g ◦ f)(x) = g(f(x)).
15.6 Problem. (easy)
Let f : X → Y and g : Y → Z. Show that (g ◦ f)
−1
(C) = f
−1
(g
−1
(C)), C ⊆ Z.
15.2 Sequences of real numbers
The set R of real numbers is wellknown, at least regarding its basic algebraic opera
tions. Let us talk about topological properties of R.
The following is not intended to be an introduction to the subject, but a check
list which should be well understood or otherwise an introductory textbook has to be
consulted.
A (open and connected) neighborhood of x ∈ R is an open interval (a, b) which
contains x. Note that neighborhoods can be very small, i.e. can have any length > 0.
Let us start with sequences. An (inﬁnite) sequence is a function form N → R,
denoted by n → x
n
, for short (x
n
), where n = 1, 2, . . .. When we say that an assertion
holds for almost all x
n
then we mean that it is true for all x
n
, beginning with some
index N, i.e. for x
n
with n ≥ N for some N.
A number x ∈ R is called a limit of (x
n
) if every neighborhood of x contains
almost all x
n
. In other words: The sequence (x
n
) converges to x: lim
n→∞
x
n
= x or
x
n
→ x. A sequence can have at most one limit since two different limits could be put
into disjoint neighborhoods.
A fundamental property of R is the fact that any bounded increasing sequence has
a limit which implies that every bounded monotone sequence has a limit. This is not
a theorem but the completeness axiom. It is an advanced mathematical construction
15.3. REALVALUED FUNCTIONS 119
to show that there exists R, i.e. a set having the familiar properties of real numbers
including completeness.
An increasing sequences (x
n
) which is not bounded is said to diverge to ∞ (x
n
↑
∞), i.e. for any a we have x
n
> a for almost all x
n
. Thus, we can summarize:
An increasing sequence either converges to some real number (iff it is bounded) or
diverges to ∞(iff it is unbounded). A similar assertion holds for decreasing sequences.
A simple fact which is an elementary consequence of the order structure says that
every sequence has a monotone subsequence.
Putting terms together we arrive at a very important assertion: Every bounded
sequence (x
n
) has a convergent subsequence. The limit of a subsequence is called
an accumulation point of the original sequence (x
n
). In other words: Every bounded
sequence has at least one accumulation point. An accumulation point x can also be
explained in the follwing way: Every neighborhood of x contains inﬁnitely many x
n
,
but not necessarily almost all x
n
. A sequence can have many accumulation points, and
it is not necessarily bounded to have accumulation points. A sequence has a limit iff it
is bounded and has only one accumulation point, which then is necessarily the limit.
There is a popular criterion for convergence of a sequence which is related to the
assertion just stated. Call a sequence (x
n
) a Cauchysequence if there exist arbitrar
ily small intervals containing almost all x
n
. Cleary every convergent sequence is a
Cauchysequence. But also the converse is true in view of completeness. Indeed, ev
ery Cauchysequence is bounded and can have at most one accumulation point. By
completeness it has at least one accumulation point, and is therefore convergent.
15.3 Realvalued functions
UNDER CONSTRUCTION
15.4 Banach spaces
Let V be a vector space.
15.7 Deﬁnition. A norm on V is a function v → [[v[[, v ∈ V , satisfying the
following conditions:
(1) [[v[[ ≥ 0, [[v[[ = 0 ⇔ v = o,
(2) [[v +w[[ ≤ [[v[[ +[[w[[, v, w ∈ V ,
(3) [[λv[[ ≤ [λ[ [[v[[, λ ∈ R, v ∈ V .
A pair (V, [[.[[) consisting of a vector space V and a norm [[.[[ is a normed space.
15.8 Example.
(1) V = R is a a normed space with [[v[[ = [v[.
120 CHAPTER 15. FOUNDATIONS OF MODERN ANALYSIS
(2) V = R
d
is a normed space under several norms. E.g.
[[v[[
1
=
d
¸
i=1
[v
i
[, [[v[[
2
=
d
¸
i=1
v
2
i
1/2
(Euclidean norm), [[v[[
∞
= max
1≤i≤d
[v
i
[
(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] →R. This is
a vector space. Popular norms on this vector space are
[[f[[
∞
= max
0≤s≤1
[f(s)[
and
[[f[[
1
=
1
0
[f(s)[ ds
The distance of two elements of V is deﬁned to be
d(v, w) := [[v −w[[
This function has the usual properties of a dstance, in particular satisﬁes the triangle
inequality. A set of the form
B(v, r) := ¦w ∈ V : [[w −v[[ < r¦
is called an open ball around v with radius r. As sequence (v
n
) ⊆ V is convergent
with limit v if [[v
n
−v[[ → 0.
A sequence (v
n
) is a Cauchysequence if there exist arbitrarily small balls contain
ing almost all members of the sequence, i.e.
∀ > 0 ∃N() ∈ N such that [[v
n
−v
m
[[ < whenever n, m ≥ N()
15.9 Deﬁnition. A normed space is a Banach space if it is complete, i.e. if every
Cauchy sequence is convergent.
It is clear that R and R
d
are complete under the usual norms. Actually they are
complete under any norm. The situation is completely different with inﬁnite dimen
sional normed spaces.
15.10 Problem. (easy for mathematicians)
Show that C([0, 1]) is complete under [[.[[
∞
.
15.11 Problem. (easy for mathematicians)
Show that C([0, 1]) is not complete under [[.[[
1
.
The latter fact is one of the reasons for extending the notion and the range of the
elementary integral.
15.5. HILBERT SPACES 121
15.5 Hilbert spaces
A special class of normed spaces are inner product spaces. Let V be a vector space.
15.12 Deﬁnition. An inner product on V is a function (v, w) →< v, w >,
v, w ∈ V , satisfying the following conditions:
(1) (v, w) →< v, w > is linear in both variables,
(2) < v, v) ≥ 0, < v, v >= 0 ⇔ v = o.
A pair (V, < ., . >) consisting of a vector space V and an inner product < ., . > is
an inner product space.
An inner product gives rise to a norm according to
[[v[[ :=< v, v >
1/2
, v ∈ V.
15.13 Problem. (easy)
Show that [[v[[ :=< v, v >
1/2
is a norm.
15.14 Example.
(1) V = R is an inner product space with < v, w >= vw. The corresponding norm
is [[v[[ = [v[.
(2) V = R
d
is an inner product space with
< v, w >=
d
¸
i=1
v
i
w
i
The corresponding norm is [[v[[
2
.
(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] →R. This is
an inner product space with
< f, g >=
1
0
f(s)g(s) ds
The corresponding norm is
[[f[[
2
=
1
0
f(s)
2
ds
1/2
15.15 Deﬁnition. An inner product space is a Hilbert space if it is complete under
the norm deﬁned by the inner product.
15.16 Problem. (easy for mathematicians)
Show that C([0, 1]) is not complete under [[.[[
2
.
122 CHAPTER 15. FOUNDATIONS OF MODERN ANALYSIS
Inner product spaces have a geometric structure which is very similar to that of R
d
endowed with the Euclidean inner product. In particular, the notions of orthogonality
and of projections are available on inner product spaces. The existence of orthogonal
projections depends on completeness, and therefore requires Hilbert spaces.
15.17 Problem. (intermediate)
Let C be a closed convex subset of an Hilbert space (V, < ., . >) and let v ∈ C.
Show that there exists v
0
∈ C such that
[[v −v
0
[[ = min¦[[v −w[[ : w ∈ C¦
Hint: Let α := inf¦[[v − w[[ : w ∈ C¦ and choose a sequence (w
n
) ⊆ C such that
[[v − w
n
[[ → α. Apply the parallelogram equality to show that (w
n
) is a Cauchy
sequence.
Bibliography
[1] Heinz Bauer. Probability theory. Translated from the German by Robert B. Bur
ckel. de Gruyter Studies in Mathematics. 23. Berlin: Walter de Gruyter. xv, 523
p. , 1996.
[2] Heinz Bauer. Measure and integration theory. Transl. from the German by Robert
B. Burckel. de Gruyter Studies in Mathematics. 26. Berlin: de Gruyter. xvi, 230
p. , 2001.
[3] Tomasz R. Bielecki and Marek Rutkowski. Credit risk: Moldelling, valuation
and hedging. Springer Finance. Berlin: Springer. xviii, 500 p. , 2002.
[4] Tomas Bjoerk. Arbitrage Theory in Continuous Time. Oxford University Press,
2004.
[5] Pierre Bremaud. Point processes and queues. Martingale dynamics. Springer
Series in Statistics. New York  Heidelberg  Berlin: Springer Verlag. XIX, 354
p. , 1981.
[6] Pierre Brémaud. An introduction to probabilistic modeling. Undergraduate Texts
in Mathematics. New York etc.: SpringerVerlag. xvi, 207 p. , 1988.
[7] Pierre Brémaud. Markov chains. Gibbs ﬁelds, Monte Carlo simulation, and
queues. Texts in Applied Mathematics. New York, NY: Springer. xviii, 444 p.
, 1999.
[8] Jean Dieudonné. Foundations of modern analysis. Enlarged and corrected print
ing. New YorkLondon: Academic Press. XV, 387 p. , 1969.
[9] Michael U. Dothan. Prices in ﬁnancial markets. New York etc.: Oxford Univer
sity Press. xv, 342 p. , 1990.
[10] Edwin Hewitt and Karl Stromberg. Real and abstract analysis. A modern treat
ment of the theory of functions of a real variable. 3rd printing. Graduate Texts
in Mathematics. 25. New York  Heidelberg Berlin: SpringerVerlag. X, 476 p. ,
1975.
123
124 BIBLIOGRAPHY
[11] John C. Hull. Options, futures, and other derivatives. 5th ed. PrenticeHall
International Editions. Upper Saddle River, NJ: Prentice Hall. xxi, 744 p. , 2003.
[12] P.J. Hunt and J.E. Kennedy. Financial derivatives in theory and practice. Revised
ed. Wiley Series in Probability and Statistics. Chichester: John Wiley & Sons.
xxi, 437 p. , 2004.
[13] Albrecht Irle. Financial mathematics. The evaluation of derivatives. (Finanz
mathematik. Die Bewertung von Derivaten) 2., überarbeitete und erweiterte Au
ﬂage. Teubner Studienbücher Mathematik. Stuttgart: Teubner. 302 S. , 2003.
[14] Jean Jacod and Albert N. Shiryaev. Limit theorems for stochastic processes. 2nd
ed. Grundlehren der Mathematischen Wissenschaften. 288. Berlin: Springer.,
2003.
[15] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus.
2nd ed. Graduate Texts in Mathematics, 113. New York etc.: SpringerVerlag.
xxiii, 470 p. , 1991.
[16] Ioannis Karatzas and Steven E. Shreve. Methods of mathematical ﬁnance. Ap
plications of Mathematics. Berlin: Springer. xv, 407 p. , 1998.
[17] Marek Musiela and Marek Rutkowski. Martingale methods in ﬁnancial mod
elling. 2nd ed. Stochastic Modelling and Applied Probability 36. Berlin:
Springer. xvi, 636 p. , 2005.
[18] Salih N. Neftci. Introduction to the mathematics of ﬁnancial derivatives. 2nd ed.
Orlando, FL: Academic Press. xxvii, 527 p. , 2000.
[19] Philip Protter. Stochastic integration without tears (with apology to P. A. Meyer).
Stochastics, 16:295–325, 1986.
[20] Philip E. Protter. Stochastic integration and differential equations. 2nd ed. Ap
plications of Mathematics 21. Berlin: Springer. xiii, 2004.
[21] A.N. Shiryaev. Probability. Transl. from the Russian by R. P. Boas. 2nd ed. Grad
uate Texts in Mathematics. 95. New York, NY: SpringerVerlag. xiv, 609 p. ,
1995.
[22] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume One. John Wiley
and Sons, 2000.
[23] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume Two. John Wiley
and Sons, 2000.
2
Contents
Preliminaries 0.1 Introduction . . . . . 0.2 Literature . . . . . . 0.3 Nature of these notes 0.4 Time table . . . . . . i i ii iii iv
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
I
Measure theory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3 3 4 6 7 7 11 13 13 14 15 17 19 19 20 23 24 27 27 29 31
1 Measure and probability 1.1 Fields and contents . . . . Contents on the real line . Contents on Rd . . . . . . Finite ﬁelds . . . . . . . . 1.2 Sigmaﬁelds and measures 1.3 The extension theorem . .
2 Measurable functions and random variables 2.1 The idea of measurability . . . . . . . . . . . . . 2.2 The basic abstract assertions . . . . . . . . . . . 2.3 The structure of realvalued measurable functions 2.4 Probability models . . . . . . . . . . . . . . . . 3 Integral and expectation 3.1 The integral of simple functions . . . . 3.2 The extension to nonnegative functions 3.3 Integrable functions . . . . . . . . . . . 3.4 Convergence . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
4 Selected topics 4.1 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 4.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Iterated integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . . . . . . . . . . .1 Basic concepts .3 Martingales . . . 8. . . . . . . . . . . . . . .4 Stopping times . . . . . . . . . . . . . . . . . . . . . . 8. . . . Optional stopping . . First passage times of the Wiener process The reﬂection principle . . . . . . . . . . . . . .4 Martingales . . . 8. . . . . . . . . . . . . . . . . . .7 The Markov property . . 8. . . . . . . . . . . 7. . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . Projection properties Convergence . . . . . .2 The information set of random variables . . . . . . . . . . . . . . 9 The ﬁnancial market picture 9. . . . . . . . . . . . . Inequalities . . . 8 The Wiener process 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . .3 Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . .4 Change of numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Martingale measures . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . .1 The ruin problem . . . . . . . . . . . . . . . . . . . Wald’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Financial markets and arbitrage 9. . . . . . 5.5 Augmentation . . . . . . . . . . . 9. . .3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtrations and stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . .2 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two players . . . . . . . . . .6 More on stopping times . . . . . . . . . . . . . .4 CONTENTS II Probability theory 33 35 35 35 36 39 39 40 40 41 41 41 43 43 43 45 46 46 46 47 49 50 52 53 53 55 56 58 59 62 64 66 70 73 73 75 75 77 5 Measure theoretic language of probability 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conditional expectation 6. . .2 Properties . . One player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . .1 The concept . . . . . . . . . . . . .2 Quadratic variation . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . .5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Stochastic sequences 7. . . . . . . .1 Assets and trading strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Basics . . . . 7. .
. . . . Facts . . . . . .2 Martingale representation . . . . . . . . . . . . . . . . 13. . . . . . . . . .3 Levy’s theorem . . . . . . . . . . . . . . . . . Path properties . . . . . . . . . . . . . . . . . . . . . . . . 101 12. The CauchyStieltjes integral . . . . . . . . . . . . . . . . . . . . . . . . . . The integral of stepfunctions . . . . . . . . . . . . . . . . . . . . . 118 . . . . . . . . 11 Applications to ﬁnancial markets 11. Proofs . . . . . . . . . . The market price of risk . . . . . . . . . . .1 Selfﬁnancing trading strategies 11. . . . . .1 Introduction . . . . . . . . . . 12 Stochastic differential equations 101 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The stochastic integral . . . . . . . . . . . . . . . . . . . .1 Elementary Integration . . . . . . . . . . . . . . . . . . . . . .2 Sequences of real numbers . . . Bounded variation . . . . . . . . . . . . . . . . . . . . . . . . . 10. . . . . . . Semimartingales . . . . . The integrationbyparts formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Martingale properties of the stochastic integral .CONTENTS 10 Stochastic calculus 10. . . . Ito’s formula . . . . . . .2 Markovian wealth processes . . .4 Exponential martingale and Girsanov’s theorem 14 Pricing of claims 107 107 107 108 110 112 112 113 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 13 Martingales and stochastic calculus 13. . . . . . . . . . . . . . . .2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. . . . . . . Differential calculus . . . . . . . . . . . . . . . . . . Extending the stochastic integral . . . . . . . . . . . . . .3 Wiener driven models . 5 79 79 79 80 81 83 83 85 86 89 89 89 90 90 93 95 95 96 97 98 99 . . . . . . . .3 The BlackScholes market model The BlackScholes equation . . . . . . . . . . . . . . . .1 Sets and functions . . . . . . . .3 Calculus for the stochastic integral The associativity rule . . . . . . . . . . . . . . . . . . III Appendix 115 15 Foundations of modern analysis 117 15. . . The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 12. . . . . 10. . . . . . . . . . . . . . . . . . . . . . . 13. . .
. . . . . . . . . . . . . . . . .3 Realvalued functions . . . . . .4 Banach spaces . . . . . . . .5 Hilbert spaces . 121 . . . . . . . 119 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 CONTENTS 15. . . . . 119 15. . .
our usage of measure theory and integration is sort of a convenient language which on this level is of little interest in itself. Therefore.1 Introduction The goal of this course is to give an introduction into some mathematical concepts and tools which are indispensable for understanding the modern mathematical theory of ﬁnance. Therefore. Similarly. It is illuminating to have a look at the evolution of some fundamental ideas of deﬁning a dynamic structure of stochastic processes. known under the label of calculus. Probability theory is a young theory compared with the classical cornerstones of mathematics. As for weak stationarity it turns out that typii . a time evolution governed by endogeneous correlation properties. Models which are themselves stationary or are cumulatives of stationary models have determined the econometric literature for decades.e. This language differs considerably from the language of classical analysis. i. Mathematical probability theory is formulated in a language that comes from measure theory and integration. For us its worth arises with its power to give insight into exciting applications like probability and mathematical ﬁnance. We will become more and more familiar with the language and its typical kind of reasoning as we go into those applications for which we are highly motivated. In the ﬁeld of probability theory we are interested in probability models having a dynamic structure. The central topic will be those probabilistic concepts and results which play an important role in mathematical ﬁnance. Such probability models are called stochastic processes. our ﬁrst step will be to get an impression of basic measure theory and integration. Such topics would be closely related to advanced set theory and topology which also differ basically from set theoretic language and topologically driven slang which is convenient for talking about mathematics but nothing more. our presentation of measure theory and integration will be an overview rather than a specialized training program. One important line of thought is looking at stationarity. Let us give an overview of historic origins of some of the mathematical tools. We will not go into the advanced problems of measure theory where this theory becomes exciting. Therefore we have to deal with mathematical probability theory. For Gaussian models one need not distinguish between strict and weak (covariance) stationarity. These will be probability theory and stochastic calculus.Preliminaries 0.
probability theory has many different faces.2 Literature Let us give some comments to the bibliography. Ito. The choice of examples is governed by be the needs of ﬁnancial applications (covering the notion of gambling. However. has been for a long time the standard textbook in Germany on measure theoretic probability. [6] and [7]. The martingale turned out to be the ﬁnal mathematical ﬁxation of the idea of noise. This notion makes it possible to apply differential reasoning to stochastic dynamics. 0. The second concept is that of a stochastic integral due to K. Both texts are mathematically oriented. The notion of a martingale is located between a process with uncorrelated increments and a process with independent increments. The older monograph by Bremaud. of course).e. both of which were the competing noise concepts up to that time. This is not the place to go into details of the overwhelming progress in Markov chains and processes achieved in the ﬁrst half of the 20th century. is not located at the focus of this lecture but contains as appendix an excellent primer on . the main subject of the second half of the lecture will be continuous time processes with a strong focus on the Wiener process. However. However. the notions of martingales. A modern introduction into probability models containing plenty of fascinating phenomena is given by Bremaud. continuous time Markov processes were deﬁned in terms of the dynamic behaviour of their distributions rather than of their states. The popular monograph by Bauer. [21]. The situation changed dramatically about the middle of the 20th century. In contrast. The book by Shiryaev.e. [5]. preferable in abstract terms.ii PRELIMINARIES cal processes follow difference or differential equations driven by some noise process. semimartingales and stochastic integrals are introduced in a way which lays the foundation for the study of more general process theory. The concept of a noise process is motivated by the idea that it does not transport any information. There were two ingenious concepts at the beginning of this disruption. i. they aim at giving complete and general proofs of fundamental facts. The ﬁrst is the concept of a martingale introduced by Doob. is much closer to those modern concepts we are heading to. From the beginning of serious investigation of stochastic processes (about 1900) another idea was leading in the scientiﬁc literature. i. At the beginning of the stochastic part of this lecture we will present an introduction to the ideas of martingales and stopping times at hand of stochastic sequences (discrete time processes). for a long time this theory failed to describe the dynamic behaviour of continuous time Markov processes in terms of equations between single states at different times. using partial difference and differential equations. [1]. However. the Markov property. Such equations have been the common tools for deterministic dynamics (ordinary difference and differential equations) and for discrete time stationary stochastic sequences.
The Wiener systems part of the probability primer by Bremaud gives a very compact overview of the main facts. Diagrams will be drawn on the blackboard and are not copied to the notes. The books by Hull and Wilmott do not pretend to talk about mathematics. Paul Wilmott. and Bielecki and Rutkowski. i. [4]. some are home exercises or classroom exercises to be presented by students. The classroom lecture will be a selection of the notes but in parts present more explanantion and motivation. Our introduction to semimartingale theory follows the outline by Protter. The style of the text is very formal. giving an introduction both to the mathematical theory and ﬁnancial mathematics. and to make the analytical framework a bit more explicit and detailed than Hull does. [15]. [20] (see also [19]). Some of the exercises will be solved during the lecture. Together with informal explanations during the lecture the text should train the students for studying more advanced literature. [17]. called semimartingale theory. general stochastic analysis is in a more or less ﬁnal state. Our topic in stochastic processes will be the Wiener process and the stochastic analysis of Wiener driven systems. Many facts are formulated in the notes as exercises with or without hints. NATURE OF THESE NOTES iii probability theory. Let us mention some basic literature on mathematical ﬁnance. Although this book heavily tries to present itself as not demanding. A standard monograph on this subject is Karatzas and Shreve. [11]. tries to cover all topics in ﬁnancial mathematics together with the corresponding intuition. Today.e. [3]. Musiela and Rutkowski. A very popular book which may serve as a bridge from mathematical probability to ﬁnancial mathematics is by Björk. I consider these books by Hull and Wilmott as a must for any beginner in mathematical ﬁnance.3 Nature of these notes These lecture notes are not intended to be a selfcontained text to be used for self study. [22] and [23].3.0. Let us mention some references which have a similar goal as this lecture. Standard monographs on mathematical ﬁnance which could be considered as cornerstones marking the state of the art at the time of their publication are Karatzas and Shreve. is by Hunt and Kennedy. Wiener driven systems are a very special framework for modelling ﬁnancial markets. Present and future research applies this theory in order to get a much more ﬂexible modelling of ﬁnancial markets. formal or informal) the . The reason is that the combination of ﬁnancial intuition and the appearently informal utilization of advanced mathematical tools requires on the reader’s side a lot of mathematical knowledge in order to catch the intrinsics. [16]. The present lecture should lay some foundations for reading books of that type. The notes are rather an outline of the main concepts and facts. In order to meet the different skills of the audience (applied. In the meanwhile. theoretic. nevertheless the contrary is true. Another book. to present the mathematical theory of stochastic analysis aiming at applications in ﬁnance. There is a standard source by Hull. [12]. 0.
differential notation) stochastic calculus (semimartingales. meaning that the lecture notes without any additional comments may be used during the exam. 0. The solutions should include extensive references of the concepts used in the solution to the notions contained in the notes.iv PRELIMINARIES exercises are classiﬁed with respect to difﬁculty and required mathematical skills.4 Time table The following is concerning with the course from January to march 2006. There are some chapters and sections missing (”under construction”) and a lot of review questions have still to be formulated to be used for the exams. Week Unit Subject 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 measure and probability measure and probability measurable functions exercises integrals and expectation integrals and expectation integrals and expectation exercises conditional expectation stochastic sequences (gambling) stochastic sequences (martingales) exercises Wiener process Wiener process (ﬁrst passage times) Wiener process (stopping times) skiing. The problems of the exams will be exercises of the notes or additional exercises posed in the classroom. stochastic integrals) exercises stochastic calculus (Ito calculus) applications to ﬁnancial markets (Black Scholes model) 3 4 5 6 7 8 . The written exams will be open book exams. The ﬁnal collection of those problems where the exams are sampled from will be ﬁxed during the lecture. Filling of the gaps will be performed in the light of student reactions and feedback during the classes. The notes are not yet ﬁnished. midterm test exercises the ﬁnancial market picture (discrete time trading) stochastic calculus (Stieltjes integrals.
0. martingale representation) martingales (Girsanov theorem) exercises v 10 .4. TIME TABLE 9 23 24 25 26 27 28 exercises linear stochastic differential equations martingales and stochastic integrals martingales (Levy’s theorem.
vi PRELIMINARIES .
Part I Measure theory 1 .
.
Let Ω = ∅ be a set.5 Problem.1 Fields and contents We start with the notion of a ﬁeld. (easy) Let µA be a content. 1. intersection. (intermediate) (a) Show that every content satisﬁes the inclusionexclusion law: µ(A1 ) + µ(A2 ) = µ(A1 ∪ A2 ) + µ(A1 ∩ A2 ) 3 . (easy) Discuss minimal sets of conditions such that a system is a ﬁeld. ∅ ∈ A (2) If A1 . A content is a set function µ deﬁned on a ﬁeld A such that (1) µ(A) ∈ [0.3 Deﬁnition. A2 ∈ A then A1 ∪ A2 ∈ A and A1 ∩ A2 ∈ A (3) If A ∈ A then Ac ∈ A 1. a ﬁeld is a system of subsets where the basic set operations (union.2 Problem.4 Problem. 1. ∞] whenever A ∈ A (2) µ(∅) = 0 (3) µ(A1 ∪ A2 ) = µ(A1 ) + µ(A2 ) whenever A1 . Roughly speaking. A2 ∈ A and A1 ∩ A2 = ∅ 1.Chapter 1 Measure and probability 1. complementation) can be performed without leaving the system. A content is an additive set function on a ﬁeld. The second basic notion is that of a content. A ﬁeld on Ω is a system A of subsets A ⊆ Ω satisfying the following conditions: (1) Ω ∈ A. Then A1 ⊆ A2 implies µ(A1 ) ≤ µ(A2 ).1 Deﬁnition. 1.
1. 2 Further reading: Shiryaev [21].g. e. As we shall see later when we are dealing with measures it is very easy to construct contents. Let µA be a content on a ﬁeld. But it is not easy to construct contents with given properties. 1.9 Problem. paragraph 1.6 Deﬁnition. (advanced) Show that each element B ∈ R can be written as a union of disjoint intervals n B= i=1 (ai .7 Lemma.4 CHAPTER 1. chapter II. The content µ is called σadditive if µ Ai = µ(Ai ) i∈N i∈N i∈N for every pairwise disjoint sequence (Ai )k∈N ⊆ A such that Ai ∈ A. b] where −∞ ≤ a < b ≤ ∞ (leftopen and rightclosed intervals). Contents on the real line Let Ω = (−∞. MEASURE AND PROBABILITY (b) The preceding problem gives a formula for µ(A1 ∪ A2 ) provided that all sets have ﬁnite content. Then: (a) ⇔ (b) ⇒ (c) ⇔ (d) If µ(Ω) < ∞ then all assertions are equivalent. ∞] and let R be the system of subsets arising as unions of ﬁnitely many intervals of the form (a. Extend this forumla to the union of three sets. bi ] (1) .8 Problem. (Include ∅ as the union of nothing). 1. Proof: See Bauer. The next paragraphs deal with the most important examples of contents which later will be extended to those measures that are most common in applications. [1]. 1. (d) For every sequence (Ai )i∈N ⊆ A such the Ai ↓ ∅ with µ(A1 ) < ∞ we have µ(Ai ) ↓ 0. with special geometric properties. Let A be a ﬁeld and let µA be a content. (b) For every sequence (Ai )i∈N ⊆ A such the Ai ↑ A ∈ A we have µ(Ai ) ↑ µ(A). Consider the following properties: (a) µA is σadditive. If a content is σadditive then the content has several continuity properties which facilitate caculations. (intermediate) Explain why R is a ﬁeld. (c) For every sequence (Ai )i∈N ⊆ A such the Ai ↓ A ∈ A with µ(A1 ) < ∞ we have µ(Ai ) ↓ µ(A).
The content λα R is σadditive iff α is right continuous. in this special case the content λα is simply the geometric volume function. < bn−1 ≤ an < bn ≤ ∞. Lebesgue content Let α(x) = x.11 Deﬁnition.10 Problem. Proof: Assume that λα R is σadditive. Let α : R → R be an increasing function. Therefore. The content λα R is called a LebesgueStieltjes content. Hint: Let H be the system of disjoint unions of intervals. Then λα ((a. b] = n∈N (a. 1. b + 1/n] . (Note that in probablity theory this is the usual way to deﬁne probability distributions by distribution functions !) 1. . FIELDS AND CONTENTS 5 where −∞ ≤ a1 < b1 ≤ a2 < b2 ≤ a3 < .13 Lemma. Then from (a.1. (advanced) (a) Show that any (hypothetical) content λα satisfying (2) necessarily satisﬁes n n A= i=1 Ii . In order to extend such contents to greater families of sets we have to check whether λα R is σadditive. We have seen that any increasing function α : R → R deﬁnes a content λα R. . intervals ⇒ λα (A) = i=1 λα (Ii ) (3) (b) Show that using (3) as a deﬁnition is unambiguous. dj. It will be shown in the following exercises that λα ((a.1. (c) Show that (3) deﬁnes a content on R which is ﬁnite on bounded sets. First. where (Ii ) are pw. 1. show that H is closed under intersections. For the latter apply the ﬁrst part of the proof. b]. 1. Second. It is called the Lebesgue content and is denoted by λ. b]) := α(b) − α(a) (2) determines a content on R. that any ﬁnite union of intervals can be written as n Ii = I1 ∪ (I2 \ I1 ) ∪ (I3 \ (I1 ∪ I2 )) · · · i=1 where Ik \ (I1 ∪ · · · ∪ Ik−1 ) is in H.12 Example. b]) = b−a is the length of the interval (a. Be careful when applying the distributive law. Deﬁne α(−∞) = inf α and α(∞) = sup α.
1. For d > 1 it is natural to consider the geometric volume d d λd i=1 (ai . bi ].7(d) and the n=1 assertion for ﬁnite contents. Denote by Rd the set of all ﬁnite unions of sets in Qd .7(d) is satisﬁed. 1. n→∞ n→∞ This means that α is right continuous at b. For the proof see Bauer [1]. 2 Contents on Rd The following is a summary of facts. Let (An ) ⊆ R such that An ↓ ∅. Proofs are similar but sometimes a bit more complicated than in the onedimensional case. b]) = lim λα ((a. . This was the easy part. There is a uniquely determined content λd on Rd such that (4) is satisﬁed. Choose > 0. (This is the socalled ﬁnite intersectionproperty of compect sets).14 Theorem. This proves 1. hence BN = ∅. (a) The set Rd is a ﬁeld. (socalled leftopen rightclosed parallelotops). (At this point rightcontinuity goes in !) Since An ↓ ∅ it follows that Kn ↓ ∅. We show the converse for bounded α only (λα (Ω) < ∞). In order to deﬁne a content on Rd we ﬁrst have to deﬁne the content on Qd and then try to extend it to Rd . MEASURE AND PROBABILITY α(b) − α(a) = λα ((a. The sets in Rd are called ﬁgures. Since is arbitrarily small we have λα ( ∞ An ) = 0. The proof of the converse is a bit more tricky.15 Theorem. It follows that λα (AN ) < . (b) Each set Q ∈ Rd is a union of pairwise disjoint sets of Qd . Let us prove that 1. b + 1/n]) = lim α(b + 1/n) − α(a). The content λd is σadditive. d Denote R := (−∞.6 it follows CHAPTER 1. Since the sets Kn are compact there is some N such that KN = ∅. For a proof see Bauer [1]. For every An we may ﬁnd a compact set Kn and a set Bn ∈ R such that Bn ⊆ Kn ⊆ An and λα (An \ Bn ) < . This was exactly the procedure that we performed for d = 1. ∞] and let Qd be the collection of all subsets Q ⊆ R of the form d Q= i=1 (ai . bi ] := i=1 (bi − ai ) (4) This can actually be extended to Rd resulting in a content called the Lebesgue content. −∞ ≤ ai < bi ≤ ∞.
16 we say that the partition C generates the ﬁeld R. 1. Hint: A set A ∈ R is called an atom if A = ∅. Let us collect the main facts in terms of exercises. µ(Ci ) = 1/m. 1.18 Problem.2 Sigmaﬁelds and measures 1. m) is a ﬁeld on Ω and that it is the smallest ﬁeld containing C. chapter I.) 1.17 Problem. . . The socalled Laplacian deﬁnition of a probability content results in the uniform content. B ∈ R ⇒ B = A Show that the collection C of all atoms of R is a partition generating R.19 Review questions. The preceding assertions are the basis of the elementary theory of probability. SIGMAFIELDS AND MEASURES 7 Finite ﬁelds Since many probabilistic applications are concerned with ﬁnite ﬁelds it is illuminating to discuss the structure of ﬁnite ﬁelds in more detail.20 Deﬁnition. . (Show that for x ∈ Ω the set Ax := {A ∈ R : x ∈ A} is the unique atom containing x. A ﬁeld F on Ω is a σﬁeld if (Fi )i∈N ⊆ F ⇒ i∈N Ai ∈ F . Show that R := i∈α Ci : α ⊆ (1. . In the situation of 1. Cm ) be a ﬁnite partition of Ω. and ∅ = B ⊆ A. .16 Problem. i. Cm } be the generating partition. . C2 . . C2 . 1. Further reading: Shiryaev [21].2. .e. (intermediate) Show that every ﬁnite ﬁeld is generated by a partition. . . How to deﬁne contents on ﬁnite ﬁelds ? 1. Show that for every choice of numbers ai ≥ 0 there exists exactly one content µR sucht that µ(Ci ) = ai . . (easy) Let R be a ﬁnite ﬁeld and let C = {C1 . . (easy) Let C = (C1 . Explain the structure and generation of ﬁnite ﬁelds.1.
Therefore the concept of a measure differs from the concept of a content only on inﬁnite σﬁelds. content) if P (Ω) = 1. Any linear combination of point ist called a discrete measure. 1. MEASURE AND PROBABILITY A pair (Ω. F) where F is a σﬁeld on Ω is called a measurable space. µ) is a measure space. too. too.21 Problem. content) is called a probability measure (resp. (Note that the case F = 2Ω is covered by this deﬁnition. In the following we perform some warming up by discussing some very simple examples of measures. A measure P (resp.8 CHAPTER 1. (b) Show that every countable linear combination of measures with nonnegative coefﬁcients is a measure. (easy) (a) Show that every ﬁnite linear combination of measures with nonnegative coefﬁcients is a measure. 1. F) be any measurable space. (intermediate) (a) A ﬁeld F is a σﬁeld iff (Fi )i∈N ⊆ F ⇒ i∈N Ai ∈ F (b) A ﬁeld F is a σﬁeld iff the union of every increasing (decreasing) sequence of sets in F is in F.23 Problem. Let x ∈ Ω some point and keep it ﬁxed. As for the existence of measures things are easy with ﬁnite ﬁelds. For every A ∈ F deﬁne 1 whenever x ∈ A δx (A) = 0 whenever x ∈ A Show that δx : A → δx (A) is a measure (the onepoint measure at the point x). Actually any ﬁnite ﬁeld is a σﬁeld and any content on a ﬁnite ﬁeld is σadditive (in a trivial sense) and is therefore a measure. A σadditive content which is deﬁned on a σﬁeld is called a A measure (resp. content) is called ﬁnite if µ(Ω) < ∞. .24 Problem.) 1.22 Deﬁnition. measure. (easy) Let (Ω. P ) is called a probability space. F. (c) A ﬁeld F is a σﬁeld iff the union of every pairwise disjoint sequence of sets in F is in F. If µF is a measure then (Ω. 1. If P F is a probability measure then (Ω. F.
We have to be more modest and to be satisﬁed if we ﬁnd measures that are deﬁned on σﬁelds containing at least reasonable sets indispensable for applications. am be any pairwise different points in Ω and keep them ﬁxed.g. an } be the set of different components of x and denote by fj the relative frequency of aj in x. pm be any nonnegative numbers. (b) Write the geometric distribution as a linear combination of point measures.2.26 Problem. (easy) (a) Write the binomial distribution as a linear combination of point measures. Let p1 . Then n µ := i=1 pi δai ⇒ µ(A) = i:ai ∈A pi . the ﬁeld of ﬁgures in Rd containing all rectangles. Let {a1 . But how to proceed from ﬁelds to σﬁelds ? Let A be a ﬁeld which is not a σﬁeld. Usually it is not very difﬁcult to ﬁnd a ﬁeld which contains sufﬁciently many reasonable sets. . The following questions arise: (1) Are there any σﬁelds F containg A ? (2) If yes.) When we are dealing with point measures or linear combinations of point measures we need not worry about σﬁelds since such measures are welldeﬁned on F = 2Ω . 1. . however. a2 . . . p2 . .25 Problem. (advanced) (a) The system 2Ω (system of all subsets of Ω is a σﬁeld. x2 . . . . (easy) Describe the values of ﬁnite linear combinations of onepoint measures: Let a1 .28 Problem. .1. (easy) Let x = (x1 . it is not possible to deﬁne measures with given properties on F = 2Ω . Show that the empirical measure satisﬁes 1 n n m δxi = i=1 j=1 fj δaj (This is the ”frequency table” of an empirical distribution. is there a smallest σﬁeld containing A ? The answer to both questions is yes. . . . In general. A ⊆ Ω. . . 1. 1. (b) The intersection of any family of σﬁelds is a σﬁeld. e. . a2 .27 Problem. xn ) be any ﬁnite sequence of elements in Ω (e. We would like to enlarge A in such a way that the result is a σﬁeld. SIGMAFIELDS AND MEASURES 9 1. .g. an empirical sample). .
(intermediate for mathematicians) What is the σﬁeld on R that is generated by the onepoint sets of R ? Answer: The system of sets which are either countable or the complement of a countable set. µ) is a measure space. 1.33 Deﬁnition. (b) Show that for every sequence of numbers an ≥ 0 there is exactly one measure µF such that µ({n}) := an . 1. F is a σﬁeld ⇒ C ⊆ σ(C) ⊆ F 1.30 Problem. 2N . A ⊆ N (the counting measure). (intermediate) (a) Let Ω = N and F = 2N . MEASURE AND PROBABILITY (c) Let C be any system of subsets on Ω and denote by σ(C) the intersection of all σﬁelds containing C: σ(C) = F C⊆F Then σ(C) is the smallest σﬁeld containing C: C ⊆ F. It turns out that the situation is simple as long as Ω is a countable set.31 Problem. Show that (N. 2N ). The σﬁeld on R (Rd ) which is generated by the algebra R (Rd ) is called the Borel σﬁeld and is denoted by B = B(R) (B(Rd )).32 Problem. Deﬁne µ(A) := A. For any system C of sets in Ω the smallest σﬁeld F that contains C is called the σﬁeld generated by C and is denoted by F = σ(C). 1. . (c) Discuss how to deﬁne probability measures on (N.29 Deﬁnition. The following exercise shows that for Ω = R the system of onepoint sets is not sufﬁcient to generate a reasonable σﬁeld. The system C is called a generator of F.10 CHAPTER 1. The preceding exercise shows: If we want to have all onepoint sets in the σﬁled then for countable Ω every subset of Ω has to be in the σ ﬁeld. 1. (easy) Show that the σﬁeld on N which is generated by the onepoint sets of N is F = 2N . The preceding exercise shows that onepoint sets do not generate a σﬁeld on R which contains intervals ! Therefore we have to include intervals in our generating system. The starting point is the algebra R of ﬁgures.
This is done by µ∗ (M ) = inf i∈N µ(Ai ) : (Ai )i∈N ⊆ A.g. M ⊆ i∈N Ai Unfortunately. It is clear that for the existence of an extension the content must be σadditive. It is clear that µ∗ A = µA. The ﬁrst step of the proof is to try an extension of the content to all subsets M ⊆ Ω. thus at least a measure extension to σ(A).34 Problem. It turns out that M is a σﬁeld and therefore σ(A) ⊆ M. . For ﬁnite contents it is even sufﬁcient.35 Problem. The set function µ∗ is a socalled outer measure. B(R)). THE EXTENSION THEOREM The sets in the Borel σﬁeld are called Borel sets. 1.36 Review questions. (b) Is Q a Borel set ? 11 1. 1.1.38 Theorem. 2N ). Further reading: Bauer [1]) Let us indicate some ideas of the proof. (easy for mathematicians) Show that B(Rd ) contains all open sets and all closed sets. What is a ﬁeld and what is a σﬁeld ? What is the difference between a content and a measure. 1. This is a necessary condition. even not in a content. B(Rd )) are generated. (easy) (a) Show that B contains all intervals (including onepoint sets). this deﬁnition does not result in a measure. (R. 1. (Rd .l. Moreover. Therefore it is a measure extension of µA to some σﬁeld containing A. The extension problem deals with the question whether a given content on a ﬁeld A can be extended to a measure on the σﬁeld σ(A). Next. Every ﬁnite σadditive content µA deﬁned on a ﬁeld has a uniquely determined measure extension to F = σ(A). it is shown that the restriction µ∗ M is a measure. deﬁne M := {M ⊆ Ω : µ∗ (M ) + µ∗ (M c ) = 1} It is clear that A ⊆ M. Explain the ideas of generating a σﬁeld by a system of sets. Proof: (Outline.37 Review questions. Explain how the measurable spaces (N. we assume that µ(Ω) = 1. W. 1.3.3 The extension theorem The fundamental problem of measure theory is the extension problem.
41 Deﬁnition. 1. State the measure extension theorem. Then it follows that σ(A) ⊆ M1 . MEASURE AND PROBABILITY The uniqueness of the extension is shown in the following way.39. 1. but a bit more complicated than in the ﬁnite case. We may apply the measure extension theorem since every λα R is obviously σﬁnite. b]) = α(b) − α(a). However. the construction of such sets can be very complicated. 1. Show how to apply this theorem for deﬁning Borel measures on R. It should be noted that there are subsets of R (resp. 1. (2) Prove assertion (2) of Remark 1. For every increasing and right continuous function α : R → R there is a uniquely determined measure λα B such that λα ((a. from (1) it follows that for every M ∈ M there is some A ∈ σ(A) such that µ(M \ A) = 0 and µ(A \ M ) = 0. The proof is similar. For α(x) = x the measure λα = λ is called the Lebesgue measure.42 Theorem. Let M1 := {M ∈ σ(A) : µ1 (M ) = µ2 (M )} By assumption we have A ⊆ M1 . 2 1. What about the extension of nonﬁnite contents ? 1.12 CHAPTER 1. Rd ) that are not Borel sets.40 Problem. A content µA is called σﬁnite if there is a sequence (Ai )i∈N ⊆ A such that i∈N Ai = Ω and µ(Ai ) < ∞ for every i ∈ N. Let µ1 σ(A) and µ2 σ(A) be two extensions of µA. .39. (1) Although the sets in σ(A) or M can be rather complicated they don’t differ very much from sets in A: For every M ∈ M and every (arbitrarily small) > 0 there is a set A ∈ A such that µ(M \ A) < and µ(A \ M ) < . Every σﬁnite σadditive content µA deﬁned on a ﬁeld has a uniquely determined measure extension to F = σ(A). (2) The proof of the measure extension theorem results in an extension to M which actually is a larger σﬁeld than σ(A). It can be shown that M1 is a σﬁeld. Assume that µ(Ω) < ∞.44 Review questions. The following remarks can be understood (”proved”) with the information provided by the outline of the proof of the measure extension theorem.43 Corollary.39 Remarks. (advanced for mathematicians) (1) Prove assertion (1) of Remark 1. However.
This the case iff the inverse image (X ∈ B) = X −1 (B) is in A. This is the image of µ under f or the distribution of f under µ. B) is (A. µ) be a measure space and let (Y. If f : (Ω. Usually we are only interested in the probabilities P (X ∈ B) the collection of which is the distribution P X . B)measurable then we may deﬁne −1 µf (B) := µ(f ∈ B) = µ(f −1 (B)). A. The following example serves as a ﬁrst motivation. This property is called measurability. B) is called (A. 13 . A function f : (Ω.2 Deﬁnition. i. The distribution is a set function on (R. B)measurable if f (B) ∈ A for all B ∈ B. B) be a measurable space. 2. A.e.1 Example. More details concerning measure theoretic probability concepts are given in section 2. This is a function from a probability space (Ω. Let X be a random variable. B) via the function X. Therefore a random variable cannot be an arbitrary function X : Ω → R but must satisfy (X ∈ B) ∈ A for all B ∈ B.4. P ) to R.1 The idea of measurability Let (Ω.Chapter 2 Measurable functions and random variables 2. A. µ) → (Y. P X (B) = P (X ∈ B). 2. Moreover. A) → (Y. B ∈ B. Distribution of a random variable The concept of the distribution of a random variable an important special case of mapping a measure from one set to another. for deﬁning P X (B) it is essential that the expression P (X ∈ B) makes sense. Since the probability space is a rather abstract object it is convenient to put the essentials of the random variable X into analytically tractable terms. let f : Ω → Y be a function. B ∈ B. We are going to consider the problem of mapping the measure µ to the set Y be means of the function f . B) and it is deﬁned by mapping the probability measure P to (R. However.
a2 . To get an idea what measurability means let us consider some simple examples. It follows that very complicated functions are Borelmeasurable. B)measurable. . f = 1Q . 2. µ) be a measure space and let f : Ω → R be a simple function.5 Problem. µ) be a measure space and let f = 1A where A ⊆ Ω. (b) Find µf . (a) Show that f is Fmeasurable iff all sets of the canonical representation are in F. There is exactly one canonical representation. This is the canonical representation of f . (b) Find µf . (1) When we consider realvalued functions then we always use the Borelσﬁeld in the range of f . F) → (R. e. . MEASURABLE FUNCTIONS AND RANDOM VARIABLES 2.2 The basic abstract assertions There are two fundamental principles for dealing with measurability. . (easy) Let Ω. It is canonical iff both the sets supporting the indicators are pairwise disjoint and the coefﬁcients are pairwise different. Any linear combination of indicator functions is simple but need not be canonical. B)measurability is called Borel measurability. E. 2. The term ”Borel” is thus concerned with the σﬁeld in the domain of f . an } denotes the set of different function values of f . 2. Let us agree upon some terminology. Recall that a simple function is a realvalued function which has only ﬁnitely many values. (easy) Let Ω.g. (a) Show that f is Fmeasurable iff A ∈ F. (easy) Show that µf is indeed a measure on B. Any simple function f can be written as n f= i=1 ai 1Fi if Fi = (f = ai ) where {a1 .g.: If f : (Ω. B) then we simply say that f is Fmeasurable if we mean that it is (F.14 CHAPTER 2. The ﬁrst proiciple says that measurability is a property which is preserved under composition of functions. . (2) When we consider functions f : R → R then (B.4 Problem. .3 Problem. F. F.
9 Problem. f2 . Hint: Note that (f ≤ α) is a closed set.6.3 The structure of realvalued measurable functions Let (Ω. (intermediate) Fill in the details of the proof of 2. 2. 2. Then f is (A. F) → R be Fmeasurable. f − . A) → (Y. C) be (B. .8. and every polynomial a0 + a1 f + · · · + an f n are Fmeasurable. . fn ) : Ω → Rn . . 2. Explain the abstract concept of a measurable function. . 2. f . The second principle is concerned with checking measurability. (intermediate) Let (f1 . Let L(F) be the set of all Fmeasurable realvalued functions. Then f = (f1 . C)measurable. (b) Show that every continuous function f : Rn → R is B n measurable. . It can be shown that D is a σﬁeld.10 Review questions. A) → (Y. 2. This provides us with a lot of examples of Borelmeasurable functions.2. (c) Let f : (Ω. .12 Problem.6 Theorem. Proof: Let D := {D ⊆ Y : f −1 (D) ∈ A}. B = σ(C).8.13 Problem. f2 .e. Let f : (Ω. Let f : (Ω. (intermediate) Show that a function f : Ω → R is Fmeasurable iff (f ≤ α) ∈ F for every α ∈ R. This implies σ(C) ⊆ D. (easy) Prove 2. B) → (Z. We start with the most common and most simple criterion for checking measurability of a realvalued function. C)measurable. . If −1 f (C) ∈ A for all C ∈ C then C ⊆ D. 2. and let g : (Y. (easy) (a) Show that every monotone function f : R → R is Borelmeasurable. B) be (A. For checking measurability it is sufﬁcient to consider the sets in a generating system of the range σﬁeld.7 Problem. The next exercise is a ﬁrst step towards the measurability of expressions involving several measurable functions. F) be a measurable space. i. Then g ◦ f is (A. .11 Problem. 2 2. Show that f + . B)measurable. B)measurable iff f −1 (C) ∈ A for all C ∈ C.3. 2. Hint: Apply 2. State the basic abstract properties of measurable functions. THE STRUCTURE OF REALVALUED MEASURABLE FUNCTIONS 15 2. B) and let C be a generating system of B. fn be measurable functions.8 Theorem.
2. f1 ∩ f2 . f1 ∪ f2 are measurable functions. Let A := (∃ limn fn ). fn ) is measurable.6. We have A := (∃ lim fn ) = n sup inf fn = inf sup fn k n≥k k n≥k This implies A ∈ F. Proof: Apply 2. . The last statement follows from lim fn = sup inf fn n k n≥k on A. any function which can be written as an expression involving countable many operations with countable many measurable functions is measurable. Then supn fn .11 that supn fn and inf n fn = − supn (−fn ) are measurable. Let us denote the set of all Fmeasurable simple functions by S(F). . inf n fn are measurable functions. Then f1 + f2 . all limits of simple measurable functions are measurable.g.16 Problem. f1 · f2 . . Then for every continuous function φ : Rn → R the composition φ(f1 . Prove 2.6. f2 . 2. 2 2. Let f1 . Proof: Since (sup fn ≤ α) = (fn ≤ α) n n it follows from 2. 2. Let (fn )n∈N be a sequence of measurable functions. f2 be measurable functions.15 Corollary. As a result we see that L(F) is a space of functions where we may perform any algebraic operations without leaving the space. we may even perform all of those operations involving a countable set (e. MEASURABLE FUNCTIONS AND RANDOM VARIABLES is (F.15. It is rather difﬁcult to construct nonmeasurable functions. B n )measurable.14 Corollary.17 Theorem. . Moreover. f2 . Then A ∈ F and limn fn · 1A is measurable. Next we turn to the question how typical measurable functions look like. .18 Theorem. . 2 Note that the preceding corollaries are only very special examples for the power of theorem 2. . fn be measurable functions. . Roughly speaking. Clearly. The remarkable fact being fundamental for almost everything in integration theory is the converse of this statement. Let f1 . . a sequence) of measurable functions ! 2. (a) Every measurable function f is the limit of some sequence of simple measurable functions.16 CHAPTER 2. Thus it is a very convenient space for formal manipulations.
Proof: The fundamental statement is (c). Any Fmeasurable realvalued function X : Ω → R is called a random variable. 2. P ) and a random variable X such that P X = λF ? . 2. k = 1. . F.4 Probability models The term random variable is simply the probabilistic name of a measurable function.20 Review questions.19 Problem. P ) be a probability space. n2n whenever f ≥ n Then fn ↑ f . . Explain the role of simple functions. Let f ≥ 0. 2. If f is bounded then (fn ) converges uniformly to f . .4. (easy) (a) Show that any distribution function is rightcontinuous. 2 2. Parts (a) and b follow from f = f + − f − . . 2. For every n ∈ N deﬁne fn := (k − 1)/2n n whenever (k − 1)/2n ≤ f < k/2n . PROBABILITY MODELS 17 (b) If f is bounded then the approximating sequence can be chosen to be uniformly convergent. Let X be a random variable.22 Problem. (c) If f ≥ 0 then the approximating sequence can be chosen to be increasing. does there exist a probability space (Ω.18. 1] deﬁned by FX (x) := P (X ≤ x). the distribution function FX determines the values of the distribution P X on intervals by P X ((a. Thus. Then the function FX : R → [0. Let (Ω.21 Deﬁnition. F. B ∈ B. the image of P under X deﬁned by P X (B) := P (X −1 (B)) = P (X ∈ B).e. b]) = F (b) − F (a). (easy) Draw a diagram illustrating the construction of the proof of 2.2. Describe the structure of the set of realvalued measurable functions. The distribution of X is P X . x∈R is the distribution function of X. A major problem of probability theory is the converse problem: Given a function F . i. (b) Show that the distribution P X = λF . 2.
25 Problem. . 1] be increasing. MEASURABLE FUNCTIONS AND RANDOM VARIABLES 2. X2 ≤ x2 . chapter II. (intermediate) Let F be a distribution function. F. . . 2. . . sections 13. is a σadditive content on Rd . 1] which is increasing. Xd ) be a random vector. . Further reading: Shiryaev [21]. Show that there is a probability space (Ω.27 Review questions. . . .24 Deﬁnition. Q ∈ Rd . xd ) But this only makes sense if this deﬁnition can be extended to an σadditive content P X (Q) := P (X ∈ Q).18 CHAPTER 2. xd ) := P (X1 ≤ x1 . x2 . . . . Then the distribution P X (Q) := P (X ∈ Q). . . The joint distribution function of X is deﬁned to be F (x1 . . rightcontinuous and satisﬁes F (−∞) = 0 and F (∞) = 1. . (easy) Let F : R → [0. P ) and a random variable X such that P (X ≤ x) = F (x). P = λF and X(ω) = ω. Joint distribution functions Let X := (X1 . .23 Problem. X2 ≤ x2 . x2 . There are conditions on F to guarantee the possibility of such an extension. Xd ≤ xd ) := F (x1 . 2.26 Example. . Xd ≤ xd ) For deﬁning joint distributions one usually goes the other way round and starts with a function F : Rd → R to deﬁne P (X1 ≤ x1 . Hint: Let Ω = R. . . 2. paragraph 3. X2 . Explain how the measure extension theorem is applied to construct probability spaces and random variables with given distributions. Q ∈ Rd . . Show that λF is a σadditive probability content iff F is rightcontinuous and satisﬁes F (−∞) = 0 and F (∞) = 1. 2. F = B(R). A distribution function is a function F : R → [0.
19 . Hint: Try to ﬁnd the canonical representation of f + g in terms of the canonical representations of f and g.4 Problem. Then n f dµ := i=1 ai µ(Fi ) is called the µintegral of f . If we were dealing with a ﬁnite measure µ the deﬁnition would work for all Fmeasurable simple functions.1 The integral of simple functions Let (Ω. F. t ∈ R+ and f. We had to restrict the preceding deﬁnition to nonnegative functions since we admit the case µ(F ) = ∞. The µintegral on S(F)+ has the following properties: (1) 1F dµ = µ(F ). 2 3.2 Example. Then we have E(X) = X dP . (intermediate) Show that (f + g)dµ = f dµ + gdµ for f. P ) be a probability space and let X = variable.3 Theorem. 3.1 Deﬁnition. µ) be a measure space. Let (Ω. We start with deﬁning the µintegral of a measurable simple function. (2) (sf + tg) dµ = s f dµ + t g dµ if s. g ∈ S(F)+ Proof: The only nontrivial part is to prove that (f + g)dµ = f dµ + gdµ. n 3. n i=1 ai 1Fi be a simple random 3. g ∈ S(F)+ . g ∈ S(F)+ (3) f dµ ≤ g dµ if f ≤ g and f. Let f = i=1 ai 1Fi be a nonnegative simple Fmeasurable function with its canonical representation. F.Chapter 3 Integral and expectation 3.
INTEGRAL AND EXPECTATION It follows that the deﬁning formula of the µintegral can be applied to any (nonnegative) linear combination of indicators. Let f ∈ S(F)+ and (fn ) ⊆ S(F)+ . we should ask whether the limit on the right hand side exists. 3. The reader whose does not like to worry should grasp Beppo Levi’s theorem and proceed to the next section. F. Then fn ↑ f ⇒ lim n fn dµ = f dµ . This sequence either has a ﬁnite limit or it increases to ∞. P ) be a probability space and X a random variable with distribution function F . The second and far more subtle question is whether the deﬁnition is compatible with the deﬁnition of the integral on S(F). ∞]. It is a natural idea to think of the integral of f as something like f dµ := lim n fn dµ (5) This is actually the way we will succeed. µ) be a measure space and let g ∈ L(F).2 The extension to nonnegative functions We know that every nonnegative measurable function f is the limit of an increasing sequence (fn ) of measurable simple functions: fn ↑ f . (easy) Prove 3.7 Problem.5 Theorem.20 CHAPTER 3. Explain the formula E(f ◦ X) = f dλF 3.8 Theorem. The integrals fn dµ form an increasing sequence in [0. This is the only nontrivial part of the extension process of the integral and it is the point where σadditivity of µ is required. First of all. But there are some points to worry about.6 Problem. Then for every f ∈ S+ (B) f ◦ g dµ = f dµg 3.5. not only to canonical representations ! 3. Both cases are covered by our deﬁnition. (easy) Let (Ω. 3. (Transformation theorem) Let (Ω. F.
3 carries over to L(F)+ . Then fn ↑ f ⇒ lim n fn dµ = f dµ .8 to L(F)+ . 3. we have a valid deﬁnition (5) of the integral on L(F)+ . Proof: It is sufﬁcient to prove the assertion with ”≤” replacing ”=”.10 Theorem. g ∈ L(F)+ The extension process is complete if we succed to extend 3.11 Theorem. Then lim fn = lim gn ⇒ lim n n n fn dµ = lim n gn dµ. 3. This is straightforward using 3. Let (fn ) and gn ) be increasing sequences of nonnegative measurable simple functions. (Theorem of Beppo Levi) Let f ∈ L(F)+ and (fn ) ⊆ L(F)+ . g ∈ L(F)+ (3) f dµ ≤ g dµ if f ≤ g and f. Since limk fn ∩ gk = fn ∩ limk gk = fn we obtain by 3. The µintegral on L(F)+ has the following properties: (1) 1F dµ = µ(F ). We get n n f dµ = j=1 αj µ(Aj ) = lim n j=1 αj µ(Aj ∩ Bn ) = lim n 1Bn f dµ which implies f dµ ≤ lim n fn dµ · (1 + ) 2 Since is arbitrarily small the assertion follows.2. For an arbitrary > 0 let Bn := (f ≤ fn · (1 + )).8 fn dµ = lim k fn ∩ gk dµ ≤ lim k gk dµ 2 Thus.8. THE EXTENSION TO NONNEGATIVE FUNCTIONS 21 Proof: Note that ”≤” is clear. It is now straightforward that 3. t ∈ R+ and f. It is clear that 1Bn f dµ ≤ 1Bn fn · (1 + ) dµ ≤ fn dµ · (1 + ) From Bn ↑ Ω it follows that A ∩ Bn ↑ A and µ(A ∩ Bn ) ↑ µ(A) by σadditivity.3. (2) (sf + tg) dµ = s f dµ + t g dµ if s. 3.9 Theorem. The third question is whether the value of the limit is independent of the approximating sequence.
. 3. 3. .e. Show that f dµ = 0 implies µ(f = 0) = 0. It follows that gk ↑ f and f dµ = lim k gk dµ ≤ lim k fk dµ 2 3. . INTEGRAL AND EXPECTATION Proof: We have to show ”≥”. Hint: Show that µ(f > 1/n) = 0 for every n ∈ N. Using this terminology the assertion of the preceding exercise can be phrased as: f dµ = 0. Deﬁne gk := f1k ∪ f2k ∪ . Consider gk := inf n≥k fn and apply Levi’s theorem to (gk ).14 Problem.22 CHAPTER 3. a > 0.13 Problem. For every n ∈ N let (fnk )k∈N be an increasing sequence in S(F)+ such that limk fnk = fn .12 Problem. (intermediate) Let f ∈ L(F)+ . (intermediate for mathematicians) For every sequence (fn ) of nonnegative measurable functions we have ∞ ∞ fn dµ = n=1 n=1 fn dµ 3. (intermediate) Let f ∈ L(F)+ . Prove Markoff’s inequality µ(f > a) ≤ 1 a f dµ. f ≥ 0 ⇒ f = 0 µa. (intermediate for mathematicians) Prove Fatou’s lemma: For every sequence (fn ) of nonnegative measurable functions lim inf n fn dµ ≥ lim inf fn dµ n Hint: Recall that lim inf n xn = limk inf n≥k xn .15 Problem.) if µ(Ac ) = 0. ∪ fkk Then fnk ≤ gk ≤ fk ≤ f whenever n ≤ k. An assertion A about a measurable function f is said to hold µalmost everywhere (µa.e.
3. INTEGRABLE FUNCTIONS 23 If we are talking about probability measures and random variables ”almost everwhere” is sometimes replaced by ”almost sure”. 3. The value of the integral may be ∞.23 Problem. (easy) (a) Discuss the question whether bounded measurable functions are integrable. 3.22 Problem.21.21 Theorem. (easy) Prove 3. 3.3 Integrable functions Now the integral is deﬁned for every nonnegative measurable function. Then f is integrable. The set L1 (µ) is a linear space and the µintegral is a linear functional on L1 (µ). (easy) Show that f ∈ L(F) is µintegrable iff f  dµ < ∞. 3.19 Theorem. 3. 3. (easy) Let f be a measurable function and assume that there is an integrable function g such that f  ≤ g. 3. (intermediate) Prove 3.3.3. Show that f dµ < ∞ implies µ(f > a) < ∞ for every a > 0.20 Problem. Show that  f dµ ≤ f  dµ. 3. If f is µintegrable then f dµ := f + dµ − f − dµ f + dµ < ∞ and The set of all µintegrable functions is denoted by L1 (µ) = L1 (Ω. A measurable function f is µintegrable if − f dµ < ∞. µ). (easy) Let f ∈ L(F)+ .16 Problem. 3. 3.25 Problem. F. In order to deﬁne the integral for measurable functions which may take both positive and negative values we have to exclude inﬁnite integrals. Proving the basic properties of the integral of integrable functions is an easy matter. We collect these fact in a couple of problems.19. The µintegral is an isotonic functional on L1 (µ).24 Problem.18 Problem.17 Deﬁnition. (easy) Let f ∈ L1 (µ). .
(b) Let f be an integrable function.29 Problem.28 Problem. For notational convenience we denote f dµ := A 1A f dµ. Then f is integrable and f dµ = 0. The problem is the following. (b) Let f and g be integrable functions.. A ∈ F. (easy) Show that integrals are linear with respect to the integrating measure. Assume that (fn ) is a sequence of integrable functions converging to some function f . ⇔ A f dµ = 0 for all A ∈ F 3. Assertions of such a type need only be proved for indicators. INTEGRAL AND EXPECTATION (b) Characterize those measurable simple functions which are integrable.27 Problem.. Then f is integrable iff g is integrable.26 Problem.4 Convergence One of the reasons for the great success of abstract integration theory are the convergence theorems for integrals. The procedure of proving (understanding) an assertion for indicators and extending it to nonnegative and to integrable functions is called measure theoretic induction. 3.e.e.24 CHAPTER 3.e. 3. (easy) (a) Let f and g be measurable functions such that f = g µa. (easy) Extend the transformation theorem by measure theoretic induction. 3. (easy) (a) Let f be a measurable function such that f = 0 µa. Then f = 0 µa. 3. ⇔ A f dµ = A g dµ for all A ∈ F Many assertions in measure theory concerning measurable functions are stable under linear combinations and under convergence.e. Then f = g µa. When can we conclude that f is integrable and lim n fn dµ = f dµ ? .
fn  ≤ g.32 Problem.4.31 Problem. the sequences g − fn and g + fn consist of nonnegative measurable functions. Dominated convergence theorem Let (fn ) be a sequence of measurable function which is dominated by an integrable function g.e. However. CONVERGENCE 25 The most popular result concerning this issue is Lebesgue’s theorem on dominated convergence. (easy) Discuss the question whether a uniformly bounded sequence of measurable functions fulﬁlls is dominated in the sense of the dominated convergence theorem.12. the proof is very easy and follows straightforward from Levi’s theorem 3. (easy) Show that under the assumptions of the dominated convergence theorem we even have lim fn − f  dµ = 0 n (This type of convergence is called mean convergence. n ∈ N. i. .) 3. ⇒ f ∈ L1 (µ) and lim n fn dµ = f dµ The dominated convergence can be used and applied like a black box without being aware of its proof. Proof: Integrability of f is obvious.3. Therefore we may apply Fatou’s lemma: (g − f ) dµ ≤ lim inf and (g + f ) dµ ≤ lim inf This implies f dµ ≤ lim inf n (g − fn ) dµ = g dµ − lim sup n fn dµ (g + fn ) dµ = g dµ + lim inf n fn dµ fn dµ ≤ lim sup n fn dµ ≤ f dµ 2 3.e. Moreover. Then fn → f µa. 3.30 Theorem.11 and Fatou’s lemma 3.
26 CHAPTER 3. INTEGRAL AND EXPECTATION .
g ∈ L1 . µ) is a vector space. In order to get a normed space one has to change the space L1 in such a way that all functions f = g µa. The space of integrable functions modiﬁed in this way is denoted by L1 = L1 (Ω. However.} denote the equivalence class of f . f. are considered as equal. (3) λf 1 ≤ λ f 1 .e. Then integrability is a classproperty and the space ˜ L1 := {f : f ∈ L1 } 27 . We would like to deﬁne a norm on L1 . .e. f ∈ L1 .1 Discussion.e. P ). f ∈ L1 . 4. λ ∈ R. can be considered as the null element of the vector space. (2) f + g1 ≤ f 1 + g1 .1 Spaces of integrable functions We know that the space L1 = L1 (Ω. f = 0 ⇒ f 1 = 0. A function with zero norm need not be identically zero ! Therefore. It is easy to see that this deﬁnition has the following properties: (1) f 1 ≥ 0. Then f = 0 µa.Chapter 4 Selected topics 4.e.1 is not a norm on L1 but only a pseudonorm. For any f ∈ L(F) let ˜ f = {g ∈ L(F) : f = g µa. F. A natural idea is to deﬁne f 1 := f  dµ. F. we have f 1 = 0 ⇒ f = 0 µa. For those readers who want to have hard facts instead of soft wellness we provide some details.
˜ It is common practise to work with L1 instead of L1 but to write f instead of f . 1 2i fn1  + i=1 fni+1 − fni  < ∞ µa. In particular. i. 4. m ≥ N ( ). µ) = {f ∈ L(F) : This is another important space of integrable functions.e. Since absolute convergence of series in R implies convergence (here completenes of R goes in) the partial sums fn1 + (fn2 − fn1 ) + · · · + (fnk − fnk−1 ) = fnk converge to some limit f µa. P ) is a Banach space. This is a typical example of what mathematicians call abuse of language. SELECTED TOPICS is a vector space. f 2 dµ < ∞} .2 Theorem. The value of the integral depends only on the class and therefore it ˜ deﬁnes a linear function on L1 having the usual properties. F. (easy) (a) Show that L2 is a vector space. Mean convergence of (fn ) follows from Fatou’s lemma by fn − f  dµ = k→∞ lim fn − fnk  dµ fn − fnk  dµ < whenever n ≥ N ( ). The space L1 (Ω. ∀ > 0 ∃N ( ) such that Let ni := N (1/2i ). F.e.3 Problem.28 CHAPTER 4. 4.e. Then fni+1 − fni  dµ < It follows that for all k ∈ N fn1  + fn2 − fn1  + · · · + fnk+1 − fnk  dµ ≤ C < ∞ Hence the corresponding inﬁnite series converges which implies that ∞ fn − fm  dµ < whenever n. f 1 := f 1 deﬁnes a norm on L1 . 2 ≤ lim inf k→∞ Let L2 = L2 (Ω. Proof: Let (fn ) be a Cauchy sequence in L1 .
MEASURES WITH DENSITIES (b) Show that 29 f 2 dµ < ∞ is a property of the µequivalence class of f ∈ L(F). Since f 1Mn and g1Mn are µintegrable it follows that f 1Mn = g1Mn µa. f >= f 2 dµ 1/2 f g dµ. F. 4.5 Problem. Proof: Let µ(M ) < ∞ and deﬁne Mn := M ∩ (f ≤ n) ∩ g ≤ n). µ) is a Hilbert space. This is not true. We would like to say that f is the density of ν with respect to µ but for doing so we have to be sure that f is uniquely determined by ν. 2 Now. . By L1 = L1 (Ω.e.2. µ) be a measure space and let f ∈ L+ (F). 4. 4.8 Deﬁnition. If µ is ﬁnite or σﬁnite then f dµ = A A g dµ ∀A ∈ F ⇒ f = g µa. 4.e.6 Lemma.4. g ∈ L2 .7 Theorem. 4.e. (intermediate) Show that ν : A → A f dµ. g ∈ L+ (F). F.e. Let µ be σﬁnite and deﬁne ν:A→ A f dµ. the basic uniquenes theorem follows immediately. g >:= The corresponding norm is f 2 =< f. on every set of ﬁnite µmeasure. For n → ∞ we have Mn ↑ M which implies f 1M = g1M µa. Then f dµ = A A g dµ ∀A ∈ F ⇒ µ((f = g) ∩ A) = 0 ∀µ(A) < ∞.2 Measures with densities Let (Ω. On this space there is an inner product < f. f. A ∈ F.4 Theorem. in general. The space L2 (Ω. In other words: f = g µa. Fµ) we again denote the corresponding space of equivalence classes. 4. Let f. A ∈ F is a measure.
Let α : R → R be an increasing function which is supposed to be continuous on R and differentiable exept of ﬁnitely many points. If ν µ then ν = f µ for some f ∈ L+ (F). A density w. Finally. The measure ν is said to be absolutely continuous w. Let µF and νF be measures. 4. The ﬁrst example deals with LebesgueStieltjes measures. A ∈ F. we have to ask how µintegrals can be transformed into νintegrals. 4. Therefore a this point we present two most important special cases. SELECTED TOPICS dν is called the density or the RadonNikodym derivative of ν dµ Which measures have densities w. 4.r.t the Lebesgue measure is called a Lebesgue density. CHAPTER 4. (intermediate) Let ν = f µ. A ∈ F. [2].14 Problem. (intermediate) Let P and Q be probability measures of a ﬁnite ﬁeld F.r. Show that µ(A) = 0 implies ν(A) = 0. 2 We will meet measures with densities frequently when we explore stochastic analysis.9 Problem. (1) State Q P in terms of the generating partition of F. Proof: See Bauer. It is even sufﬁcient. Show that λα = α λ. (2) If Q P ﬁnd dQ/dP .11 Theorem.13 Problem.t the measure µF (ν µ) if µ(A) = 0 ⇒ ν(A) = 0. (RadonNikodym theorem) Assume that µ is σﬁnite.30 Then ν = f µ and f =: with respect to µ. We will show that λα = α λ.r. (easy) Let ν = f µ.12 Problem.t. Discuss the validity of f dν = f dν dµ dµ .10 Deﬁnition. 4. other measures ? 4. (intermediate) Let α : R → R be an increasing function which is supposed to be continuous on R and to not differentiable at at most ﬁnitely many points. 4. We saw that absolute continuity is necessary for having a density.
15 Problem. F. 31 4. Let (Ω.3 Iterated integration UNDER CONSTRUCTION . Explain the formulas P (X ∈ B) = B F (t) dt and E(g ◦ X) = g(t)F (t) dt 4.3.4. ITERATED INTEGRATION Hint: Prove it for f ∈ S + (F) and extend it by measure theoretic induction. P ) be a measure space and X a random variable with differentiable distribution function F .
SELECTED TOPICS .32 CHAPTER 4.
Part II Probability theory 33 .
.
1 Basics Let (Ω. the family of events X ∈ B.g.2 Problem. Observability of A means that after performing the random experiment it can be decided whether A is realized or not. Find σ(X).e. i. B ∈ B. P ) be a probability space. i. a random variable is simply an Fmeasurable function.e. The probability space serves as a model of a random experiment. The intuitive nature of the probability will become clear later. (intermediate) (a) Let X be an indicator random variable.1 Problem. (X ∈ B). Let X be a nonnegative or integrable random variable. This is a subσﬁeld of F and is denoted by σ(X). 5. In this sense the σﬁeld can be identiﬁed with the information which is obtained after having performed the random experiment. 35 . Then the integral of X is called expectation of X and is denoted by E(X) = X dP 5. (easy) Show that σ(X) is a σﬁeld.2 The information set of random variables The information set of a random variable X is the family of events which can be expressed in terms of X. B ∈ B) are observable. 5. are in F. The σﬁeld F is the ﬁeld of observable events. The probability measure P gives to each event A ∈ F a probability P (A). F. Therefore. A function X : Ω → R is a random variable if assertions about the function value (e.Chapter 5 Measure theoretic language of probability 5. It is called the σﬁeld generated by X.
This means 1A = 1B ◦ X. Recall that σ(Y ) = {∅. B ∈ F are independent if the product formula P (A ∩ B) = P (A)P (B) is true.7 Problem. Let X and Y be random variables and let C and D be generators of the corresponding information sets. Ac }. Then there exists a measurable function f such that Y = f ◦ X. Recall that two events A. The families C and D are said to be independent (with respect to P ) if P (A ∩ B) = P (A)P (B) for every choice A ∈ C and B ∈ D. It is natural to call random variables X and Y independent if the corresponding information sets are independent. In other words: If f is causal dependent of X then the information of Y is contained in the information set of X. From σ(Y ) ⊆ σ(X) it follows that A ∈ σ(X). Show that X and Y are independent iff F (x.e. How to check independence of random variables ? Is it sufﬁcient to check the independence of generators of the inforamtion sets ? This is not true. y) = h(x)k(y) for some functions h and k. (intermediate) Let F (x. It is a remarkable fact that even the converse is true. Let X and Y be random variables such that Y = f ◦ X where f is some Borelmeasurable function.3 Independence The notion of independence marks the point where probability theory goes beyond abstract measure theory. in general. Y ). This is intuitively very plausible: Any assertion about Y can be stated as an assertion about X. Since (Y ∈ B) = (X ∈ f −1 (B)) it follows that σ(Y ) ⊆ σ(X). 5. 5. Two random variables X and Y are independent if σ(X) and σ(Y ) are independent. Find σ(X). This is easily extended to families of events.6 Theorem. If C and D are independent and closed under intersection then X and Y are independent. A = (X ∈ B) for some B ∈ B. Let C and D be subfamilies of F.4 Deﬁnition.5 Deﬁnition. 5. A. . 5. (Causality theorem) Let X and Y be random variables such that σ(Y ) ⊆ σ(X). 5.36 CHAPTER 5. i. A ∈ F. Proof: By measure theoretic induction it is sufﬁcient to prove the assertion for Y = 1A . but with a minor modiﬁcation it is. MEASURE THEORETIC LANGUAGE OF PROBABILITY (b) Let X be a simple random variable. 2 5. y) be the joint distribution function of (X.3 Theorem. Ω.
Then E(Y X) = E(X)E(Y ) (2) Let X ∈ L1 and Y ∈ L1 be indpendent random variables.3. (1) Let X ≥ 0 and Y ≥ 0 be independent random variables.5. . INDEPENDENCE For independent random variables there is a product formula for expectations.8 Theorem. 37 5. Then XY ∈ L1 and E(Y X) = E(X)E(Y ) 2 Proof: Apply measure theoretic induction.
MEASURE THEORETIC LANGUAGE OF PROBABILITY .38 CHAPTER 5.
This program leads to the concept of conditional expectation. If the random variable X is not Ameasurable we could be interested in ﬁnding an optimal Ameasurable approximation of X in a sense to be speciﬁed. A In other words the approximating variable Y should satisfy X dP = A A Y dP for all A ∈ A For these integrals to be deﬁned we need nonnegative or integrable random variables. P ) be a probability space and let A ⊆ F be subσﬁeld. Let X be a nonnegative or integrable random variable. If a random variable X is Ameasurable then the information available in A tells us everything about X. Let (Ω. F. F. A successful way consists in decomposing X into a sum Y + R where Y is Ameasurable and R is uncorrelated to A.1 Deﬁnition. Let (Ω.Chapter 6 Conditional expectation 6. If we require that E(X) = E(Y ) then E(R) = 0. The conditional expectation E(XA) of X given A is an Ameasurable random variable Y satisfying X dP = A A Y dP for all A ∈ A 39 . The condition on R of being uncorrelated to A means R dP = 0 for all A ∈ A. 6. P ) be a probability space and let A ⊆ F be subσﬁeld.1 The concept Let us explore the relation between a random variable and a σﬁeld.
(b) If X ≥ 0 then E(XA) ≥ 0 P a. If X ≥ 0 then µ(A) := A X dP deﬁnes a σﬁnite measure on A such that µ P . Iterated conditioning UNDER CONSTRUCTION 6.e. Distinguish between the nonnegative and the integrable case.3 Problem.6 Problem. 6. Redundant conditioning UNDER CONSTRUCTION Inequalities 6.4 Problem. 6. Then E(αX + βY A) = αE(XA) + βE(Y A) whenever α. 6. 2 dP 6. . If X is Ameasurable then E(XA) = X.8 Theorem. The conditional expectation E(XA) exists if X is integrable or X ≥ 0. If X is integrable apply the preceding to X + and X − . Show that X ≤ Y implies E(XA) ≤ E(Y A).2 Properties Since the deﬁnition of the conditional expectation is linear in X and Y the following two assertions are almost obvious: (1) Assume that X and Y are nonnegative random variables. Find the conditional expectation given a ﬁnite ﬁeld.5 Problem.7 Problem. (c) If X is integrable then E(XA) is integrable.9 Theorem. E(E(XA)) = E(X) 6. (2) Assume that X and Y are integrable random variables. CONDITIONAL EXPECTATION 6. (a) The conditional expectation E(XA) is uniquely determined P a. 6.e.10 Problem.40 CHAPTER 6. Deﬁne E(XA) := dµ . Then E(αX + βY A) = αE(XA) + βE(Y A) whenever α. If X is independent of A then E(XA) = E(X). Proof: This is a consequence of the RadonNikodym theorem.2 Theorem. 6. too. β ≥ 0. β ∈ R.
3. CALCULATION 6.13 Problem. 6.6. Show that E(XA) ≤ E(XA) if X is integrable.14 Problem. 6.3 Calculation • given a random variable.15 Problem. Prove a Fatou’s lemma for conditional expectations. causality theorem • given a dominating product measure • given several random variables • the Gaussian case . Jensen’s inequality UNDER CONSTRUCTION Further topics: • L2 hereditary property • CSinequality 41 Projection properties UNDER CONSTRUCTION Convergence 6. Show that Xn → X implies E(Xn A) → E(XA). 6. Prove a Lebesgue’s dominated convergence theorem for conditional expectations.12 Theorem.11 Problem. L1 L1 6.
CONDITIONAL EXPECTATION .42 CHAPTER 6.
A stochastic model of such a gambling system consists of a probability space (Ω. the accumulated gains. i. In case of winning the gambler’s return is the double stake. Assume the the gambler starts at i = 0 with capital V0 = a. A gambler bets a stake of one unit at subsequent games. Sn = X1 + X2 + · · · + Xn is called a random walk on Z starting at zero. The sequence of partial sums. qc (a) := P (Tc < T0 V0 = a) is the probability of winning. The random variables are independent with values +1 and −1 representing the gambler’s gain or loss at time i ≥ 1. Let Tx := min{n : Vn = x} Then q0 (a) := P (T0 < Tc V0 = a) is called the probability of ruin. Then her wealth after n games is Vn = a + X1 + X2 + · · · + Xn = a + Sn The sequence (Vn )n≥0 of partial sums is called a random walk starting at a. The games are independent and p denotes the probability of winning. Thus. we have P (X = 1) = p. 43 . F. A major foundation problem of probability theory is the question whether there exists a stochastic model for a random walk. We assume that the gambler plans to continue gambling until her wealth is c > a or 0.e. We do not discuss such questions but refer to the literature. otherwise the stake is lost. If p = 1/2 then it is a symmetric random walk. P ) and a sequence of random variables (Xi )i≥1 .1 The ruin problem One player Let us start with a very simple gambling system. Similarly.Chapter 7 Stochastic sequences 7.
this is a basic example of a situation which is typical for stochastic analysis: Probabilities are not only obtained by combinatorial methods but also and often much more easily by a dynamic argument resulting in a difference or differential equation. 0 < a < c.2 Discussion. It is illuminating to understand the assertion with the help of an heuristic argument: If the random walk starts at V0 = a. 7. This gives P (Tc < T0 V0 = a) = pP (Tc < T0 V1 = a + 1) + (1 − p)P (Tc < T0 V1 = a − 1) However. In order to calculate the ruin probabilities we have to solve the difference equation. This proves the assertion. The ruin probabilities satisfy the difference equation qc (a) = p qc (a + 1) + (1 − p) qc (a − 1) whenever 0 < a < c with boundary conditions qc (0) = 0 and qc (c) = 1.1 Lemma. The starting point is the following assertion. In this argument we utilized the intuitively obvious fact that the starting time of the random walk does not affect its ruin probabilites. . Thus.44 CHAPTER 7. the random walk starting at time i = 1 has the same ruin probabilites as the random walk starting at i = 0. This gives 1−p a−1 p if p = 1/2 1−p c −1 qc (a) = p a if p = 1/2 c . . Solving the ruin equation The difference equation xa = pxa+1 + (1 − p)xa−1 whenever a = 1. . . then we have V1 = a + 1 with probability p and V1 = a − 1 with probability 1 − p. 7. c − 1 has the general solution A+B 1−p p a xa = if p = 1/2 if p = 1/2 A + Ba The constants A and B are determined by the boundary conditions. STOCHASTIC SEQUENCES How to evaluate the probability of ruin ? It will turn out that the probability can be obtained by studying the dynamic behaviour of the gambling situation.
THE RUIN PROBLEM 45 In order to calculate q0 (a) we note that q0 (a) = qc (c − a) where q denotes the ruin ¯ ¯ probabilities of a random walk with interchanged transitions probabilities. 7. Let (Sn ) be a random walk on Z.7. Then the game can only end by the ruin of player 2.e. i. Let us turn to the situation where player 1 has unlimited initial capital.3 Problem. This implies c−a p −1 1−p if p = 1/2 c p −1 q0 (a) = 1−p c−a if p = 1/2 c Easy calculations show that qc (a) + q0 (a) = 1 which means that gambling ends with probability 1. 7. The stake of each player is 1 at each game. Then 1 whenever p ≥ 1/2 b p P (sup Sn ≥ b) = whenever p < 1/2 n 1−p . This is obviously equivalent to the situation of the preceding section leading to P (player 1 wins) = qa+b (a) P (player 2 wins) = q0 (a). if sup Sn ≥ b n where Sn denotes the accumulated gain of player 1. The game ends when one player is ruined. (b) Show that the random walk hits the boundaries almost surely (with probability one).1.4 Theorem. We know that the game ends with probability one. (intermediate) (a) Fill in the details of solving the difference equation of the ruin problem. Two players Now we assume that two players with initial capitals a and b are playing against each other.
46
CHAPTER 7. STOCHASTIC SEQUENCES
7.5 Problem. (advanced) Prove 7.4. Hint: Show that P (supn Sn ≥ b) = lima→∞ qa+b (a). Note that P (supn Sn ≥ 1) is the probability that a gambler with unlimited initial capital gains 1 at some time. If p ≥ 1/2 this happens with probability 1 if we wait sufﬁciently long. Later we will see that in a fair game (p = 1/2) the expected waiting time is inﬁnite.
7.2 Stopping times
Optional stopping
Let us consider the question whether gambling chances can be improved by a gambling system. Let us start with a particularly simple gambling system, called optional stopping system. The idea is as follows: The gambler waits up to a random time σ and then starts gambling. (The game at period σ + 1 is the ﬁrst game to play.) Gambling is continued until a further random time τ ≥ σ and then stops. (The game at period τ is the last game to play.) Random times are random variables σ, τ : Ω → N0 ∪ {∞}. Now it is an important point that the choice of the starting time σ and the stopping time τ depend only on the information available up to those times since the gambler does not know the future.
Filtrations and stopping times
Let X1 , X2 , . . . , Xk , . . . be a sequence of random variables representing the outcomes of a game at times k = 1, 2, . . .. 7.6 Deﬁnition. The σﬁeld Fk := σ(X1 , X2 , . . . , Xk ). which is generated by the events (X1 ∈ B1 , X2 ∈ B2 , . . . , Xk ∈ Bk ), Bi ∈ B, is called the past of the sequence (Xi ) at time k. 7.7 Problem. (advanced) State and prove a causality theorem for σ(X1 , X2 , . . . , Xk )measurable ransom variables. Hint: Let C be the generating system of σ(X1 , X2 , . . . , Xk ) and let D be the family of random variables that are functions of (X1 , X2 , . . . , Xk ). Show that D is a σﬁeld and that C ⊆ D. This implies that any indicator of a set in σ(X1 , X2 , . . . , Xk ) is a function of (X1 , X2 , . . . , Xk ). Extend this result by measure theoretic induction. The past at time k is the information set of the beginning (X1 , X2 , . . . , Xk ) of the sequence (Xi ). The history of the game is the family of σﬁelds (Fk )k≥ 0 where F0 =
7.2. STOPPING TIMES
47
{∅, Ω}. The history is an increasing sequence of σﬁelds representing the increasing information in course of time. 7.8 Deﬁnition. Any increasing sequence of σﬁelds is called a ﬁltration. 7.9 Deﬁnition. A sequence (Xk ) of random variables is adapted to a ﬁltration (Fk )k≥0 if Xk is Fk measurable for every k ≥ 0. Clearly, every sequence of random variables is adapted to its own history. Now we are in the position to give a formal deﬁnition of a stopping time. 7.10 Deﬁnition. Let (Fk ) be a ﬁltration. A random variable τ : Ω → N0 ∪ {∞} is a stopping time (relative to the ﬁltration (Fk )) if (τ = k) ∈ Fk for every k ∈ N.
7.11 Problem. Let (Fk ) be a ﬁltration and let τ : Ω → N0 ∪ {∞} be a random variable. Show that the following assertions are equivalent: (a) (τ = k) ∈ Fk for every k ∈ N (b) (τ ≤ k) ∈ Fk for every k ∈ N (c) (τ < k) ∈ Fk−1 for every k ∈ N (d) (τ ≥ k) ∈ Fk−1 for every k ∈ N (e) (τ > k) ∈ Fk for every k ∈ N 7.12 Problem. Let (Xn )n≥0 be adapted. Show that the hitting time τ = min{k ≥ 0 : Xk ∈ B} is a stopping time for any B ∈ B. (Note that τ = ∞ if Xk ∈ B for all k ∈ N.) In view of the causality theorem the realisation of the events (τ = k) is determined by the values of the random variables X1 , X2 , . . . , Xk , i.e. 1(τ =k) = fk (X1 , X2 , . . . , Xk ) where fk are any functions.
Wald’s equation
If our gambler applies a stopping system (σ, τ ) with ﬁnite stopping times then her gain is Sτ − Sσ . 7.13 Problem. (easy) Let (Xk ) be a sequence adapted to (Fk ) and let τ be a ﬁnite stopping time. Then Xτ is a random variable.
48
CHAPTER 7. STOCHASTIC SEQUENCES
Does the stopping system improve the gambler’s chances ? We require some preparations. 7.14 Problem. (easy) Let Z be a random variable with values in N0 . Show that E(Z) =
∞ k=1
P (Z ≥ k).
7.15 Theorem. Wald’s equation Let (Xk ) be an independent sequence of integrable random varibales with a common expectation E(Xk ) = µ. If τ is an integrable stopping time then Sτ is integrable and E(Sτ ) = µE(τ ) Proof: We will show that the equation is true both for the positive parts and the negative parts of Xk . Let Xk ≥ 0. Then
∞ ∞
E(Sτ ) =
k=1 ∞ τ =k
Sk dP =
i=1 τ ≥i
Xi dP
=
i=1
E(Xi )P (τ ≥ i) = µE(τ ) 2
Note that all terms are ≥ 0 which allows interchanging sums iteration.
7.16 Problem. (advanced) Let τ := min(k ≥ 0 : Sk = −a or Sk = b). (a) Show that Wald’s equation is valid. Hint: Let τn := τ ∩ n. Show that τn satisﬁes Wald’s equation. Let n → ∞ to obtain integrability of τ . (b) Calculate E(τ ). 7.17 Problem. (intermediate) Let (Sk ) be a symmetric random walk and let τ := min(k ≥ 0 : Sk = 1). Show that E(τ ) = ∞. Hint: Assume E(τ ) < ∞ and derive a contradiction. The following theorem answers our question for improving chances by stopping strategies. It shows that unfavourable games cannot be turned into fair games and fair games cannot be turned into favourable games. The result is a consequence of Wald’s equation and it is the prototype of the fundamental optional stopping theorem of stochastic analysis. 7.18 Theorem. Optional stopping of random walks Let (Xk ) be an independent sequence of integrable random variables with a common expectation E(Xk ) = µ. Let σ ≤ τ be integrable stopping times. Then:
3. The stopping system is a special case where only 0 and 1 are admitted as stakes. In particular.20 Example. (b) µ = 0 ⇒ E(Sτ − Sσ ) = 0. the stakes are integrable. . if the total gambling time is unbounded (but ﬁnite !) then this is no longer true. 7. . However. 7. We are going to admit that the gambler may vary the stakes. Hint: Show that Hn = 1(σ.3 Gambling systems Next we generalize our gambling system. . if p = 1/2 we have E(Vn ) = 0 for all n ∈ N. n − 1. we have P (Vτ = 1) = 1 (7) . τ = min{k ≥ 1 : Xk = 1} and deﬁne Hn := 2n−1 1τ ≥n Obviously.7.19 Problem. Determine the stakes Hn corresponding to a stopping system and check predictability. 49 7. The sequence of stakes (Hn ) is thus not only adapted but even predictable.e. (c) µ > 0 ⇒ E(Sτ − Sσ ) > 0. 2. It is ﬁxed before period n and therefore must be Fn−1 measurable since it is determined by the outcomes at times k = 1. The stake for game n is denoted by Hn and has to be nonnegative. i. For the wealth of the gambler after n games we obtain n n Vn = V0 + k=1 Hk (Sk − Sk−1 ) = V0 + k=1 Hk ∆Sk (6) If the stakes are integrable then we have E(Vn ) = E(Vn−1 ) + E(Hn )E(Xn ). GAMBLING SYSTEMS (a) µ < 0 ⇒ E(Sτ − Sσ ) < 0.τ ] (n). The gain at game k is Hk Xk = Hk (Sk − Sk−1 ) = Hk ∆Sk . . Doubling strategy Let τ be the waiting time to the ﬁrst success. However.
7. 1). . Since N Vτ − Vσ = k=1 Hk Xk 1σ<k≤τ and since (σ < k ≤ τ ) is independent of Xk it follows that N E(Vτ − Vσ ) = µ k=1 E(Hk 1σ<k≤τ ).22 Theorem. The assertion of the optional stopping theorem remains valid for gambling systems if the stopping times are bounded. 7. Hint: Show that the sequence (Vτ ∩n ) is not dominated from below by an integrable random variable. STOCHASTIC SEQUENCES for any p ∈ (0.23 Problem. actually E(τ ) = 1/p ! 7. (c) µ > 0 ⇒ E(Vτ − Vσ ) > 0. why Lebesgue’s theorem on dominated convergence does not imply E(Vτ ∩n ) → E(Vτ ).21 Problem. And this is true although the stopping time τ is integrable. Let (Vn ) be the sequence of wealths generated by a gambling system with integrable stakes. 7. (easy) Let p = 1/2. (b) µ = 0 ⇒ E(Vτ − Vσ ) = 0. Prove (7). Optional stopping for gambling systems Let (Xk ) be an independent sequence of integrable random variables with a common expectation E(Xk ) = µ. Show that for the doubling strategy we have E(Vτ ∩n ) = 0. a fair game can be transformed into a favourable game by such a strategy.4 Martingales Let (Xn )n≥0 be a sequence of integrable random variables adapted to a ﬁltration (Fn )n≥0 with F0 = {∅.24 Problem. Proof: Let N := max τ . 2 7. This might be the reason for the following mathematical terminology. If σ ≤ τ are bounded stopping times then (a) µ < 0 ⇒ E(Vτ − Vσ ) < 0. Therefore. (advanced) Explain for the doubling strategy.50 CHAPTER 7. although Vτ ∩n → Vτ . In gambler’s speech gambling systems are called martingales. Ω}.
4. We start with a fundamental identity. Fill in the details of the proof of 7. j ≥ 1.27 Problem.7. This property can be rewritten in terms of conditional expectations.26 Lemma. 2 Then τ is a stopping time. .28 Theorem. 7. Equation (8) is the common deﬁnition of a martingale.26. From E(Xj ) = E(Xτ ) the ”only if”part follows. MARTINGALES 51 7. (2) The sequence (Yk ) is called a submartingale if E(Yσ ) ≤ E(Yτ ) for all bounded stopping times σ ≤ τ . (1) The sequence (Yk ) is called a martingale if E(Yσ ) = E(Yτ ) for all bounded stopping times σ ≤ τ . (8) Proof: The ”if”part of the assertion is clear from 7. We deﬁned martingales by the property of τ → E(Xτ ) being constant for bounded stopping times τ . (3) The sequence (Yk ) is called a supermartingale if E(Yσ ) ≥ E(Yτ ) for all bounded stopping times σ ≤ τ . The sequence (Xn )n≥0 is a martingale iff E(Xj Fj−1 ) = Xj−1 . 7. Let F ∈ Fj−1 and deﬁne τ := j−1 j whenever ω ∈ F. whenever ω ∈ F. If σ ≤ τ are bounded stopping times then for any A ∈ F n (Xτ − Xσ )dP = A j=1 A∩(σ<j≤τ ) (E(Xj Fj−1 ) − Xj−1 )dP Proof: Let τ ≤ n. Let (Fk ) be a ﬁltration and let (Yk ) be an adapted sequence of integrable random variables. 7. It is obvious that n Xτ = X0 + j≤τ (Xj − Xj−1 ) = X0 + j=1 1τ ≥j (Xj − Xj−1 ) This gives n (Xτ − X0 )dP = A j=1 A∩(τ ≥j) (Xj − Xj−1 )dP 2 We may replace Xj by E(Xj Fj−1 ).25 Deﬁnition.26.
Each adapted sequence (Xn )n≥0 of integrable random variables can be written as Xn = Mn + An . n ≥ 0. j ≥ 0} is a σﬁeld (the past of the stopping time τ ).28 to submartingales and supermartingales. it is a supermartingale iff (An ) is decreasing.30 Theorem. The decomposition is unique up to constants. STOCHASTIC SEQUENCES 7. Then for any pair σ ≤ τ of bounded random variables E(Xτ Fσ ) = Xσ Proof: Applying 7. and it is a martingale iff (An ) is constant. An is Fn−1 measurable for every n ≥ 0. Optional stopping for martingales Let (Xn )n≥0 be a martingale. Uniqueness follows from the fact that a predictable martingale is constant. Let τ be a stopping time. 7. where (Mn ) is a martingale and (An ) is a predictable sequence. Show that Fτ := {A ∈ F : F ∩ (τ ≤ j) ∈ Fj .29 Problem.31 Problem.26 to A ∈ Fσ proves the assertion. 7.52 CHAPTER 7. We conclude this section by the elementary version of the celebrated DoobMeyer decomposition.e. Proof: Let n Mn = j=1 (Xj − E(Xj Fj−1 )) n and An = (E(Xj Fj−1 ) − Xj−1 ) j−1 This proves existence of the decomposition. 2 7. The rest is obvious. 7.33 Theorem. i. Extend 7. 2 7. Show that a predictable martingale is constant. Equation (8) extends easily to stopping times after having deﬁned the past of a stopping time.5 Convergence UNDER CONSTRUCTION .32 Problem. The sequence (Xn )n≥0 is a submartingale iff (An ) is increasing.
A stochastic process (random process) on a probability space (Ω. the intuitive notion of a stochastic process is that of random system whose state at time t is Xt . the increments Xt − Xs for s < t. (2) the increments Wt − Ws are N (0. Therefore. 8. There are some notions related to a stochastic process (Xt )t≥0 which are important from the very beginning: the starting value X0 . section 2. 8.2. The Wiener process is deﬁned in terms of these notions.1 Basic concepts In this section we introduce the Wiener process or Brownian Motion process. The mathematical construction of such models is a complicated matter and is one of great achievements of probability theory in the ﬁrst half of the 20th century. P ) and a family of random variables (Wt ) satisfying the properties of Deﬁnition 8. (3) the paths are continuous for P almost all ω ∈ Ω.Chapter 8 The Wiener process 8. A stochastic process (Wt )t≥0 is called a Wiener process if (1) the starting value is W0 = 0.1.1 Deﬁnition. F.) 8. (Further reading: KaratzasShreve [15].2 Remark.3 Discussion. ω ∈ Ω. Accepting the existence of the Wiener process as a valid mathematical model we may forget the details of the construction (there are several of them) and start with the axioms stated in 8. and the paths t → Xt (ω).1. P ) is a family (Xt )t≥0 of random variables. The parameter t is usually interpreted as time. As it is the case with every probability model one has to ask whether there exist a probability space (Ω. t − s)distributed and mutually independent for nonoverlapping intervals. Wiener process as random walk Later we will show: 53 . F.
This means that there are three structural properties which are essential for the concepts of a Wiener process: (1) The process has independent increments. .e. too.4 Problem. . Ws2 − Ws1 . THE WIENER PROCESS Any process (Xt )t≥0 with continuous paths and independent increments such that E(Xt − Xs ) = 0 and V(Xt − Xs ) = t − s is necessarily a Wiener process. (intermediate) Show that Wt /t → 0 as t → ∞. . Ws2 . . Wt − Ws are independent. Show that Xt := −Wt . Ω}. X2 . Let us motivate these properties at hand of speciﬁc discrete time models. 8. The internal history of (Xt )t≥0 is the family (FtX )t≥0 of pasts of the process. we have E(Xi ) = 0 ⇒ E(Sn − Sm ) = 0 and V(Xi ) = 1 ⇒ V(Sn − Sm ) = n − m Thus. + Xn whenever n = 1. It represents the information about the process available at time t. . . P The intuitive idea behind the concept of past is the following: FtX consists of all events which are observable if one observes the process up to time t. 2 . .54 CHAPTER 8.5 Problem. be independent and such that P (Xi = 1) = P (Xi = −1) = 1/2. If X0 is a constant then F0 = {∅. . (2) The expectation of the increments is zero. Wsn are independent of Wt − Ws . It is easy to see that the increments Sn − Sm are independent. 2 Independence of increments is actually a much stronger property than it sounds. 8. The increments Wt − Ws of a Wiener process (Wt )t≥0 are indepenW dent of the past Fs . . the Wiener process can be interpreted as a continuous time version of a symmetric random walk. is called a symmetric random walk. . . 8.6 Deﬁnition. t ≥ 0. . (easy) Let (Wt )t≥0 be a Wiener process. (3) The variance of the increments is proportional to the length of the time interval. Let X1 .7 Theorem. Wsn − Wsn−1 . ≤ sn ≤ s < t. Then the random variables Ws1 . 2. . FtX 8. It it obvious that t1 < t2 ⇒ FtX ⊆ 1 X FtX . s ≤ t. Then Sn = X1 + X2 + . Since this is valid for any choice of time points si ≤ s the independence assertion carries over to the whole past FtW . . . . i. The past of a process (Xt )t≥0 at time t is the σﬁeld of events = σ(Xs : s ≤ t) generated by variables Xs of the process prior to t. Moreover. Proof: Let s1 ≤ s2 ≤ . is a Wiener process. . It follows that even the random variables Ws1 . .
. .2 Quadratic variation For beginners the most surprising properties are the path properties of a Wiener process.8. the paths of a Wiener process are of a very peculiar nature. We will now show that (almost) all paths of a Wiener process have nonvanishing quadratic variation. . Hint: Let Qn := n W (tn ) − W (tn )2 for a particular Riemannian sequence i i−1 i=1 of subdivisions.1. 8. (This remark is based on section 10. (Further reading: KaratzasShreve [15]. < tn = t 0 1 n n W (tn ) − W (tn )2 → t.9 Problem. (easy for mathematicians) Show that the quadratic variation of a continuous BVfunction is zero on every compact interval. < tn = t and 0 1 n every ﬁxed t > 0. Actually. 8. Show that E(Qn ) = t and V (Qn ) → 0. Recall that all smooth (continuously differentiable) functions are of bounded variation on compacts. Let (Wt )t≥0 be a Wiener process.) Recall that for a function f : [0. .11 Problem. The paths of a Wiener process are continuous (which is part of our deﬁnition).10 Theorem. For every t > 0 and every Riemannian sequence of subdivisions 0 = tn < tn < .2. ∞) → R with bounded variation on compacts we have n lim i=1 f (tn ) − f (tn ) = V0t (f ) < ∞ i−1 i for each Riemannian sequence of subdivisions 0 = tn < tn < . (intermediate) Prove 8. Then the assertion follows from Cebysev’s inequality. In this respect the paths seem to be not complicated since they have no jumps or other singularities. QUADRATIC VARIATION 55 8.8 Remark. deﬁned as n lim i=1 f (tn ) − f (tn )2 i i−1 by Riemannian sequences. 8. however. section 2.10. i i−1 i=1 P t > 0. It will turn out. that in spite of their continuity. it can be shown that they are nowhere differentiable.9.) 8. For such functions it follows that their quadratic variation. . This implies that the paths cannot be smooth. is necessarily zero for every t > 0.
t ≥ 0.12 Deﬁnition. . Let us consider some important martingales related to the Wiener process. THE WIENER PROCESS The assertion of 8.3 Martingales We start with some general deﬁnitions. The internal history (FtX )t≥0 of a process (Xt )t≥0 is a ﬁltration and the process (Xt )t≥0 is adapted to its internal history. But also Yt := φ(Xt ) for any measurable function φ is adapted to the internal history of (Xt )t≥0 .15 Theorem. Any increasing family of σﬁelds (Ft )t≥0 is called a ﬁltration.10 can be improved to P almost sure convergence which implies that the quadratic variation on [0.13 Deﬁnition. provided that the process is still adapted. t] of almost all paths is actually t. A martingale relative to the ﬁltration (Ft )t≥0 is an adapted and integrable stochastic process (Xt )t≥0 such that E(Xt Fs ) = Xs whenever s < t It is a square integrable martingale if E(Xt2 ) < ∞. A process (Yt )t≥0 is adapted to the ﬁltration (Ft )t≥0 if Yt is Ft measurable for every t ≥ 0. Adaption simply means that the past of the process (Yt )t≥0 at time t is contained in Ft . Proof: Since Wt − Ws is independent of Fs it follows that E(Wt − Ws Fs ) = E(Wt − Ws ) = 0. 8. 8.14 Problem. But the next theorem is a ﬁrst special case of a very general fact: It is sometimes possible to correct a process by a bounded variation process in such a way that the result is a martingale. Let (Ft )t≥0 be the internal history of the Wiener process (Wt )t≥0 . Having the information contained in Ft we know everything about the process up to time t. (easy) Show that the martingale property remains valid if the ﬁltration is replaced by another ﬁltration consisting of smaller σﬁelds.56 CHAPTER 8. 2 A nonlinear function of a martingale typically is not a martingale. A Wiener process is a square integrable martingale with respect to its internal history. Hence E(Wt Fs ) = E(Ws Fs ) = Ws . 8. 8. It is remarkable that the quadratic variation of the Wiener process is a deterministic function of a very simple nature. 8.
MARTINGALES 57 8. 8.17 Problem. The process Wt2 − t is a square integrable martingale with respect to the internal history of the driving Wiener process (Wt )t≥0 . Proof: Note that Wt2 − Ws2 = (Wt − Ws )2 + 2Ws (Wt − Ws ) This gives E(Wt2 − Ws2 Fs ) = E((Wt − Ws )2 Fs ) + 2E(Ws (Wt − Ws )Fs ) = t − s 2 The assertion of 8.16 are valid for (Xt )t≥0 . Show that assertions 8.18 Theorem. (easy) Let (Xt )t≥0 be any process with independent increments such that E(Xt ) = 0 and E(Xt2 ) = t. 8. F.3. dPt dP dQt dQt is a positive martingale such that > 0 P a. Proof: Use eaWt = ea(Wt −Ws ) eaWs to obtain E(eaWt Fs ) = E(ea(Wt −Ws ) )eaWs = ea 2 (t−s)/2 eaWs 2 The process E(W )t := exp(Wt − t/2) is called the exponential martingale of (Wt )t≥0 .16 Theorem. . P ) be a probability space and let (Ft )t≥0 be a ﬁltration. UNDER CONSTRUCTION: MAXIMAL INEQUALITY 8. The process exp(aWt − a2 t/2) is a martingale with respect to the internal history of the driving Wiener process (Wt )t≥0 . Such a decomposition is a mathematical form of the idea that a process Xt is the sum of a (rapidely varying) noise component and a (slowly varying) trend component. Denote Pt := P Ft and Qt := QFt .15 and 8.8. dQt dQ (b) Show that =E Ft for every t ≥ 0. (intermediate) Let (Ω.19 Problem.e. (a) Show that Pt ∼ Qt for every t ≥ 0. Let Q ∼ P be an equivalent probability measure.16 can be written in the following way: For Xt = Wt2 there is a decomposition Xt = Mt + At where (Mt )t≥0 is a martingale and (At )t≥0 is a bounded variation process. (c) Show that the process dPt dPt for every t ≥ 0.
8. THE WIENER PROCESS (d) Prove the socalled „Bayesformula”: EQ (XFt ) = EP (X dQ Ft ) dP EP ( dQ Ft ) dP whenever X ≥ 0 or X ∈ L1 (Q). we have (τ ≤ t) ∈ Ft for all t ≥ 0. (intermediate for mathematicians) Show that every bounded stopping time τ is limit of a decreasing sequence of bounded stopping times each of which has only ﬁnitely many values. Deﬁne dQt 2 := eaWt −a t/2 .20 Problem. P ) and (Ft )t≥0 its internal history. A random variable τ : Ω → [0. 8. (c) Show that Wt := Wt − at is a Wiener process under Q. ∞] is called a stopping time if (τ ≤ t) ∈ Ft for all t ≥ 0.22 Deﬁnition. t ≥ 0.58 CHAPTER 8. dPt (a) Show that there is a uniquely determined probability measure QF∞ such that QFt = Qt .21 Problem.23 Problem. 8. 8. Hint: Prove that for s < t EQ eλ(Wt −Ws ) Fs = eλ 2 (t−s)/2 . F. (easy for mathematicians) Prove (9). .4 Stopping times Let (Xt )t≥0 be a right continuous adapted process such that X0 = 0 and for some a > 0 let τ = inf{t ≥ 0 : Xt ≥ a} The random variable τ is called a ﬁrst passage time: It is the time when the process hits the level a for the ﬁrst time. (intermediate) Let (Wt )t≥0 be a Wiener process on a probability space (Ω. (b) Show that QF∞ is equivalent to P F∞ . 8. By right continuity of the paths we have τ ≤ t ⇔ max Xs ≥ a s≤t (9) Thus. t ≥ 0.
Hint: For s < t and F ∈ Fs deﬁne τ := s t whenever ω ∈ F. Show that (a) implies (b): (a) E(Xσ ) = E(X0 ) for all bounded stopping times σ. . . Let (Mt )t≥0 be a martingale. STOPPING TIMES 59 Hint: Let T = max τ . λ≥0 Proof: Applying the optional stopping theorem to the exponential martingale of the Wiener process we get 2 E(eθWτ −θ τ /2 ) = 1 for every θ ∈ R and every bounded stopping time τ . Then τ is a stopping time. Then it follows from the discrete version of the optional stopping theorem that E(Mτn ) = E(M0 ). First passage times of the Wiener process As an application of the optional stopping theorem we derive the distribution of ﬁrst passage times of the Wiener process.b −θ P 2τ a.8.b ∩ n for every n ∈ N.4.5) eθWτn −θ 2 τ /2 n → eθWτa. Therefore this equation is true for τn := τa. (Optional stopping theorem) Let (Mt )t≥0 be a martingale with rightcontinuous paths and let τ be a bounded stopping time. Then E(Mτ ) = E(M0 ).26 Theorem. 8. .b 1(τa.25 Problem. k = 0. Then we have E(Mt ) = E(M0 ) for every t ≥ 0. The following is the simplest form of the famous optional stopping theorem. Clearly we have Mτn → Mτ . This can be extended to stopping times. (b) (Xt )t≥0 is a martingale. .24 Theorem. We note that (use 8. 2 8.b /2 1(τa. From E(Xt ) = E(Xτ ) the assertion follows.b <∞) .b <∞) ) = e−a(b+ √ b2 +2λ) . (intermediate) Let (Xt )t≥0 be an integrable process adapted to a ﬁltration (Ft )t≥0 . T 2n . 8. Let τn ↓ τ where τn are bounded stopping times with ﬁnitely many values. Since E(Mτn ) = E(MT Fτn ) the sequence (Mτn ) is uniformly integrable and the assertion follows. Deﬁne τn = k/2n whenever (k − 1)/2n < τ ≤ k/2n . whenever ω ∈ F. Proof: Assume that τ ≤ T .b := inf{t : Wt ≥ a + bt} Then we have E(e−λτa. Let (Wt )t≥0 be a Wiener process and for a > 0 and b ∈ R deﬁne τa.
mint Wt = −∞) = 1.30 Problem.29 Problem.b (t) denote the right hand side of the asserted equation.b ≤ t) and let Fa. t≥0 . (easy) In the following problems treat the cases b > 0. 8. 8.b −θ τa. From 8. Then P (τa.26. (a) Find P (τa.26 says that ∞ e−λt dG(t) = e−a(b+ 0 √ b2 +2λ) .b be deﬁned as in 8.b <∞) ) = e−ab 2 Putting λ := −θb + θ2 /2 proves the assertion.27 Problem.b /2 1(τa. (easy) (a) Does the assertion of the optional sampling theorem hold for the martingale (Wt )t≥0 and τa.26.b ? (b) Does the assertion of the optional sampling theorem hold for the martingale Wt2 − t and τa. (Consider E(e−λτan .b <∞) ) = 1 The rest are easy computations.b ) for an ↓ 0.b ≤ t) = 1 − Φ a + bt −a + bt √ √ + e−2ab Φ . (intermediate) (a) Show that P (τ0. (b) Find E(τa. 8. (c) Conclude from (a) that almost all paths of (Wt )t≥0 inﬁnitely often cross every horizontal line. Let τa.60 CHAPTER 8. t≥0 t t Proof: Let G(t) := P (τa.b 1(τa. Since Wτa.b ? 8. THE WIENER PROCESS Applying the dominated convergence theorem it follows (at least for sufﬁciently large θ) that 2 E(eθWτa.b we get E(e(θb−θ 2 /2)τ a.) Give a verbal interpretation of this result. b = 0 and b < 0 separately. t ≥ 0.b (t) = G(t).b < ∞). (b) Show that P (maxt Wt = ∞. For this we will apply the uniqueness of the Laplace transform.26 we obtain the distribution of the ﬁrst passage times.b = 0) = 1 for every b > 0. (advanced) Fill in the details of the proof of 8.31 Corollary.28 Problem. 8. We want to show that Fa. Note that 8.b ).b = a + bτa.
36 Problem.b (t) = √ 2π t 0 a s3/2 a2 b 2 s − − ab ds exp − 2s 2 (This is done by calculating the derivatives of both sides. d)} 8. First. d > 0 and deﬁne σc.35 Problem. (easy) Find the distribution of Wσc. d} for t ≤ σc.d ). 2 Hint: Note that E(Wσc. (easy) Find the distribution of maxs≤t Ws . . t≥0 This is done by the following simple calculations.32 Problem. Let c. d c Solution: P (Wσc. 8.8. (easy) Find E(σc.31.b (s) = ea √ b2 +2λ Fa. (easy) (a) Show that σc. 8.d ) = E(σc. (requires calculation skills) Fill in the details of the proof of 8.d . For σc.) Then it follows that t eab 0 e−λs dFa. The following problems are concerned with ﬁrst passage times for two horizontal boundaries. 8.d < ∞) = 1.b (t) = e−a(b+ 0 √ b2 +2λ) .d ) (why ?). STOPPING TIMES Therefore.d has only two different values.d .34 Problem.d = d) = c+d 8.d = −c) = c+d .4. (b) Show that P (σc.d the application of the optional sampling theorem is straighforward since Wt  ≤ max{c.33 Problem.d = inf{t : Wt ∈ (−c. Hint: Note that E(Wσc.√b2 +2λ (t) 2 Putting t = ∞ the assertion follows.d ) = 0 (why ?) and remember that Wσc.d is a stopping time. we have to show that ∞ 61 e−λt dFa. P (Wσc.d ) = cd. it is shown that 1 Fa. Solution: E(σc.
t ≥ 0.d ) = 1 + e−(c+d) 2λ Expanding this into an inﬁnite geometric series and applying ∞ e−λt dFa. Obtaining probabilistic information requires much more analytical efforts. It is easy to obtain the Laplace transforms. Let us give an intuitive interpretation of these facts.0 (t) = e−a 0 √ 2λ . What about the future ? How will the process behave for t > s ? The future variation of the process after time s is give by (Xt )t≥0 .d −θ 2σ c.d /2 dP = Wσd.c =−d e−θ 2σ d. Then we know the past Fs and the value Ws at time s. Assume that we observe the Wiener process up to time s. it is easy to see that (Xt )t≥0 is a Wiener process. From the remarks above it follows that the future variation is that of a Wiener process which is independent of the past.d /2 dP = Wσd.d =−c e−θ 2σ c.c /2 dP From 1 = E eθWσc.d /2 and 1 = E eθWσd.c /2 we obtain a system of equations for A and B leading to A= This implies eθd − e−θd eθ(c+d) − e−θ(c+d) and √ B= eθc − e−θc eθ(c+d) − e−θ(c+d) √ e−c 2λ + e−d 2λ √ E(e−λσc. Since the Wiener process has independent increments the process (Xt )t≥0 is independent of Fs . Moreover.d =d e−θ 2σ c.d . For reasons of symmetry we have A := Wσc. Distribution of σc. (Further reading: KaratzasShreve [15].37 Discussion.c /2 dP and B := Wσc. THE WIENER PROCESS 8.d is a more complicated story. section 2. Let s > 0 and consider the process Xt := Ws+t −Ws . The common formulation of this fact is: At every time s > 0 the Wiener process starts afresh.62 CHAPTER 8. .c =c e−θ 2σ d.) The reﬂection principle Let (Wt )t≥0 be a Wiener process and let (Ft )t≥0 be its internal history.8.d The distribution of the stopping time σc.c −θ 2σ d. t≥0 we could obtain an inﬁnite series expansion of the distribution of σc.
There is a simple consequence of the property of starting afresh at every time s.38 Problem. t ≥ 0 is a Wiener process for every s ≥ 0. Equality of the ﬁnite dimensional marginal distributions is shown in a similar manner. . Wt2 )) = E(f (Wt1 . Note that Wt whenever t ≤ s Wt = Ws + (Wt − Ws ) whenever t > s Deﬁne the corresponding process reﬂected at time s by Wt = Wt Ws − (Wt − Ws ) whenever t ≤ s whenever t > s Then it is clear that (Wt )t≥0 and (Wt )t≥0 have the same distribution. 8. We have to show that for any bounded continuous function f we have E(f (Wt )) = E(f (Wt )). STOPPING TIMES 63 8. The common approximation argument then proves the assertion.40 Problem. Wt2 )) for t1 < t2 and bounded continuous f . This assertion looks rather harmless and selfevident. However.39 Theorem.4. (advanced) To get an idea of how the full proof of the reﬂection principle works show E(f (Wt1 . For obvious reasons we need only show f (Wt ) dP = τ <t τ <t f (Wt ) dP which is equivalent to f (Wτ + (Wt − Wτ )) dP = τ <t τ <t f (Wτ − (Wt − Wτ )) dP The last equation is obviously true for stopping times with ﬁnitely many values.8. it becomes a powerful tool when it is extended to stopping times. Proof: Let us show that the single random variables Wt and Wt have equal distributions. t1 ≤ τ < t2 and t2 ≤ τ . 2 8. Hint: Distinguish between τ < t1 . (Reﬂection principle) Let τ be any stopping time and deﬁne Wt = Wt Wτ − (Wt − Wτ ) whenever t ≤ τ whenever t > τ Then the distributions of (Wt )t≥0 and (Wt )t≥0 are equal. (easy) Show that the process Xt := Ws+t −Ws .
Then ˜ P (Mt ≥ y. (intermediate) Find P (Wt < z. 8. t > 0. y > 0.64 CHAPTER 8.44 Deﬁnition. 8.42 Problem. Proof: (Outline) It is easy to see that W E(ea(Wt −Ws ) Fs+ ) = e(t−s)/2 .41 to ﬁnd the distribution of Mt . Let Ft+ := s>t Fs and deﬁne G) = 0 for some G ∈ Ft+ } F t := {F ∈ F∞ : P (F Then (F t )t≥0 is the augmented ﬁltration. It is convenient to increase the σﬁelds of the internal history in a way that does not destroy the basic properties of the underlying process. s ≥ 0. 8. This procedure is called augmentation. Mt < y) when z < y. Wt < y − x) = P (τ ≤ t. 8.5 Augmentation For technical reasons which will become clear later the internal history of the Wiener process is slightly too small. Let Mt := maxs≤t Ws .43 Problem. Let (Wt )t≥0 be a Wiener process. Wt > y + x) = P (Wt > y + x) 2 8. Wt < y − x) = P (Wt > y + x). x ≥ 0 Proof: Let τ := inf{t : Wt ≥ y} and τ := inf{t : Wt ≥ y}. Wt < y − x) = P (˜ ≤ t.45 Problem. THE WIENER PROCESS The reﬂection principle offers an easy way for obtaining information on ﬁrst passage times. y > 0. (easy) Use 8. Then the increments Wt − Ws W are independent of F s .41 Theorem. 8.46 Corollary. (intermediate) Show that the augmented ﬁltration is really a ﬁltration. 8. Then P (Mt ≥ y. Wt < y − x) τ = P (τ ≤ t.
8.5. AUGMENTATION This implies E(1F ea(Wt −Ws ) ) = P (F )E(ea(Wt −Ws ) )
65
W for every F ∈ Fs+ . From the totality of exponentials property (see the proof of 13.11) W it follows that Wt − Ws is independent of Fs+ . It is clear that this carries over to F s . 2
8.47 Problem. (intermediate for mathematicians) Fill in the details of the proof of 8.46, 8.48 Theorem. Let (Ft )t≥0 be a ﬁltration. Then the augmented ﬁltration is rightcontinuous, i.e. Ft = Fs
s>t
Proof: It is clear that ⊆ holds. In order to prove ⊇ let F ∈ s>t F s . We have to show that F ∈ F t . For every n ∈ N there is Gn ∈ Ft+1/n such that P (F Gn ) = 0. Deﬁne
∞ ∞ ∞ ∞
G :=
m=1 n=m
Gn =
m=K n=m
Gn ∈ Ft+1/K for all K ∈ N. 2
Then G ∈ Ft+ and P (G F ) = 0.
One says that a ﬁltration satisﬁes the ”usual conditions” if it is rightcontinuous and contains all negligible sets of F∞ . The internal history of the Wiener process does not satisfy the usual conditions. However, every augmented ﬁltration satisﬁes the usual conditions. Thus, 8.46 and 8.48 show that every Wiener process has independent increments with respect to a ﬁltration that satisﬁes the usual conditions. When we are dealing with a Wiener process we may suppose that the underlying ﬁltration satisﬁes the usual conditions. 8.49 Problem. (easy) Show that the assertions of 8.15, 8.16 and 8.18 are valid for the augmented internal history of the Wiener process. Let us illustrate the convenience of ﬁltrations satisfying the usual conditions by a further result. For some results on stochastic integrals it will be an important point that martingales are cadlag. A general martingale need not be cadlag. We will show that a martingale has a cadlag modiﬁcation if the ﬁltration satisﬁes the usual conditions. 8.50 Theorem. Let (Xt )t≥0 be a martingale w.r.t. a ﬁltration satisfying the usual conditions. Then there is a cadlag modiﬁcation of (Xt )t≥0 . Proof: (Outline. Further reading: KaratzasShreve, [15], Chapter 1, Theorem 3.13.)
66
CHAPTER 8. THE WIENER PROCESS
We begin with path properties which are readily at hand: There is a set A ∈ F∞ , satisfying P (A) = 1 and such that the restricted process (Xt )t∈Q has paths with right and left limits for every ω ∈ A. This is a consequence of the upcrossings inequality by Doob. See KaratzasShreve, [15], Chapter 1, Proposition 3.14, (i). It is now our goal to modify the martingale in such a way that it becomes cadlag. The idea is to deﬁne Xt+ := lim Xs , t ≥ 0.
s↓t,s∈Q
on A and Xt+ := 0 elsewhere. It is easy to see that the paths of (Xt+ )t≥0 are cadlag. Since (Ft )t≥0 satisﬁes the usual conditions it follows that (Xt+ )t≥0 is adapted. We have to show that (Xt+ )t≥0 is a modiﬁcation of (Xt )t≥0 , i.e. Xt = Xt+ P a.e. for all t ≥ 0. Let sn ↓ t, (sn ) ⊆ Q. Then Xsn = E(Xs1 Fsn ) is uniformly integrable which implies Xsn → Xt+ . From Xt = E(Xsn Ft ) we obtain Xt = E(Xt+ Ft ) = Xt+ P a.e. 2
L1
8.6 More on stopping times
The interplay between stopping times and adapted processes is at the core of stochastic analysis. In this section we try to provide a lot of information for reasons of later reference. We will state most of the assertions as exercises with hints if necessary. Throughout the section we assume tacitely that the ﬁltration satisﬁes the usual conditions. Further reading: KaratzasShreve, [15], Chapter 1, section 1.2. Let τ be a stopping time. The intuitive meaning of (τ ≤ t) ∈ Ft is as follows: At every time t it can be decided whether τ ≤ t or not. 8.51 Problem. (intermediate) Show that τ is a stopping time iff (τ < t) ∈ Ft for every t ≥ 0. 8.52 Problem. (easy) Let σ, τ and τn be stopping times. (a) Then σ ∩ τ , σ ∪ τ and σ + τ are stopping times. (b) τ + α for α ≥ 0 and λτ for λ ≥ 1 are stopping times. (c) supn τn , inf n τn are stopping times. Let (Xt )t≥0 be a process adapted to a ﬁltration (Ft )t≥0 and let A ⊆ R. Deﬁne τA = inf{t : Xt ∈ A} Then τA is called the hitting time of the set A. 8.53 Remark. The question, for which sets A a hitting time τA is a stopping time, is completely solved. The solution is as follows.
8.6. MORE ON STOPPING TIMES
67
We may assume that P F∞ is complete, i.e. that all subsets of negligible sets are added to F∞ . The whole theory developed so far is not affected by such a completion. We could assume from the beginning that our probability space is complete. The reason why we did not mention this issue is simple: We did not need completeness so far. However, the most general solution of the hitting time problem needs completeness. The following is true: If P F∞ is complete and if the ﬁltration satisﬁes the usual conditions then the hitting time of every Borel set is a stopping time. For further comments see JacodShiryaev, [14], Chapter I, 1.27 ff. For particular cases the stopping time property of hitting times is easy to prove. 8.54 Theorem. Assume that (Xt )t≥0 has rightcontinuous paths and is adapted to a ﬁltration which satisﬁes the usual conditions. (a) Then τA is a stopping time for every open set A. (b) If (Xt )t≥0 has continuous paths then τA is a stopping time for every closed set A. Proof: (a) Note that τ < t ⇔ Xs ∈ A for some s < t Since A is open and (Xt )t≥0 has rightcontinuous paths it follows that τ < t ⇔ Xs ∈ A for some s < t, s ∈ Q ¯ (b) Let A be closed and let (An ) be open neighbourhoods of A such that An ↓ A. Deﬁne τ := limn→∞ τAn ≤ τA which exists since τAn ↑. We will show that τ = τA . Since τAn ≤ τA we have τ ≤ τA . By continuity of paths we have XτAn → Xτ ¯ whenever τ < ∞. Since XτAn ∈ An it follows that Xτ ∈ A whenever τ < ∞. This implies τA ≤ τ . 2 We need a notion of the past of a stopping time. 8.55 Problem. (intermediate) A stochastic interval is an interval whose boundaries are stopping times. (a) Show that the indicators of stochastic intervals are adapted processes. Hint: Consider 1(τ,∞) and 1[τ,∞) . (b) Let τ be a stopping time and let F ⊆ Ω. Show that the process 1F 1[0,τ ) is adapted iff F ∩ (τ ≤ t) ∈ Ft for all t ≥ 0. (c) Let Fτ := {F : F ∩ (τ ≤ t) ∈ Ft , t ≥ 0}. Show that Fτ is a σﬁeld. 8.56 Deﬁnition. Let τ be a stopping time. The σﬁeld Fτ is called the past of τ . The intuitive meaning of the past of a stopping time is as follows: An event F is in the past of τ if at every time t the occurrence of F can be decided provided that τ ≤ t.
n=1 There is a fundamental rule for iterated conditional expectations with respect to pasts of stopping times. Show that Fτ = ∞ Fτn . Measurability of stopped processes Let (Xt )t≥0 be an adapted process and σ a stopping time. 8. We ask whether Xσ 1(σ<∞) is Fσ measurable.58 Theorem. The second step is based on the ﬁrst step and consists in showing that 1(σ≤τ ) E(ZFσ ) = 1(σ≤τ ) E(ZFσ∩τ ) (10) Finally. 2 We arrive at the most important result on stopping times and martingales. 8. 8. Then E(E(ZFσ )Fτ ) = E(ZFσ∩τ ) Proof: The proof is bit tedious and therefore many textbooks pose it as exercise problem (see KaratzasShreve. Chapter 1. (d) Show that every stopping time σ is Fσ measurable. . we prove the assertion separately on (σ ≤ τ ) and (σ ≥ τ ). A preliminary technical problem is whether an adapted process stopped at σ is Fσ measurable. (b) Fσ∩τ = Fσ ∩ Fτ . THE WIENER PROCESS Many of the subsequent assertions can be understood intuitively if this interpretation is kept in mind. (advanced) Let σ and τ be stopping times.57 Problem. (σ ≤ τ ) and (σ = τ ) are in Fσ ∩ Fτ . 2. (e) Let τn ↓ τ . Intuitively. this should be true. For case 1 we apply (10) to the inner conditional expectation. For case 2 we apply (10) to the outer conditional expectation (interchanging the roles of σ and τ ). [15]. Let Z be an integrable or nonnegative random variable and let σ and τ be stopping times. We have to start with showing that F ∩ (σ < τ ) ∈ Fσ∩τ and F ∩ (σ ≤ τ ) ∈ Fσ∩τ whenever F ∈ Fσ Note that the nontrivial part is to show ∈ Fτ .59 Discussion. Hint: Start with proving (σ < τ ) ∈ Fτ and (σ ≤ τ ) ∈ Fτ .68 CHAPTER 8. (c) The sets (σ < τ ). (a) If σ ≤ τ then Fσ ⊆ Fτ . The trick is to observe that on (σ ≤ τ ) we have (τ ≤ t) = (τ ≤ t) ∩ (σ ≤ t).17). Let us give more detailed hints.
Let (Mt )t≥0 be a martingale. 8. too. Then Z(Mτ ∩t − Mσ∩t ) is a martingale. . The assertion of the theorem follows from E(Mσ Fτ ) = E(E(MT Fσ )Fτ ) = E(MT Fσ∩τ ) = Mσ∩τ .60 Theorem.63 Corollary. < tn .23. This proves the auxiliary assertion for stopping times with ﬁnitely many values. Then n Mtn − Mτ = k=1 (Mtk − Mtk−1 )1(τ ≤tk−1 ) (Prove it on (τ = tj−1 )). However. Since the assertion is true for (Xtn ) it carries over to (Xt ). The extension to arbitrary bounded stopping times is done by 8. too.6. Let τ be any stopping time. 2 We ﬁnish this section with two consequences of the optional stopping theorem which are fundamental for stochastic integration. Then Mτ = E(ZFτ ). MORE ON STOPPING TIMES 69 It is easy to prove the assertion for rightcontinuous processes with the help of 8. 8. If σ is a bounded stopping time and τ is any stopping time then E(Mσ Fτ ) = Mσ∩τ Proof: The proof is based on the following auxiliary assertion: Let τ be a bounded stopping time and let Mt := E(ZFt ) for some integrable random variable Z. Let T = sup σ. If (Mt )t≥0 is a martingale then (Mτ ∩t )t≥0 is a martingale.61.8. 8. Let σ ≤ τ be stopping times and let Z be Fσ measurable and bounded.23. This can be shown in the following way. (easy) Prove 8. Let τ be a stopping time with ﬁnitely many values t1 < t2 < . .62 Problem. (Optional stopping theorem) Let (Mt )t≥0 be a right continuous martingale. 8.61 Corollary. It follows that E((Mtn − Mτ )1F ) = 0 for every F ∈ Fτ . This would be sufﬁcient for the optional stopping theorem below. for stochastic integration we want to be sure that the assertion is also valid for leftcontinuous processes. Deﬁne t Xtn (Xtn )t≥0 := n 0 Xs en(s−t) ds Then are continuous adapted processes such that Xtn → Xt provided that (Xt )t≥0 has leftcontinuous paths. .
2 Fu )). It is. illuminating to discuss more sophisticated versions of the Markov property. i. 8. CHAPTER 8. Proof: For the proof we only have to note that the system of functions ea1 Ws+h1 +a2 Ws+h2 +···+an Ws+hn . Then for every P integrable function Z which is σ( u≥s Fu )measurable we have E(ZFs ) = φ(Ws ) where φ is some measurable function. they depend on any Xu . E(eaWt Fs ) = eaWs +a 2 (t−s)/2 These conditional expectations do not use the whole information available in Fs but only the value Ws of the Wiener process at time s. We need some preliminaries.25.65 is the simplest and basic formulation of the Markov property. 8.65 Theorem. n ∈ N. Let (Wt )t≥0 be a Wiener process and (Ft )t≥0 its internal history. Redundant conditioning We have to be aware of an important property of conditional expectations. 8. THE WIENER PROCESS UNDER CONSTRUCTION: EXTENSION TO SUB. is total in L2 (σ( u≥s hi ≥ 0. Let X be Ameasurable and let Y be P integrable and independent of A.e.AND SUPERMARTINGALES 8.63.67 Remark.66 Problem.64 Problem. (intermediate) Under the assumptions of 8.65 show that E(ZFs ) = E(ZWs ).70 8.7 The Markov property We explain and discuss the Markov property at hand of the Wiener process. however. But when we were dealing with special conditional expectations given the past of a Wiener process then we have got formulas of the type E(Wt Fs ) = Ws . Then we know that E(XY A) = XE(Y A) = XE(Y ) . When we calculate conditional expectations given the past Fs of a stochastic process (Xt )t≥0 then from the general point of view conditional expectations E(Xt Fs ) are Fs measurable. 8. E(Wt2 Fs ) = Ws2 + (t − s). (intermediate) Prove 8. u ≤ s. Hint: Apply 8.
It says that at every time point s the Wiener process starts afresh at the state ξ = Ws as a new Wiener process forgetting everything what happened before time s. Y ) is independent of A. Proof: Extend (11) to functions of several variables. The rest (i.68 contains that formulation which is known as the ordinary Markov property of the Wiener process. This can be understood intuitively in the following way: Since X is Ameasurable the information in A gives the whole information on X. It is a remarkable fact with far reaching consequences that the Markov property still holds if time s is replaced by a stopping time. THE MARKOV PROPERTY 71 The conditional expectation depends on A only through X. Let (Wt )t≥0 be a Wiener process and (Ft )t≥0 its internal history.68 Theorem. conditional expectations simply are expectations depending depending on a parameter slot where the present value of the process has to be plugged in. Y )A) = φ ◦ X where φ(ξ) = E(f (ξ. 8. Then the process Xt := Wτ +t − Wτ . Let us calculate E(f (Ws+t )Fs ) where f is bounded and measurable. 2 8. F ∈ F∞ .e. Y )) (provided that f is sufﬁciently integrable). Then the conditional distribution of (Ws+t )t≥0 given Fs is the same as the distribution of a process ξ + Wt where ξ = Ws and (Wt )t≥0 is any (other) Wiener process. 8.7.8.69 Theorem. that the equation can be written as follows: E(XY A) = φ ◦ X where φ(ξ) = E(ξY ) This view can be extended to much more general cases: E(f (X. Proof: (Outline) Let us show that f (Wτ +t − Wτ ) dP = P (F )E(f (Wt )) F . We have E(f (Ws+t )Fs ) = E(f (Ws + (Ws+t − Ws ))Fs ) Since Ws is Fs measurable and Ws+t − Ws is independent of Fs we have E(f (Ws+t )Fs ) = φ ◦ Ws where φ(ξ) = E(f (ξ + (Ws+t − Ws ))) (11) Roughly speaking. Let τ be any stopping time and deﬁne Q(F ) = P (F τ < ∞). The essential preliminary step is the following. is a Wiener process under Q which is independent of Fτ . t ≥ 0. Note.
69.6. F ∈ Fτ and f is any bounded continuous function. 8. But this is certainly true for stopping times with ﬁnitely many values. The common approximation argument proves the equation.72 CHAPTER 8. Further reading: KaratzasShreve [15].71 Theorem. (advanced) Fill in the details of the proof of 8. Let σ be any stopping time. s > 0. Noting that the equation holds for τ + s. 2 8. sections 2. Then on (σ < ∞) the conditional distribution of (Wσ+t )t≥0 given Fσ is the same as the distribution of a process ξ + Wt where ξ = Wσ and (Wt )t≥0 is some (other) Wiener process.70 Problem. proves the assertion.5 and 2. (Strong Markov property) Let (Wt )t≥0 be a Wiener process and (Ft )t≥0 its internal history. THE WIENER PROCESS when F ⊆ (τ < ∞). replacing τ . .
≤ σn = T are stopping times and ak is Fσj−1 measurable. n where 0 = σ0 ≤ σ1 ≤ . the processes Htk are adapted and leftcontinuous processes. . S 1 . . j−1 k = 0. . During the presentation of stochastic analysis we will refer to the concepts and ideas discussed in this chapter. 1. Leftcontinuity is essential because trading strategies must be predictable. The market value of the portfolio (Htk )k at time t is given by Vt = k Htk Stk The process (Vt ) is the wealth process corresponding to the trading strategy. 2. . 9. Thus. .σj ] . The trading strategy is called selfﬁnancing if the changes in the portfolio at σj are ﬁnanced by nothing else than the value of the portfolio: k k Hσj−1 Sσj = k k k k H σ j S σj (12) 73 .Chapter 9 The ﬁnancial market picture This chapter gives an overview over the basic concepts of pricing in ﬁnancial markets. . S m ) be a ﬁnite set of rightcontinuous processes modelling the value of tradable assets of a ﬁnancial market.1 Assets and trading strategies Let S = (S 0 . We restrict our view to trading strategies with a ﬁnite number of trading times. A trading strategy in S is a process which determines how many units of each asset are held during a time interval: n Htk = j=1 ak 1(σj−1 . j−1 Thus. We consider a ﬁnite time horizon [0. . However. T < ∞. the present chapter is the basic motivation for going into the troubles of stochastic analysis. all concepts are formulated such that they are valid also in the general case which will be tractable by stochastic analysis. T ]. . . .
σj ] we have Vt − Vσj−1 = k k k k (Hσj−1 Stk − Hσj−2 Sσj−1 ) k k Hσj−1 (Stk − Sσj−1 ) k (14) (15) = This formula can be extended to t ∈ [0. S m ) iff it is selfﬁnancing for S where S k = S k /N . S 1 . For proving (16) we get from (14) that Vt − Vσj−1 = Vσ Vt − j−1 whenever σj−1 < t ≤ σj Nt Nσj−1 With t = σj an induction argument implies Vσj = Vσj /Nσj . Later we will see that for continuous trading the corresponding representation is that of an integral.1 Theorem. 2 . if the trading strategy is selfﬁnancing then for t ∈ (σj−1 . If V and V are the corresponding wealth processes then Vt ˜ Vt = . Fill in the details of the proof of 9. Nt t ≥ 0. A trading strategy (H k )k is selfﬁnancing for S = (S 0 . 9. Let N be any positive rightcontinuous process. . The unit of money can be a currency or some other positive value process. (16) Proof: The ﬁrst part follows easily by dividing (12) by Nσj .2 Problem. This proves the assertion. 2 9. It should be noted that formula (16) only holds for wealth processes of selfﬁnancing trading strategies. It is important to know how the properties of trading strategies behave under a change of the unit of money.74 CHAPTER 9. The value of assets is measured in terms of a unit of money.3 Theorem. . 9. . A trading strategy (H k )k is selfﬁnancing iff Vt = V0 + k j k k k Hσj−1 (Sσj ∩t − Sσj−1 ∩t ) (13) Proof: It is easy to see that 13 implies 12. Conversely. T ] by writing it as Vσj ∩t − Vσj−1 ∩t = k k k k Hσj−1 (Sσj ∩t − Sσj−1 ∩t ) The assertion follows from Vt − V0 = j (Vσj ∩t − Vσj−1 ∩t ).1. THE FINANCIAL MARKET PICTURE The selfﬁnancing property has the consequence that the wealth process can be written as a gambling system like (6). .
A ﬁnancial market is a set M of wealth processes (for the moment rightcontinuous processes) with the following properties: (1) M is a vector space. 9. (2) Every selfﬁnancing trading strategy based on ﬁnitely many wealth processes in M leads to a wealth process in M. the risk neutral pricing method is only reasonable if the price does not allow arbitrage. But this works only if the claims are independent. That might be true for insurance but not for ﬁnancial markets.7 Deﬁnition. The insurance method is to deﬁne x0 as the expectation under P of the discounted claim. i.4 Deﬁnition. Show that (2) implies (1) in 9. VT = C. such a concept does not work since there are very plausible market models where arbitrage can be achieved with highly risky wealth processes. The idea is the following: A price x0 for a claim C is arbitragefree if there does not exist a wealth process V ∈ M such that V0 = x0 and VT ≥ C.3 Martingale measures Let M be a market model. The fundamental problem of mathematical ﬁnance is to ﬁnd a reasonable price x0 at time t = 0 for the claim C. Alas. A wealth process V ∈ M is called admissible if it is bounded 9. In this case it looks reasonable to deﬁne V0 = x0 . How can we be sure that risk neutral pricing leads to arbitragefree prices ? 9. The risk of this kind of pricing is controlled by selling a large number of claims. There are two methods to ﬁnd a price x0 for the claim. VT = C.5 Problem. . The more recent and most important method of pricing is risk neutral pricing using hedge strategies. A price x0 is an arbitragefree price for a claim C if there does not exist an admissible wealth process V ∈ M such that V0 = x0 and VT ≥ C.e. Such wealth processes have to be excluded from competition. FINANCIAL MARKETS AND ARBITRAGE 75 9. A claim C has a hedge in the market M (is attainable) if there is a wealth process V ∈ M satisfying VT = C.2. However.9.2 Financial markets and arbitrage A claim at time t = T is any FT measurable random variable C. This is called risk neutral pricing. This leads to the concept of a market model. every linear combination of wealth process in M is contained in M. Then by the LLN the average cost of that set of claims equals x0 . from below.6 Deﬁnition. 9. The common answer to the question posed is the existence of a socalled martingale measure.4. 9.
or it is even necessary. The proof of this lemma is postponed. The following theorem is fundamental for the modern theory of pricing in ﬁnancial markets. THE FINANCIAL MARKET PICTURE A generating system of M is a subset M0 ⊆ M such that every wealth process V ∈ M is generated by a trading strategy based on ﬁnitely many elements of M0 . A martingale measure is a probability measure Q ∼ P such that all wealth processes of some generating system are martingales. Then we have EQ (VT ) = V0 and EQ (VT1 ) ≤ V01 = V0 Since VT1 − VT ≥ 0 and EQ (VT1 − VT ) ≤ 0 it follows that VT1 = VT = C Qa. Let C be an attainable claim with a hedge whose wealth process is a Qmartingale (a martingale hedge). as a rule. Therefore it is important to ask whether we may expect to have martingale measures for arbitragefree markets.11 Deﬁnition. 9. 9.e.9 Lemma. .10 Theorem. (Admissible wealth processes are ”supermartingales”). 9. und hence P a. Proof: Let C be a claim and let V be the wealth process of a martingale hedge of the claim. For common ﬁnancial market models martingale measures exist. for a mathematician it is a challenge to ask whether the existence of a martingale measure is only some sufﬁcient condition for a market to be arbitragefree.10 shows that the existence of a martingale measure makes risk neutral pricing an easy exercise (at least in theory). Show that the existence of a martingale measure implies that the market is arbitragefree. we have x0 = V0 .8 Deﬁnition. 9. It is not true that every arbitragefree market admits martingale measures. A market model M is arbitragefree if x0 = 0 is an arbitragefree price of the claim C = 0. If there exists a martingale measure then all admissible wealth processes V ∈ M satisfy E(Vt Fs ) ≤ Vs whenever s < t.76 CHAPTER 9.12 Problem.e. This turns out to be a rather delicate question. Let V 1 ∈ M be an admissible wealth process such that V01 = V0 = x0 and VT1 ≥ C = VT . Then x0 := EQ (C) is an arbitragefree price. 2 Theorem 9. 9. Clearly. Let M be a market model and Q some martingale measure. However.
there is an easy solution of that problem. ﬁrst we have to look for numeraires in order to normalize the market. . 9.g.14 Problem. (3) If C/NT has a martingale hedge in the normalized market then deﬁne the price to be x0 = N0 EQ (C/NT ). e. the risk neutral pricing machinery runs as follows: (1) Find a numeraire N and turn to the normalized market. a bank account Nt = ert . Summing up. Then we may deﬁne this asset as our unit of money.4. Assume that there exists some positive tradable asset N (a numeraire).9.4 Change of numeraire There is an easy aspect of the existence of martingale measures which is important for practical purposes. (2) Find a martingale measure Q for the normalized market. For the market model this amounts to dividing all value processes by N . 9. CHANGE OF NUMERAIRE 77 9.13 Problem. In an arbitragefree market all riskless wealth processes are proportional. Fortunately. Show: If a martingale measure exists then all riskless assets of the market are constant. If a market contains a bank account with positive interest then there cannot exist a martingale measure ! This message sounds a bit disappointing. If we try to ﬁnd martingale measures. Assume that one of the assets of the market is a riskless (for the moment: nonstochastic) positive asset. The normalized market has only constant riskless assets proportional to 1 = N/N and therefore does not exclude the existence of a martingale measure.3. resulting in a socalled normalized market M consisting of the wealth processes given in 9.
78 CHAPTER 9. THE FINANCIAL MARKET PICTURE .
Chapter 10 Stochastic calculus 10. T ] if V0T (f ) < ∞. T ]). t] ⊆ [0. The set of all functions of bounded variation is denoted by BV ([0.4 Problem. . 10. Then f ∈ BV and t Vst (f ) = s f (u)du 10. The variation of f on the interval [s. 79 .2 Problem. (easy) Show that BV is a vector space. < tn = t and all n ∈ N. T ] is n Vst (f ) := sup j=1 f (tj ) − f (tj−1 ) where the supremum is taken over all subdivisions s = t0 < t1 < . . (intermediate) Let f be differentiable on [s. 10.1 Elementary Integration Bounded variation Let f : [0. t] with continuous derivative. A function f is of bounded variation on [0. (very easy) Show that monotone functions are BV and calculate their variation. 10.1 Deﬁnition. T ] → R be any function.3 Problem.
We want to extend the integral to a more general class of functions f . .tk ] converges uniformly to f .80 CHAPTER 10. If f ∈ T and if g is rightcontinuous then we deﬁne T n f dg := 0 k=1 ak (g(tk ) − g(tk−1 )) (17) If g is increasing then this deﬁnition coincides with f dλg . Which BVfunctions are Borelmeasurable ? There are continuous functions on compact intervals which are not of bounded variation. i. . fn − f u → 0. Recall that a sequence of subdivisions 0 = t0 < t1 < .7 Lemma. The CauchyStieltjes integral Let T ([0. In a similar way as for the abstract integral we conclude that 17 is a valid deﬁnition being linear both in f and g. 2 Such step functions on Riemannian sequences of subdivisions can be used to extend the integral to arbitrary cagladfunctions due to the following inequality. STOCHASTIC CALCULUS 10.6 Problem.e. If f has inﬁnitely many jumps one has to work a little harder. Hint: Let g(t) := (V0t (f ) + f (t))/2 and h(t) := (V0t (f ) − f (t))/2. Proof: This is a beginner’s lemma if f is continuous on [0. .5 Problem.e. T ]) be the set of all leftcontinuous stepfunctions on [0. Then for any Riemannian sequence of subdivisions the sequence of step functions n fn := k=1 f (tk−1 )1(tk−1 . Of course. this could be done along the lines of general integration theory. < tn = T is some subdivision. < tn = T is called Riemannian if max tk − tk−1  → 0. Let f be leftcontinuous with limits from the right (caglad). functions of the form n f (t) = k=1 ak 1(tk−1 . h are increasing and satisfy V0t (f ) = g(t) + h(t). . (intermediate) Show that any function f ∈ BV can be written as f = g − h where g. 10. But we will describe the older and more elementary approach by Cauchy for its formal similarity to Protter’s ([19]) deﬁnition of the stochastic integral. T ].tk ] where 0 = t0 < t1 < . i. . T ]. 10.
t] f dg. Then the following assertions are true: . 10. This leads to the deﬁnition of the integral.11 Remark. if (fn ) converges uniformly then the sequence of integrals is also convergent. 10. Differential calculus Let f be caglad on [0. Later the notion of the stochastic integral will be deﬁned in the following way (due to Protter [19]).10. 0 ≤ t ≤ T. For such processes (socalled semimartingales) the deﬁnition of the integral works for adapted caglad processes. Show that for increasing g the CSintegral coincides with the abstract integral for λg . Then the CauchyStieltjes integral is T T f dg := lim 0 n→∞ fn dg 0 where (fn ) is any sequence in T converging uniformly to f . ELEMENTARY INTEGRATION 10. In this way all processes can be used as integrators which can be written as a sum of a cadlag martingale and an adapted cadlag BVprocess.8 Lemma. T ] and g ∈ BV ([0. T ]) be cadlag. Let f be caglad and g be cadlag and of bounded variation. and f •g :t→ 0 f dg. T ]) be cadlag.10 Problem. For notational convenience we deﬁne t T f dg := 0 0 1(0. t 0 ≤ t ≤ T. The integrators g are extended to adapted cadlag processes for which the integrals of adapted caglad stepprocesses have a continuity property similar to the CSintegral.12 Theorem. 10.9 Deﬁnition. In particular. Let f be caglad on [0. 10. For g(t) = t this is the notion of an integral that is taught in schools. It turns out that not only processes with BVpaths have such a continuity property but also martingales. If f ∈ T and if g is rightcontinuous then T 81 f dg ≤ f u V0T (g) 0 This lemma implies that the integral is continuous under uniform convergence of the integrands provided that g is of bounded variation.1. T ] and g ∈ BV ([0.
Since G = g • h we obtain T T f dG = 0 0 f (s)G (s) ds. in short: dG(s) = G (s) ds The second rule is the product rule which in integral notation is called integration by parts. in short: d(g • h) = g dh This is true by deﬁnition for f = 1(0. G = g. Deﬁne ∆g(tj ) := g(tj ) − g(tj−1 ) and ∆h(tj ) similarly. The proof runs over approxmation by stepfunctions. Then T T T f d(gh) = 0 0 f g dh + 0 f h dg. The ﬁrst rule is associativity. 2 Now we turn to the three basic rules of differential calculus.82 CHAPTER 10. 0 f d(g ◦ h) We assume tacitely that all involved functions fulﬁl those conditions which are required for making the expressions welldeﬁned. (b) f • g is of bounded variation since V (f • g) = f  • V (g). STOCHASTIC CALCULUS (a) f • g is cadlag. Let 0 = t0 < t1 < . Let h(t) = t and let G be the primitive of g.t] this means t t g(t)h(t) = g(0)h(0) + 0 g dh + 0 h dg This gives wellknown formulas if g and h are differentiable. Then T T f d(g • h) = 0 0 f g dh. There is an important special case. The rules are concerned with the evaluation of T T T f d(g • h). . in short: d(gh) = g dh + h dg For f = 1(0. (c) If g is continuous then f • g is continuous.t] and extends to general f by a straightforward induction argument. The equation under (b) is a consequence of the corresponding equation for stepfunctions. Proof: (a) and (c) are due to the fact that cadlag and continuity is inherited under uniform convergence. For this let g and h be continuous and BV. . Let f and g be caglad and h cadlag and BV.e. 0 0 f d(gh). i. < tn = t by some subdivision. Then n n n g(t)h(t) = g(0)h(0) + j=1 g(tj−1 )∆h(tj ) + j=1 h(tj−1 )∆g(tj ) + j=1 ∆g(tj )∆h(tj ) . These are the prototypes for the corresponding rules of stochastic calculus.
But if both functions are Wiener paths then the last term tends to the quadratic variation ! The last rule is the chain rule or substitution rule. Both proofs indicate that in the stochastic case the formula will have to be changed by including quadratic variation terms. It is easy to see that (Ht )t≥0 is leftcontinuous and adapted. 0 t ≥ 0. < sn = T is a subdivision and aj is Fsj measurable for every j. Let L0 be the set of all leftcontinuous adapted processes. Then T T f d(g ◦ h) = 0 0 f (g ◦ h) dh. The integral of stepfunctions Let E0 be the set of processes of the form n Ht (ω) = j=0 aj−1 (ω)1(sj−1 . A different approach is based on Taylor polynomials where the terms of higher order vanish by the BVproperty of h.2 The stochastic integral T Let (Zt )t≥0 be any rightcontinuous adapted process.σj ] (t) (18) . At this point we can already see some of the frictions with extending such formulas to the stochastic case. A bit more general is the set E of processes n Ht (ω) = j=0 aj−1 (ω)1(σj−1 .sj ] (t) where 0 = s0 < s1 < . It is our goal to deﬁne an integral H dZ. 10. The result will be the Itoformula. .10. There are subsets of L0 where the deﬁnition of the integral is easy.2. . THE STOCHASTIC INTEGRAL 83 The assertion follows since by the BVproperty the last term tends to zero for a Riemannian sequence of subdivisions. for left continuous adapted processes (Ht )t≥0 . There is an elegant proof verifying the formula for arbitrary power functions g(t) = k t by the product rule. passing to polynomials and ﬁnally applying the Weierstrass approximation theorem. Let g be continuously differentiable. The notorious last term vanishes if at least one of the functions g or h is BV. in short: d(g ◦ h) = (g ◦ h) dh The special case dg(t) = g (t)dt is the chain rule of ordinary calculus.
55(b) we know that the processes in E are adapted. 10.t] H dZ = j=1 aj−1 (Zσj ∩t − Zσj−1 ∩t ) if H is deﬁned by (18). Let (Mt )t≥0 be a martingale and let H ∈ E be bounded. This can be done pathwise and leads to the following deﬁnition: t T n H dZ := 0 0 1(0.14 Theorem.σj ∩t] . j=1 For functions in E it obvious how to deﬁne the integral. 10. . Show that 1(0.63. Proof: Apply 8. < σn = T is a subdivision and aj is Fσj measurable for every j. the selfﬁnancing property is characterized by the equation t Vt = k Htk Stk = k k k H 0 S0 + k 0 k k Hs dSs (21) If there is a martingale measure Q and if the trading strategy is bounded then the corresponding wealth process is a martingale under Q.t] H = n aj−1 1(σj−1 ∩t. STOCHASTIC CALCULUS where 0 = σ0 < σ1 < . 2 10. Again it is obvious that the paths are leftcontinuous and from 8.15 Discussion. .84 CHAPTER 10.13 Problem. . Since for each single path this is an ordinary Stieltjes integral we have immediately the properties: t t t (αH1 + βH2 ) dZ = α 0 t 0 t H1 dZ + β 0 t H2 dZ H dZ2 0 (19) (20) Hd(αZ1 + βZ2 ) = α 0 0 H dZ1 + β For notational convenience denote H • Z : t → t 0 H dZ. Let H ∈ E be deﬁned by (18).1 can be written as an integral: t Vt = V0 + k j k k k Hσj−1 (Sσj ∩t − Sσj−1 ∩t ) = V0 + k k k Hs dSs 0 Thus. First we observe that the representation of selfﬁnancing trading strategies in 9. Then H • M is a martingale. Financial markets Let us continue the ﬁnancial market framework of chapter 9.
A right continuous process (Xt )t≥0 is a semimartingale if for every sequence (H n ) of processes in E the following condition holds: t n sup Hs  s≤T → 0 ⇒ sup t≤T 0 n Hs dZs → 0 P The set of all semimartingales is denoted by S.2. The selfﬁnancing property of continuous trading strategies will be deﬁned by a formula like (21) and for this we have to extend our notion of the integral to continuous integrands. Hint: Note that (Xt = Xtτn ) ⊆ (τn < t). Semimartingales 10. (intermediate for mathematicians) Show that: (a) The set of semimartingales is a vector space. Therefore for dealing with general claims we have to consider continuous trading strategies.17 Problem. e. 10. THE STOCHASTIC INTEGRAL 85 It follows that every claim C which can be hedged by a bounded selfﬁnancing trading strategy in E is a martingale hedge and the pricing formula x0 = EQ (C) can be applied.16 Deﬁnition. (b) If X ∈ S then for every stopping time the stopped process X τ := (Xτ ∩t )t≥0 is a semimartingale. But if our assets behave like random walks.8 it follows that this is actually the case. (c) Let τn ↑ ∞ be a sequence of stopping times such that X τn ∈ S for every n ∈ N. The concept of semimartingales is only reasonable if it covers adapted cadlag processes with paths of bounded variation. . many claims cannot be (exactly) hedged using only ﬁnitely many trading times. there is no hope to have paths of bounded variation. However. From 10. The following result opens the door to stochastic processes like the Wiener process. Before we turn to such examples let us study the structure of the set S of semimartingales. It is therefore important to get an overview over typical processes that are semimartingales. It will turn out that a reasonable extension process of the stochastic integral can be carried out for integrator processes which are semimartingales.g. Then X ∈ S.10. driven by a Wiener process. If the integrators are processes with paths of bounded variation then the integral extension could be done pathwise like a CSintegral.
we have shown that every cadlag process which is a sum of a continuous martingale and an adapted process with paths of bounded variation is a semimartingale. Chapter I.86 CHAPTER 10. the procedure is as . (intermediate) Show that every cadlag martingale (Mt )t≥0 with continuous paths is a semimartingale. Extending the stochastic integral The extension of the stochastic integral from E to L0 is based on the fact that every process in L0 can be approximated by processes in E. Every square integrable cadlag martingale (Mt )t≥0 is a semimartingale. Hint: Let τn = inf{t : Mt  ≥ n} and show that M τn is a square integrable martingale for very n ∈ N. [14]. We have t E 0 H n dM 2 n 2 = E j=1 n aj−1 (Mj − Mj−1 ) a2 (Mj − Mj−1 )2 j−1 j=1 n = E ≤ H n 2 E u j=1 n (Mj − Mj−1 )2 2 (Mj2 − Mj−1 ) ≤ H n 2 E(Mt2 ) u j=1 = H n 2 E u 2 Thus. Actually every cadlag martingale is a semimartingale. In short. 10. Summing up.19 Problem. 10. we proved that (Wt )t≥0 is a semimartingale. STOCHASTIC CALCULUS 10. See JacodShiryaev. (easy) Show that (Wt2 )t≥0 is a semimartingale. t Proof: Let (H n ) be a sequence in E such that H n u → 0.18 Theorem. Since 0 H n dM is a martingale we have by the maximal inequality s P sup s≤t 0 H n dM > a ≤ 1 E a2 t H n dM 0 2 n For convenience let Mj := Mσj ∩t .17.20 Problem. 4.
22 Theorem. The stochastic integral has a strong continuity property. P (This is slightly stronger than the deﬁning property of semimartingales. P (4) From (2) it follows that the limiting process Y does not depend on the sequence (H ).2. [20]. Therefore there exists a process Y such that t sup t≤T 0 H n dX − Yt → 0. in order to make sure that such a deﬁnition makes sense one has to consider several mathematical issues. T 0 10.) (3) From (2) it follows that for every sequence (H n ) ⊆ E satisfying (1) the corresponding sequence of stochastic integrals T H n dX 0 is a Cauchy sequence with respect to convergence in probability. 10. uniformly on [0. Foundations of the extension process The main points of deﬁnition (22) are existence and uniqueness of the limit. We follow Protter. sup Hs  → 0 ⇒ sup s≤T t≤T 0 P t H n dX → 0.21 Discussion. n The preceding discussion shows that there is a welldeﬁned stochastic integral H dX whenever H ∈ L0 and X ∈ S. Let X is a semimartingale and let H ∈ L0 . (1) One can always ﬁnd a sequence (H n ) ⊆ E such that n sup Hs − Hs  → 0 s≤T P (2) Semimartingales satisfy n (H n ) ⊆ E.10. Let X be a semimartingale. Let X ∈ S and H ∈ L0 . Then for every sequence (H n ) of processes in L0 n P sup Hs  → s≤T t 0 ⇒ sup t≤T 0 n Hs dZs → 0 P . T ]. Consider some sequence (H n ) in E such that H n → H and deﬁne T T H dX := lim 0 n→∞ H n dX 0 (22) However. THE STOCHASTIC INTEGRAL 87 follows.
However. or JacodShiryaev. In order to achieve uniform convergence one has to replace the arbitrary deterministic sequence of subdivisions by a particular sequence of subdivisions based on stopping times instead of ﬁxed interval boundaries.24 Theorem. 10. Assume that 0 = tn < 0 tn < . 2 Let us apply 10.23 for the evaluation of a fundamental special case. It can also be proved for square integrable martingales X. Confer Protter. The assertion can be proved by our means if the paths of H are continuous. The universal validity follows from a very deep representation theorem for semimartingales which is not available for us at this stage. . This implies n Wtj−1 (Wtj − Wtj−1 ) → j=1 0 P t Ws dWs . . 10. STOCHASTIC CALCULUS For deriving (understanding) the basic properties or rules of this stochastic integral we will apply the following approximation result. Let X be a semimartingale and H ∈ L0 . (actually they converge everywhere. ≤ tn = t be an interval partition such that max tj − tj−1  → 0 as n → ∞.23 Theorem. t]. [20]. .tj ] j=1 converge to H in probability. but not necessarily uniformly on compacts. Then t 0 1 Ws dWs = (Wt2 − t) 2 (23) Proof: Let 0 = t0 ≤ t1 ≤ . < tnn = t is any Riemannian sequence of subdivisions of [0. If H is only leftcontinuous then the Riemannian step functions converge to H pointwise. Let (Wt )t≥0 be a Wiener process. the result is true anyway. In this case the Riemannian step processes kn Htj−1 1(tj−1 . Then 1 k kn t sup s≤t j=1 Htj−1 (Xtj − Xtj−1 ) − 0 H dX → 0 P Proof: This is not a proof but a comment.88 CHAPTER 10. . uniformly on compacts). For BVprocesses X it is a consequence of Lebesgue’s theorem on dominated convergence. [14]. uniformly on compacts.
(2) the integrationbyparts formula. (2) Show that t t 1F H dX = 1F s s H dX whenever F ∈ Fs . 2 10.3. 10.3 Calculus for the stochastic integral There are three fundamental rules for calculations with the stochastic integral which correspond to the three rules considered in section 10. Path properties UNDER CONSTRUCTION The Wiener integral UNDER CONSTRUCTION 10.10.∞) H dW (1) Prove a concatenation property of the stochastic integral.26 Problem. How to modify (23) if (Wt )t≥0 is replaced by some BVprocess ? It is clear that the linearity properties (19) remain valid for the stochastic integral with H ∈ L0 .1: (1) the associativity rule. CALCULUS FOR THE STOCHASTIC INTEGRAL On the other hand we have n 89 Wt2 = j=1 n (Wt2 − Wt2 ) = j−1 j n = j=1 (Wtj − Wtj−1 ) + 2 j=1 n 2 Wtj−1 (Wtj − Wtj−1 ) We know that (Wtj − Wtj−1 )2 → t (P ) j=1 This proves the assertion. (3) the chain rule (Ito’s formula) . Deﬁne t t H dX := s 0 1(s.25 Problem.
28 Theorem. If Hn → H in an appropriate sense the asserted equation follows. For the second equation note that 1(0. Recall the deterministic integrationbyparts formula for continuous BVfunctions: t t f (t)g(t) − f (0)g(0) = 0 f dg + 0 g df This formula is not true for arbitrary semimartingales. The integrationbyparts formula We restrict our presentation of the integrationbyparts formula to processes with continuous paths.τ ] H dX = 0 0 H dX = 0 H dX τ 10. Hint: The ﬁrst equation follows from the deﬁnition of the integral on E. The following is a deﬁnition rather than a theorem but is called the integration by parts formula.27 Theorem. T T (2) Let H ∈ L0 .29 Problem. Then 0 H d(G • X) = 0 HG dX. Proof: For Hn ∈ E0 we have T T in short: d(G • X) = G dX (24) Hn d(G • X) = 0 0 Hn G dX If Hn → 0 in an appropriate sense this implies the semimartingale property of G • X. Truncation rule Let H ∈ L0 and X ∈ S. Let H.90 CHAPTER 10. Then H • (G • X) = (HG) • X. 2 There is an important consequence of rule (24) which should be isolated. 10. 10. . Then for any stopping time τ T T ∩τ T 1(0. (intermediate) Prove 10. G ∈ L0 and X ∈ S.τ ] • X = X τ . Then G • X is in S. (1) Let X ∈ S and G ∈ L0 . Details are as follows.28. STOCHASTIC CALCULUS The associativity rule This rule can be formulated brieﬂy as follows.
3. For every Riemannian sequence of subdivisions n (Xtj − Xtj−1 )(Ytj − Ytj−1 ) → [X. Proof: This follows easily from the deﬁnition of [X.32.10. 2 10.30 Deﬁnition. 10.32 Theorem. X] =: [X] is the quadratic variation of X. H ∈ L0 or in short d(XY ) = XdY + Y dX + d[X. So let us have a closer look onto [X. Y ]. Y ]t . 10.31 Problem. Y ] = ([X + Y ] − [X − Y ]) 2 also the quadratic covariation is a BVprocess and a semimartingale. this only makes sense if [X. . Y ]. (easy) Show that [X. t ≥ 0. Y ] is linear in both arguments. Let X and Y be semimartingales with continuous paths. This process is called the quadratic covariation of X and Y . This is an increasing process. The integration by parts formula can be written as T T T T H d(XY ) = 0 0 HX dY + 0 HY dX + 0 H d[X. CALCULUS FOR THE STOCHASTIC INTEGRAL 91 10. Y ]t := Xt Yt − X0 Y0 − 0 X dY − 0 Y dX. Deﬁne t t [X. From 10. Let X and Y be semimartingales with continuous paths. Y ] is welldeﬁned and is a continuous adapted process. It is clear that [X.32 it follows that [X. Y ] is a semimartingale. (easy) Show: If X and Y are continuous semimartingales then XY is a (continuous) semimartingale. too. Y ] However. (intermediate) Fill in the details of the proof of 10.34 Problem. j=1 P t ≥ 0. Moreover. hence a BVprocess and a semimartingale. since 1 [X.33 Problem. 10. Y ] when it is approximated by Riemannian sequence of step processes.
Hint: Prove it for H = 1(σ. Z]. 10. Hint: Use induction on k. 10. (advanced) Show that [X τ . (intermediate) Let X ∈ S be continuous. Y ] = 0. (easy) Let (Wt ) be a Wiener process. (intermediate) Show that H • W is a BVprocess iff H ≡ 0. . (a) Show that dX 2 = 2XdX + d[X] (b) Find a formula for dX k . Y ]τ .32 and could be made precise by approximating τ by a sequence of stopping times with ﬁnitely many values.41 Problem. 10. (intermediate) Show that [H • X. Z] = [Y. (intermediate) A process of the form t t Xt = x0 + 0 as ds + 0 bs dWs is called an Itoprocess.38 Problem. Y ].92 CHAPTER 10. For this note that t τ ∩t X τ dY = 0 0 X dY + Xτ ∩t (Yt − Yτ ∩t ) 10. k ∈ N. Hint: This is intuitively clear from 10.35 Problem. 10. Y ] = [X. 10. (c) [X + Y. STOCHASTIC CALCULUS 10. Show that: (a) [X] = 0. However. Y ] = H • [X.τ ] where σ ≤ τ are stopping times. (b) [X. (intermediate) Let X be a continuous BVprocess and Y.40 Problem. Z continuous semimartingales. it can be obtained from the deﬁnition without any approximation argument. Show that a and b are uniquely determined by X. Calculate [H • W ].37 Problem.36 Problem.39 Problem.
in short: dY = Y dX There is a subtle point to discuss. k ∈ N. CALCULUS FOR THE STOCHASTIC INTEGRAL 93 Ito’s formula Now we turn to the most important and most powerful rule of stochastic analysis. Hint: Start with powers φ(x) = xk .45 Problem.44 Problem. 10.42 in terms of differentials.49 Problem.48 Deﬁnition. 10. (advanced) Use Ito’s formula to ﬁnd a recursion formula for E(Wtk ).42 Theorem. 2 10. (very easy) State 10. (intermediate) Let X ∈ S be continuous and Y := E(X). Let X ∈ S be continuous. Then t φ(Xt ) = φ(X0 ) + 0 φ (Xs ) dXs + 1 2 t φ (Xs ) d[X]s 0 Proof: The assertion is true for polynomials. (easy) Calculate dWta . 10. a > 0. 10.43 Problem. (easy) Show that Ito’s formula is true if φ is a polynomial. 10.46 Problem. (easy) Calculate deαWt . Since smooth functions can be approximated uniformly by polynomials in such a way that also the corresponding derivatives are approximated the assertion follows. Then E(X) = eX−[X]/2 is called the stochastic exponential of X.3. Show that t Y t = Y0 + 0 Ys dXs . 10.10.47 Problem. Ito’s formula Let X ∈ S be continuous and let φ : R → R be twice differentiable with continuous derivatives. Then we may consider φ(X) . 10. Consider some positive continuous semimartingale X and a function like φ(x) = log(x) or φ(x) = 1/x.
52 Problem. Hint: Let τn = min{t ≥ 0 : Xt ≥ 1/n}.53 Theorem. 2 10. STOCHASTIC CALCULUS since it is welldeﬁned and realvalued. (a) Show that Ito’s formula holds for φ(x) = log(x) and for φ(x) = 1/x. Y0 ) t t + 0 φ1 (Xs .53 in terms of differentials. (advanced) Show that Ito’s formula is true if φ : R2 → R is a polynomial.51 Problem. Since smooth functions can be approximated uniformly by polynomials in such a way that also the corresponding derivatives are approximated the assertion follows. 10.50 Problem. (intermediate) State Ito’s formula for φ(x. (b) Show that φ(X) is a semimartingale. Find t 0 k 1/Xs dXs . Ys ) d[Y ]s 0 Proof: The assertion is true for polynomials. Hint: Start with powers φ(x. . 10. Hint: Note that dX = X dL implies dL = 1/X dX. 10. (very easy) State 10. The reason for this difﬁculty is due to the fact that the range of X is not contained in a compact interval where φ can be approached uniformly by polynomials. (intermediate) Show that every positive continuous semimartingale X can be written as a stochastic exponential E(L).53 to Yt = t. Hint: Apply 10. Yt ) = φ(X0 . But Ito’s formula cannot be applied in that version we have proved it.53 to derive the differential equation for the stochastic differential.56 Problem. 10. Y ]s + 2 t φ22 (Xs . (intermediate) Let X be a continuous positive semimartingale. 10. 10. Then φ(Xt . Ys ) dXs + + 0 t φ2 (Xs . Apply Ito’s formula to X τn and let n → ∞. Y ∈ S be continuous and let φ : R2 → R be twice differentiable with continuous derivatives. y) = xk y l . 10. k ∈ N. (easy) Use 10. Ys ) dYs t 1 + 2 φ11 (Xs .55 Problem. Ys ) d[X. Ys ) d[X]s + 0 0 1 φ12 (Xs . t).57 Problem.94 CHAPTER 10. Ito’s formula Let X.54 Problem. (advanced) Let X be a positive continuous semimartingale.
H Y ) (consisting of leftcontinuous adapted processes) is selfﬁnancing if t X Y HtX Xt + HtY Yt = H0 X0 + H0 Y0 + 0 t H X dX + 0 H Y dY or in other words d(H X X) + d(H Y Y ) = H X dX + H Y dY The property of being selfﬁnancing is a very strong property which narrows the set of available wealth processes considerably. 11. It seems to be 95 . But there are also important applications where the „numeraire” is a stochastic process. H Y ] If the trading strategy is selfﬁnancing then the expression on the second line vanishes. The most simple example is discounting by a ﬁxed interest rate. Assume that the market model and the trading strategy are continuous.Chapter 11 Applications to ﬁnancial markets 11. H X ] + d[Y. Let us illustrate this fact at hand of continuous models.1 Deﬁnition. Y ) which is generated by two semimartingales X and Y .1 Selfﬁnancing trading strategies Consider a ﬁnancial market model M = (X. In ﬁnancial calculations it is often convenient to change the unit of money. Then from the integration by parts formula we have d(H X X) + d(H Y Y ) = H X dX + H Y dY + XdH X + Y dH Y + d[X. Continuing the discussion 10. A trading strategy (H X .15 leads to the following deﬁnition.
11. V ] = H X (ZdX + XdZ) + H Y (ZdY + Y dZ) + d[Z.96 CHAPTER 11. . V ] = ZH X dX + ZH Y dY + H X XdZ + H Y Y dZ + d[Z. V ] = H X d(XZ) + H Y d(Y Z) + d[Z.2 Theorem. Z] − H Y d[Y. Then d(V Z) = d(H X XZ) + d(H Y Y Z) = H X d(XZ) + H Y d(Y Z) Proof: The ﬁrst equality is obvious. X n ) be a ﬁnancial market model consisting of Itoprocesses . Show that any wealth process V satisfying t Vt /Xt = HtX + HtY (Yt /Xt ) = V0 /X0 + 0 Hd(Y /X) for some continuous adapted process H is a selfﬁnancing wealth process. X 2 . V ] = H X (d(XZ) − d[X. Applying the preceding result to Z = 1/X we obtain t Vt /Xt = HtX + HtY (Yt /Xt ) = V0 /X0 + 0 H Y d(Y /X) 11. V ] − H X d[X. . Let M = (X 1 . Z]) + H Y (d(Y Z) − d[Y. Find the corresponding trading strategy. APPLICATIONS TO FINANCIAL MARKETS intuitively clear that such a change of numeraire should have no inﬂuence on the trading strategy and should not destroy the selfﬁnancing property.2 Markovian wealth processes dXti = µit dt + σit dWt . . 11. Z]) + d[Z. .3 Problem. Assume that the market model and the trading strategy are continuous and dV := d(H X X) + d(H Y Y ) = H X dX + H Y dY Let Z be a continuous semimartingale. The following theorem shows that this is actually true. The second follows from d(V Z) = ZdV + V dZ + d[Z. Z] 2 Assume now that X is a positive continuous semimartingale.
t)Xti Comparing the dtpart gives the second partial differential equation: ft (Xt . . s)dXs i=1 t 0 t f (Xt . 11. . t) is smooth then it satisﬁes necessarily partial differential equations which for special cases are known as BlackScholes equations. . t) = i=1 fxi (Xt .11. t)σit σjt = 0 i. The equality of the dWt part gives φi = fxi (Xt .3 The BlackScholes market model The simplest mathematical model of a ﬁnancial asset is the model of a bank account (Bt ) with ﬁxed interest r > 0: Bt = B0 ert ⇔ dBt = rBt dt (25) .j In former times wealth processes have been calculated by solving these partial differential equations by analytical or numerical methods. t) + 1 2 fxi xj (Xt . φ2 . t). such that Vt = f (Xt . t) = f (X0 . . THE BLACKSCHOLES MARKET MODEL 97 This is a socalled onefactor model since only one Wiener process is responsible for random ﬂuctuations.3. t) where Xt = (Xt1 . . . Xtn ). To begin with we note that the selfﬁnancing property implies the existence of a trading strategy (φ1 . We assume that the processes (σit ) are positive. 0) + + 0 1 ft (Xs . φn ) such that n t i φi dXs s i=1 n 0 f (Xt . . . We will show that if the function f (x. s)σis σjs ds i. 0) + = i=1 φi Xti t On the other hand the Itoformula gives n t i fxi (Xs . x ∈ Rn . Xt2 . s)ds + 2 fxi xj (Xs . t ≥ 0.j 0 Both representations are Itoprocesses which are equal if both the dWt part and the dtpart coincide. t) and thus the ﬁrst t partial differential equation: n f (Xt . Let V be a wealth process generated by a selfﬁnancing trading strategy. The wealth process is called Markovian if there exists a function f (x. t) = f (X0 .
2. t) be a selfﬁnancing wealth process and deﬁne g(x. APPLICATIONS TO FINANCIAL MARKETS Denoting Rt := rt the bank account follows the differential equation dBt = Bt dRt Stochastic models for ﬁnancial assets are often based on a stochastic model for the rendite process (Rt ). The number σ > 0 is called the volatility of the asset. 11. Xt2 . A BlackScholes model is a market model which is generated by two assets (Bt .98 CHAPTER 11. Note that gt = ft + fx1 rert and gx = fx2 Since g = fx1 ert + fx2 x = fx1 ert + gx x we obtain gt = ft + r(g − gx x) From 11. A BlackScholes model consists of the assets Xt1 = ert and Xt2 = St where σ1t = 0 and σ2t = σSt . If this („generalized Wiener process”) is a model of the rendite of an asset (St ) then it follows that dSt = St dRt = µSt dt + σSt dWt (26) This is a stochastic differential equation. Let f (Xt1 . 11.4 Problem. t) = f (ert . x. Assume that Rt = µt + σWt where (Wt ) is a Wiener process. The BlackScholes equation We are going to apply 11.2 we know that 1 1 2 ft = − fx2 x2 σ2 = − gxx σ 2 x2 2 2 This leads to the famous BlackScholes equation 1 gt + gxx σ 2 x2 + rgx x = rg 2 (27) . St ) following equations (25) and (26). Let us give an overview over the available wealth processes in the BlackScholes model. Show that St = S0 e(µ−σ 2 /2)dt+σW t is a solution of (26). We begin with smooth Markovian wealth processes.5 Deﬁnition. t) The BlackScholes equation is the partial differential equation for the function g(x. t).
6.1. t (28) 11.11. Let (Vt ) be a positive wealth process. Then it can be written as V dVt = µV Vt dt + σt Vt dWt t Let S := S/B and V := V /B. Prove 11. .3.7 Problem.6 Theorem. Then a wealth process V dVt = µV Vt dt + σt Vt dWt t is selfﬁnancing iff V µV − r = λσt . From the integration by parts formula it follows that dS t = (µ − r)S t dt + σS t dWt and V dV t = (µV − r)V t dt + σt V t dWt t On the other hand we know from 11. 11. We are now going to apply 11. t ≥ 0.1 that for some process τt dV t = τt dS t = τt ((µ − r)S t dt + σS t dWt ) This implies τt S t = V σt Vt σ and µ−r µV − r = σ σV 11. Let λ := (µ − r)/σ (the „market price of risk”). THE BLACKSCHOLES MARKET MODEL 99 The market price of risk Much more insight into the structure of wealth processes is obtained in a different way. Show that for a Markovian wealth process equations (27) and (28) are equivalent.8 Problem.
APPLICATIONS TO FINANCIAL MARKETS .100 CHAPTER 11.
x) and σ(t. Xt )dt + σ(t. It is not within the scope of this text to give precise deﬁnitions of these notions. (2) Finding analytical expressions for solutions. x) are given functions. Note that the differential notation is only an abbreviation for the integral equation t t Xt = x0 + 0 b(s.1 Introduction A (Wiener driven) stochastic differential equation is an equation of the form dXt = b(t. let us indicate some issues which are important from the theoretical point of view. (3) Calculating solutions by numerical methods. But the idea can be described in an intuitive way. Xs ) dWs There are three issues to be discussed for differential equations: (1) Theoretical answers for existence and uniqueness of solutions. Xt ) dWt where (Wt )t≥0 is a Wiener process and b(t. However. Such a process is then called a solution of the differential equation. A weak solution is an answer to the question: Does there exist a probability space where a process (Xt )t≥0 and a Wiener process (Wt )t≥0 exist such that the differential equation holds ? 101 . For stochastic differential equations even the concept of a solution is a subtle question.Chapter 12 Stochastic differential equations 12. even between weak and strong uniqueness. We will focus on analytical expressions for important but easy special cases. We have to distinguish between weak and strong solutions. Xs ) ds + 0 σ(s. The problem is to ﬁnd a process (Xt )t≥0 that satisﬁes the equation. A strong solution is a solution where the driving Wiener process (and the underlying probability space) is ﬁxed in advance and the solution (Xt )t≥0 is a function of this given driving Wiener process.
The abstract homogeneous linear equation is dXt = Xt dYt and its solution is known to us as Xt = x0 eYt −[Y ]t /2 = x0 E(Yt ) This is the recipe to solve any homogeneous linear stochastic differential equation. If the volatility is time dependent then it is a local volatility model. A linear differential equation is of the form dXt = (a0 (t) + a1 (t)Xt )dt + (σ0 (t) + σ1 (t)Xt )dWt It is a homogeneous linear differential equation if a0 (t) = σ0 (t) = 0. Popular short rate models are the Vasicek model drt = a(b − rt )dt + σdWt and the HullWhite model drt = (θ(t) − a(t)rt )dt + σ(t)dWt 12.2 The abstract linear equation Let Y and Z be any continuous semimartingales. Any stochastic differential equation is time homogeneous if b(t. In particular for linear differential equations (to be deﬁned below) complete formulas for strong solutions are available. Both the proofs and the assertions of this theory are quite similar to the classical theory of ordinary differential equations. STOCHASTIC DIFFERENTIAL EQUATIONS When we derive analytical expressions for solutions we will derive strong solutions. There is nothing more to say about it at the moment. There are plenty of linear differential equations used in the theory of stochastic interest rates. The constant σ is called the volatility of the model. Let us introduce some terminology. The simplest homogeneous case is dXt = µXt dt + σXt dWt which corresponds to the Black Scholes model. x) = b(x) and σ(t. x) = σ(x). . There is a general theory giving sufﬁcient conditions for existence and uniqueness of nonexploding strong solutions. We refer to HuntKennedy [12] and KaratzasShreve [15].102 CHAPTER 12. If (Bt ) denotes a process that is a model for a bank account with stochastic interest rate then t B rt := t ⇔ Bt = B0 e 0 rs ds Bt is called the short rate.
X]t At At At At Xt 1 1 = dXt − dYt − d[Y. 12.12. Z]s + As t 0 1 dZs As (30) Note that the solution is particularly simple if either Y or Z are BVprocesses. the left hand side of (29) differs from a total differential by a known BVfunction. 103 Things become more interesting when we turn to the general inhomogeneous equation dXt = Xt dYt + dZt There is an explicit expression for the solution but it is much more illuminating to memorize the approach how to arrive there. THE ABSTRACT LINEAR EQUATION 12. We obtain 1 1 1 d Xt = dZt − d[Y.1 Problem. Z]t At At At 1 1 1 = − dYt + d[Y ]t At At At (29) Thus. The idea is to write the equation as dXt − Xt dYt = dZt and to ﬁnd an integrating factor that transforms the left hand side into a total differential. Xt At At At At 1 Xt Xt 1 = dXt − dYt + d[Y ]t − d[Y. (easy) Solve dXt = a(t)Xt dt + σ(t)Xt dWt .2. (intermediate) Fill in and explain all details of the derivation of (30). Z]t At At At leading to t Xt = At x0 − 0 1 d[Y. .2 Problem. Let dAt = At dYt and multiply the equation by 1/At giving 1 Xt 1 dXt − dYt = dZt At At At Note that d Then d 1 1 1 1 Xt = dXt + Xt d + d .
12.6 Problem. Explore the mean and covariance structure of a Vasicek model starting with X0 . (advanced) Derive the following properties of the Vasicek model: (a) The process (Xt )t≥0 is a Gaussian process (i. (advanced) Show that the solution of the Vasicek equation is Xt = e −µt ν x0 + (1 − e−µt ) + σ µ t e−µ(t−s) dWs 0 12. .104 CHAPTER 12. (intermediate) The Brownian bridge: (a) Find the solution of dXt = − 1 Xt dt + dWt . (d) Find Cov(Xt . 2µ . 12.3 Wiener driven models dXt = (ν − µXt )dt + σdWt The Vasicek model is For ν = 0 the solution is called the OrnsteinUhlenbeck process. STOCHASTIC DIFFERENTIAL EQUATIONS 12. (c) Find V (Xt ) and limt→∞ V (Xt ). (advanced) ν σ2 Let X0 ∼ N µ . Let us turn to models that are not time homogneous. (c) Show that Xt → 0 if t → 1. all joint distribution are normal distributions). Xt+h ) and limt→∞ Cov(Xt .4 Problem.e. 1−t (b) Show that (Xt )t≥0 is a Gaussian process. 0 ≤ t < 1.3 Problem. 12.5 Problem. Xt+h ). Find the mean and the covariance structure. The Vasicek model is a special case of the inhomogeneous linear equation for dYt = −µdt and dZt = νdt + σdWt Therefore the integrating factor is At = e−µt and the solution is obtained as in the case of an ordinary linear differential equation. (b) Find E(Xt ) and limt→∞ E(Xt ).
let us consider a nonlinear model.12. (intermediate) Find the solution of the HullWhite model: dXt = (θ(t) − a(t)Xt )dt + σ(t)dWt 105 Finally. WIENER DRIVEN MODELS 12.3.8 Problem.7 Problem. 12. (advanced) Let Zt = E(µt + σWt ). (a) For a > 0 ﬁnd the differential equation of Xt := (b) What about a < 0 ? Zt 1+a t 0 Zs ds .
106 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS .
For any continuous square integrable martingale (Mt )t≥0 the process Mt2 − [M ]t is a martingale. 13.Chapter 13 Martingales and stochastic calculus 13. 13. For a better understanding of 13. There are two main results in this section. At this point we attempt to understand the assertions and their consequences. is a square integrable martingale iff T E 0 2 Hs d[M ]s < ∞ 107 .1 Facts Let (Mt )t≥0 be a continuous square integrable martingale. E(Mt2 ) < ∞. First we note that 13. Thus. t ≥ 0.1 is known to us for the Wiener process. t ≤ T . i.2 Theorem. We would like to know for which H ∈ L0 t Martingale properties of the stochastic integral H •M :t→ 0 H dM is a square martingale. Then H•M is a square integrable martingale for t ∈ [0. T ] iff E([H•M ]T ) < ∞.2 we note that T [H • M ]T = 0 2 Hs d[M ]s Therefore t 0 Hs dMs . Let (Mt )t≥0 be a continuous square integrable martingale and H ∈ L0 .e. We will outline the proofs at the end of the section.1 Theorem. it is a generalization of a familiar structure.
(intermediate) Let (Mt )t≥0 be a continuous square integrable martingale. The most general version of the stochastic integral (being not the subject of this text) could be considered as the stochastic counterpart of abstract (Lebesgue) integration theory.108 CHAPTER 13. Proofs Now.1 it follows that t 2 t Hs dMs 0 − 0 2 Hs d[M ]s .4 Problem. t ≤ T. Hint: Show that for any subdivision 0 = t0 < t1 < . < tn = t n E(Mt2 ) = E j=1 (Mtj − Mtj−1 )2 . 13.5 Problem. then At = [M ]t . By the way.3 Problem. However. For Wiener driven martingales this is even an ordinary Lebesgue integral. If the condition is satisﬁed then by 13. It was the original starting point of the construction of the stochastic integral and it is still the starting point of further extensions of the stochastic integral to larger entities than L0 . 13. is a martingale which means that t 2 t E 0 Hs dMs =E 0 2 Hs d[M ]s This is one of the most important identities of stochastic analysis.2. MARTINGALES AND STOCHASTIC CALCULUS Thus. If (At ) is a continuous adapted process of bounded variation such that Mt2 − At is a martingale. we have to check the P integrability of a Stieltjes integral. . (advanced) For every continuous martingale (Mt )t≥0 we have E(Mt2 ) ≥ E([M ]t ). it is not necessarily a square integrable martingale ! 13. For warming up we provide some straightforward facts.2 by t 2 Mt2 − [M ]t = M0 + 2 0 Ms dMs This implies that M • M is a martingale. what we did (Protter’s [19] approach ”without tears”) is the stochastic counterpart of the CauchyStieltjes integral. let us turn to the proofs of 13. .1 is related to 13. (intermediate) Show that every continuous square integrable martingale of bounded variation is necessarily constant. Let us mention that the assertion of 13.1 and 13.
Show that this implies that Mt2 − [M ]t is even a martingale.8 Lemma.25. Let τn = inf{t : Mt  ≥ n} . note that t 2 n H dM . t ≥ 0. MARTINGALE PROPERTIES OF THE STOCHASTIC INTEGRAL 109 and apply Fatou’s lemma to an appropriate subsequence of a Riemannian sequence. For proving the general case it is sufﬁcient to show that E(Mt2 ) = E([M ]t ).13. It remains to show that E([ 2 t 0 Hn dM ]2 ) E 0 Hn dM =E j=1 n Htj−1 (Mtj − Mtj−1 ) E(Ht2j−1 (Mtj − Mtj−1 )2 ) = j=1 n ≤C j=1 2 E((Mtj − Mtj−1 )2 ) = C(E(Mt2 ) − E(M0 )) 2 Now we are in the position to prove 13. t Proof: It is sufﬁcient to show that E( 0 H dM ) = 0.6 Problem.8.tj ] t 0 Then E( 0 Hn dM ) = 0 and 0 Hn dM → is bounded.7 Problem. < tn = t be a the nth element of a Riemannian sequence of subdivisions and deﬁne n Hn = j=1 t t P Htj−1 1(tj−1 . Proof: (of Theorem 13. (easy) Prove the ”only if” part in 13. (intermediate) Suppose you know that for every continuous square integrable martingale (Mt )t≥0 the equation E(Mt2 ) = E([M ]t ). Next we prove a preliminary assertion. 13.1. The proof isolates some arguments which are related to the martingale structure. Hint: Apply 8.1. Let (Mt )t≥0 be a continuous square integrable martingale. 13. 13. . . Let 0 = t0 < t1 < .1) For a bounded martingale the assertion follows from the integration by parts formula and 13. is true.2. Then H • M is a square integrable martingale for every bounded H ∈ L0 . Recall that for any stopping time τ the identity [M τ ]t = [M ]t∩τ holds. For this.
1 it follows that E((Mt − Ms )2 Fs ) = E([M ]t − [M ]s Fs ) Then the equation array of the proof of 13.110 CHAPTER 13.8 can be improved to t 2 n 2 E 0 Hn dM =E j=1 n Htj−1 (Mtj − Mtj−1 ) E(Ht2j−1 (Mtj − Mtj−1 )2 ) = j=1 n t = j=1 E(Ht2j−1 ([M ]tj − [M ]tj−1 )) = E 2 Hn d[M ] 0 This is extended to bounded H ∈ L0 by routine arguments. Then E((H • M )2 ) = E([H • M ]t ) for every bounded H ∈ L0 . Then (by leftcontinuity !) H τn is bounded and tends to H.9 can be applied to H τn and the assertion is proved again by routine arguments. The corresponding conver2 gence of the left hand side follows from Mt∩τ = E(Mt Fτ )2 ≤ E(Mt2 Fτ ). 0 is a square integrable martingale iff t E 0 2 Hs ds < ∞. 13. . t ≥ 0. Let τn := inf{t : Ht  ≥ n}. MARTINGALES AND STOCHASTIC CALCULUS Then it follows that 2 E(Mt∩τn ) = E((Mtτn )2 ) = E([M τn ]t ) = E([M ]t∩τn ) Letting n → ∞ it is clear that E([M ]t∩τn ) → E([M ]t ). t Proof: With the aid of 13. 2 Proof: (of Theorem 13.9 Lemma. 2 The following assertion is a continuation of 13. t ≥ 0. Lemma 13.8. 2 13. Let (Mt )t≥0 be a continuous square integrable martingale.2 Martingale representation t Let (Wt )t≥0 be a Wiener process.2) We need only prove the ”if”part and for this it is sufﬁcient to prove that 13. We know that Hs dWs .9 extends to arbitrary H ∈ L0 .
13.2. MARTINGALE REPRESENTATION
111
Now, in this special case there is a remarkable converse: Each square integrable martingale arises in this way ! We have to be a bit more modest: If we conﬁne ourselves (as we have done so far) to H ∈ L0 (leftcontinuous adapted processes) then all square integrable martingales can only be approximated which arbitrary precision by stochastic integrals. We will comment this point later. The martingale representation fact is an easy consequence of the following seemingly simpler assertion: Each random variable C ∈ L2 (Ft ) (each ”claim”) can be (approximately) written as a stochastic integral (”hedged” by a selfﬁnancing strategy). Let us introduce some simplifying terminology. 13.10 Deﬁnition. A set C of random variables in L2 (Ft ) is called dense if for every C ∈ L2 (Ft ) there is a sequence Cn ⊆ C such that E((Cn − C)2 ) → 0. A set C of random variables in L2 (Ft ) is called total if the linear hull of C is dense. Thus, we want to prove
2 13.11 Theorem. The set of all integrals 0 H dW with H ∈ L0 and E( 0 Hs ds) < ∞ is dense in L2 (Ft ). Proof: The starting point is that Ft is generated by (Ws )s≤t and therefore also by (eWs )s≤t . Therefore an obvious dense set consists of the functions t t
φ(eWs1 , eWs2 , . . . , eWsn ), where φ is some continuous function with compact support and s1 , s2 , . . . , sn is some ﬁnite subset of [0, t]. Every continuous function can be approximated uniformly by polynomials (Weierstrass’ theorem) and polynomials are linear combinations of powers. Thus, we arrive at a total set consisting of
n
exp
j=1
k j W sj
which after reshufﬂing can be written as
n t
exp
j=1
aj−1 (Wsj − Wsj−1 ) = exp
0
f (s) dWs
(31)
for some bounded leftcontinuous (step) function f : [0, t] → R. It follows that the set of functions (differring from (31) by constant factors)
t
Gt = exp
0
f (s) dWs −
1 2
t
f 2 (s) ds
0
112
CHAPTER 13. MARTINGALES AND STOCHASTIC CALCULUS
is total when f varies in the set of all bounded leftcontinuous (step) functions f : [0, t] → R. Recall that (Gs )s≤t is a square integrable martingale and satisﬁes
t t
Gt = 1 +
0
G d(f • W ) =
0
Gs f (s)dWs
From 13.1 it follows that
t
E
0
G2 f 2 (s) ds < ∞. s
Therefore, the set of integrals
t t
Hs dWs where H ∈ L0
0
and E
0
2 Hs ds < ∞
is total and by linearity of the integral even dense. UNDER CONSTRUCTION: Extension to predictable processes. Representation of martingales.
2
13.3
Levy’s theorem
UNDER CONSTRUCTION
13.4
Exponential martingale and Girsanov’s theorem
UNDER CONSTRUCTION
Chapter 14 Pricing of claims
UNDER CONSTRUCTION
113
PRICING OF CLAIMS .114 CHAPTER 14.
Part III Appendix 115 .
.
15. (easy) Show that: (a) f −1 (B1 ∪ B2 ) = f −1 (B1 ) ∪ f −1 (B2 ). 15. A function f : X → Y is a set of pairs (x. i i∈N c Ai = i∈N Ac i cartesian products. (easy) Prove de Morgan’s laws: (A ∪ B)c = Ac ∩ B c . (b) f −1 (B1 ∩ B2 ) = f −1 (B1 ) ∩ f −1 (B2 ). f (x)) ∈ X × Y such that for every x ∈ X there is exactly one f (x) ∈ Y . rectangles Let X and Y be nonempty sets. If B ⊆ Y then −1 f (B) := {x : f (x) ∈ B} is the inverse image of B under f .1 Problem.3 Problem. 117 . [8]. (A ∩ B)c = Ac ∪ B c 15. If A ⊆ X then f (A) := {f (x) : x ∈ A} is the image of A under f .2 Problem.Chapter 15 Foundations of modern analysis Futher reading: Dieudonné. (intermediate) Prove de Morgan’s laws: c Ai i∈N = i∈N Ac .1 Sets and functions 15. X is the domain of f and Y is the range of f . A function f : X → Y is injective if f (x1 ) = f (x2 ) implies x1 = x2 . It is surjective if for every y ∈ Y there is x ∈ X such that f (x) = y. If a function is injective and surjective then it is bijective.
Show that (g ◦ f )−1 (C) = f −1 (g −1 (C)). for xn with n ≥ N for some N . C ⊆ Z. Note that neighborhoods can be very small. A (open and connected) neighborhood of x ∈ R is an open interval (a.e. . 15. but a checklist which should be well understood or otherwise an introductory textbook has to be consulted. b) which contains x. A sequence can have at most one limit since two different limits could be put into disjoint neighborhoods. (easy) Show that: (a) f (A1 ∪ A2 ) = f (A1 ) ∪ f (A2 ). (b) f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ). (d) Show that for injective functions equality holds in (b). . The following is not intended to be an introduction to the subject. at least regarding its basic algebraic operations. It is an advanced mathematical construction . can have any length > 0. 15. where n = 1.5 Problem. i.118 CHAPTER 15. In other words: The sequence (xn ) converges to x: limn→∞ xn = x or xn → x.e. denoted by n → xn . A number x ∈ R is called a limit of (xn ) if every neighborhood of x contains almost all xn . Then the composition g ◦ f is the function from X to Z such that (g ◦ f )(x) = g(f (x)). (easy) Let f : X → Y and g : Y → Z.2 Sequences of real numbers The set R of real numbers is wellknown. for short (xn ). 15. (easy) Show that: (a) f (f −1 (B)) = f (X) ∩ B (b) f −1 (f (A)) ⊇ A Let f : X → Y and g : Y → Z. (e) Extend (a) and (b) to families of sets. Let us talk about topological properties of R. This is not a theorem but the completeness axiom. An (inﬁnite) sequence is a function form N → R. A fundamental property of R is the fact that any bounded increasing sequence has a limit which implies that every bounded monotone sequence has a limit. i. beginning with some index N . (c) Give an example where inequality holds in (b). Let us start with sequences. When we say that an assertion holds for almost all xn then we mean that it is true for all xn .. 15. FOUNDATIONS OF MODERN ANALYSIS (c) f −1 (B c ) = (f −1 (B))c (d) Extend (a) and (b) to families of sets.6 Problem.4 Problem. 2. .
λ ∈ R. 15. A pair (V. 15. A simple fact which is an elementary consequence of the order structure says that every sequence has a monotone subsequence. REALVALUED FUNCTIONS 119 to show that there exists R. every Cauchysequence is bounded and can have at most one accumulation point. . v ∈ V . satisfying the following conditions: (1) v ≥ 0. (1) V = R is a a normed space with v = v. v ∈ V . w ∈ V . a set having the familiar properties of real numbers including completeness. for any a we have xn > a for almost all xn . we can summarize: An increasing sequence either converges to some real number (iff it is bounded) or diverges to ∞ (iff it is unbounded). Indeed. A sequence can have many accumulation points. A sequence has a limit iff it is bounded and has only one accumulation point.e.3. which then is necessarily the limit. i. A norm on V is a function v → v. An accumulation point x can also be explained in the follwing way: Every neighborhood of x contains inﬁnitely many xn . i. . (2) v + w ≤ v + w.3 Realvalued functions UNDER CONSTRUCTION 15. Cleary every convergent sequence is a Cauchysequence. Call a sequence (xn ) a Cauchysequence if there exist arbitrarily small intervals containing almost all xn .) consisting of a vector space V and a norm . is a normed space.4 Banach spaces Let V be a vector space. 15.e.8 Example. and is therefore convergent. v = 0 ⇔ v = o. An increasing sequences (xn ) which is not bounded is said to diverge to ∞ (xn ↑ ∞). By completeness it has at least one accumulation point. The limit of a subsequence is called an accumulation point of the original sequence (xn ). but not necessarily almost all xn . There is a popular criterion for convergence of a sequence which is related to the assertion just stated. v.15. A similar assertion holds for decreasing sequences. In other words: Every bounded sequence has at least one accumulation point. Putting terms together we arrive at a very important assertion: Every bounded sequence (xn ) has a convergent subsequence. But also the converse is true in view of completeness. and it is not necessarily bounded to have accumulation points.7 Deﬁnition. (3) λv ≤ λ v. Thus.
in particular satisﬁes the triangle inequality. A normed space is a Banach space if it is complete. m ≥ N ( ) 15. (easy for mathematicians) Show that C([0. A sequence (vn ) is a Cauchysequence if there exist arbitrarily small balls containing almost all members of the sequence. 1]) is complete under . d d v1 = i=1 vi . A set of the form B(v. As sequence (vn ) ⊆ V is convergent with limit v if vn − v → 0.g. This is a vector space. i. v2 = i=1 2 vi 1/2 (Euclidean norm). The latter fact is one of the reasons for extending the notion and the range of the elementary integral. r) := {w ∈ V : w − v < r} is called an open ball around v with radius r. The situation is completely different with inﬁnite dimensional normed spaces. FOUNDATIONS OF MODERN ANALYSIS (2) V = Rd is a normed space under several norms. (easy for mathematicians) Show that C([0. . ∀ > 0 ∃N ( ) ∈ N such that vn − vm  < whenever n. 15.10 Problem. if every Cauchy sequence is convergent.9 Deﬁnition.e.11 Problem. 1]) be the set of all continuous functions f : [0.120 CHAPTER 15.∞ . v∞ = max vi  1≤i≤d (3) Let V = C([0. Popular norms on this vector space are f ∞ = max f (s) 0≤s≤1 and f 1 = 0 1 f (s) ds The distance of two elements of V is deﬁned to be d(v. i. E. Actually they are complete under any norm.e. 15. 1] → R. 1]) is not complete under . w) := v − w This function has the usual properties of a dstance. It is clear that R and Rd are complete under the usual norms.1 .
. 15. 15. w ∈ V . 1]) be the set of all continuous functions f : [0. . satisfying the following conditions: (1) (v.15 Deﬁnition. (2) < v.13 Problem. w) →< v.. 15. HILBERT SPACES 121 15. (1) V = R is an inner product space with < v. g >= 0 f (s)g(s) ds The corresponding norm is 1 f 2 = 0 f (s)2 ds 1/2 15.14 Example. < . w > is linear in both variables. A pair (V. v >1/2 . 1]) is not complete under .15. > is an inner product space. v >= 0 ⇔ v = o. w >. 15. 1] → R. This is an inner product space with 1 < f. < v.5. v. An inner product on V is a function (v. . Let V be a vector space. v >1/2 is a norm. (easy) Show that v :=< v. v ∈ V. An inner product gives rise to a norm according to v :=< v.. (3) Let V = C([0. v) ≥ 0. w >= vw. (easy for mathematicians) Show that C([0. (2) V = Rd is an inner product space with d < v.12 Deﬁnition.5 Hilbert spaces A special class of normed spaces are inner product spaces. w) →< v. >) consisting of a vector space V and an inner product < . w >= i=1 vi wi The corresponding norm is v2 . An inner product space is a Hilbert space if it is complete under the norm deﬁned by the inner product.16 Problem.2 . The corresponding norm is v = v.
>) and let v ∈ C. FOUNDATIONS OF MODERN ANALYSIS Inner product spaces have a geometric structure which is very similar to that of Rd endowed with the Euclidean inner product. and therefore requires Hilbert spaces. . In particular. Show that there exists v0 ∈ C such that v − v0  = min{v − w : w ∈ C} Hint: Let α := inf{v − w : w ∈ C} and choose a sequence (wn ) ⊆ C such that v − wn  → α. the notions of orthogonality and of projections are available on inner product spaces.122 CHAPTER 15. .17 Problem. < . (intermediate) Let C be a closed convex subset of an Hilbert space (V. 15. Apply the parallelogram equality to show that (wn ) is a Cauchy sequence. The existence of orthogonal projections depends on completeness..
New York. Berlin: Springer. Bielecki and Marek Rutkowski. . 444 p. Probability theory. Markov chains. 500 p. 1988. [10] Edwin Hewitt and Karl Stromberg. Enlarged and corrected printing. 1990. Prices in ﬁnancial markets. . Credit risk: Moldelling. 207 p. XIX. Undergraduate Texts in Mathematics. 23. xviii. 1999. Berlin: Walter de Gruyter. 230 p. [9] Michael U. . [6] Pierre Brémaud. Springer Finance.Bibliography [1] Heinz Bauer. 2004. Burckel. 1996. 1969. .: Oxford University Press. A modern treatment of the theory of functions of a real variable. Oxford University Press. [7] Pierre Brémaud. de Gruyter Studies in Mathematics. 523 p. 26. Translated from the German by Robert B. 2002. . xv. XV. . Arbitrage Theory in Continuous Time. . 25.Heidelberg . .Berlin: Springer. An introduction to probabilistic modeling. . Dothan. [8] Jean Dieudonné. Transl. Berlin: de Gruyter. NY: Springer. xv. 387 p. Springer Series in Statistics. Graduate Texts in Mathematics. 342 p. 2001. 1975. [3] Tomasz R. valuation and hedging. Real and abstract analysis.Verlag. de Gruyter Studies in Mathematics. 476 p. Texts in Applied Mathematics. Burckel. Martingale dynamics. 1981. 123 . Point processes and queues. X. 354 p. Measure and integration theory. Foundations of modern analysis. xvi. xvi. xviii.Heidelberg Berlin: SpringerVerlag. Monte Carlo simulation. [4] Tomas Bjoerk. 3rd printing. and queues. New York . from the German by Robert B. New York etc. New York etc. Gibbs ﬁelds. New York .: SpringerVerlag. [2] Heinz Bauer. New YorkLondon: Academic Press. [5] Pierre Bremaud.
E. . 2nd ed.: SpringerVerlag. 407 p. [20] Philip E. 2004. xiii. 609 p. . Shreve. xxvii. P. xvi. Graduate Texts in Mathematics. 2nd ed. Volume Two. New York. Stuttgart: Teubner. 1986. Methods of mathematical ﬁnance. 16:295–325. [13] Albrecht Irle. (Finanzmathematik. . Upper Saddle River. Berlin: Springer. Transl. 1998. Paul Wilmott on Quanitative Finance. Hull. Stochastic integration without tears (with apology to P. Probability. Grundlehren der Mathematischen Wissenschaften. Financial mathematics. Revised ed. Graduate Texts in Mathematics. Applications of Mathematics 21. Shiryaev. 302 S. xiv. [16] Ioannis Karatzas and Steven E. 288. 744 p. 2003. 527 p. xv. Limit theorems for stochastic processes. New York etc. [15] Ioannis Karatzas and Steven E. 1995. Wiley Series in Probability and Statistics. 2000. Hunt and J. . Berlin: Springer. Financial derivatives in theory and practice. [12] P. Orlando. Protter.. . xxi. 2nd ed. Shreve. [14] Jean Jacod and Albert N. Meyer). 2004. 2000. Options. futures. . [18] Salih N. Chichester: John Wiley & Sons. Stochastic integration and differential equations. 2000. from the Russian by R. and other derivatives. Applications of Mathematics. 1991.N. Boas. John Wiley and Sons. Paul Wilmott on Quanitative Finance. 5th ed. . John Wiley and Sons. 470 p. 2005. The evaluation of derivatives. FL: Academic Press. A.. Shiryaev. 113. 95. 2003. 2003. Introduction to the mathematics of ﬁnancial derivatives. [19] Philip Protter. Martingale methods in ﬁnancial modelling.124 BIBLIOGRAPHY [11] John C. 636 p. Stochastic Modelling and Applied Probability 36. . [17] Marek Musiela and Marek Rutkowski. Volume One. Teubner Studienbücher Mathematik.J. Berlin: Springer. NY: SpringerVerlag. 2nd ed. [22] Paul Wilmott. Berlin: Springer. 437 p. 2nd ed. 2nd ed. Neftci. xxi. Die Bewertung von Derivaten) 2. überarbeitete und erweiterte Auﬂage. Kennedy. [21] A. Brownian motion and stochastic calculus. Stochastics. PrenticeHall International Editions. . [23] Paul Wilmott. NJ: Prentice Hall. xxiii.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.