Stochastic Calculus

Laura Ballotta
MSc Financial Mathematics
October 2008
0
c Laura Ballotta - Do not reproduce without permission.
2
Table of Contents
1. Review of Measure Theory and Probability Theory
(a) The basic framework: the probability space
(b) Random variables
(c) Conditional expectation
(d) Change of measure
2. Stochastic processes
(a) Some introductory definitions
(b) Classes of processes
3. Brownian motions
(a) The martingale property
(b) Construction of a Brownian motion
(c) The variation process of a Brownian motion
(d) The reflection principle and functionals of a Brownian motion
(e) Correlated Brownian motions
(f) Simulating trajectories of the Brownian motion - part 1
4. Itˆo Integrals and Itˆo Calculus
(a) Motivation
(b) The construction of the Itˆo integral
(c) Itˆo processes and stochastic calculus
(d) Stochastic differential equations
(e) Steady-state distribution
(f) The Brownian bridge and stratified Monte Carlo
5. The Change of Measure for Brownian Motions
(a) Change of probability measure: the martingale problem
(b) PDE detour
(c) Feynman-Kac representation
(d) Martingale representation theorem
REFERENCES 3
References
[1] Grimmett, G. and D Stirzaker (2003). Probability and Random Processes. Oxford
University Press.
[2] Mikosch, T. (2004). Elementary Stochastic Calculus, with Finance in View. World
Scientific Publishing Co Pte Ltd.
[3] Shreve, S. (2004). Stochastic Calculus for Finance II - Continuous-time models.
Springer Finance.
4 REFERENCES
Introduction
This set of lecture notes will take you through the theory of Brownian motions and
stochastic calculus which is required for a sound understanding of modern option pricing
theory and modelling of the term structure of interest rates.
As the theory of stochastic processes has its own special “language”, the first chapter
is devoted to introducing this new notation but also to some revision of the basic concepts
in probability theory required in the following chapter. Particular attention is given to
the conditional expectation operator which is the building block of modern mathematical
finance. This will allow us to introduce the idea of martingale, which underpins the theory
of contingent claim pricing. Once these concepts are clear and well understood, we will
devote the rest of the module to the Brownian motion and the rules of calculus that go
with it. These will be our main “tools” for financial applications, which are explored in
great details in the module “Mathematical Models for Financial Derivatives”.
As the Brownian motion by construction links us to a prespecified distribution of the
increments of the process, we will introduce very briefly a more general class of processes
which can be used in the context of mathematical finance. However, the full investigation
of these processes and their applications will be the focus of the module “Advanced
Stochastic Modelling in Finance” which runs in Term 2.
The material in this booklet covers the entire module; however it is far from being ex-
haustive and students are strongly recommended to do some self-reading. Some references
have been provided in the previous page.
Each chapter contains a number of sample exam questions, some in the form of solved
examples, others in form of exercises for you to practice. Solutions to these exercises will
be posted on CitySpace at some point before the end of term, together with the solutions
to the exam papers that you will find in the very last chapter of this booklet.
Needless to say that waiting for these solutions to become available before
attempting the exercises on your own will not help you much in preparing for
the exam itself. You need to test yourself first!
5
1 Review of Measure Theory and Probability Theory
1.1 The basic framework: the probability space
Imagine a random experiment like the toss of a coin or the prices of securities traded in
the market in the next period of time. Imagine that we want to explore the features of this
random experiment in order to make appropriate and informed decisions. These features
could be: the expected price of the security tomorrow, or its volatility; the characteristics
of the tails of the price distribution (if for example you need to calculate some risk measure
such as VaR, or shortfall expectation).
In order to be able to do all this, we need appropriate tools describing the random
experiment in such a way that we can extract all this information, i.e. we need a mathe-
matical model of the random experiment. This is represented by the so-called probability
space.
Definition 1 (Probability space) We denote the probability space by the triplet
Θ := (Ω, F, P) .
A probability space can be considered as a mathematical model of a random experiment.
This definition is telling us that the probability space is made up of three building
blocks, which we are going to explore one by one.
The first piece of the probability space is Ω, which represents our sample space, i.e.
the set of all possible outcomes of random experiment.
Example 1 Let the random experiment be defined as: choose a number from the unit
interval [0, 1]. Then Ω ={ω : 0 ≤ ω ≤ 1} = [0, 1].
Example 2 Assume now that the random experiment you are interested into is the
evolution of a stock price over an infinite time horizon, when only 2 states of nature can
occur, i.e. up or down. Then Ω = the set of all infinite sequences of ups and downs
= {ω : ω
1
ω
2
ω
3
...} , where ω
n
is the result at the n-th period.
The second piece you need in order to have a probability space is F which is called
σ-algebra. The σ-algebra of a random experiment can be interpreted as the collection of
all possible histories of the random experiment itself. Formally, it is defined as follows.
Definition 2 (σ-algebra) Given a set Ω, a collection F of subsets of Ω is a σ-algebra
if:
1. ∅ ∈ F
2. A ∈ F implies A
c
∈ F
0
c Laura Ballotta - Do not reproduce without permission.
6 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
3. {A
m
} ∈ F implies

¸
m=1
A
m
∈ F (infinite union).
Example 3 1. F = {∅, Ω} is a σ-algebra
2. Consider some event A ⊂ Ω. Then the σ-algebra generated by Ais F = {∅, Ω, A,A
c
}.
3. Consider the sample space defined above for the evolution of the stock price in a
2-state economy, i.e. Ω = the set of infinite sequences of ups and downs, and define
A
U
= {ω : ω
1
= U}
A
D
= {ω : ω
1
= D} .
The σ-algebra generated by these two sets is
F
(1)
= {∅, Ω, A
U
,A
D
} .
Now consider the sets
A
UU
= {ω : ω
1
= U, ω
2
= U}
A
UD
= {ω : ω
1
= U, ω
2
= D}
A
DU
= {ω : ω
1
= D, ω
2
= U}
A
DD
= {ω : ω
1
= D, ω
2
= D} .
Then
F
(2)
= {∅, Ω, A
UU
,A
UD
, A
DU
,A
DD
, A
c
UU
,A
c
UD
, A
c
DU
,A
c
DD
, A
U
,A
D
,
A
UU
¸
A
DU
, A
UU
¸
A
DD
, A
DU
¸
A
UD
, A
UD
¸
A
DD
¸
is the corresponding σ-algebra.
Example 4 The Borel σ-algebra B on R is the σ-algebra generated by open subsets of
R.
Every σ-algebra has a set of properties that will be useful in the future.
Theorem 3 The σ-algebra has the following properties:
1. Ω ∈ F.
2. {A
m
} ∈ F implies

¸
m=1
A
m
∈ F.
1.1 The basic framework: the probability space 7
Proof. 1) ∅ ∈ F by definition, hence ∅
c
= Ω ∈ F by definition as well, (apply properties
1 and 2 from the previous definition).
2) By assumption: {A
m
} ∈ F; hence A
c
m
∈ F which implies that

¸
m=1
A
c
m
∈ F. By the
law of De Morgan (b)
1
:

¸
m=1
A
c
m
=


¸
m=1
A
m

c
,
therefore


¸
m=1
A
m

c
∈ F.
From the definition of σ-algebra, it follows that
¸

¸
m=1
A
m

c

c
∈ F and consequently


m=1
A
m
∈ F.
The last piece of our probability space is represented by the symbol P. This is called
probability measure, and you can consider it as a sort of “metrics”, that measures the
likelihood of a specific event or story of the random experiment.
Definition 4 A probability measure P is a set function P : F →[0, 1] such that:
1. P(Ω) = 1
2. For any sequence of disjoint events {A
m
} , P(
¸

m
A
m
) =
¸

m=1
P(A
m
).
Based on this definition, you can show that
P(∅) = 0;
P

A
¸
B

= P(A) +P(B) ;
P(A
c
) = 1 −P(A) .
Moreover, we can define independent events: two events, A and B, are independent if and
only if P(A
¸
B) = P(A) P(B).
Example 5 Consider the previous example of the evolution of the stock price over
an infinite time horizon, so that Ω = {ω : ω
1
ω
2
ω
3
...}, and A
U
= {ω : ω
1
= U}, A
D
=
{ω : ω
1
= D}. Assume that the different up/down movements at each time step are in-
dependent, and let
P(A
U
) = p; P(A
D
) = q = 1 −p.
1
Proposition (Law of De Morgan) (a) (A ∪ B)
c
= A
c
∩B
c
. More in general: (∪
m
A
m
)
c
= ∩
m
A
c
m
.
(b) (A ∩ B)
c
= A
c
∪ B
c
. Generalising: (∩A
m
)
c
= ∪A
c
m
.
Proof. (a) Assume x ∈ ∩

m=1
A
c
m
. Then x ∈ A
c
m
∀m. Hence x / ∈ A
m
∀m, which implies x / ∈


m=1
A
m
. Therefore x ∈ (∪

m=1
A
m
)
c
.
(b) Assume x ∈ ∪

m=1
A
c
m
; then x ∈ A
c
m
for some m. Hence x / ∈ A
m
for the same m. Therefore
x / ∈ ∩

m=1
A
m
and hence x ∈ (∩

m=1
A
m
)
c
. The other direction of the statement can be proved in a
similar fashion.
8 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
Then
P(A
UU
) = p
2
; P(A
UD
) = P(A
DU
) = pq; P(A
DD
) = q
2
.
Further, P(A
c
UU
) = 1 − p
2
; similarly, you can calculate the probability of each other
set in F
(2)
. Moreover, if A
UUU
= {ω : ω
1
= U, ω
2
= U, ω
3
= U}, you can calculate that
P(A
UUU
) = p
3
. And so on. Hence, in the limit you can conclude that the probability
of the sequence UUU... is zero. The same applies for example to the sequence UDUD...;
in fact this sequence is the intersection of the sequences U, UD, UDU, .... From this
example, we can conclude that every single sequence in Ω has probability zero.
In the previous example, we have shown that
P(every movement is up) = 0;
this implies that this event is sure not to happen. Similarly, since the above is true, we are
sure to get at least one down movement in the sequence, although we do not know exactly
when in the sequence. Because of this fact, and the fact that the infinite sequence UUU...
is in the sample space (which means that still is a possible outcome), mathematicians have
come up with a somehow strange way of saying: we will get at least one down movement
almost surely.
Definition 5 Let (Ω, F, P) be a probability space. If A ⊂ F is such that
P(A) = 1,
we say that the event A occurs almost surely (a.s.).
Now, in order to introduce the next definition, consider the following, maybe a little
silly, example. Assume that you want to measure the length of a room, and assume you
express this measure in meters and centimeters. It turns out that the room is 4.30m.
long. Now assume that you want to change the reference system and express the length
of the room in terms of feet and inches. Then, the room is 14ft. long. But in the process
of switching from one reference system to the other, the room did not change: it did not
shrink; it did not expand. The same applies to events and probability measures. The idea
is given in the following.
Definition 6 (Absolutely continuous/equivalent probability measure) Given two
probability measures P and P

defined on the same σ-algebra F, then:
i) P is absolutely continuous with respect to P

, i.e. P << P

, if P(A) = 0 whenever,
P

(A) = 0∀A ∈ F.
ii) If P << P

and also P

<< P, then P ∼ P

, i.e. P and P

are equivalent measures.
Thus, for P ∼ P

the following are equivalent:
• P(A) = 0 ⇔P

(A) = 0 (same null sets)
• P(A) = 1 ⇔P

(A) = 1 (same a.s. sets)
1.1 The basic framework: the probability space 9
• P(A) > 0 ⇔P

(A) > 0 (same sets of positive measures)
Example 6 Consider a closed interval [a, b], for 0 ≤ a ≤ b ≤ 1 and consider the experi-
ment of choosing a number from this interval. Define the following
P(the number chosen is in [a, b]) = P[a, b] := b −a.
But you can also define a different metrics P

, according to which
P

(the number chosen is in [a, b]) = P

[a, b] := b
2
−a
2
.
As there is a conversion factor that helps you to switch between meters and feet, so that
4.30m = 14ft, there is also a conversion factor between probability measures. However,
this conversion factor depends on few objects that we have not met yet. Therefore, the
discussion of this last feature is postponed to the end of this unit.
Exercise 1 Let A and B belong to some σ-algebra F. Show that F contains the sets
A
¸
B, A\B, and A∆B, where ∆ denotes the symmetric difference operator, i.e.
A∆B = {x : x ∈ A, x / ∈ B or x / ∈ A, x ∈ B} .
Exercise 2 Show that for every function f : Ω −→R the following hold:
1. f
−1
(
¸
n
A
n
) =
¸
n
f
−1
(A
n
);
2. f
−1
(
¸
n
A
n
) =
¸
n
f
−1
(A
n
);
3. f
−1

A
C

= (f
−1
(A))
C
for any subsets A
n
, A of R.
Exercise 3 Let F be a σ-algebra of subsets of Ω and suppose that B ∈ F. Show that
G = {A
¸
B : A ∈ F} is a σ-algebra of subsets of B.
Exercise 4 Let P be a probability measure on F. Show that P has the following prop-
erties:
1. for any A, B ∈ F such that A
¸
B = ∅, P(A
¸
B) = P(A) +P(B);
2. for any A, B ∈ F such that A ⊂ B, P(A) ≤ P(B) [Hint: use the fact that for any
two sets A and B such that A ⊂ B, B = A
¸
(B\A) , where we define B\A :=
{x : x ∈ B, x / ∈ A}, (difference operator for sets]
3. for any A, B ∈ F such that A ⊂ B, P(B\A) = P(B) −P(A)
10 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
1.2 Random variables
So far, we have considered random events, like and up or down movement in the stock price
over the next period of time, and the likelihood of such events to occur, as described by
the probability measure. The next step in which you might be interested is to “quantify”
the outcome of the random event, for example you might want to know how much the
stock price is going to change if an up or down movement is going to occur in the next
time period. In order to do this, you need the idea of random variable.
Definition 7 (Random variable) Let (Ω, F, P) be a probability space. A random vari-
able X is a function X : Ω →R such that {ω ∈ Ω|X (ω) ≤ x} ∈ F ∀x ∈ R.
Note that if B is any subset of the Borel σ-algebra B, i.e. B is a set of the form B =
(−∞, x] ∀x ∈ R, then Definition 7 implies that X
−1
(B) ∈ F ∀x ∈ R. In other words,
any random variable is a measurable function
2
, i.e. a numerical quantity whose value is
determined by the random experiment of choosing some ω ∈ Ω.
Example 7 Consider once again the random experiment of the evolution of the stock
price over an infinite time horizon in a 2-state economy, described in Example 3. Let us
define the stock prices by the formulae:
S
0
(ω) = 4;
S
1
(ω) =

8 if ω
1
= up
2 if ω
1
= down
S
2
(ω) =

16 if ω
1
= ω
2
= up
4 if ω
1
= ω
2
1 if ω
1
= ω
2
= down.
All of these are random variables, assigning a numerical value to each sequence of up
and down movements in the stock price at each time period. Example 5 tells us how to
calculate the probability that the random variable S takes any of these values; for example
P(S
1
(ω) = 8) = P(A
U
) = p;
P(S
2
(ω) = 4) = P

A
DU
¸
A
UD

= 2pq.
The above Example shows that we can associate to any random variable another
function measuring the likelihood of the outcomes. This is what we call the law of X.
Precisely, by law of X we mean a probability measure on (R, B), L
X
: B → [0, 1] such
that
L
X
(B) = P(X ∈ B) ∀B ⊂ B.
2
Definition (Measurable function) Let F be a σ-algebra on Ω and f : Ω →R. For A ∈ R let
f
−1
(A) = {ω ∈ Ω|f (ω) ∈ A} ;
then, f is called F-measurable if f
−1
(E) ∈ F ∀E ∈ B, where f
−1
(E) is called the pre-image of E.
1.2 Random variables 11
In general, we prefer to speak in terms of distribution of a random variable; this is a
function F
X
: R →[0, 1] defined as
F
X
(a) = P(X ≤ a) = P(ω : X (ω) ≤ a) .
This is the law of X for any set B of the form B = (−∞, a], i.e. F
X
(a) = L
X
(−∞, a].
In some special cases, we can describe the distribution function of a random variable X
in even more details. The first case is the case of a discrete random variable, like the
one introduced in Example 7, which assigns lumps of mass to events. For this random
variable, we can express the distribution function as
F
X
(a) = P(X ≤ a) =
¸
X≤a
p
X
(x) ,
where p
X
(x) is the probability mass function of X. If instead the random variable X
spreads the mass continuously over the real line, then we have a continuous random
variable and
F
X
(a) = P(X ≤ a) =

a
−∞
f
X
(x) dx, (1)
where f (x) denotes the density function of X.
Exercise 5 Let X be a random variable. Show that the distribution F
X
of X defined by
F
X
(A) = P(X ∈ A) = P

X
−1
(A)

, A ∈ B(R) ,
is a probability measure on the σ-algebra B (R).
Remark 1 (A matter of notation) From equation (1), we see that we could write the
density function as
f
X
(x) =
dF
X
dx
=
dP(ω)
dx
∀x ∈ R.
The expectation E of a random variable X on (Ω, F, P) is then defined by:
E[X] =


X (ω) dP(ω) =

−∞
xdF
X
(x) .
The expectation returns the mean of the distribution; you might be interested in
the dispersion around the mean, this feature is described by the variance of a random
variable. Further features that characterize the distribution of a random variable are
the skewness (degree of asymmetry) and the kurtosis (behaviour of the tails). These
features are described by the moments (from the mean) of a random variable which can
be recovered via the moment generating function (MGF)
M
X
(k) = E

e
kX

=

−∞
e
kx
dF
X
(x) .
12 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
Example 8 Few (and very important, as we will use them throughout the entire year)
examples of random variables:
1. The Poisson random variable is an example of discrete random variable. More
precisely, a Poisson random variable N ∼ Poi(λ), with rate λ has probability mass
p
N
(n) =
e
−λ
λ
n
n!
from which it follows that
E(N) = λ = Var(N); M
N
(k) = e
λ(e
k
−1)
.
2. The normal (or Gaussian) random variable X ∼ N (µ, σ
2
) is a continuous random
variable defined by the density function
f
X
(x) =
e

(x−µ)
2

2
σ


.
You can easily show that
E(X) = µ; Var(X) = σ
2
; M
X
(k) = e
kµ+
k
2
σ
2
2
3. Assume X ∼ Γ(α, λ), α > 0. Then X is a non-negative random variable which
follows a Gamma distribution; its density function is given by
f (x) =
1
Γ(α)
λ
α
x
α−1
e
−λx
,
where Γ(α) is the Gamma function, which is defined as
Γ(α) =


0
x
α−1
e
−x
dx,
and has the property that
3
Γ(α) = Γ(α −1) (α −1) .
This means that
Γ(α) = (α −1)!
where α is a positive integer. The MGF of X is
M
X
(k) =
1
Γ(α)
λ
α


0
x
α−1
e
−x(λ−k)
dx.
3
Why don’t you try to prove this last property... just integrate by parts.
1.2 Random variables 13
Set y = x(λ −k), then
M
X
(k) =
1
Γ(α)
λ
α


0

y
λ −k

α−1
e
−y
λ −k
dy =

λ
λ −k

α
.
Note that if α = 1, then X follows an exponential distribution with rate λ. Using
the MGF you can show that the Gamma random variable has mean µ = α/λ and
variance ν = α/λ
2
. The parameter α is the shape parameter, whilst λ is the scale
parameter.
Moment generating functions suffer the disadvantage that the integrals which define
them may not always be finite.
Example 9 A Cauchy random variable X has density function
f(x) =
1
π(1 +x
2
)
x ∈ R.
Hence the MGF of X is given by
M
X
(k) =


−∞
e
kx
π(1 +x
2
)
.
This is an improper integral of the 1
st
kind which does not converge unless k = 0 (which
of course is a nonsense...) In fact, if you perform the convergence test, you obtain that:
lim
x→∞
e
kx
π(1+x
2
)
(
1
x
)
α
α=2
=
1
π
lim
x→∞
e
kx
=

0 if k < 0
∞ if k > 0,
lim
x→−∞
e
kx
π(1+x
2
)
(
1
x
)
α
α=2
=
1
π
lim
x→−∞
e
kx
=

0 if k > 0
∞ if k < 0.
Hence, the MGF of a Cauchy random variable does not exist.
Characteristic functions are another class of functions equally useful and whose finitiness
is guaranteed.
Definition 8 The characteristic function of X is the function φ
X
: R →C defined by
φ
X
(u) = E

e
iuX

where i =

−1.
This is a common transformation and is often called the Fourier transform of the
density f of X if this quantity exists. In this case
φ
X
(u) =

e
iux
dF(x) =

e
iux
f(x)dx.
14 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
The characteristic function of a random variable has several nice properties. Firstly
it always exists and it is finite (in L
1
): note that
φ
X
(u) = E

e
iuX

= E(cos (uX) +i sin (uX)) ,
hence
4
|cos (uX) +i sin (uX)| :=

cos (uX)
2
+ sin (uX)
2
= 1.
Then

E

e
iuX

≤ E

e
iuX

= 1.
Moreover:
1. if X and Y are independent random variables, φ
X+Y
(u) = φ
X
(u) φ
Y
(u) ;
2. if a, b ∈ R and Y = aX +b, then φ
Y
(u) = e
iub
φ
X
(au).
1.2.1 Examples of characteristic functions
Calculations of integrals involving complex numbers are not always pleasant; usually you
should know about contour integration... but for our purposes you can get away with
only knowing about analytic continuation.
Analytic continuation provides a way of extending the domain over which a complex
function is defined. Let us start from a complex function f (like the characteristic
function); this function is complex differentiable at z
0
and has derivative A if and only if
f (z) = f (z
0
) +A(z −z
0
) +o (z −z
0
) , ∀z ∈ C.
A complex function is said to be analytic on a region D if it is complex differentiable at
every point in D (i.e. has no singularities, i.e. points at which the function “blows up”
or becomes degenerate). Now, let f
1
and f
2
be analytic functions on domains D
1
and D
2
respectively, with D
1
⊂ D
2
, such that f
1
= f
2
on D
1
¸
D
2
. Then f
2
is called the analytic
continuation of f
1
to D
2
. Moreover, if it exists, the analytic continuation of f
1
to D
2
is
unique.
Consider now the MGF M
X
of some random variable X; we can say that the function
M
X
(z) =


−∞
f (x) e
zx
dx z ∈ C
is the analytic continuation of M
X
to the complex plane, if it respects the condition above.
Then, the characteristic function of X, φ
X
, is the restriction of M
X
to the imaginary axis,
i.e.
φ
X
(u) = M
X
(iu)
And now, let’s calculate some characteristic functions.
4
Note that this is the complex square of the complex number z = cos (uX) +i sin(uX), and you can
interpret the notation as a norm.
1.3 Conditional expectation 15
1. Let X ∼ N (0, 1). The characteristic function is
φ
X
(u) =
1


−∞
e
iux−
x
2
2
dx.
Now consider the real valued function
M
X
(k) =
1


−∞
e
kx−
x
2
2
dx = e
k
2
2
,
i.e. the MGF of X. Since R
¸
C =∅, then M
X
has analytic continuation on the
complex plane given by
M
X
(z) =
1


−∞
e
zx−
x
2
2
dx = e
z
2
2
z ∈ C.
Therefore, by analytic continuation
φ
X
(u) = M
X
(iu) = e

u
2
2
.
2. Let X be a Poisson random variable with rate u. You can apply the same argument
as above (i.e. analytic continuation) to show that
φ
X
(u) = M
X
(iu) = e
λ(e
iu
−1)
.
3. Consider now the Gamma distribution. Analytic continuation implies that
φ
X
(u) =

λ
λ −iu

α
.
4. Assume X is a Cauchy random variable, i.e.
f (x) =
1
π (1 +x
2
)
.
We cannot use the analytic continuation argument because the function is not ana-
lytic (can you spot why?). Here you need to use contour integration and the residue
theorem. You should obtain that
φ
X
(u) = e
−|u|
.
1.3 Conditional expectation
At the beginning of this Unit, we talked about the problem of setting up a mathematical
model of a random experiment, in order to support our decision process. Specifically, we
talked about informed decisions, and we have seen that information in the probability
16 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
space is captured by the σ-algebra. Then, in the previous section, we have seen how to
quantify a random event by using random variables.
Now, consider as always that some random experiment is performed, whose outcome is
some ω ∈ Ω. Imagine that we are given some information, G, about this possible outcome,
not enough to know the precise value of ω, but enough to narrow down the possibilities.
Then, we can use this information to estimate, although not precisely, the value of the
random variable X (ω). Such an estimate is represented by the conditional expectation of
X given G.
In order to understand the definition of conditional expectation, we need to familiarize
first with the indicator function. Precisely, we use the notation 1
A
for
1
A
(ω) =

1 if ω ∈ A
0 otherwise
Hence 1
A
is a random variable which follows a Bernoulli distribution, taking values 1 with
probability P(A), and 0 with probability P(A
c
). Hence E[1
A
] = P(A). Properties of the
indicator function are listed below.
1. 1
A
+ 1
A
C = 1
A∪A
c = 1

= 1;
2. 1
A∩B
= 1
A
1
B
.
Now, we are ready for the following.
Definition 9 (Axiomatic definition-Kolmogorov) Let (Ω, F, P) be a probability space
and X a random variable with E|X| < ∞. Let G be a sub σ-algebra of F. Then the ran-
dom variable Y = E[X|G] is the conditional expectation of X with respect to G if:
1. Y is G-measurable (Y ∈ G).
2. E|Y | < ∞
3. ∀A ∈ G : E(Y 1
A
) = E(X1
A
) , i.e.

A
Y dP =

A
XdP.
The idea is that, if X and G are somehow connected, we can expect the information
contained in G to reduce our uncertainty about X. In other words, we can better predict
X with the help of G. In fact, Definition 9 is telling us that, although the estimate
of X based on G is itself a random variable, the value of the estimate E[X|G] can be
determined from the information in G (property 1). Further, Y is an unbiased estimator
of X (property 3 with A = Ω).
Example 10 Consider once again the stock price evolution described in Example 7.
Suppose you are told that the outcome of the first stock price movement is “up”. You
can now use this information to estimate the value of S
2
E[S
2
(ω) |up] = 12p + 4.
1.3 Conditional expectation 17
In this case, G = A
U
. Similarly,
E[S
2
(ω) |down] = 3p + 1,
and G = A
D
. Question: what is
E[S
2
(ω) |G = A
UD
]?
Theorem 10 The conditional expectation has the following properties:
1. E[E(X|G)] = E[X] , i.e. E[Y ] = E[X].
2. If G = {∅, Ω} (smallest σ-algebra),E[X|G] = E[X].
3. If G = F, E[X|G] = X.
4. If X ∈ G, E[X|G] = X
5. If Z ∈ G, then E[ZX|G] = ZE[X |G] = ZY
6. Let G
0
⊂ G, E[E(X |G) |G
0
] = E[X|G
0
] .
7. Let G
0
⊂ G, E[E(X|G
0
) |G] = E[X |G
0
] .
8. If X is independent of G, then E[X |G] = E[X]
Proof. One by one:
1. Check point 3 in the previous definition for A = Ω (remember that Ω ∈ G ...):
E[Y 1

] = E[X1

] but 1

= 1.
2. Check point 3 in the axiomatic definition. ForA = ∅, we have


Y dP =


XdP = 0
For A = Ω
E[X1

] = E[X]
E[E(X) 1

] = E[X]
in virtue of property 1. Hence both sides return E[X].
3. Verify the definition of conditional expectation on X for G = F:
• X ∈ F because it is F-measurable by definition of random variable.
• E|X| < ∞ by assumption (axiomatic definition).
• E(Y 1
A
) = E(X1
A
) ∀A ∈ G.
18 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
In this case you have available the entire “history” of X. Hence you know everything
and therefore there is no uncertainty left.
4. If X ∈ G, then we go back to the same situation as depicted in (3).
5. We prove this property for the simple case of an indicator function; hence, assume
Z = 1
B
for some B ∈ G; then condition 3 in the definition of conditional expectation
reads:
∀A ∈ G E(ZX1
A
) = E(X1
A
1
B
) = E(X1
A∩B
) .
But ∀A∩ B ∈ G, condition 3 implies
E(X1
A∩B
) = E(Y 1
A∩B
) = E(Y 1
A
1
B
) = E(ZY 1
A
) .
The extension to the case of a more general random variable relies on the construc-
tion of a random variable as the limit of the sum of indicator functions. However,
this is out of the grasp of this unit.
6. Let Y = E[X|G] and Z = E[X|G
0
]. If A ∈ G
0
, then E(Z1
A
) = E(X1
A
), but
since G
0
⊂ G, A ∈ G as well, and by definition E(Y 1
A
) = E(X1
A
). Therefore
E(Z1
A
) = E(Y 1
A
) ∀A ∈ G
0
.
7. Let Z = E[X|G
0
], then Z ∈ G
0
. Since G
0
⊂ G, it follows that Z ∈ G. Therefore
E[Z|G] = Z.
8. ∀A ∈ G : E(X1
A
) = E(X) E(1
A
) = E[E(X) 1
A
] .
Exercise 6 Let X
1
, X
2
, ... be identically distributed random variables with mean µ, and
let N be a random variable taking values in the non-negative integers and independent
of the X
i
. Let S = X
1
+ X
2
+ ... + X
N
. Show that E(S| N) = µN and deduce that
E(S) = µE(N).
Exercise 7 We define the conditional variance of a random variable X given a σ-algebra
F by
V ar(X|F) = E[(X −E(X|F))
2
|F].
Show that
V ar (X) = E[V ar(X|F)] +V ar [E(X|F)] .
1.4 Change of measure
Let us go back to the example of measuring the length of a room and of wishing to do
this using different references. If you want to convert meters in feet, you need a “bridge”
between the two (1 ft = 0.30 meters). There is something equivalent to this also for
probability measures and it is defined as follows.
1.4 Change of measure 19
Theorem 11 (Radon-Nikod´ ym) If P and P

are two probability measures on (Ω, F)
such that P ∼ P

, then there exists a random variable Y ∈ F such that
P

(A) =

A
Y dP = E[Y 1
A
] , ∀A ∈ F. (2)
Y is called the Radon-Nikod´ym derivative of P

with respect to P and is also written as
Y =
dP

dP
Remark 2 From the discussion in Section 1.1, it should be obvious by now that Y is not
a proper derivative but more something like a likelihood ratio.
Example 11 Consider Example 6. Here we defined two metrics on the interval [a, b],
0 ≤ a ≤ b ≤ 1:
P(the number chosen is in [a, b]) = P[a, b] := b −a
P

(the number chosen is in [a, b]) = P

[a, b] := b
2
−a
2
.
We could be more specific and say that
P[a, b] =

b
a
dω =

[a,b]
dP(ω) ;
P

[a, b] =

b
a
2ωdω =

[a,b]
2ωdP(ω) .
The last equation is (2) with Y (ω) = 2ω.
Exercise 8 Consider the usual probability space (Ω, F, P) and a standard normal random
variable X, i.e. X ∼ N (0, 1). Define a new random variable Y as Y = X + θ, and let
ˆ
P(A) be another probability measure on Ω, defined by
d
ˆ
P
dP
= Z,
where
Z = e
−θx−
θ
2
2
.
Show that Y ∼ N (0, 1) on

Ω, F,
ˆ
P

.
Note that for any random variable X,
E

[X] =

XdP

=

XY dP = E[XY ] .
20 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY
Theorem 12 (Bayes formula) Let P and P

be two equivalent probability measures on
the same measurable space (Ω, F) and let
Y =
dP

dP
be the Radon-Nikod´ym derivative of P

with respect to P. Furthermore, let X be a random
variable on (Ω, F, P

) such that E

|X| < ∞ and G ∈ F a sub σ-algebra of F. Then the
following generalised version of the Bayes formula holds:
E

[X|G] =
E[XY |G]
E[Y |G]
.
Proof. Let Z = E

[X |G]. By definition: Z ∈ G, E

|Z| < ∞ and E

(Z1
A
) =
E

(X1
A
) ∀A ∈ G. Hence

A
ZdP

=

A
XdP

A
ZY dP =

A
XY dP
⇔E(ZY 1
A
) = E(XY 1
A
)
Now
E(XY 1
A
) = E[E(XY |G) 1
A
] ;
E(ZY 1
A
) = E[E(ZY |G) 1
A
] , ∀A ∈ G.
Then
E[(E(ZY |G) −E(XY |G)) 1
A
] ,
which implies that E(ZY |G) = E(XY |G). Since Z ∈ G, E(XY |G) = E(Y |G) Z.
We will use this rule to link expectations calculated in a particular “universe” to the
ones calculated in another universe.
1.5 Some more exercises
1. a) Formally define the components of any probability space Θ = (Ω, F,P) .
b) Let Ω = {1, 2, 3, 4, 5} and let U be the collection
U = {{1, 2, 3} , {3, 4, 5}} .
Find the smallest σ-algebra F (U) generated by U.
c) Define X : Ω →R by
X (1) = X (2) = 0; X (3) = 10; X (4) = X (5) = 1.
Define the condition of F-measurability for X. Check if X is measurable with
repsect to F (U).
21
d) Define Y : Ω →R by
Y (1) = 0; Y (2) = Y (3) = Y (4) = Y (5) = 1.
Find the σ-algebra F (Y ) generated by Y and show that Y is F (Y )-measurable.
2. Let X be a non-negative random variable defined on a probability space (Ω, F, P)
with exponential distribution, which is
P(X ≤ x) = F
X
(x) = 1 −e
−λx
, x ≥ 0,
where λ is a positive constant. Let
˜
λ be another positive constant, and define
Z =
˜
λ
λ
e
−(
˜
λ−λ)X
.
Define
˜
P by
˜
P(A) =

A
ZdP for all A ∈ F.
(a) Show that
˜
P(Ω) = 1.
(b) Compute the cumulative distribution function
˜
P(X ≤ x) for x ≥ 0
for the random variable X under the probability measure
˜
P.
A Set theory: quick reminder
For further references, you can look at Grimmett and Stirzaker, and Schaum (Chapter
2).
A.1 Sets, elements and subsets
• a ∈ A: stays for “ a is an element of set S”;
• if a ∈ A implies (⇒, in short) a ∈ B, then A is a subset of B, or A ⊆ B, which is
read “ A is contained in B”;
• A = B ⇐⇒ (read: “if and only if”) A ⊆ B and B ⊆ A;
• Negations:
a / ∈ A; A B; A = B
• If A ⊆ B and A = B, then A ⊂ B (proper subset)
• An example: let A = {1, 3, 5, 7, 9}; B = {1, 2, 3, 4, 5}; C = {3, 5}
22 A SET THEORY: QUICK REMINDER
• C ⊂ A
• C ⊂ B
• A B
• B A
• Sets can be specified in
– tabular form (roster method): A = {1, 3, 5, 7, 9}
– set-builder form (property method): B = {x : x is an even integer, x > 0}
• Special sets:
– Universal set U
– Empty set ∅: S = {x : x is a positive integer, x
2
= 3} = ∅
A.2 Union and intersection
• Union of A and B: set of all elements which belong either to A, B, or both:
A
¸
B := {x : x ∈ A or x ∈ B}
• Intersection of A and B: set of all elements which belong to both A and B:
A
¸
B := {x : x ∈ A and x ∈ B}
• If A
¸
B = ∅, then A and B are disjoint.
• If A ⊆ B, then
A
¸
B = B
A
¸
B = A
A.2.1 Properties
• A
¸
∅ = A; A
¸
∅ = ∅
• If A ⊆ U, A
¸
U = U and A
¸
U = A
• Commutative Law
A
¸
B = B
¸
A
A
¸
B = B
¸
A
A.3 Complements and difference 23
• Associative Law

A
¸
B

¸
C = A
¸

B
¸
C

A
¸
B

¸
C = A
¸

B
¸
C

• Distributive Law
A
¸

B
¸
C

=

A
¸
B

¸

A
¸
C

A
¸

B
¸
C

=

A
¸
B

¸

A
¸
C

• Idempotent Law
A
¸
A = A
A
¸
A = A
A.3 Complements and difference
• The set (absolute) complement of A is defined as
A
C
= {x : x ∈ U, x / ∈ A}
i.e. the set of elements which do not belong to A;
• The set relative complement of B with respect to A (or difference of A and B) is
defined as
A\B = {x : x ∈ A, x / ∈ B}
! Note that
A\B = A
¸
B
C
A\

B
¸
C

= (A\B)
¸
(A\C)
A\

B
¸
C

= (A\B)
¸
(A\C)
Example 12 Let
U = {1, 2, 3, 4, 5, ...}
A = {1, 2, 3}
B = {3, 4, 5, 6, 7}
then
A
c
= {4, 5, 6, ...}
A\B = {1, 2}
Note:
• A
¸
A
c
= U
• A
¸
A
c
= ∅
24 B MODES OF CONVERGENCE OF A RANDOM VARIABLE
A.3.1 Properties
• (A
c
)
c
= A
• if A ⊂ B, then B
c
⊂ A
c
• De Morgan Laws
– (A
¸
B)
c
= A
c
¸
B
c
– (A
¸
B)
c
= A
c
¸
B
c
A.4 Further definitions
• A×B := {(x, y) : x ∈ A, y ∈ B} is the Cartesian product of A and B
• A is finite if it is empty or if it consists of exactly n elements, where n is a positive
integer;
• Otherwise A is infinite;
• A is countable if it is finite or if its elements can be listed in the form of a sequence
(countable infinite)
• Otherwise A is uncountable
Example 13 • A = {letters of the English alphabet}
• D = {days of the week}
• R = {x : x is a river on Earth}
• Y = {x : x is a positive integer, x is even} = {2, 4, 6, 8, ...}
• I = {x : 0 ≤ x ≤ 1}
B Modes of convergence of a random variable
Let {X
m
}
m∈N
be a sequence of random variables, and let X be another random variable.
Then:
• ALMOST SURE CONVERGENCE: X
m
a.s
→ X if, ∀ε > 0, the event
{ω ∈ Ω :X
m
(ω) →X (ω) as m →∞} has probability 1.
• CONVERGENCE IN PROBABILITY: X
m
P
→X if, ∀ε > 0
lim
m→∞
P(|X
m
−X| > ε) = 0.
B.1 Further convergences 25
• CONVERGENCE IN L
p
(in L
p
mean): X
m
L
?p
→ X if
lim
m→∞
E(|X
m
−X|
p
) = 0
• CONVERGENCE IN DISTRIBUTION: X
m
D
→ X
lim
m→∞
P(X
m
≤ x) = P(X ≤ x) ∀x ∈ R.
B.1 Further convergences
• MONOTONE CONVERGENCE: if 0 ≤ X
m
↑ X a.s., then E(X
m
) ↑ E(X) < ∞,
or equivalently, lim
m→∞
E(X
m
) = E(lim
m→∞
X
m
) = E(X) , as X = lim
m→∞
X
m
.
• DOMINATED CONVERGENCE: for X
m
→X a.s., if |X
m
| ≤ Y (ω) with E(Y ) <
∞, then
E(|X
m
−X|) →0
In other words E(X
m
) ↑ E(X) or lim
m→∞
E(X
m
) = E(X).
• BOUNDED CONVERGENCE THEOREM: for X
m
→X a.s., if |X
m
| ≤ K
E(|X
m
−X|) →0
implied by dominated convergence.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.