# STOCHASTIC ANALYSIS NOTES

Ivan F Wilde

Chapter 1 Introduction

We know from elementary probability theory that probabilities of disjoint events “add up”, that is, if A and B are events with A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B). For inﬁnitely-many events the situation is a little more subtle. For example, suppose that X is a random variable with uniform distribution on the interval (0, 1). Then { X ∈ (0, 1) } = s∈(0,1) { X = s } and P (X ∈ (0, 1)) = 1 but P (X = s) = 0 for every s ∈ (0, 1). So the probabilities on the right hand side do not “add up” to that on the left hand side. A satisfactory theory must be able to cope with this. Continuing with this uniform distribution example, given an arbitrary subset S ⊂ (0, 1), we might wish to know the value of P (X ∈ S). This seems a reasonable request but can we be sure that there is an answer, even in principle? We will consider the following similar question. Does it make sense to talk about the length of every subset of R ? More precisely, does there exist a “length” or “measure” m deﬁned on all subsets of R such that the following hold? 1. m(A) ≥ 0 for all A ⊂ R and m(∅) = 0. 2. If A1 , A2 , . . . is any sequence of pairwise disjoint subsets of R, then m n An = n m(An ). 3. If I is an interval [a, b], then m(I) = ℓ(I) = b − a, the “length” of I. 4. m(A + a) = m(A) for any A ⊂ R and a ∈ R (translation invariance). The answer to this question is “no”, it is not possible. This is a famous “no-go” result. Proof. To see this, we ﬁrst construct a special collection of subsets of [0, 1) deﬁned via the following equivalence relation. For x, y ∈ [0, 1), we say that x ∼ y if and only if x − y ∈ Q. Evidently this is an equivalence relation on [0, 1) and so [0, 1) is partitioned into a family of equivalence classes. 1

Indeed. 1). Theorem 1. Claim. 1) simply by subtracting 1 from each such element.1) Aq = [0.all the fuss with σ-algebras and so on really is necessary if we want to develop a robust (and rigorous) theory. No other equivalence class can contain a rational. In other words. q∈Q∩[0. Choose one point from each equivalence class to form the set A0 . Case 1. 1). The translation Bq 0 q q q q invariance of m implies that m(Aq ) = m(A0 ). We have seen that the notion of length simply cannot be assigned to every subset of R. if a class contains one rational. Finally. Then there is a ∈ A0 such that x ∼ a. Then no two elements of A0 are equivalent and every equivalence class has a representative which belongs to A0 . we construct a subset Aq of [0. then Ar ∩ As = ∅. Hence x = a + r − 1 so that x ∈ Ar . Then x = a + q so that x ∈ Aq . x < a. 1)) = m( q∈Q∩[0. 1) as follows. then we see that A = (B ′ + q) ∪ (B ′′ + q − 1). it follows that 2 > 1 + x > a. Proof. 2) back to [0. let x ∈ [0. Then there is α ∈ A0 such that x = α + r − [α + r] and β ∈ A0 such that x = β + s − [β + s].1) Aq ) = q∈Q∩[0. We have m([0. 1)) = 1. Then 0 < r < 1 (because x < a) and r ∈ Q. x ≥ a. 1). 1). The following result is a further very dramatic indication that we may not be able to do everything that we might like. Let Aq = { x + q − [x + q] : x ∈ A0 } where [t] denotes the integer part of the real number t. It follows that α ∼ β which is not possible unless α = β which would mean that r = s. 1)) = ℓ([0. Aq is got from A0 + q by translating that portion of A0 + q which lies in the interval [1. say. If r. then it must contain all the rationals which are in [0. s ∈ Q ∩ [0. 1). But m([0. Put r = 1 + x − a. Moral . we need only show that the right hand side is a subset of the left hand side. that is x − a ∈ Q. Suppose that x ∈ Ar ∩ As . For each q ∈ Q ∩ [0. It is possible to cut-up a sphere of radius one (in R3 ) into a ﬁnite number of pieces and then reassemble these pieces (via standard Euclidean motions in R3 ) into a sphere of radius 2. ′ ′′ ′ If we write A0 = Bq ∪ Bq where Bq = { x ∈ A0 : x + q < 1 } and ′′ = A \ B ′ . in general. 1). we get our contradiction. Put q = x − a ∈ Q ∩ [0. Claim. To establish this. The claim is proved. 1) and a ∈ [0. Proof. Since both x ∈ [0. 1) with r = s. This proves the claim. We might therefore expect that it is not possible to assign a probability to every subset of the sample space. 1).1 (Banach-Tarski). Case 2. Since Aq ⊂ [0. 1).2 Chapter 1 One such is Q ∩ [0.1) m(Aq ) which is not possible since m(Aq ) = m(A0 ) for all q ∈ Q ∩ [0. ifwilde Notes . 1).

as required. Proposition 1. Proof. Suppose that (An ) is a sequence of events in a probability space (Ω. We note that P (Ac ) = 0 and so n P( because P ( P( as required. Then P ( n An ) = 1. Lemma 1.3 (Borel-Cantelli). Σ. Now n=N P (An inﬁnitely-often) = P ( But for any m P( m k=N n k≥n Ak m c n=1 An n An c n An ) = lim P ( m→∞ m c n=1 An )=0 )≤ m c n=1 P (An ) = 0 for every m. { An inﬁnitely-often } = n k≥n Ak and so { An inﬁnitely-often } really is an event. P ). it belongs to Σ. that is. Then P (An inﬁnitely-often) = 1.Introduction 3 Some useful results Frequent use is made of the following. November 22. (ii) (Second Lemma) Let (An ) be a sequence of independent events such that n P (An ) is divergent. Evidently. But then ) = 1 − P( c n An ) = 1 − P( ( c n An ) )=1 ) ≤ P( k≥N Ak ) .2. Let (An ) be a sequence of events such that P (An ) = 1 for all n. Proof. 2004 . (i) (First Lemma) Suppose that (An ) is a sequence of events such that n P (An ) is convergent. Then P (An inﬁnitely-often) = 0. We deﬁne the event { An inﬁnitely-often } to be the event { An inﬁnitely-often } = { ω ∈ Ω : ∀N ∃n > N such that ω ∈ An } . taking the limit m → ∞. it follows that for any ε > 0 there is N ∈ N such that ∞ P (An ) < ε. (i) By hypothesis. it follows that P (An inﬁnitely-often) ≤ P ( k≥N Ak ) = lim P ( m m k=N Ak ) ≤ ε and therefore P (An inﬁnitely-often) = 0. m Ak ) ≤ P (Ak ) < ε k=N and so.

. Suppose. in a sequence of coin tosses. . . Moreover. Now let Bn be the event that the outcomes of the ﬁve consecutive plays at the times 5n. = lim P ( n m = lim lim P ( m+n k=n Ak The proof now proceeds by ﬁrst taking complements.4. . since k P (Ak ) is divergent. Deﬁne g : R → R by setting g(x) = m yk 1{ xk } (x). ym in R. the events B1 . we have P( ( m+n c k=n Ak ) ) = P( m+n m+n c k=n Ak ) by independence. . Then σ(X) is k=1 generated by the ﬁnite collection { A1 . B2 . since P (Ac ) = 1 − P (Ak ) ≤ e−P (Ak ) . A2 . If X is discrete. Functions of a random variable If X is a random variable and g : R → R is a Borel function. 5n + 1. . there will be an inﬁnite number of “heads” with probability one. m+n k=n ≤ k=n = e− →0 È P (Ak ) as m → ∞. In particular. then one can proceed fairly directly. xm and that Ω = m Ak where X = xk on Ak . k≥n Ak )=1 Example 1. . . Then g is a k=1 ifwilde Notes . . that X assumes the ﬁnite number of distinct values x1 . A fair coin is tossed repeatedly. . . . . The converse is true. . . Am } and so any σ(X)-measurable random variable Y must have the form Y = k yk 1Ak for some y1 . Then P (Bn ) = ( 1 )5 and so n P (Bn ) is divergent. It follows that P (An inﬁnitely-often) = 1. Then P (An ) = 2 and evidently n P (An ) is divergent (and A1 . then Y = g(X) is σ(X)-measurable (where σ(X) denotes the σ-algebra generated by X). k = k=n m+n P (Ac ) . .4 (ii) We have P (An inﬁnitely-often) = P ( n n k≥n Ak k≥n Ak Chapter 1 ) ) ). then invoking the independence hypothesis and ﬁnally by the inspirational use of the inequality 1 − x ≤ e−x for any x ≥ 0. Indeed. k e−P (Ak ) . In other words. with probability one. It follows that P ( and so P (An inﬁnitely-often) = 1. by way of illustration. . 2 are independent and so P (Bn inﬁnitely-often) = 1. . 5n + 3 and 5n + 4 are all “heads”. are independent). 5n + 2. it follows that. there is an inﬁnite number of “5 heads in a row”. Let An be the event that 1 the outcome at the nth play is “heads”.

Then there is a Borel function g : R → R such that Y = g(X). To prove this for the general case. Indeed. Let X be a random variable on (Ω. Then B = { s ∈ R : (ϕn (s)) is a Cauchy sequence in R } = k∈N N ∈N m. To complete the proof of (iii). ϕn (X(ω)) converges (to Z(ω)) and so X(ω) ∈ B for all ω ∈ Ω. Each ψn is Borel and ψn (s) converges to ϕ(s) for each s ∈ R. (iii) C is closed under pointwise convergence.Introduction 5 Borel function and Y = g(X). that Z = ϕ(X) for some Borel function ϕ : R → R. / We claim that ϕ is a Borel function on R. To see this. Hence 1A (ω) = 1B (X(ω)) and so 1A ∈ C because 1B : R → R is Borel. X(ω) ∈ B and so ϕn (X(ω)) → ϕ(X(ω)). we note that for any ω ∈ Ω. that is. we note that σ(X) = X −1 (B) and so there is B ∈ B such that A = X −1 (B). Let B = { s ∈ R : limn ϕn (s) exists }. Proof. Furthermore. It follows that ϕ is also a Borel function on R. any linear combination of members of C also belongs to C. 2004 . one takes a far less direct approach. We wish to show that Z ∈ C. let ψn (s) = ϕn (s) 1B (s).5. (i) 1A ∈ C for any A ∈ σ(X). C = { ϕ(X) : for some Borel function ϕ : R → R } Then C has the following properties. 0. (ii) Clearly. But ϕn (X(ω)) → Z(ω) and so we conclude that Z = ϕ(X) and is of the required form. together with (i) means that C contains all simple functions on (Ω. P ) and suppose that Y is σ(X)-measurable. { s : ϕ(s) < α } = k∈N N ∈N n≥N { s : ψn (s) < α − 1 k } is a Borel subset of R for any α ∈ R. s∈B s∈B. November 22. S. suppose that (ϕn ) is a sequence of Borel functions on R such that ϕn (X(ω)) → Z(ω) for each ω ∈ Ω. Indeed. Theorem 1.n>N { ϕm (s) − 1 k < ϕn (s) < ϕm (s) + 1 k } 1 1 { ϕn (s) < ϕm (s)+ k } ∩ { ϕm (s)− k < ϕn (s) } and it follows that B ∈ B. σ(X)). by hypothesis. Let C denote the class of Borel functions of X. Let ϕ : R → R be given by ϕ(s) = limn ϕn (s). This fact. To see this.

let [f ] be the equivalence class containing f . consider equivalence classes of functions. Ω |f (ω)| p One can show that Lp is a linear space. Indeed. P ) be a probability space. Declare functions f and g in Lp to be equivalent.e. any such Y has the form Y = g(X) for some Borel function g : R → R. P ) is a linear space equipped with the rule α[f ] + β[g] = [αf + βg]. Let Lp (Ω. if g = 0 almost surely. · p is a semi-norm on Lp — but not a norm. (Note that every element h of N belongs to every Lp and obeys h p = 0. P ) denote the collection of equivalence classes { [f ] : f ∈ Lp }. For any 1 ≤ p < ∞. To “correct” the norm-seminorm issue. L versus L Let (Ω. then there are elements h′ . and if f ∼ g then g ∼ f . g ∈ Lp . But P (f = g) = P (g = h) = 1 so that P (f = h) = 1.6 Chapter 1 Finally.e. then f ∼ g if and only if f − g ∈ N . S. one can show that f +g p dP 1/p . Of further interest is H¨lder’s inequality o |f g| dP ≤ f p g q Ω for any f ∈ Lp and g ∈ Lq and with 1/p+1/q = 1. Then αf1 +βg1 = αf +βg +αh′ +βh′′ ifwilde Notes . In other words. This reduces (essentially) to the Cauchy-Schwarz inequality when p = 2 (so that q = 2 also). Let f p = Then (using Minkowski’s inequality). Lp (Ω. (Clearly f ∼ f .) For any f ∈ Lp . i.. S. then g p = 0 even though g need not be the zero function. then f p = sup{ Ω |f g| dP : g q ≤ 1 }. as follows. to complete the proof of the theorem. if f = g almost surely. P ) is the collection of measurable functions (random variables) f : Ω → R (or C) such that E(|f |p ) < ∞. f ∼ h. It is interesting to note that if q obeys 1/p + 1/q = 1 (q is called the exponent conjugate to p). so that [f ] = f + N . In other words. S. i. One notes that if f1 ∼ f and g1 ∼ g. As a consequence. S. we note that any Borel measurable function Y : Ω → R is a pointwise limit of simple functions and so must belong to C. To see that f ∼ g and g ∼ h implies that f ∼ h. the space Lp (Ω. written f ∼ g. by (ii) and (iii) above. note that { f = g } ∩ { g = h } ⊂ { f = h }. h′′ of N such that f1 = f +h′ and g1 = g +h′′ . Ω |f (ω)|p dP is ﬁnite. ≤ f p + g p for any f. if N denotes the collection of random variables equal to 0 almost surely..) One sees that ∼ is an equivalence relation.

P (Ak ) ≤ k 2p E(|gk+1 − gk |p ) = k 2p gk+1 − gk p p ≤ k 2p /2k .) November 22. if B = { ω : ω ∈ Ak for at most ﬁnitely many k }. By Chebyshev’s inequality. If f ∼ f1 . Put d(N ) = supn. In particular. then there is some f ∈ Lp such that fn → f in Lp and there is some subsequence (fnk ) such that fnk → f almost surely as k → ∞. . then clearly B = { Ak inﬁnitely often }c so that P (B) = 1. But for ω ∈ B. Then gj → f almost surely. simply for notational convenience. then f p = 0 so that f ∈ N . (If |||[f ]|||p = 0. In other words. Note: this whole discussion applies to any measure space — not just probability spaces. f ∼ 0 so that [f ] is the zero element of Lp . otherwise. by the Borel-Cantelli Lemma.e.6 (Riesz-Fischer). Let Ak = { ω : |gk+1 (ω) − gk (ω)| ≥ 1/k 2 }. ||| · |||p is a norm on Lp . Let 1 ≤ p < ∞ and suppose that (fn ) is a Cauchy sequence in Lp . This estimate implies that k P (Ak ) < ∞ and so. so ||| · |||p is well-deﬁned on Lp .. In other words.Introduction 7 which shows that αf1 + βg1 ∼ αf + βg. (Note that f is measurable because f = limj gj 1B on Ω. Deﬁne the function f on Ω by f (ω) = limj gj (ω) for ω ∈ B and f (ω) = 0. Hence there is some sequence n1 < n2 < n3 < . i. Riesz-Fischer Theorem Theorem 1. In fact. Let us write gk for fnk . it follows that d(N ) → 0 as N → ∞. P (Ak inﬁnitely-often) = 0. It follows that j gj+1 (ω) = g1 (ω) + k=1 (gk+1 (ω) − gk (ω)) converges (absolutely) for each ω ∈ B as j → ∞. such that d(nk ) < 1/2k . Next. Since (fn ) is a Cauchy sequence in Lp . 2004 .) It is usual to write ||| · |||p as just · p . . it is true that fnk+1 − fnk p < 1/2k . we deﬁne |||[f ]|||p = f p . One can think of Lp and Lp as “more or less the same thing” except that in Lp one simply identiﬁes functions which are almost surely equal. Proof. there is some N (which may depend on ω) such that |gk+1 (ω) − gk (ω)| < 1/k 2 for all k > N . so this really does determine a linear structure on Lp . the deﬁnition of α[f ] + β[g] above does not depend on the particular choice of f ∈ [f ] or g ∈ [g] used.m≥N fn − fm p . then f p = f1 p . The fact that the measure has total mass one is irrelevant here.

8 We claim that fn → f in L1 . Remark 1. letting m → ∞. Let N be such that d(N ) < 2 ε and choose ∞ −k < 1 ε. Then for any n > N . let g ∞ = inf{ M : |g| ≤ M almost surely }. To see this. we ﬁnd f − gj p ≤ ∞ k=j 2−k . Then g ifwilde p p = Ω |g|p dP ≤ M p Notes . Let A = { ω : |X| ≥ c }. In particular. Then X p p p p = Ω |X|p dP |X|p dP + |X|p dP Ω\A = A |X|p dP ≥ A ≥ cp P (A) as required. ﬁrst observe that m m Chapter 1 gm+1 − gj p = k=j (gk+1 − gk ) p ≤ k=j gk+1 − gk p < ∞ k=j 2−k . Then g p ≤ g ∞ and g ∞ = limp→∞ g p . Proof. f − gj ∈ Lp and so f = (f − gj ) + gj ∈ Lp . fn → f in Lp . Hence.8. that is. To see this. For any random variable g which is bounded almost surely. For any c > 0 and 1 ≤ p < ∞ P (|X| ≥ c) ≤ c−p X for any X ∈ Lp . let ε > 0 be given. 1 Finally. Proposition 1. suppose that g is bounded with |g| ≤ M almost surely. by Fatou’s Lemma.7 (Chebyshev’s inequality). we have any j such that nj > N and k=j 2 2 f − fn p ≤ f − fnj p + fnj − fn p ≤ ∞ k=j 2−k + d(N ) < ε .

Introduction

9

and so g p is a lower bound for the set { M : |g| ≤ M almost surely }. It follows that g p ≤ g ∞ . For the last part, note that if g ∞ = 0, then g = 0 almost surely. (g = 0 on the set n { |g| > 1/n } = A, say. But if for each n ∈ N, |g| ≤ 1/n almost surely, then P (A) = 0 because P (|g| > 1/n) = 0 for all n.) It follows that g p = 0 and there is nothing more to prove. So suppose that g ∞ > 0. Then by replacing g by g/ g ∞ , we see that we may suppose that g ∞ = 1. Let 0 < r < 1 be given and choose δ such that r < δ < 1. By deﬁnition of the · ∞ -norm, there is a set B with P (B) > 0 such that |g| > δ on B. But then |g|p dP ≥ δ p P (B) 1= g ∞≥ g p= p

so that 1 ≥ g p ≥ δ P (B)1/p . But P (B)1/p → 1 as p → ∞, and so 1 ≥ g p ≥ r for all suﬃciently large p. The result follows.

Monotone Class Theorem
It is notoriously diﬃcult, if not impossible, to extend properties of collections of sets directly to the σ-algebra they generate, that is, “from the inside”. One usually has to resort to a somewhat indirect approach. The so-called Monotone Class Theorem plays the rˆle of the cavalry in this respect and o can usually be depended on to come to the rescue. Deﬁnition 1.9. A collection A of subsets of a set X is an algebra if (i) X ∈ A, (ii) if A ∈ A and B ∈ A, then A ∪ B ∈ A, (iii) if A ∈ A, then Ac ∈ A.

Note that it follows that if A, B ∈ A, then A ∩ B = (Ac ∪ B c )c ∈ A. Also, for any ﬁnite family A1 , . . . , An ∈ A, it follows by induction that n n i=1 Ai ∈ A and i=1 Ai ∈ A. Deﬁnition 1.10. A collection M of subsets of X is a monotone class if (i) whenever A1 ⊆ A2 ⊆ . . . is an increasing sequence in M, then ∞ i=1 Ai ∈ M, (ii) whenever B1 ⊇ B2 ⊇ . . . is a decreasing sequence in M, then ∞ i=1 Bi ∈ M. One can show that the intersection of an arbitrary family of monotone classes of subsets of X is itself a monotone class. Thus, given a collection C of subsets of X, we may consider M(C), the monotone class generated by the collection C — it is the “smallest” monotone class containing C, i.e., it is the intersection of all those monotone classes which contain C.
November 22, 2004

10

Chapter 1

Theorem 1.11 (Monotone Class Theorem). Let A be an algebra of subsets of X. Then M(A) = σ(A), the σ-algebra generated by A. Proof. It is clear that any σ-algebra is a monotone class and so σ(A) is a monotone class containing A. Hence M(A) ⊆ σ(A). The proof is complete if we can show that M(A) is a σ-algebra, for then we would deduce that σ(A) ⊆ M(A). If a monotone class M is an algebra, then it is a σ-algebra. To see this, let A1 , A2 , · · · ∈ M. For each n ∈ N, set Bn = A1 ∪ · · · ∪ An . Then Bn ∈ M, if M is an algebra. But then ∞ Ai = ∞ Bn ∈ M if M is a monotone n=1 i=1 class. Thus the algebra M is also a σ-algebra. It remains to prove that M is, in fact, an algebra. We shall verify the three requirements. (i) We have X ∈ A ⊆ M(A). M = { B : B ∈ M(A) and B c ∈ M(A) }. Since A is an algebra, if A ∈ A then Ac ∈ A and so A ⊆ M ⊆ M(A). We shall show that M is a monotone class. Let (Bn ) be a sequence in c M with B1 ⊆ B2 ⊆ . . . . Then Bn ∈ M(A) and Bn ∈ M(A). Hence c n Bn ∈ M(A) and also n Bn ∈ M(A), since M(A) is a monotone class c ) is a decreasing sequence). (and (Bn c c c But n Bn = and so both n Bn and belong to n Bn n Bn M(A), i.e., n Bn ∈ M. Similarly, if B1 ⊇ B2 ⊇ . . . belong to M, then n Bn ∈ M(A) and c c = n Bn ∈ M(A) so that n Bn ∈ M. It follows that M is a n Bn monotone class. Since A ⊆ M ⊆ M(A) and M(A) is the monotone class generated by A, we conclude that M = M(A). But then this means that for any B ∈ M(A), we also have B c ∈ M(A). (ii) We wish to show that if A and B belong to M(A) then so does A ∪ B. Now, by (iii), it is enough to show that A ∩ B ∈ M(A) (using A ∪ B = (Ac ∩ B c )c ). To this end, let A ∈ M(A) and let MA = { B : B ∈ M(A) and A ∩ B ∈ M(A) }. Then for B1 ⊆ B2 ⊆ . . . in MA , we have A∩
∞ i=1 ∞ i=1

(iii) Let A ∈ M(A). We wish to show that Ac ∈ M(A). To show this, let

Bi =

A ∩ Bi ∈ M(A)

since each A ∩ Bi ∈ M(A) by the deﬁnition of MA .
ifwilde Notes

Introduction

11

Similarly, if B1 ⊇ B2 ⊇ . . . belong to MA , then A∩
∞ i=1

Bi =

∞ i=1

A ∩ Bi ∈ M(A).

Therefore MA is a monotone class. Suppose A ∈ A. Then for any B ∈ A, we have A ∩ B ∈ A, since A is an algebra. Hence A ⊆ MA ⊆ M(A) and therefore MA = M(A) for each A ∈ A. Now, for any B ∈ M(A) and A ∈ A, we have A ∈ MB ⇐⇒ A ∩ B ∈ M(A) ⇐⇒ B ∈ MA = M(A). Hence, for every B ∈ M(A), A ⊆ MB ⊆ M(A) and so (since MB is a monotone class) we have MB = M(A) for every B ∈ M(A). Now let A, B ∈ M(A). We have seen that MB = M(A) and therefore A ∈ M(A) means that A ∈ MB so that A ∩ B ∈ M(A) and the proof is complete. Example 1.12. Suppose that P and Q are two probability measures on B(R) which agree on sets of the form (−∞, a] with a ∈ R. Then P = Q on B(R). Proof. Let S = { A ∈ B(R) : P (A) = Q(A) }. Then S includes sets of the form (a, b], for a < b, (−∞, a] and (a, ∞) and so contains A the algebra of subsets generated by those of the form (−∞, a]. However, one sees that S is a monotone class (because P and Q are σ-additive) and so S contains σ(A). The proof is now complete since σ(A) = B(R).

November 22, 2004

12 Chapter 1 ifwilde Notes .

Ë n An on Ω and so by Lebesgue’s Monotone Conver- Ω X 1Bn dP ↑ X1 Ω Ë n An dP = ν( n n An ) An ) . ν is a ﬁnite measure on (Ω. Proof. denoted by E(X | G). G). Letting n → ∞. It follows that k ν(Ak ) is convergent and ν( ν is a ﬁnite measure on (Ω. is a sequence of pairwise disjoint events in G. 1Bn ↑ 1 gence Theorem. Where does this come from? Suppose that X ≥ 0 and deﬁne ν(A) = ν(A) = Ω AX dP for any A ∈ G. + 1An ) dP = ν(A1 ) + · · · + ν(An ) . S. The conditional expectation of an integrable random variable X with respect to a sub-σ-algebra G of S is a G-measurable random variable. = n ν(An ). . P ). A2 . Now suppose that A1 .1. . Then X 1A dP = E(X 1A ) . Since X 1A ≥ 0 almost surely for any A ∈ G. Also. ν(Ω) = E(X) which is ﬁnite by hypothesis (X is integrable). Proposition 2. 13 Hence . obeying A E(X | G) dP = X dP A for every set A ∈ G. . Set Bn = A1 ∪ · · · ∪ An . . Then ν(Bn ) = Ω X 1Bn dP = Ω X(1A1 + . . G). we see that ν(A) ≥ 0 for all A ∈ G.Chapter 2 Conditional expectation Consider a probability space (Ω.

e. (iii) E(X + Y | G) = E(X | G) + E(Y | G) almost surely.14 Chapter 2 If P (A) = 0. Suppose that µ1 and µ2 are ﬁnite measures (on some (Ω. E(E(X | G)) = E(X). (v) If G = { Ω.2 (Radon-Nikodym). (ii) If X ≥ 0. Also E(a | G) = a almost surely. This last assertion follows from the following lemma. (vii) If σ(X) and G are independent σ-algebras. if A = { ω : f (ω) = g(ω) }. that is. then E(X | G) = E(X) everywhere. we obtain the conditional expectation E(X | G) as above.. the conditional expectation is (almost surely) positivity preserving. Note that if X is not necessarily positive. then E(X | G) ≥ 0 almost surely.3. (vi) (Tower Property) If G1 ⊆ G2 . We say that ν is absolutely continuous with respect to P on G (written ν << P ).. then E(X | G) = E(X) almost surely. the trivial σ-algebra. The conditional expectation enjoys the following properties. then X 1A = 0 almost surely and (since X1A ≥ 0) we see that ν(A) = 0. ∅ }. almost surely. then E(E(X | G2 ) | G1 ) = E(X | G1 ). then A ∈ G and P (A) = 0. (viii) For any G. i. Proposition 2. (iv) For any a ∈ R. Thus P (A) = 0 =⇒ ν(A) = 0 for A ∈ G. ifwilde Notes . (i) Suppose that f and g are positive G-measurable and satisfy f dP = A A g dP for all A ∈ G. dµ2 )) on Ω such that g ≥ 0 (µ1 -almost everywhere) and µ1 (A) = A g dµ2 for any A ∈ G.e. G)) with µ1 << µ2 . With µ1 = ν and µ2 = P . E(aX | G) = aE(X | G) almost surely. then we can write X = X+ − X− with X± ≥ 0 to give E(X | G) = E(X+ | G) − E(X− | G). Theorem 2. Then there is a G-measurable µ2 -integrable function g (g ∈ L1 (G. Properties of the Conditional Expectation Various basic properties of the conditional expectation are contained in the following proposition. The following theorem is most relevant in this connection. Then A (f − g) dP = 0 for all A ∈ G and so f − g = 0 almost surely. i. Proof. (i) E(X | G) is unique. E(X | G)(ω) = E(X) for every ω ∈ Ω.

Conditional expectation 15 Lemma 2. Note that we have used the standard shorthand notation such as µ(h > 0) for µ({ ω : h(ω) > 0 }). A X dP ≥ 0. Bn Bn However. Then for any A ∈ G. we have E(X + Y | G) dP = (X + Y ) dP A X dP ≥ 0 which forces P (Bn ) = 0. A November 22. There is unlikely to be any confusion. Then 1 n h dµ ≥ 1 n dµ = An µ(An ) and so µ(An ) = 0 for all n ∈ N. let X denote E(X | G). Ë n n An 1 Let Bn = { ω : h(ω) ≤ − n }. 2004 .5. the left hand side is equal to Bn But then P (X < 0) = limn P (Bn ) = 0 and so P (X ≥ 0) = 1. But then it follows that µ( { ω : h(ω) > 0 } ) = lim µ(An ) = 0 .4. S. (ii) For notational convenience. then h = 0 µ-almost everywhere. Let An = { ω : h(ω) ≥ 0= An 1 n }. X dP = A 1 Let Bn = { ω : X(ω) ≤ − n }. µ(h = 0) = 1. Then 1 X dP ≤ − n 1 dP = − n P (Bn ) . Ë n n Bn Hence µ({ ω : h(ω) = 0 }) = µ(h < 0) + µ(h > 0) = 0 . Remark 2. Proof of Lemma. that is. Then 0= Bn 1 h dµ ≤ − n Bn 1 dµ = − n µ(Bn ) so that µ(Bn ) = 0 for all n ∈ N and therefore µ( { ω : h(ω) < 0 } ) = lim µ(Bn ) = 0 . µ) and satisﬁes A h dµ = 0 for all A ∈ S. (iii) For any choices of conditional expectation. If h is integrable on some ﬁnite measure space (Ω.

X dP = A ∅ X dP = 0 = A E(X) dP . Then. as required. We conclude that E(X + Y | G) = E(X | G) + E(Y | G) almost surely. So the everywhere constant function X : ω → E(X) is { Ω. Now with A = ∅. so that P (X ′ = X) = 1. A E(X | G1 ) dP = = X dP A A E(X | G2 ) dP since A ∈ G2 . it must be the case that { X ′ = X } = Ω so that X(ω) = X ′ (ω) for all ω ∈ Ω. ∅ }. (vi) Suppose that G1 and G2 are σ-algebras satisfying G1 ⊆ G2 . since A ∈ G1 Notes = A E(E(X | G2 ) | G1 ) dP ifwilde . Hence ω → E(X) is a conditional expectation of X.16 = A Chapter 2 X dP + A Y dP E(Y | G) dP = A E(X | G) dP + A = A E(X + Y | G) dP for all A ∈ G. ∅ }-measurable and so is equal to either ∅ or to Ω. (v) With A = Ω. Since P (X ′ = X) = 1. we have X dP = A Ω X dP = E(X) E(X) dP Ω = = A E(X) dP . If X ′ is another. for any A ∈ G1 . But the set { X ′ = X } is { Ω. then X ′ = X almost surely. ∅ }-measurable and obeys X dP = A A X dP for every A ∈ { Ω. (iv) This is just as the proof of (iii).

X dP = A A X dP . we get X dP = Ω Ω X dP . as claimed. we see that (∗) implies that f dP = A A f dP for all A ∈ G. Then. writing a general f as f = f + − f − (where f ± ≥ 0 and f + f − = 0 almost surely). S. It follows that f is the required conditional expectation. For the converse. The conditional expectation f = E(f | G) is characterized (almost surely) by f g dP = Ω Ω f g dP (∗) for all bounded G-measurable functions g.6. With g = 1A . P ). Then for any A ∈ G. Proof.Conditional expectation 17 and we conclude that E(X | G1 ) = E(E(X | G2 ) | G1 ) almost surely. we get f g dP = Ω Ω f + g dP − f + g dP − f − g dP Ω = Ω Ω f − g dP November 22. A E(X | G) dP = X dP = A Ω X 1A dP = E(X 1A ) = E(X) E(1A ) = A E(X) dP. Theorem 2. that is. 2004 . (vii) For any A ∈ G. In particular. suppose that we know that (∗) holds for all f ≥ 0. with A = Ω. Let f ∈ L1 (Ω. The result follows. E(X) = E(X). (viii) Denote E(X | G) by X. The next result is a further characterization of the conditional expectation.

we note that by decomposing g as g = g + −g − . ifwilde (2. which completes the proof.3) Notes . ϕ(a)) and (b. The function ϕ : R → R is said to be convex if ϕ(a + s(b − a)) ≤ ϕ(a) + s(ϕ(b) − ϕ(a)). For ﬁxed n. In this case. Similarly. let sn = aj 1Aj (ﬁnite sum). it is enough to prove that (∗) holds for g ≥ 0. ϕ((1 − s)a + sb)) ≤ (1 − s)ϕ(a) + sϕ(b) (2. Jensen’s Inequality We begin with a deﬁnition.2) (2.1) we have (1 − s)ϕ(v) + sϕ(v) = ϕ(v) ≤ (1 − s)ϕ(u) + sϕ(w) which can be rearranged to give (1 − s)(ϕ(v) − ϕ(u)) ≤ s(ϕ(w) − ϕ(v)). Let u < v < w. The point a + s(b − a) = (1 − s)a + sb lies between a and b and the inequality (2.18 = Ω Chapter 2 ( f + − f + ) g dP f g dP = Ω and we see that (∗) holds for general f ∈ L1 . So we need only show that (∗) holds for f ≥ 0 and g ≥ 0. that is. Then v = u + s(w − u) = (1 − s)u + sw for some 0 < s < 1 and from (2. b ∈ R and all 0 ≤ s ≤ 1. Then f sn dP = Ω Aj f dP = Aj f dP = Ω f sn dP giving the equality f sn dP = Ω Ω f sn dP . the left hand side converges to Ω f g dP and the right hand side converges to Ω f g dP as n → ∞.1) is the statement that the chord between the points (a. By Lebesgue’s Dominated Convergence Theorem. Deﬁnition 2. ϕ(b)) on the graph y = ϕ(x) lies above the graph itself.7.1) for any a. we know that there is a sequence (sn ) of simple G-measurable functions such that 0 ≤ sn ≤ g and sn → g everywhere.

3) can therefore be rewritten as ϕ(v) − ϕ(u) ϕ(w) − ϕ(v) ≤ .5) ϕ(w) − ϕ(v0 ) ≡ D+ ϕ(v0 ).4) (2. is bounded above (by (ϕ(w) − ϕ(v0 ))/(w − v0 ) for any w > v0 ). Then by inequality (2. w − v0 Next. November 22. by (2.Conditional expectation 19 But v = (1 − s)u + sw = u + s(w − u) = (1 − s)(u − w) + w so that s = (v − u)/(w − u) and 1 − s = (v − w)/(u − w) = (w − v)/(w − u).4).5). (2. ∃ lim u↑v0 ϕ(v0 ) − ϕ(u) ≡ D− ϕ(v0 ).e.e. It follows that ϕ has a left derivative at v0 . v−u w−v Again. 2004 .6). is bounded below by (ϕ(v0 ) − ϕ(u))/(v0 − u) for any u < v0 . Now ﬁx v = v0 . we get ϕ(v) − ϕ(u) ≤ (1 − s)ϕ(u) + sϕ(w) − ϕ(u) = s(ϕ(w) − ϕ(u)) and so. we see that the ratio (Newton quotient) (ϕ(w) − ϕ(v0 ))/(w − v0 ) decreases as w ↓ v0 and.2). ∃ lim w↓v0 (2. v−u w−u Once more. the ratio (ϕ(v0 ) − ϕ(u))/(v0 − u) is increasing as u ↑ v0 and. substituting for s. v0 − u ϕ(w) − ϕ(v0 ) →0 w − v0 ϕ(v0 ) − ϕ(u) →0 v0 − u It follows that ϕ is continuous at v0 because ϕ(w) − ϕ(v0 ) = (w − v0 ) and ϕ(v0 ) − ϕ(u) = (v0 − u) as w ↓ v0 as u ↑ v0 .. by the inequality (2. Hence ϕ has a right derivative at v0 . ϕ(v) − ϕ(u) ϕ(w) − ϕ(u) ≤ ..2). ϕ(v) − ϕ(w) ≤ (1 − s)ϕ(u) + sϕ(w) − ϕ(w) = (s − 1)(ϕ(w) − ϕ(u)) which gives ϕ(w) − ϕ(v) ϕ(w) − ϕ(u) ≤ . i. Inequality (2. from (2.6) w−u w−v These inequalities are readily suggested from a diagram. from (2. we consider u ↑ v0 . By (2.4). i.

i. Proof. by (2.20 By (2. λ−u v−u D+ ϕ(u) ≤ D+ ϕ(v) Chapter 2 Hence Letting λ ↓ u. we get D− ϕ(v0 ) ≤ ϕ(w) − ϕ(v0 ) w − v0 ϕ(w) − ϕ(λ) ≤ for any v0 ≤ λ ≤ w. ifwilde Notes (2.5).e. D− ϕ(v0 ) ≤ D− ϕ(w) whenever v ≤ w.4). Claim. ϕ(x) − ϕ(v) ≥ m(x − v) for x < v and the claim is proved. both D+ ϕ and D− ϕ are non-decreasing functions. we get whenever u ≤ v. Similarly. Then (ϕ(v) − ϕ(x))/(v − x) ↑ D− ϕ(v) ≤ m and so we see that ϕ(v) − ϕ(x) ≤ m(v − x). letting u ↑ v0 and w ↓ v0 in (2. That is. we see that D− ϕ(v0 ) ≤ D+ (v0 ) at each v0 .6). Furthermore. Now let x < v.. Then m(x − v) + ϕ(v) ≤ ϕ(x) for any x ∈ R. letting u ↑ v0 . Fix v and let m satisfy D− ϕ(v) ≤ m ≤ D+ ϕ(v). (ϕ(x) − ϕ(v))/(x − v) ↓ D+ ϕ(v) and so ϕ(x) − ϕ(v) ≥ D+ ϕ(v) ≥ m x−v which means that ϕ(x) − ϕ(v) ≥ m(x − v) for x > v.7) . For x > v. we ﬁnd ϕ(v) − ϕ(u) ≤ D+ ϕ(v) v−u and so ϕ(λ) − ϕ(u) ϕ(v) − ϕ(u) ≤ ≤ D+ ϕ(v). letting w ↓ v in (2. Note that the inequality in the claim becomes equality for x = v.6) w−λ ↑ D− ϕ(w) as λ ↑ w.

For each q ∈ Q. D− ϕ(u) ≤ D− ϕ(q) ≤ D+ ϕ(q) ≤ D+ ϕ(w). ﬁx u < x < w. β) ∈ A0 }. From the remark above (and using the same notation). (m(qn )) is a bounded sequence. Then we know that for any q ∈ Q with u < q < w. ϕ(x) = sup{ αx + β : (α.Conditional expectation 21 Let A = { (α. β) ∈ A0 }. β) : ϕ(x) ≥ αx + β for all x }. β(q)) : q ∈ Q }.8) Hence D− ϕ(u) ≤ m(q) ≤ D+ ϕ(w). let m(q) be a ﬁxed choice such that D− ϕ(q) ≤ m(q) ≤ D+ ϕ(q) (for example. Then by (2. ϕ(x) = sup{ αx + β : (α. β) ∈ A0 }. Let (qn ) be a sequence in Q with u < qn < w and such that qn → x as n → ∞. We are now in a position to discuss Jensen’s Inequality. the ﬁrst term on the right hand side converges to 0 as n → ∞ and so does the second term. 2004 . It follows that (α(qn )x + β(qn )) → ϕ(x) and therefore we see that ϕ(x) = sup{ αx + β : (α. (2. because the sequence (m(qn )) is bounded. Let A0 = { (α(q). We wish to show that it is possible to replace the uncountable set A with a countable one. Set α(q) = m(q) and β(q) = ϕ(q) − m(q)q. as claimed. we could systematically choose m(q) = D− ϕ(q)).8). Then the discussion above shows that αx + β ≤ ϕ(x) for all x ∈ R and that ϕ(q) = sup{ αq + β : (α. Given x ∈ R. Proof. We have 0 ≤ ϕ(x) − (α(qn )x + β(qn )) = ϕ(x) − ϕ(qn ) + ϕ(qn ) − (α(qn )x + β(qn )) = (ϕ(x) − ϕ(qn )) + m(qn )(qn − x) . = (ϕ(x) − ϕ(qn )) + α(qn )qn + β(qn ) − (α(qn )x + β(qn )) Since ϕ is continuous. we see that for any x ∈ R. β) ∈ A }. For any x ∈ R. November 22. Claim.

β) ∈ A. ifwilde Notes . β) be a set in G such that P (A(α. Now.β)∈A0 . P (A) = 1 and αX(ω) + β ≤ E(ϕ(X) | G)(ω) for all ω ∈ A. ϕ(X) − (αX + β) ≥ 0 on Ω. Since A0 is countable.22 Chapter 2 Theorem 2. Hence. Let A = (α. αX(ω) + β ≤ E(ϕ(X) | G)(ω) ϕ(X(ω))=ϕ(X)(ω) on A. Taking the supremum over (α. let A(α. taking conditional expectations. let A = { (α. almost surely almost surely. (α. Proof. β) ∈ A0 . sup (αx + β). As above. Then ϕ(E(X | G)) ≤ E(ϕ(X) | G) almost surely. for each (α. ϕ(X) ≤ E(ϕ(X) | G) almost surely and the proof is complete. β) ∈ A0 . Suppose that ϕ is convex on R and that both X and ϕ(X) are integrable. E(ϕ(X) − (αX + β) | G) ≥ 0 It follows that E((αX + β) | G) ≤ E(ϕ(X) | G) and so αX + β ≤ E(ϕ(X) | G) where X is any choice of E(X | G). We have seen that there is a countable subset A0 ⊂ A such that ϕ(x) = For any (α. we get sup (α. that is. β)) = 1 and αX(ω) + β ≤ E(ϕ(X) | G)(ω) for every ω ∈ A(α.β)∈A0 αX(ω) + β ≤ ϕ(X(ω)) for all ω ∈ Ω.β)∈A0 almost surely. β).8 (Jensen’s Inequality). In other words. β) : ϕ(x) ≥ αx + β for all x ∈ R }.

Then the distance between u and the x-y plane is just |c|. The vector v lies in the x-y plane and is orthogonal to the vector w. k denote the unit vectors in the Ox. where i. Note: it is usual to also require that if x = 0. r almost surely. x). z) + b(y. that is. then (x. y ∈ X. Then E(X | G) r ≤ X r. for all x ∈ X. for all x. Let ϕ(s) = |s|r and let X be any choice for the conditional expectation E(X | G). In other words.9. z) = a(x.) Let X be a real linear space equipped with an “inner product”. x) > 0 (which is why we have put the term “inner product” in quotation marks). (We consider only real Hilbert spaces. the conditional expectation is a contraction on every Lr with r ≥ 1. Taking rth roots. z). Recall that a Hilbert space is a complete inner product space. of course. b ∈ R. This. a map x. (iii) (ax + by. ≤ X r Functional analytic approach to the conditional expectation Consider a vector u = ai + bj + ck ∈ R3 . Proof. x) ≥ 0. where v = ai + bk and w = ck. Taking expectations. y → (x. for all x. y) ∈ R such that (i) (x. The generalization to general Hilbert spaces is as follows. z ∈ X and a. The vector u can be written as v + w. 2004 . Suppose that X ∈ Lr where r ≥ 1. Then ϕ is convex and so by Jensen’s Inequality ϕ(X) ≤ E(ϕ(X) | G) . November 22. respectively. gives X as claimed. that is | X |r ≤ E( |X|r | G) almost surely. j. y) ∈ R and (x. we get E( |X|r ) ≤ E( E( |X|r | G) ) = E( |X|r ) .Conditional expectation 23 Proposition 2. (ii) (x. y) = (y. Oy and Oz directions. but it is more usual to discuss complex Hilbert spaces. leads us back to the discussion of the distinction between L2 and L2 . y.

But then d ≤ x − v ≤ x − v n + vn − v →d →0 1 x− 2 (vm +vn ) and so d = x − v . an example in L2 is provided by any function which is zero almost surely. there is v ∈ V so that x − v = dist(x. This follows by direct calculation using w = (w. n → ∞) then there is some v ∈ V such that vn → v. Indeed. We claim that (vn ) is a Cauchy sequence. By hypothesis. Hence (∗) → 2d2 +2d2 −4d2 = 0 as m. we 1/2 observe that f = Ω f 2 dP = f 2 . Theorem 2. the parallelogram law gives vn − v m 2 = 2 x − vm = 2 x − vm = (x − vm ) + (vn − x) 2 2 2 + vn − x − (x − vm ) − (vn − x) 1 2(x− 2 (vm +vn )) 2 2 2 + vn − x = 2 − 4 x − 1 (vm + vn ) 2 − vm ) + 1 (x − vm ) 2 (∗) Now d ≤ x − 1 ( vm + v n ) 2 ≤ 1 2 ∈V 1 2 (x x − vm + 1 x − vn 2 →d →d so that → d as m. V ). P ) with (f.24 Example 2. g ∈ L2 .13.12. V is complete and so there is v ∈ V such that vn → v. i. w). Deﬁnition 2. set x = (x. if (vn ) is a Cauchy sequence in V (so that vn − vm → 0 as m. n → ∞. that is. Then there is some v ∈ V such that x − v = inf y∈V x − y .11 (Parallelogram law). It follows that (vn ) is a Cauchy sequence. Proof. A subspace V ⊂ X is said to be complete if every Cauchy sequence in V converges to an element of V . as n → ∞. x)1/2 for x ∈ X. Indeed. Chapter 2 Ω f (ω)g(ω) dP for any In general. S.e. n → ∞. Let x ∈ X and suppose that V is a complete subspace of X. For any x. Let X = L2 (Ω. Then in the example above. It can happen that x = 0 even though x = 0. ifwilde Notes . Proof. Let (vn ) be any sequence in V such that y∈V x − vn → d = inf x − y ..10. Proposition 2. g) = f. y ∈ X. the distance between x and V . x+y 2 + x−y 2 = 2( x 2 + y 2 2 ). It is important to note that · is not quite a norm.

−2αλ + α2 w 2 = x − v ′ − αw 2 = x − v′ 2 − 2α(x − v ′ . Thus we are supposing that · really is a norm not just a seminorm on X. In this case. Given any x ∈ X. One checks that this is a linear November 22. Then we calculate 2 2 = x − v′ = (x − v ′ ) + (v ′ − v) = 0 since v ′ − v ∈ V ′ 2 2 2 + 2 (x − v ′ .Conditional expectation 25 Proposition 2. This decomposition of x as the sum of an element v ∈ V and an element w ⊥ V is unique and means that we can deﬁne a map P : X → V by the formula P : x → P x = v. Then the equality v − v ′ = 0 is equivalent to v = v ′ . Suppose that there is w ∈ V such that (x − v ′ . With x = v + x − v we see that we can write x as x = v + w where v ∈ V and w ⊥ V .14. It follows that v ′ − v = 0. there is a unique v ∈ V such that x − v ⊥ V . (The graph of y = x(xb2 − a) lies below the x-axis for all x between 0 and a/b2 .) We conclude that there is no such w and therefore x − v ′ ⊥ V . 2004 . w) = λ = 0. v ′ − v) + v − v ′ + v−v = x − v′ 2 ≥d2 2 (∗) ≥ d + v − v′ 2. if v ′ ∈ V and x − v ′ ⊥ V . suppose that x − v ′ ⊥ V . If v ′ ∈ V satisﬁes d = x − v ′ . w) + α2 w 2 But the left hand side is ≥ d2 . Then u ∈ V and x−u 2 = d2 − 2αλ + α2 w 2 . Suppose now that · satisﬁes the condition that x = 0 if and only if x = 0. Proof. Let u = v ′ + αw where α ∈ R. then x − v ′ = d and v − v ′ = 0. d2 = x − v For the converse. It follows that v − v ′ = 0 and then (∗) implies that x − v ′ = d. so by the deﬁnition of d we obtain = α(−2λ + α w 2 ) ≥ 0 for any α ∈ R. This is impossible. v ′ − v) + v ′ − v = d2 + v ′ − v 2 . then x − v ′ ⊥ V and v − v ′ = 0 (where v is as in the previous theorem). we can summarize the preceding discussion as follows. Conversely. But then we have d2 = x − v 2 2 = x − v′ = (x − v ′ ) + (v ′ − v) = 0 since v ′ − v ∈ V 2 2 + 2 (x − v ′ .

P ). However. Then we note that Q(1−Q) = Q−Q2 = 0 and writing any x ∈ X as x = Qx+(1−Q)x. Suppose that Q is a linear map obeying these two conditions. G. If we demand that G contain all the null sets of S. G. P ). where now N (S) is the set of S-measurable null functions. P ). If we were to apply this discussion to L2 (Ω. The Riesz-Fisher Theorem tells us that L2 (Ω. P v = v for all v ∈ V . S. Qx) = 0. P ) then f − f ′ 2 = 0. x′ ) = (x. It is certainly true that any element of L2 (Ω. G. P ). x′ ) = (v. P ) is a set (equivalence class) of the form [f ] = { f + N (G) }. that is. G. v ′ + w′ ) = (v. G. any element f ∈ L2 (Ω. Just as we construct L2 (Ω. The notion of almost sure equivalence depends on the underlying σ-algebra.26 Chapter 2 map from X onto V . Then (P x. where N (G) denotes the set of null functions that are G-measurable. G. P ) is the set { f + N (S) }. For this idea to go through. G. S. It is certainly true that { f + N (G) } ⊂ { f + N (S) }. there is a subtle point here. P ). for any x. P x′ ). P ) is complete and so the above analysis is applicable. f ∈ L2 (Ω.) We also see that P 2 x = P P x = P v = v = P x. P ). but in general there need not be equality. the functions f and g are unique only in the sense that if also f = f ′ + g ′ with f ′ ∈ L2 (Ω. we must be able to identify L2 (Ω. G. x) − (Q2 x. x′ ∈ X. P ) and L2 (Ω. it follows that L2 (Ω. Because · 2 is not a norm but only a seminorm on L2 (Ω. w′ ⊥ V . G. We say that P projects onto the subspace V = { P x : x ∈ X }. G. P ) also an element of L2 (Ω. G. x′ ) = (v. S. so that P 2 = P . We now wish to apply this to the linear space L2 (Ω. where now V = L2 (Ω. P ). then we do have equality ifwilde Notes . P ) as a subspace of L2 (Ω. A linear map P with the properties that P 2 = P and (P x. P ). x) − (Qx. P ) under an orthogonal projection. v ′ ∈ V and w. On the other hand. P x′ ) for all x. G. we ﬁnd that the terms Qx and (1 − Q)x are orthogonal (Qx. the corresponding element [f ] ∈ L2 (Ω. (1 − Q)x) = (Qx. Qx) = (Qx. S. v ′ ) = (x. G. so we may construct L2 (Ω. x′ ∈ X is called an orthogonal projection. S. Suppose that G is a sub-σ-algebra of S. P ) is also an element of L2 (Ω. P ) is unique almost surely. P ) is a linear subspace of L2 (Ω. P ). P ). P )? The answer lies in the sets of zero probability. then · 2 is a norm and the object corresponding to f should now be unique and be the image of [f ] ∈ L2 (Ω. P ) can be written as f =f +g where f ∈ L2 (Ω. S. S. S. but is every element of L2 (Ω. G. S. Moreover. Any element of L2 (Ω. P ) and g ′ ∈ L2 (Ω. Thus. v ′ ) = (v + w. So Q is the orthogonal projection onto the linear subspace { Qx : x ∈ X }. P ) and g ⊥ L2 (Ω. S. Since every G-measurable function is also Smeasurable. S. (Indeed. write x = v + w and x′ = v ′ + w′ with v.

16. This is the deﬁning property of the conditional expectation except that f belongs to L2 (Ω. S. Indeed. f − f ⊥ L2 (Ω. f dP = A A f dP for any A ∈ G. if x = [f ]. P ). it is enough to show that h ≥ 0 almost surely whenever h ≥ 0 almost surely.15. set g = 1A to get the equality f dP = A Ω f g dP = Ω f g dP = A f dP. (any version of ) the conditional expectation f = E(f | G) satisﬁes Ω f (ω)g(ω) dP = Ω f (ω)g(ω) dP for any g ∈ L2 (Ω. then f ≥ g almost surely. there is a unique element v ∈ L2 (Ω. we can extend the result to cover the L1 case as we now show. G. any element f ∈ L2 (Ω. (If Bn = { h < −1/n } for n ∈ N. G. P ) so that Ω (f − f ) g dP = 0 for any g ∈ L2 (Ω. S.) November 22. However. P ) rather than L1 (Ω. then v is given by v = [ f ].Conditional expectation 27 { f + N (G) } = { f + N (S) } and in this case it is true that L2 (Ω. with f ∈ L2 (Ω. But this follows directly from h dP = A A h dP ≥ 0 for all A ∈ G. Proof. For any x ∈ L2 (Ω. Indeed. S. S. First note that if f ≥ g almost surely. G. G. For given f ∈ L2 (Ω. G. P ). P ) such that x − v ⊥ L2 (Ω. By construction. P ). P ). the conditional expectation map f → f is an orthogonal projection (subject to the ambiguities of sets of probability zero). We now wish to show that it is possible to recover the usual properties of the conditional expectation from this L2 (inner product business) approach. P ) such that f − f ⊥ L2 (Ω. P ) really is a subspace of L2 (Ω. Proposition 2. P ). But then P (h < 0) = limn P (Bn ) = 0. S. G. P ). P ). by considering h = f − g. for any A ∈ G. In particular. P ). G. P ). Deﬁnition 2. P ) is called a version of the conditional expectation of f and is denoted E(f | G). 2004 . G. In particular. For f ∈ L2 (Ω. S. On square-integrable random variables. then the inequalities 1 0 ≤ Bn h dP ≤ − n P (Bn ) imply that P (Bn ) = 0. S.

ifwilde Notes . then fn ≥ fm and so fn ≥ fm almost surely. fn dP = A Ω fn 1A dP = Ω fn 1A dP = Ω fn 1B c 1A dP because P (B c ) = 1. Of course. S. if n > m.17. we see that fn 1B c 1A dP → f 1B c 1A dP = Ω A f dP Ω which gives the equality A f dP = A f dP . Proof. The left hand side → Ω f 1A dP by Lebesgue’s Dominated Convergence Theorem. P ). For f ∈ L1 (Ω. ω ∈ B. we see that f ∈ L1 (Ω. Then P (B) = 0. P ). G. Taking A = Ω. P ) such that f dP = f dP A A for any A ∈ G. P (B c ) = 1 and fn ≥ fm on B c for all m. there exists f ∈ L1 (Ω. the function f is called (a version of) the conditional expectation E(f | G) and we have recovered our original construction as given via the Radon-Nikodym Theorem. it is enough to consider the case with f ≥ 0 almost surely. The function f is unique almost surely. set fn (ω) = f (ω) ∧ n. S. ω ∈ B c f (ω) = 0. Set supn fn (ω). Let B = m. P ). For n ∈ N . By writing f as f + − f − .n Bmn . Applying Lebesgue’s Monotone Convergence Theorem to the right hand side. Then for any A ∈ G. Let fn be any version of the conditional expectation of fn with respect to G. G. That is. n with m ≤ n. Now. Then 0 ≤ fn ≤ n almost surely and so fn ∈ L2 (Ω. there is some event Bmn ∈ G with c c P (Bmn ) = 0 and such that fn ≥ fm on Bmn (and P (Bmn ) = 1).28 Chapter 2 Proposition 2.

by the tower property = ··· = E(ξn−1 | Fm ) almost surely = E(ξm+1 | Fm ) almost surely = ξm almost surely. A martingale (with respect to (Fn )) is a sequence (ξn ) of random variables such that 1. almost surely. P ) be a probability space. (This could have been taken as part of the deﬁnition. We consider a sequence (Fn )n∈Z+ of sub-σ-algebras of a ﬁxed σ-algebra F ⊂ S obeying F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ F . Think of n as (discrete) time. Suppose that (ξn ) is a martingale. The intuitive idea is to think of Fn as those events associated with outcomes of interest occurring up to “time n”. Each ξn is integrable: E(|ξn |) < ∞ for all n ∈ Z+ . 3.) Remark 3. That is. Such a sequence is called a ﬁltration (upward ﬁltering).2. ξn is measurable with respect to Fn (we say (ξn ) is adapted ). For any n > m. S. for all n ∈ Z+ . Deﬁnition 3.1.Chapter 3 Martingales Let (Ω. we have E(ξn | Fm ) = E(E(ξn | Fn−1 ) | Fm ) almost surely. E(ξn | Fm ) = ξm almost surely for all n ≥ m. (The conditional expectation E(ξ | G) is not deﬁned unless ξ is integrable.) 29 . E(ξn+1 | Fn ) = ξn . Note that (1) is required in order for (3) to make sense. 2.

of random variables is said to be a supermartingale with respect to the ﬁltration (Fn ) if the following three conditions hold. Deﬁnition 3. 2. X1 .30 Chapter 3 Example 3. . For each n ∈ Z+ let Fn = σ(X0 . . Finally.3. (Xn ) is adapted to (Fn ). Example 3. G = { Ω. The sequence X0 .4. Xn ≥ E(Xn+1 | Fn ) almost surely. Proposition 3. ifwilde Notes = E(X | Fn ) almost surely . X1 . . E(ξn ) = E(ξ0 ). For any martingale (ξn ). Let X ∈ L1 (F) and let ξn = E(X | Fn ). For each n ∈ Z+ . Each Xn is integrable. . X2 . Since G ⊂ Fn for all n. Xn ) and let ξn = X0 + X1 + · · · + Xn . where G is the trivial σ-algebra. . .5. We note that E(X) = E(X | G). Let X0 . Proof. we can apply the tower property to deduce that E(ξn ) = E(ξn | G) = E(E(ξn | F0 ) | G) = E(ξ0 | G) = E(ξ0 ) as required. X2 . we note that (almost surely) E(ξn+1 | Fn ) = E(Xn+1 + ξn | Fn ) = E(Xn+1 ) + ξn = ξn = E(Xn+1 | Fn ) + E(ξn | Fn ) since ξn is adapted and Xn+1 and Fn are independent and so (ξn ) is a martingale. . 1. 3. be independent integrable random variables with mean zero. ∅ }. . Evidently (ξn ) is adapted and E(|ξn |) = E(|X0 + X1 + · · · + Xn |) <∞ ≤ E(|X0 |) + E(|X1 |) + · · · + E(|Xn |) so that ξn is integrable. .6. . Then ξn ∈ L1 (Fn ) and E(ξn+1 | Fn ) = E(E(X | Fn+1 ) | Fn ) = ξn almost surely so that (ξn ) is a martingale. X1 .

since Xn is Fn -measurable. .. by independence. is said to be a submartingale if both (1) and (2) above hold and also the following inequalities hold. Y1 . . For n ∈ Z+ . we ﬁnd that E(Y ) = E(Xn ) − E(Xn+1 ) = 0. since E(Xn+1 ) = E(Xn ) = E(X0 ).] By construction. E(eYn ) . . = Xn E(eYn+1 ). let Xn = exp(Y0 + Y1 + · · · + Yn ) . the σ-algebra generated by Y0 . that is. . Since (Xn ) is a supermartingale. Example 3. . Let (Yn ) be a sequence of independent random variables such that Yn and eYn are integrable for all n ∈ Z+ and such that E(Yn ) ≥ 0. . Indeed. eYn . Then (Xn ) is a submartingale with respect to the ﬁltration (Fn ) where Fn = σ(Y0 . . Note also that each fj is non-negative. . (Xn ) is adapted. To verify the submartingale inequality. Yn ). 3′ For each n ∈ Z+ . Evidently. 2004 . by Jensen’s inequality. .7. (s → es is convex) November 22. we consider E(Xn+1 | Fn ) = E(eY0 +Y1 +···+Yn +Yn+1 | Fn ) = E(Xn eYn+1 | Fn ) = Xn E(eYn+1 | Fn ). since E(Yn+1 ) ≥ 0. . . + Yn ) = eY0 eY1 . ≥ Xn e . we have E(Xn ) = E(e(Y0 +···+Yn ) ) = E(eY0 eY1 . f m ) = E(f m ) E(f m ) . We have Xn = e(Y0 + . Letting m → ∞ E(f0 n n 0 1 and applying Lebesgue’s Monotone Convergence Theorem. let Y = Xn − E(Xn+1 | Fn ). Yn . Then m . (Xn ) is a martingale. . . Y ≥ 0 almost surely. It follows that Y = 0. we deduce that the product f0 . ≥ Xn . . To see this. X1 . fn is integrable (and its integral is given by the product E(f0 ) . Xn ≤ E(Xn+1 | Fn ) almost surely. . E(fn )). by hypothesis. Then X is a martingale. Example 3. set fj = fj ∧ m. . E(f m ). Each term eYj is integrable. almost surely. . (Xn ) is a submartingale if and only if (−Xn ) is a supermartingale. . . .. but why should the product be? It is the independence of the Yj s which does the trick — indeed. Let X = (Xn )n∈Z+ be a supermartingale (with respect to a given ﬁltration (Fn )) such that E(Xn ) = E(X0 ) for all n ∈ Z+ .8. . Each fj is bounded and m the fj s are independent. Taking expectations. E(Yn+1 ) by independence. . eYn ) = E(eY0 ) E(eY1 ) . . . .Martingales 31 The sequence X0 . m m [ Let fj = eYj and for m ∈ N. . we ﬁrst show that each Xn is integrable. by independence.

It follows that E(Zn+1 | Fn ) ≥ Zn almost surely. Chapter 3 (i) Suppose that (Xn ) and (Yn ) are submartingales. E( (Xk − Xj ) (Xn − Xm ) ) = 0 for any 0 ≤ j ≤ k ≤ m ≤ n. ξn That is. Theorem 3. Suppose that (ξn )n∈Z+ is a martingale and ξn ∈ L2 for 2 each n.11 (Orthogonality of martingale increments). so that Uk ≤ Xk and Uk ≤ Yk for all k. However.) (ii) Set Un = Xn ∧ Yn . (ii) Suppose that (Xn ) and (Yn ) are supermartingales. Then E( (Xn − Xm ) Y ) = 0 whenever n ≥ m and Y ∈ L2 (Fm ). as required. P (A) = P (B) = 1 and so P (A ∩ B) = 1. Proposition 3. Then (ξn ) is a submartingale.9. then A ∩ B = { E(Zn+1 | Fn ) ≥ Zn }. Then (Xn ∧ Yn ) is also a supermartingale. ifwilde Notes .10. In particular. We conclude that E(Xn+1 ∧ Yn+1 | Fn ) ≤ Xn ∧ Yn almost surely. Then (Xn ∨ Yn ) is also a submartingale. 2 2 ξn ≤ E(ξn+1 | Fn ) almost surely as required. Proof. Proof. The function ϕ : t → t2 is convex and so by Jensen’s inequality ϕ( E(ξn+1 | Fn ) ) ≤ E(ϕ(ξn+1 ) | Fn ) almost surely. In other words. Suppose that (Xn ) is an L2 -martingale with respect to the ﬁltration (Fn ). (i) Set Zn = Xn ∨ Yn . It follows that E(Un+1 | Fn ) ≤ E(Xn+1 | Fn ) ≤ Xn almost surely and similarly E(Un+1 | Fn ) ≤ E(Yn+1 | Fn ) ≤ Yn almost surely. as required. (If A = { E(Zn+1 | Fn ) ≥ Xn } and B = { E(Zn+1 | Fn ) ≥ Yn }. the increments (Xk − Xj ) and (Xn − Xm ) are orthogonal in L2 .32 Proposition 3. Then Zk ≥ Xk and Zk ≥ Yk for all k and so E(Zn+1 | Fn ) ≥ E(Xn+1 | Fn ) ≥ Xn almost surely and similarly E(Zn+1 | Fn ) ≥ E(Yn+1 | Fn ) ≥ Yn almost surely.

If a gambler places a unit stake at each game. 2004 . For n ∈ N. This can be interpreted as telling us that the knowledge of everything up to (and including) game n suggests that our winnings after game n + 1 will be less than the winnings after n games. . then the total winnings after n games is ξn = η1 + η2 + · · · + ηn . We can interpret this as saying that knowing everything up to play n. using the “tower property” and the Fm -measurability of Y . Gambling It is customary to mention gambling. then E(ξn ) = E(ξ1 ) ( = E(ξ0 )) = 0 in this case) so the expected total winnings never changes. of random variables where ηn is thought of as the “winnings per unit stake” at game play n. we see that E((Xn − Xm ) Y ) = E(E((Xn − Xm ) Y | Fm )) =0 = E(E((Xn − Xm | Fm ) Y ) since E(Xn − Xm | Fm ) = Xm − Xm = 0. To say that (ξn ) is a martingale is to say that E(ξn+1 | Fn ) = ξn almost surely. It is an unfavourable game for us. On the other hand. . . November 22. then ξn ≥ E(ξn+1 | Fn ) almost surely. ηn ) and set ξ0 = 0 and F0 = { Ω. If (ξn ) is a submartingale.Martingales 33 Proof. . let Fn = σ(η1 . Next. then arguing as above. we conclude that the game is in our favour. In other words. . . We gain no advantage nor suﬀer any disadvantage at play n + 1 simply because we have already played n games. Note that if (ξn ) is a martingale. . ∅ }. or E(ξn+1 − ξn | Fn ) = 0 almost surely. Consider a sequence η1 . if (ξn ) is a super martingale. the game is “fair”. The orthogonality of the martingale increments follows immediately by taking Y = Xk − Xj . we expect the gain at the next play to be zero. Note ﬁrst that (Xn − Xm )Y is integrable. η2 .

This can also be expressed as ζn+1 = ζn + αn+1 ηn+1 = ζn + αn+1 (ξn+1 − ξn ) . . . otherwise. Theorem 3. Then (ζn ) is a submartingale. However.12. mainly for notational convenience. We have a “gaming strategy”. Example 3. ifwilde Notes . Then (ζn ) is a martingale. Let (αn ) be a predictable process and as above. let F0 = (Ω. Using the gambling scenario. For each k ∈ Z+ . (ii) Suppose that (αn ) is bounded. . . (iii) Suppose that (αn ) is bounded. One decides whether to play at game k + 1 depending on the outcomes up to and including game k. we might think of αn as the number of stakes bought at game n. η1 . namely. . . Our choice for αn can be based on all events up to and including game n − 1. A stochastic process (αn )n∈N is said to be previsible (or predictable) with respect to a ﬁltration (Fn ) if αn is Fn−1 -measurable for each n ∈ N.14. The requirement that αn be Fn−1 -measurable is quite reasonable and natural in this setting. In the following discussion. ηk )-measurable. if (η0 . let (ζn ) be the process ζn = α1 (ξ1 − ξ0 ) + · · · + αn (ξn − ξn−1 ). αn ≥ 0 and that the process (ξn ) is a submartingale. αn ≥ 0 and that the process (ξn ) is a supermartingale. (i) Suppose that (αn ) is bounded and that (ξn ) is a martingale. η1 . . Then (ζn ) is a supermartingale. Note that one often extends the labelling to Z+ and sets α0 = 0.13. whether the outcome belongs to Bk or not. let Bk be a Borel subset of Rk+1 and set αk+1 = 1.34 Chapter 3 Deﬁnition 3. ηk ) ∈ Bk 0. ∅) and set ζ0 = 0. . ξn = η1 + · · · + ηn = ξn−1 + ηn and so ζn = α1 (ξ1 − ξ0 ) + α2 (ξ2 − ξ1 ) + · · · + αn (ξn − ξn−1 ) . The total winnings after n games will be ζn = α1 η1 + · · · + αn ηn . Then αk+1 is σ(η0 .

2. First note that |ζn | ≤ |α1 | (|ξ1 | + |ξ0 |) + · · · + |αn | (|ξn | + |ξn−1 |) and so ζn is integrable because each ξk is and (αn ) is bounded. Deﬁnition 3. Given an adapted process X and a predictable process C. a favourable game unfavourable or an unfavourable game favourable. Φn ≥ 0 almost surely and so αn Φn ≥ 0 almost surely and therefore ζn−1 ≤ E(ζn | Fn−1 ) almost surely. This last result can be interpreted as saying that no matter what strategy one adopts. The formula ζn = α1 (ξ1 − ξ0 ) + α2 (ξ2 − ξ1 ) + · · · + αn (ξn − ξn−1 ) is the martingale transform of ξ by α.16. 1. Moreover. November 22.15. The general deﬁnition follows. so E(ζn | Fn−1 ) = ζn−1 almost surely. In case (ii). E(ζn | Fn−1 ) − ζn−1 = αn Φn . 2004 . Next. in case (i). Φn ≤ 0 almost surelyand therefore αn Φn ≤ 0 almost surely. then C · X is also a martingale.Martingales 35 Proof. Φn = 0 almost surely. then C · X is a submartingale if X is and C · X is a supermartingale if X is. We have seen that if C is bounded and X is a martingale. In case (iii). if C ≥ 0 almost surely. Remarks 3. Now. we have E(ζn | Fn−1 ) = E(ζn−1 + αn (ξn − ξn−1 ) | Fn−1 ) = ζn−1 + E(αn (ξn − ξn−1 ) | Fn−1 ) E(ξn | Fn−1 ) − ξn−1 Φn = ζn−1 + αn E((ξn − ξn−1 ) | Fn−1 ) That is. the (martingale) transform of X by C is the process C · X with (C · X)0 = 0 (C · X)n = 1≤k≤n Ck (Xk − Xk1 ) (which means that (C · X)n − (C · X)n−1 = Cn (Xn − Xn−1 )). It follows that ζn−1 ≥ E(ζn | Fn−1 ) almost surely. it is not possible to make a fair game “unfair”.

then { τ ≤ n } ∈ Fn for all n ∈ Z+ . Let τ : Ω → Z+ ∪ { ∞ }. Proof. {τ ≤ n} = {τ = k}.36 Chapter 3 Stopping Times A map τ : Ω → Z+ ∪ { ∞ } is said to be a stopping time (or a Markov time) with respect to the ﬁltration (Fn ) if { τ ≤ n } ∈ Fn for each n ∈ Z+ . Let σ and τ be stopping times (with respect to (Fn )). Proof. the constant time τ (ω) = k is a stopping time. suppose that { τ = n } ∈ Fn for all n ∈ Z+ .17. For any ﬁxed k ∈ Z+ . For example. ifwilde Notes 0≤k≤n {σ = k} ∩ {τ = n − k}.19. Then τ is a stopping time (with respect to the ﬁltration (Fn )) if and only if { τ = n } ∈ Fn for every n ∈ Z+ . Proof. One can think of this as saying that the information available by time n should be suﬃcient to tell us whether something has “stopped by time n”or not. Hence { τ = n } ∈ Fn for all n ∈ Z+ . Then 0≤k≤n Each event { τ = k } belongs to Fk ⊂ Fn if k ≤ n and therefore it follows that { τ ≤ n } ∈ Fn . we should not need to be watching out for a company’s proﬁts in September if we only want to know whether it went bust in May. If k > n. Proposition 3. On the other hand. Since σ(ω) ≥ 0 and τ (ω) ≥ 0. if k ≤ n. For the converse.18. If τ is a stopping time. Then σ + τ . we see that {σ + τ = n} = Hence { σ + τ ≤ n } ∈ Fn . then { τ ≤ n } = Ω ∈ Fn . then { τ ≤ n } = ∅ ∈ Fn . Proposition 3. But { τ = 0 } = { τ ≤ 0 } and { τ = n } = { τ ≤ n } ∩ { τ ≤ n − 1 }c ∈Fn ∈Fn−1 ⊂Fn for any n ∈ N. . σ ∨ τ and σ ∧ τ are also stopping times. Proposition 3.

Let X be a process and τ a stopping time. X23 (ω). Firstly. X99 (ω ′ )..Martingales 37 Next. Furthermore. each term on the right hand side is integrable τ and so we deduce that (Xn ) is adapted and integrable. Proposition 3. X23 (ω). . X22 (ω). submartingale. If (Xn ) is a martingale (resp. . Deﬁnition 3. . then Xτ ∧n takes values X1 (ω ′ ). . . supermartinτ gale). Xτ (ω)∧2 (ω). . { τ = k } ∈ Fk and { τ ≤ n }c ∈ Fn and so it follows that Xn is Fn -measurable. Of course. X23 (ω).20. . November 22. X2 (ω ′ ). so is the stopped process (Xn ). then Xn (ω) is given by Xτ (ω)∧1 (ω). n τ Xn = k=0 Xk 1{ τ =k } + Xn 1{ τ ≤n }c . The process X τ stopped by τ is the process (Xn ) with τ Xn (ω) = Xτ (ω)∧n (ω) for ω ∈ Ω. X98 (ω ′ ). Xτ ∧n is constant for n ≥ 23. we have { σ ∧ τ ≤ n } = { σ ∧ τ > n }c = ({ σ > n } ∩ { τ > n })c ∈ Fn and therefore σ ∧ τ is a stopping time. . . that is. Proof. τ (ω) = 23. τ Now. = X1 (ω). with τ (ω ′ ) = 99 say. for another outcome ω ′ . . X99 (ω ′ ). . 2004 . . X99 (ω ′ ). Finally. . we note that τ Xn (ω) = Xτ (ω)∧n (ω) = Xk (ω) Xn (ω) if τ (ω) = k with k ≤ n if τ (ω) > n.21. . . τ So if the outcome is ω and say. . X2 (ω). we note that { σ ∨ τ ≤ n } = { σ ≤ n } ∩ { τ ≤ n } ∈ Fn and so σ ∨ τ is a stopping time. .

it is measurable with respect to S (in fact. if (Xn ) is a martingale  ≥ 0. if (Xn ) is a supermartingale since 1{ τ ≥n+1 } ∈ Fn τ and so it follows that (Xn ) is a martingale (respectively. = Xn+1 1{ τ =n+1 } + Xn+1 1{ τ >n+1 } − Xn 1{ τ >n } Taking the conditional expectation. Then (on { τ < ∞ }) { Xτ ∈ B } = { X τ ∈ B } ∩ = k∈Z+ k∈Z+ {τ = k} = Fk k∈Z+ ({ Xτ ∈ B } ∩ { τ = k }) { Xk ∈ B } ∈ k∈Z+ which shows that Xτ is a bone ﬁde random variable. that is. Deﬁnition 3. S. / where X∞ is any arbitrary but ﬁxed constant. it is measurable with respect to σ n Fn ). The random variable stopped by τ is deﬁned to be Xτ (ω) = Xτ (ω) (ω). if (Xn ) is a submartingale   ≤ 0. supermartingale) if (Xn ) is. X∞ . Then Xτ really is a random variable. submartingale. we see that n+1 τ τ Xn+1 − Xn = k=0 Chapter 3 Xk 1{ τ =k } + Xn+1 1{ τ >n+1 } n − k=0 Xk 1{ τ =k } − Xn 1{ τ >n } = Xn+1 1{ τ ≥n+1 } − Xn 1{ τ >n } = (Xn+1 − Xn ) 1{ τ ≥n+1 } . P ) and let τ be a stopping time such that τ < ∞ almost surely.22.38 Next. The right hand side gives E((Xn+1 − Xn ) 1{ τ ≥n+1 } | Fn ) = 1{ τ ≥n+1 } E((Xn+1 − Xn ) | Fn ) = 1{ τ ≥n+1 } (E(Xn+1 | Fn ) − Xn )   = 0. Let (Xn )n∈Z+ be an adapted process with respect to a given ﬁltration (Fn ) built on a probability space (Ω. let B be any Borel set in R. To see this. the left hand side gives τ τ τ τ E(Xn+1 − Xn | Fn ) = E(Xn+1 | Fn ) − Xn τ since Xn is adapted. ifwilde Notes . for ω ∈ { τ ∈ Z+ } if τ ∈ Z+ .

Martingales 39 Example 3. (3) E(Xn 1{ τ >n } ) → 0 as n → ∞. . Let (Xn )n∈Z+ be a martingale and τ a stopping time such that: (1) τ < ∞ almost surely (i. ) = E(X1 1{ N =1 } + (X1 + X2 )1{ N =2 } + = E(X1 g1 ) + E(X2 g2 ) + . .e. ) = E(X1 ) E(N ) — known as Wald’s equation. let Sn = X1 + · · · + Xn . by independence. Theorem 3. . . = Xτ ∧n + (Xτ − Xτ ∧n )(1τ ≤n + 1τ >n ) November 22. P (τ ∈ Z+ ) = 1). . = E(X1 1{ N ≥1 } + X2 1{ N ≥2 } + X3 1{ N ≥3 } + . . For each n ∈ N. For n ∈ N. Xn ) be the ﬁltration generated by the Xj s and suppose that N is a bounded stopping time (with respect to (Fn )). Then clearly E(Sn ) = n E(X1 ). .23 (Wald’s equation). We calculate E(SN ) = E(S1 1{ N =1 } + S2 1{ N =2 } + . . Then E(Xτ ) = E(X0 ). ) Xt = Xτ ∧n + (Xτ − Xτ ∧n ) = Xτ ∧n + (Xτ − Xn )1τ >n . 2004 . We have Xτ ∧n = and so Xτ Xn on { τ ≤ n } on { τ > n } = E(X1 ) (P (N = 1) + 2 P (N = 2) + 3 P (N = 3) + .24 (Optional Stopping Theorem). Proof. . .. Let (Xj )j∈N be a sequence of independent identically distributed random variables with ﬁnite expectation. . . ) where gj = 1{ N ≥j } . let Fn = σ(X1 . Therefore E(SN ) = E(X1 ) ( P (N ≥ 1) + P (N ≥ 2) + P (N ≥ 3) + . . ) + (X1 + X2 + X3 )1{ N =3 } + . But 1{ N ≥j } = 1 − 1{ N <j } and so (gj ) is predictable and. E(Xj gj ) = E(Xj ) E(gj ) = E(Xj ) P (N ≥ j) = E(X1 ) P (N ≥ j) since E(Xj ) = E(X1 ) for all j. . . . (2) Xτ is integrable.

we note these conditions hold if τ is a bounded stopping time. Let ξn = Y1 + · · · + Yn . Remark 3. Yn ). We can recover Wald’s equation as a corollary. The middle term in (∗) gives (using 1{ τ =∞ } = 0 almost surely) E(Xτ 1{ τ >n } ) = ∞ k=n+1 E(Xk 1{ τ =k } ) → 0 as n → ∞ because. If Sn = X1 + · · · + Xn . by hypothesis. If X is a submartingale and the conditions (1). . we see that E(ξN ) = E(ξ1 ) = E(Y1 ) = 0. be a sequence of independent identically distributed random variables with means E(Xj ) = µ and let Yj = Xj − µ. Remark 3. By the Theorem (but with index set now N rather than Z+ ). . This follows just as for the case when X is a martingale except that one now uses the inequality E(Xτ ∧n ) ≥ E(Xτ ∧0 ) = E(X0 ) in equation (∗). . we know that E(Xn 1{ τ >n } ) → 0 and so letting n → ∞ in (∗) gives the desired result.26. . .40 Taking expectations. . by (2). In particular. . then we have E(Xτ ) ≥ E(X0 ) . Indeed. Finally. (2) and (3) of the theorem hold. let X1 . Xn ) = σ(Y1 . ifwilde Notes . . . then we see that ξN = SN − N µ and we conclude that E(SN ) = E(N ) µ = E(N ) E(X1 ) which is Wald’s equation. Then we know that (ξn ) is a martingale with respect to the ﬁltration (Fn ) where Fn = σ(X1 . E(Xτ ) = ∞ k=0 E(Xk 1{ τ =k } ) < ∞ . where N is a stopping time obeying the conditions of the theorem. . X2 . we get E(Xτ ) = E(Xτ ∧n ) + E(Xτ 1{ τ >n } ) − E(Xn 1{ τ >n } ) Chapter 3 (∗) But we know (Xτ ∧n ) is a martingale and so E(Xτ ∧n ) = E(Xτ ∧0 ) = E(X0 ) since τ (ω) ≥ 0 always. .25.

Suppose that (Xn ) is a nonnegative submartingale with respect to the ﬁltration (Fn ). k≤m ∗ Proof. November 22. Then evidently τ ≤ n almost surely (and takes values in Z+ ). Then E(Xm ) ≥ E(Xτ ) . = E(Xτ ) as required. Xj dP. for any m ∈ Z+ and λ > 0. for j < m { τ = j } = { X0 < λ } ∩ { X1 < λ } ∩ · · · ∩ { Xj−1 < λ } ∩ { Xj ≥ λ } ∈ Fj and { τ = m } = { X0 < λ } ∩ { X1 < λ } ∩ · · · ∩ { Xm−1 < λ } ∈ Fm and so we see that τ is a stopping time as claimed. 2004 .28 (Doob’s Maximal Inequality). Theorem 3. otherwise. Fix m ∈ Z+ and let Xm = maxk≤m Xk . We shall show that τ is a stopping time. λ P (max Xk ≥ λ) ≤ E(Xm 1{ maxk≤m Xk ≥λ } ) . Indeed.27. ≥ j=0 { τ =j } since E(Xm | Fj ) ≥ Xj almost surely. Let (Xn ) be a submartingale with respect to the ﬁltration (Fn ) and suppose that τ is a bounded stopping time with τ ≤ m where m ∈ Z. if { k : k ≤ m and Xk (ω) ≥ λ } = ∅ m. let τ (ω) = min{ k ≤ m : Xk (ω) ≥ λ }. since { τ = j } ∈ Fj . For λ > 0.Martingales 41 Lemma 3. Proof. Then. We have m E(Xm ) = j=0 m Ω Xm 1{ τ =j } dP Xm dP = j=0 m { τ =j } = j=0 m { τ =j } E(Xm | Fj ) dP.

Since (Xn ) is an L2 -martingale. Thus τ = m. we note that by the Lemma (since τ ≤ m) E(Xm ) ≥ E(Xτ ) . then there is no j with 0 ≤ j ≤ m and Xj ≥ λ. set A = { Xm ≥ λ }. Ω 2 Xm dP 2 λ2 P (max |Xk | ≥ λ) ≤ E(Xm ) k≤m as required. Rearranging. Then Chapter 3 E(Xτ ) = A Xτ dP + Ac Xτ dP . ∗ Now.10). it follows that the process (Xn ) is a submartingale (Proposition 3.29. if Xm < λ. 2 Proof. ∗ If Xm ≥ λ. Then 2 λ2 P (max |Xk | ≥ λ) ≤ E(Xm ) k≤m Xm dP A for any λ ≥ 0. Let (Xn ) be an L2 -martingale. (τ is the minimum of such ks so Xk0 ≥ λ. Hence Ω Xm dP = E(Xm ) ≥ E(Xτ ) = ≥ λ P (A) + Xτ dP + A Ac Xτ dP Ac Xm dP . Applying Doob’s Maximal Inequality to 2 the submartingale (Xn ) (and with λ2 rather than λ). by construction. ifwilde Notes .) Therefore X τ = X k0 ≥ λ . ∗ On the other hand. then there is some 0 ≤ k ≤ m with Xk ≥ λ and τ = k0 ≤ k in this case. Corollary 3. we get 2 λ2 P (max Xk ≥ λ2 ) ≤ k≤m 2 Xm dP 2 { maxk≤m Xk ≥λ2 } ≤ that is.42 Next. we ﬁnd that λ P (A) ≤ and the proof is complete.

Then E(X 2 ) = 2 0 ∞ t P (X ≥ t) dt .Martingales 43 Proposition 3. Setting x = X(ω). by the Maximal Inequality. Proof. Suppose that X ≥ 0 and X 2 is integrable. we can write x2 = 2 0 x t dt ∞ =2 0 t 1{ x≥t } dt . Then ∗ 2 E( (Xn )2 ) ≤ 4 E( Xn ) ∗ where Xn = maxk≤n Xk . 0 Xn dP dt dP . 2004 .31 (Doob’s Maximal L2 -inequality). For x ≥ 0. Using the proposition. Proof. In alternative notation. ∗ Xn 2 ≤ 2 Xn 2 . we get X(ω)2 = 2 0 ∞ t 1X(ω)≥t dt so that E(X 2 ) = 2 0 ∞ ∞ t E(1X≥t ) dt t P (X ≥ t) dt . we ﬁnd that ∗ Xn 2 2 ∗ = E( (Xn )2 ) ∞ 0 ∞ ∞ 0 ∗ { Xn ≥t } ∗ Xn =2 ≤2 =2 =2 ∗ t P (Xn 1{ Xn ≥t } ) dt ∗ E(Xn 1{ Xn ≥t } ) dt .30. Let (Xn ) be a non-negative L2 -submartingale. =2 0 Theorem 3. Xn Ω 0 ∗ Xn Xn dP =2 Ω November 22. dt by Tonelli’s Theorem.

X7 (ω) and X11 (ω). xn is eventually inside the interval (α − ε. b] by (Xn (ω)).1: Up-crossings of [a. Fix a < b. Such a crossing is called an up-crossing of [a. n+k) for which xn < a and xn+k > b. We wish to count the number of o times the path (Xn (ω)) crosses the band { (x. . X13 (ω) each constitute an up-crossing. In the example in the ﬁgure 3. By way of motivation.44 ∗ = 2 E(Xn Xn ) Chapter 3 ≤ 2 Xn It follows that 2 ∗ Xn 2. . X12 (ω). . ≤ 2 Xn ∗ Xn 2 2 or ∗ 2 E( (Xn )2 ) ≤ 4 E( Xn ) and the proof is complete. X7 (ω) X13 (ω) y=b y=a X11 (ω) X3 (ω) Figure 3. y) : a ≤ y ≤ b } from below a to above b. Consider a process (Xn ) with respect to the ﬁltration (Fn ). a ≤ y ≤ b } ﬁnitely-many times. We look at such crossings for processes (Xn ). for any a < b. Now ﬁx ω ∈ Ω and consider the sequence X0 (ω). α + ε). X1 (ω). xn ) can only cross the semi-inﬁnite strip { (x. X2 (ω). b] by (Xn (ω)). In particular. . . consider a sequence (xn ) of real numbers. . the path sequences X3 (ω). y) : x > 0. If xn → α as n → ∞. . The path sequence ifwilde Notes X16 (ω) . Doob’s Up-crossing inequality We wish to discuss convergence properties of martingales and an important ﬁrst result in this connection is Doob’s so-called up-crossing inequality. then for any ε > 0. by Schwarz’ inequality. In other words.1. there can be only a ﬁnite number of pairs (n. the graph of points (n. (although the value of X0 (ω) will play no rˆle here).

Fix m and consider the sum (up to time m) (— the transform of X by g at time m) m j=1 gj (ω) (Xj (ω) − Xj−1 (ω) ) = g1 (ω) (X1 (ω) − X0 (ω) ) + · · · + gm (ω) (Xm (ω) − Xm−1 (ω) ) .   0. Xr+s (ω) forms an up-crossing of [a. Indeed. gr+1 (ω) = · · · = gr+s (ω) = 1. . X18 (ω). = Xr+s (ω) − Xr (ω) Let Um [a.  gn+1 (ω) = 1. X17 (ω). . (∗) The gj s take the values 1 or 0 depending on whether an up-crossing is in progress or not. g2 (ω) = if X1 (ω) < a otherwise if g2 (ω) = 0 and X2 (ω) < a if g2 (ω) = 1 and X2 (ω) ≤ b otherwise if gn (ω) = 0 and Xn (ω) < a if gn (ω) = 1 and Xn (ω) ≤ b otherwise. we introduce the process (gn )n∈N deﬁned as follows: g1 (ω) = 0 1. b](ω) denote the number of completed up-crossings of (Xn (ω)) up to time m. X19 (ω) forms a partial up-crossing (which. If not all gj s are zero. then the sum (∗) is zero.Martingales 45 X16 (ω). We see that gn = 1 when an up-crossing is in progress.   0. will never be completed if Xn (ω) ≤ b remains valid for all n > 16). . 0. 2004 .  1. As an aid to counting such up-crossings. . b] and we see that r+s j=r+1 r+s gj (ω)(Xj (ω) − Xj−1 (ω)) = j=r+1 (Xj (ω) − Xj−1 (ω) ) >b−a because Xr < a and Xr+s > b. b](ω) up-crossings November 22. . then the path Xr (ω). gr+s+1 (ω) = 0. we can say that the path has made Um [a. in fact. . if gr (ω) = 0. . If gj (ω) = 0 for each 1 ≤ j ≤ m.  g3 (ω) = 1.  1.

we see that R = Xm (ω) − Xk−1 (ω) > Xm (ω) − a where the inequality follows because gk (ω) = 1 and so Xk−1 (ω) < a. Then gm+1 = 1{ gm =0 }∩{ Xm <a } + 1{ gm =1 }∩{ Xm ≤b } which is Fm -measurable. . it follows that (gn )n∈N is predictable. Xm (ω). g2 = 1{ X1 <a } and so g2 is F1 -measurable. The inequality f ≥ −f − allows us to estimate R by R ≥ −(Xm − a)− (ω) which is valid also when there is no partial up-crossing (so R = 0).32. we obtain the estimate m j=1 gj (Xj − Xj−1 ) ≥ Um [a. deﬁned by f ± = 1 (|f |±f ). Now. let us suppose that gm is Fm−1 -measurable. by construction. Hence we may estimate (∗) by m j=1 gj (ω) (Xj (ω) − Xj−1 (ω) ) ≥ Um [a. X1 (ω). . b](ω)(b − a) + R where R = 0 if there is no residual partial up-crossing but otherwise is given by m R= j=k gj (ω) (Xj (ω) − Xj−1 (ω) ) where k is the largest integer for which gj (ω) = 1 for all k ≤ j ≤ m + 1 and gk−1 = 0. Proof. . It is clear that g1 is F0 -measurable. b](b − a) − (Xm − a)− (∗∗) on Ω. as required. By induction.46 Chapter 3 (which may be zero) followed possibly by a partial up-crossing (which will be the case if and only if gm (ω) = gm+1 (ω) = 1). The process (gn ) is predictable. . The sequence Xk−1 (ω) < a. Proposition 3. . For the general case. This means that Xk−1 (ω) < a and Xj (ω) ≤ b for k ≤ j ≤ m. Xm (ω) ≤ b forms the partial up-crossing at the end of the path X0 (ω). . Putting all this together. ifwilde Notes . Furthermore. Since we have gk (ω) = · · · = gm (ω) = 1. Xk (ω) ≤ b. any real-valued function f can be written as f = f + −f − where f ± are the positive and negative parts of f . . 2 Evidently f ± ≥ 0. .

E(gn ) ↑ E(g) and therefore E(g) ≤ K. Hence m P (B) ≤ E(g) ≤ K . b] − (Xm − a)− . Then g = m on B (because fn (ω) is eventually > m if ω ∈ B). b] ) − E((Xm − a)− ) . For any supermartingale (Xn ). Proof. Now 0 ≤ gn ≤ fn =⇒ E(gn ) ≤ E(fn ) ≤ K . Set gn = fn ∧ m. By Lebesgue’s Monotone Convergence Theorem.33. It follows that g = limn gn exists and obeys 0 ≤ g ≤ m. b] ) ≤ for any a < b. This holds for any m and so it follows that P (B) = 0. However. Let B = { ω : fn (ω) → ∞ }. b] ) − E((Xm − a)− ) and the result follows.35 (Doob’s Martingale Convergence Theorem).Martingales 47 Theorem 3. we obtain the inequality E( (g · X)m ) ≥ (b − a) E( Um [a. The non-negative. n Proof. so we may say that 0 ≥ (b − a) E( Um [a. (Xn − a)− = 1 2 E( (Xn − a)− ) b−a ≤ |Xn | + |a| (|Xn − a| − (Xn − a)) November 22. Lemma 3. Then P (lim fn < ∞) = 1 . b] ) ≤ E((Xm − a)− ) .34. Then gn ↑ and 0 ≤ gn ≤ m for all n. Let (Xn ) be a supermartingale such that E(Xn ) < M for all n ∈ Z+ . We have seen that E( Un [a. Then there is an integrable random variable X such that Xn → X almost surely as n → ∞. The estimate (∗∗) becomes (g · X)m ≥ (b − a) Um [a. bounded process (gn ) is predictable and so the martingale transform ((g · X)n ) is a supermartingale. Proof. fn ↑ and such that E(fn ) ≤ K for all n. But E( (g · X)m ) ≤ E( g · X)1 ) since ((g · X)n ) is a supermartingale. Suppose that (fn ) is a sequence of random variables such that fn ≥ 0. we have (b − a) E( Um [a. by construction. 2004 . Taking expectations. and (g · X)1 = 0. Theorem 3.

b] ) exists and obeys limn E( Un [a. Denote this limit by X. Corollary 3. by its very construction. Let A = a<b { limn Un [a. Then A is a countable intersection E( Un [a. b] < ∞ }. b] and M + |a| so limn E( Un [a. b] ) ≤ . Since Xn ≥ 0 almost surely (and is a supermartingale). n by Fatou’s Lemma. Hence B ∩ A = ∅ which means that B ⊂ Ac and so P (B) ≤ P (B c ) = 0. by hypothesis. However. E( (Xn − a)− ) ≤ E|Xn | + |a| ≤ M + |a| giving Chapter 3 M + |a| b−a + . It follows that Xn converges almost surely as claimed. then B = { lim inf Xn < lim sup Xn } ⊂ Ω would have positive probability. b](ω) = ∞ (because (Xn (ω)) would cross [a. b] inﬁnitely-many times).48 and so. Remark 3. Now apply the theorem. Proof. if not. b] ≤ U for any n ∈ Z n n+1 [a.36. Let (Xn ) be a positive supermartingale. U [a. it follows that 0 ≤ E(Xn ) ≤ E(X0 ) and so (E(Xn )) is bounded. For. b ∈ Q with a < b such that lim inf Xn (ω) < a < b < lim sup Xn (ω) which means that limn Un [a. For any ω ∈ B. it follows that Un [a. These results also hold for martingales and submartingales (because (−Xn ) is a supermartingale whenever (Xn ) is a submartingale). ifwilde Notes .b∈Q of sets of probability 1 and so P (A) = 1. ≤M and the proof is complete. Then / E( |X| ) = E( lim inf |Xn | ) n ≤ lim inf E(|Xn | . there exist a. with X(ω) = 0 for ω ∈ A. b] converges almost surely (to a ﬁnite value). b] ) ≤ a. Then there is some X ∈ L1 such that Xn → X almost surely. Claim: (Xn ) converges almost surely. By the b−a lemma.37.

suppose there is some positive constant M such that − E(Xn ) ≤ M for all n. Indeed. for any ε > 0 there is M such that { |Yα |>M } |Yα | dP < ε for all α.39. The collection { Yα : α ∈ I } of integrable random variables labeled by some index set I is said to be uniformly integrable if for any given ε > 0 there is some M > 0 such that E( |Yα |1{ |Yα |>M } ) = for all α ∈ I. that is − sup E(|Xn |) < ∞ ⇐⇒ sup E(Xn ) < ∞ .40. Deﬁnition 3. For example. Since E(Xn ) ≤ E(Xn + Xn ) = E(|Xn |).) Remark 3. Then + − − E(|Xn |) = E(Xn + Xn ) = E(Xn ) + 2 E(Xn ) ≤ E(X0 ) + 2M and so (Xn ) is L1 -bounded. The supermartingale (Xn ) is L1 -bounded if and only if the − process (Xn ) is. But then Ω { |Yα |>M } |Yα | dP < ε |Yα | dP = { |Yα |≤M } |Yα | dP + { |Yα |>M } |Yα | dP ≤ M + ε for all α ∈ I. we − see immediately that if E(|Xn |) ≤ M for all n. by deﬁnition. almost sure convergence does not necessarily imply L1 convergence. (E(Xn ) ≤ E(X0 ) because (Xn ) is a supermartingale. Proof. suppose that Y has a uniform distribution on (0.38. Remark 3. Set ξ = |X| and for n ∈ N let ξn = ξ 1{ ξ≤n } . Note that the theorem gives convergence almost surely. However. L1 convergence is assured. n n − + − We can see this as follows.41.42. Before proceeding. then also E(Xn ) ≤ M for all n. Let X ∈ L1 . Then for any ε > 0 there is δ > 0 such that if A is any event with P (A) < δ then A |X| dP < ε. under an extra condition.Martingales 49 Remark 3. Then Xn → 0 almost surely but Xn 1 = 1 for all n and so it is false that Xn → 0 in L1 . In general. 1) and for each n ∈ N let Xn = n1{ Y <1/n } . Then ξn → ξ almost surely and by Lebesgue’s Monotone Convergence Theorem ξ dP = { ξ>n } Ω ξ (1 − 1{ ξ≤n } ) dP = Ω ξ dP − Ω ξn dP → 0 November 22. Conversely. Lemma 3. Note that any uniformly integrable family { Yα : α ∈ I } is L1 -bounded. we shall establish the following result. 2004 .

the integrability of X means that there is δ > 0 such that A |X| dP < ε whenever P (A) < δ. We shall consider separately the three terms on the right hand side. We shall show that these two facts imply that Xn → X in L1 . { |Xn |>M } |X| dP < ε . By the hypothesis of uniform integrability. we may say that the second term { |Xn |>M } |Xn | dP < ε for all suﬃciently large M . for all suﬃciently large M . Let ε > 0 be given and let n be so large that For any event A. By hypothesis. 2 ξ dP + A∩{ ξ>n } ξ dP ξ dP ≤ <ε n dP + A { ξ>n } < n P (A) + 1 ε 2 whenever P (A) < δ where δ = ε/2n. the family { Xn : n ∈ Z+ } is uniformly integrable and we know that Xn converges almost surely to some X ∈ L1 . we have ξ dP = A A∩{ ξ≤n } { ξ>n } ξ Chapter 3 dP < 1 ε. Suppose that (Xn ) is a uniformly integrable supermartingale. Let ε > 0 be given. Proof. Hence.43 (L1 -convergence Theorem). by uniform integrability. Now we note that (provided M > 1) P ( |Xn | > M ) = 1{ |Xn |>M } dP ≤ 1{ |Xn |>M } |Xn | dP < δ Ω Ω for large enough M .50 as n → ∞. As for the third term. Theorem 3. Then there is an integrable random variable X such that Xn → X in L1 . Notes ifwilde . We estimate Xn − X 1 = Ω |Xn − X| dP |Xn − X| dP + |Xn − X| dP |Xn | dP + { |Xn |>M } { |Xn |>M } = { |Xn |≤M } |Xn − X| dP ≤ { |Xn |≤M } + { |Xn |>M } |X| dP .

In other words. Since Xn 1 ≤ Xn 2 . Proof. Suppose that (Xn ) is 2 an L1 -martingale such that E(Xn ) < K for all n. But the martingale property implies that for any n ≥ m E(X − Xn | Fm ) = E(X | Fm ) − E(Xn | Fm ) = E(X | Fm ) − Xm which does not depend on n. (2) E( (X − Xn )2 ) → 0. The random variable 1{ |Xn |≤M } |Xn −X| is bounded by the integrable random variable M + |X| and so it follows from Lebesgue’s Dominated Convergence Theorem that |Xn − X| dP = 1{ |Xn |≤M } |Xn − X| dP → 0 { |Xn |≤M } Ω as n → ∞ and the result follows. Fix M > 0 so that the above bounds on the second and third terms hold. The conditional expectation is a contraction on L1 .Martingales 51 Finally. it follows that (Xn ) is a bounded L1 -martingale and so there is some X ∈ L1 such that Xn → X almost surely. Then. November 22. Xn → X almost surely and also in L2 . We have Xn − X0 2 2 = E( (Xn − X0 )2 ) n n j=1 =E k=1 n (Xk − Xk−1 ) (Xj − Xj−1 ) = k=1 E (Xk − Xk−1 )2 . E(X | Fm ) 1 ≤ X 1 for any X ∈ F 1 .44. for each n. 2004 . Xn = E(X | Fn ) almost surely. We must show that X ∈ L2 and that Xn → X in L2 . Proof. we consider the ﬁrst term. by orthogonality of increments. 1 = 0 and Theorem 3.45 (L2 -martingale Convergence Theorem). It follows that E(X | Fm ) − Xm so E(X | Fm ) = Xm almost surely. for ﬁxed m. Proposition 3. Therefore. Then there is X ∈ L2 such that (1) Xn → X almost surely. Let (Xn ) be a martingale with respect to the ﬁltration (Fn ) such that Xn → X in L1 . E(X − Xn | Fm ) 1 ≤ X − Xn 1 →0 as n → ∞. that is.

so (An ) is predictable. we see (by induction) that An is integrable. ifwilde Notes . Suppose that (Xn ) is an adapted L1 process. Theorem 3. and (An ) is a predictable process. if also X = X0 + M ′ + A′ . then A is increasing. we get E( (X − Xm )2 ) = E( lim inf (Xn − Xm )2 ) n ≤ lim inf E( (Xn − Xm )2 ) n ≤ε for all suﬃciently large m. if X is a supermartingale. Such a decomposition is unique. Hence X − Xn ∈ L2 and so X ∈ L2 and Xn → X in L2 . Furthermore. then M = M ′ almost surely and A = A′ almost surely. as required.52 However. Evidently.46. 2 2 E( (Xn − X0 )2 ) = E(Xn ) − E(X0 ) < K Chapter 3 and so n E (Xk − Xk−1 )2 increases with n but is bounded. Furthermore. Doob-Meyer Decomposition The (continuous time formulation of the) following decomposition is a crucial idea in the abstract development of stochastic integration. A is null at 0 and An is Fn−1 -measurable. Let ε > 0 be given. that is. Then (Xn ) has the decomposition X = X0 + M + A where (Mn ) is a martingale. and so k=1 must converge. n. null at 0. Then (as above) n E( (Xn − Xm )2 ) = k=m+1 E (Xk − Xk−1 )2 < ε for all suﬃciently large m. null at 0. Proof. Letting n → ∞ and applying Fatou’s Lemma. We deﬁne the process (An ) by A0 = 0 An = An−1 + E(Xn − Xn−1 | Fn−1 ) . by the martingale property.

s.s.s.Martingales 53 Now.s. Now suppose that X = X0 +M +A is a submartingale. that is. Then M is a martingale. M = X − X0 − A and M ′ = X − X0 − A′ and so ′ Mn (ω) = Mn (ω) for all n for all ω ∈ Λ. 2004 . suppose that X = X 0 + M ′ + A′ = X 0 + M + A . Let Mn = (Xn −An )−X0 . To establish uniqueness. An = E(An | Fn−1 ) ≥ An−1 a. 1 0 1 =0a. November 22. it follows that there is some Λ ⊂ Ω with P (Λ) = 1 and such that An (ω) ≥ An−1 (ω) for all n and all ω ∈ Λ.s. which means that (Xn −An ) is an L1 -martingale. E(Xn − Xn−1 | Fn−1 ) = A′ − A′ n n−1 and so An − An−1 = A′ − A′ n n−1 a. Continuing in this way. A is increasing almost surely. Similarly. from the construction of A.s. a. we see that An = A′ a. Then A = X−M −X0 is also a submartingale so. Once again. However. M is null at 0 and we have the Doob decomposition X = X0 + M + A . both A and A′ are null at 0 and so A0 = A′ (= 0) almost surely and 0 therefore A1 = A′ + (A0 − A′ ) =⇒ A1 = A′ a. since M is a martingale and A is predictable. Then E(Xn − Xn−1 | Fn−1 ) = E( ( Mn + An ) − ( Mn−1 + An−1 ) | Fn−1 ) = An − An−1 a. we ﬁnd that E( ( Xn − An ) | Fn−1 ) = E(Xn | Fn−1 ) = Xn−1 − An−1 − E(An−1 | Fn−1 ) − E(( Xn − Xn−1 ) | Fn−1 ) a. Now. that is M = M ′ almost surely.s. It follows that n there is some Λ ⊂ Ω with P (Λ) = 1 and such that An (ω) = A′ (ω) for all n n for all ω ∈ Λ. since A is predictable.s. for each n.

we must have the Notes . Note that if A is any predictable. Corollary 3. Let X be an L2 -martingale. Let (Fn )n∈Z+ be a ﬁltration e on a probability space (Ω. For any n ∈ Z+ and any B ∈ Fn . F∞ . predictable process. We simply note that X 2 is a submartingale and apply the theorem. P. We shall show that Y = X almost surely. Deﬁnition 3. Proposition 3. Proof. For n ∈ Z+ . then X must be a submartingale (because M is a martingale). integrable process. It follows that if X = X0 + M + A is the Doob decomposition of X and if the predictable part A is increasing.50 (L´vy’s Upward Theorem). (Fn ))) tells us that there is some Y ∈ L2 (F∞ ) such that Xn → Y almost surely. P ). B On the other hand. The following result is an application of a Monotone Class argument. we have Xn dP = B B E(X | Fn ) dP = X dP.54 Chapter 3 Remark 3. Set F∞ = σ( n Fn ) and let X ∈ L2 be F∞ -measurable. The almost surely increasing (and null at 0) process A is called the quadratic variation of the L2 -process X. Xn dP − Y dP = B Ω B 1B (Xn − Y ) dP ≤ Xn − Y 2 2. The family (Xn ) is an L2 -martingale with respect to the ﬁltration (Fn ) and so the Martingale Convergence Theorem (applied to the ﬁltered probability space (Ω.47. null at 0. Then Xn → X almost surely. Since Xn − Y equality B (X − Y ) dP = 0 for any B ∈ n Fn .49. null at 0 and A is an increasing. increasing. and also Xn → Y in L2 (F∞ ). Proof. Then X 2 has decomposition 2 X 2 = X0 + M + A where M is an L1 -martingale. ifwilde → 0. S. let Xn = E(X | Fn ). then E(An | Fn−1 ) = An ≥ An−1 so A is a submartingale. by the Cauchy-Schwarz inequality.48.

is an increasing sequence in M.Martingales 55 Let M denote the subset of F∞ given M = { B ∈ F∞ : B (X − Y ) dP = 0 } . The subspaces L2 (Fn ) increase and would seem to “exhaust” the space L2 (F∞ ). 2004 . But Xn = Pn X. suppose that B1 ⊂ B2 ⊂ . Pn g → g for any g ∈ L2 (F∞ ). Indeed. In fact. to work this argument through would bring us back to the start. In the L2 case.51. An analogous proof shows that X = Y almost surely. Xn converges almost surely and also in L1 to some Y . This result is also valid in L1 . one would expect the result to be true on the following grounds. But then we are done. B (X −Y ) dP for every B ∈ F∞ and so we must have X = Y almost surely. Remark 3. Now. We claim that M is a monotone class. if X ∈ L1 (F∞ ) then (E(X | Fn )) is uniformly integrable and the Martingale Convergence Theorem applies. The function X −Y is integrable and so we may appeal to Lebesgue’s Dominated Convergence Theorem to deduce that 0= Bn (X − Y ) dP = Ω 1Bn (X − Y ) dP B → Ω 1B (X − Y ) dP = (X − Y ) dP which proves the claim. where B = n Bn . so we would expect Xn → X. . Of course. November 22. By the Monotone Class Theorem. n Fn is an algebra and belongs to M. M contains the σ-algebra generated by this algebra. Then 1Bn ↑ 1B . namely F∞ . . The projections Pn from L2 (F∞ ) onto L2 (Fn ) should therefore increase to the identity operator.

56 Chapter 3 ifwilde Notes .

The process (Yn )n∈Z+ is said to be a version (or a modiﬁcation) of (Xn )n∈Z+ if Xn = Yn almost surely for every n ∈ Z+ . A process is a submartingale if its negative is a supermartingale.) Just as for a discrete index. The proof of this is just as for a discrete index. Remark 4. P ) together with a ﬁltration (Ft )t∈R+ of sub-σ-algebras indexed by R+ = [0. ∞) (so Fs ⊂ Ft if s < t). the change from a discrete index to a continuous one is anything but. It must be stressed that although it may at ﬁrst appear quite innocuous. Supermartingales indexed by R+ are deﬁned as above but with the inequality E(Xt | Fs ) ≤ Xs almost surely for any s ≤ t instead of equality above. If (Xt ) is an L2 -martingale. (Recall that our notation A ⊂ B means that A ∩ B c = ∅ so that Fs = Ft is permissible here.1.4. then (Xt ) is a submartingale. 2 Remark 4.10. Stochastic processes (Xn ) and (Yn ). Remark 4. as in Proposition 3. The process (Xt )t∈R+ is adapted with respect to the ﬁltration (Ft )t∈R+ if Xt is Ft -measurable for each t ∈ R+ . S. Indeed.3.Chapter 4 Stochastic integration . are said to be indistinguishable if P (Xn = Yn for all n) = 1. one can deﬁne martingales etc. one might immediately anticipate measuretheoretic diﬃculties simply because R+ is not countable. The adapted process (Xt )t∈R+ is a martingale with respect to the ﬁltration (Ft )t∈R+ if Xt is integrable for each t ∈ R+ and if E(Xt | Fs ) = Xs almost surely for any s ≤ t.2. indexed by Z+ . 57 . Deﬁnition 4. There are enormous technical complications involved in the theory with a continuous index.informally We suppose that we are given a probability space (Ω.

whereas with probability one. in particular. the path t → Yt (ω) has a jump at t = ω. A ﬁltration (Gt )t∈R+ of sub-σ-algebras of a σ-algebra S is said to be right-continuous if Gt = s>t Gs for any t ∈ R+ . So the paths t → Xt (ω) are continuous almost surely. which is to say that (Yt ) is a version of (Xt ) but these processes are far from indistinguishable. ifwilde Notes . ω = t. For any given ﬁltration (Ft ). let Ak = { Xk = Yk }. For any r > t. set Gt = s>t Fs . On the other hand. Hence { Xt = Yt } = Ω \ { t } or Ω depending on whether t ∈ [0. P (Xt = Yt for all t ∈ R+ ) = 0 .58 Chapter 4 If (Yn ) is a version of (Xn ) then (Xn ) and (Yn ) are indistinguishable. but the above implication can be false in the situation of continuous time. otherwise. it is not at all clear that Yt is measurable. for each t ∈ R+ . 1] equipped with the Borel σ-algebra and the probability measure P determined by (Lebesgue measure) P ([a. Example 4. no path t → Yt (ω) is continuous. Example 4. Warning: the extension of these deﬁnitions to processes indexed by R+ is clear. Even though each Xt is a random variable. ω = t Yt (ω) = 1. Indeed. Note also that the path t → Xt (ω) is constant for every ω.7. by hypothesis. Fix t ≥ 0 and suppose that A ∈ s>t Gs . 1] or not. Then we can deﬁne sups≤t Xt (ω). So P (Xt = Yt ) = 1. Yt (ω) = 0. Fix t ∈ R+ .5. Let Ω = [0. { Xt = Yt for all t ∈ R+ } = ∅ and so we have P (Xt = Yt ) = 1 . However. for ω ∈ Ω0 . For t ∈ R+ . We know that P (Ak ) = 1. Remark 4. Let (Xt )t∈R+ be a process indexed by R+ . choose s with t < s < r. Then we see that A ∈ Gs = v>s Fv and. suppose that t → Xt is almost surely continuous and let Ω0 ⊂ Ω be such that P (Ω0 ) = 1 and t → Xt (ω) is continuous for each ω ∈ Ω0 . A ∈ Fr . as the following example shows.6. since (Yn ) is a version of (Xn ). b]) = b − a for any 0 ≤ a ≤ b ≤ 1. for each k ∈ Z+ . whereas for every ω. let Xt (ω) = 0 for all ω ∈ Ω = [0. But then P (Xn = Yn for all n ∈ Z+ ) = P ( n An ) = 1. Hence A ∈ r>t Fr = Gt and so (Gt )t∈R+ is right-continuous. Then Xt (ω) = Yt (ω) unless ω = t. Then one might be interested in the process given by Yt = sups≤t Xt for t ∈ R+ . 1] and let 0.

b] → R is a continuous function. Then Ω0 ∈ F0 and so the process (ξt ) = (1Ω0 Xt ) is also a non-negative submartingale. Suppose that ϕ : [a. ﬁx ω ∈ Ω and let ϕ(s) = 1Ω0 (ω) Xs (ω). . b] } < maxDn ϕ + ε and so sup{ ϕ(s) : s ∈ [a. According to the discussion above. b] such that mesh(Dn ) → 0. Set ηj = ξtj and Gj = Ftj for j ≤ 2n and ηj = ηt and Gj = Ft for j > 2n . b]. by adjusting the inequalities in the proof of Doob’s Maximal inequality. Then for any s ∈ [a. ω → maxDn ϕ = max{ Xt (ω) : t ∈ Dn } is measurable and so the pointwise limit ω → Yt (ω) is measurable. b = t.Stochastic integration . Suppose then that n is suﬃciently large that mesh(Dn ) < δ. there is some δ > 0 such that |ϕ(t) − ϕ(s)| < ε whenever |t − s| < δ. Example 4.28. let maxD ϕ denote max{ ϕ(t) : t ∈ D }. If D = { t0 . Theorem 3. .8 (Doob’s Maximal Inequality). But for each n. b] } = limn maxDn ϕ. . where mesh(D) is given by mesh(D) = max{ tj+1 − tj : tj ∈ D }. there is some tj ∈ Dn such that |s − tj | < δ. Now ﬁx t ≥ 0. Yt = sup ξs = lim max ηj = sup max ηj n n s≤t n j≤2 n j≤2 on Ω. Now. Evidently s → ϕ(s) is continuous on [0. . b] is uniformly continuous there and so for any ε > 0. b] } = limn maxDn ϕ. This means that |ϕ(s) − ϕ(tj )| < ε. t1 . tk } with a = t0 < t1 < · · · < tk = b is a partition of [a. November 22. Then (ηj ) is a discrete parameter non-negative submartingale with respect to the ﬁltration (Gj ). Suppose now that (Xt ) is a nonnegative submartingale with respect to a ﬁltration (Ft ) such that F0 contains all sets of probability zero. we see that the result is also valid if the inequality maxk Xk ≥ λ is replaced by maxk Xk > λ on both sides and so λ P (maxj≤2n ηj > λ) ≤ E( η2n 1{ maxj≤2n ηj >λ } ) . let tj = tj/2n . Fix t ≥ 0 and n ∈ N. t] and so Yt (ω) = sup{ ϕ(s) : s ∈ [0. 2004 . Let Dn = { a = t 0 (n) < t1 (n) < · · · < t(n) = b } mn be a sequence of partitions of [a. Proof of claim. b]. This is because a continuous function ϕ on a closed interval [a. Then it follows that sup{ ϕ(s) : s ∈ [a. It follows that maxDn ϕ ≤ sup { ϕ(s) : s ∈ [a.informally 59 Claim: Yt is measurable. set a = 0. t] } = lim maxDn ϕ n as above. which completes the proof of the claim. For 0 ≤ j ≤ 2n .

Let (Dn ) be the sequence of partitions of [0. 2 λ2 Proof. ifwilde Notes . . 2 µ2 (∗) For j ∈ N. . Then for any λ > 0 P ( sup |Xs | ≥ λ ) ≤ s≤t 1 Xt 2 . Replacing µ in (∗) by µj and letting j → ∞. it follows that E(1{ fn >µ } ) → E(1{ f >µ } ). we ﬁnd that λ E(1{ Yt >λ } ) ≤ E( Xt 1{ Yt >λ } ) (∗) Replacing λ by λn in (∗) where now λn ↓ λ > 0 and letting n → ∞. t] as above and let fn (ω) = maxj≤2n |ξj (ω)|. . tn n = t }.29.60 Since η2n = Xt . tn . we obtain λ E(1{ Yt ≥λ } ) ≤ E( Xt 1{ Yt ≥λ } ) . Then fn → f = sups≤t |Xs | almost surely and since fn ≤ f almost surely. Let µ < λ.29 for continuous time. that is. we get m 0 1 P ( fn > µ ) ≤ P ( fn ≥ µ ) ≤ Letting n → ∞. corollary 3. By Lebesgue’s Dominated Convergence Theorem. Hence. Similarly. a continuous version of Doob’s Maximal inequality. Suppose that (Xt )t∈R+ is an L2 martingale such that the map t → Xt (ω) is almost surely continuous. Chapter 4 But maxj≤2n ηj ≤ Yt and maxj≤2n ηj → Yt on Ω as n → ∞ and therefore 1{ maxj≤2n ηj >λ } → 1{ Yt >λ } . gives µ2 P ( f > µ ) ≤ Xt 2 2. for the discrete-time ﬁltration (Fs ) with s ∈ { 0 = tn . . we may say that λ E(1{ maxj≤2n ηj >λ } ) ≤ E( Xt 1{ maxj≤2n ηj >λ } ) . it follows that 1{ fn >µ } → 1{ f >µ } almost surely. P (fn > µ) → P (f > µ). let µj ↑ λ and set Aj = { f > µj }. Then Aj ↓ { f ≥ λ } so that P (Aj ) ↓ P (f ≥ λ). 1 Xt 2 . we get the inequality λ2 P ( f ≥ λ ) ≤ X t 2 2 as required. Applying Doob’s maximal inequality. we can obtain a version of Corollary 3. letting n → ∞.

Now. we recall that if Z is a random variable.b] (s) for some 0 ≤ a < b and some bounded random variable h(ω) which is Fa measurable.informally 61 Stochastic integration – ﬁrst steps We wish to try to perform some kind of abstract integration with respect to martingales. the family ( is an L2 -martingale. if T ≤ a.b] (s) dXs Proposition 4. 2004 .b] and let t  h (Xb − Xa ). First. it November 22. ω) = h(ω) 1(a. suppose that (Xt )t∈R+ is a square-integrable martingale (that is. then its (cumulative) distribution function F is deﬁned to be F (t) = P (Z ≤ t) and we have P (a < Z ≤ b) = P (Z ≤ b) − P (Z ≤ a) = F (b) − F (a) . The stochastic integral 0 g dX of the elementary process g with respect the L2 -martingale (Xt ) is deﬁned to be the random variable T T T g dX = 0 0 h 1(a. It is usually straightforward to integrate step functions. then h is Ft -measurable and so is Xb∧t − Xa∧t . if T ≥ b  = h (XT − Xa ).b] (s) dF (s) = F (b) − F (a) .9. if a < T < b   0. Note that g is piecewise-constant in s (and continuous from the left). since h is bounded. if t ≥ a. T Xt ∈ L2 for each t). so we shall try to do that here. Since Yt = 0 for t < a. Deﬁnition 4.Stochastic integration . Notice that step-functions of the form 1(a.b] (left-continuous) appear quite naturally here. A process g : R+ × Ω → R is said to be elementary if it has the form g(s. Furthermore. Suppose that g = h 1(a. t 0 g dX)t∈R+ Yt = 0 g dX = h (Xb∧t − Xa∧t ) . Now. Proof. we see that (Yt ) is adapted. This can be written as E(1{ a<Z≤b } ) = F (b) − F (a) or as ∞ −∞ 1(a. For a ﬁxed elementary process g.10. We want to set-up something like 0 f dXt so we begin with integrands which are step-functions.

we are not interested in the value of h at s = 0 as far as integration is concerned. Notice that h(ω. a ≤ s ≤ b. Case 3. s ≤ a. let 0 ≤ s ≤ t and consider three cases. We have E(Yt | Fs ) = E(h (Xt∧b − Xa ) | Fs ) = h (Xt∧b∧s − Xa ) = h E((Xt∧b − Xa ) | Fs ) a. = Ys and the proof is complete.s. ifwilde Notes . = h E((Xb − Xa ) | Fs ) a. We could have included random variables of the form g0 (ω)1{ 0 } (s) in the construction of E. But s ≤ a implies that Ys = h (Xs∧b − Xs∧a ) = 0 and so E(Yt | Fs ) = 0 = Ys almost surely.bi ] (s) for some n. pairs ai < bi and bounded random variables gi where each gi is Fai -measurable.s. So any element h ∈ E has the form n h(ω. o where g0 is F0 -measurable. In fact.s.s. a. = Ys . = h (Xs∧b − Xs∧a ) Notation Let E denote the real linear span of the set of elementary processes. We ﬁnd = h (Xs∧b − Xs∧a ) E(Yt | Fs ) = E(h (Xb − Xa ) | Fs ) = h (Xb − Xa ) a. b < s. we ﬁnd that E(Yt | Fs ) = E(h (Xt∧b − Xt∧a ) | Fs ) = E(E(h (Xt∧b − Xt∧a ) | Fa ) | Fs ) = 0 since (Xt ) is a martingale = E(h E((Xt∧b − Xt∧a ) | Fa ) Fs | ) a.s. Case 1.62 Chapter 4 follows that Yt ∈ L2 for each t ∈ R+ . 0) = 0. but such elements play no rˆle. Case 2. Using the tower property E(· | Fs ) = E(E(· | Fa ) | Fs ). To show that Yt is a martingale. s) = i=1 gi (ω)1(ai .

informally 63 Deﬁnition 4. Now if Yt = 0 h dX.t] which means that we may assume.bi ] . Then h can be written as n t 2 h= i=1 gi 1(ti . Remark 4. the process ( Proof. This will play a central rˆle in the development of the theory. without loss of generality. we consider m−1 E(Yt2 ) = E( = E( i=1 m−1 m−1 j=1 i=1 gi (Xti +1 − Xti ) 2 ) gi gj (Xti +1 − Xti )(Xtj +1 − Xtj ) ) . Let Yt = 0 h dX and consider E(Yt2 ). For h = n gi 1(ai . November 22. Let h ∈ E. The result follows. that tm = t. This done. The “increment” (XT ∧bi − XT ∧ai ) “points to the future”. can anything be said about Yt (or E(Yt2 ))? We pursue this now. 2004 . by Proposition 4. First we note that Yt is unchanged if h is replaced by h 1(0.bi ] ∈ E and T ≥ 0. For h ∈ E. as above. the stochastic integral i=1 T 0 h dX is deﬁned to be the random variable T n T n h dX = 0 i=1 0 hi dX = i=1 gi (XT ∧bi − XT ∧ai ) where hi = gi 1(ai .13. where each hi is elementary.Stochastic integration .11. It is crucial to note that the stochastic integral is a sum of terms gi (XT ∧bi − XT ∧ai ) where gi is Fai -measurable and bi > ai . If h = n i=1 hi .12. o Proposition 4.10. then t n t h dX = 0 i=1 0 hi dX which is a linear combination of L2 -martingales. 0 = t1 < · · · < tm and Fti -measurable random variables gi t (which may be 0).ti+1 ] for some m. So far so good. t 0 g dX)t∈R+ ) is an L2 -martingale.

we see that 2 2 2 2 2 E( gj (Xtj+1 − Xtj ) ) = E( gj ∆Mj ) + E( gj ∆Aj ) 2 2 = E( E(gj ∆Mj | Ftj ) ) + E( gj ∆Aj ) 2 2 = E( gj E((Mtj+1 − Mtj ) | Ftj ) ) + E( gj ∆Aj ) = 0 since (Mt ) is a martingale = ifwilde 2 E( gj ∆Aj ) . 2 − Xtj ) ) . How does this help? Setting ∆Mj = Mtj+1 − Mtj and similarly ∆Aj = Atj+1 − Atj . since gi gj ∆i is Fti -measurable. say i < j. = 2 2 E( gj (Xtj+1 What now? We have already seen that the square of a discrete-time L2 martingale has a decomposition as the sum of an L1 -martingale and a pre2 dictable increasing process. Notes .64 Chapter 4 Now. Next. this also holds for i > j (simply interchange i and j) and so we may say that t E( 0 g dX 2 m−1 m−1 2 2 gj ∆Xj ) ) = E( j=1 = j=1 2 2 E( gj ∆Xj ) . a. So let us assume that (Xt ) has the Doob-Meyer decomposition 2 Xt = Mt + At where (Mt ) is an L1 -martingale and (At ) is an L1 -process such that A0 = 0 and As (ω) ≤ At (ω) almost surely whenever s ≤ t.s. = 0. suppose i = j. 2 2 2 2 = E( gj (Xtj+1 + Xtj ) ) − 2E( gj Xtj E( Xtj+1 | Ftj ) ) 2 2 2 2 = E( gj (Xtj+1 + Xtj ) ) − 2E( E(gj Xtj+1 Xtj | Ftj ) ) 2 2 2 2 = E( gj (Xtj+1 + Xtj ) ) − 2E( gj Xtj+1 Xtj ) 2 2 2 = E( gj (Xtj+1 + Xtj − 2Xtj+1 Xtj ) ) since (Xt ) is a martingale. consider 2 2 2 E( gj ∆Xj ) = E( gj (Xtj+1 − Xtj )2 ) 2 2 2 2 = E( gj (Xtj+1 + Xtj ) ) − 2E( gj Xtj Xtj ). Of course.. since (Xt ) is a martingale. Writing ∆Xi for (Xti +1 − Xti ). we ﬁnd that E( gi gj ∆Xi ∆Xj ) = E( E(gi gj ∆Xi ∆Xj | Ftj ) ) = E( gi gj ∆Xi E(∆Xj | Ftj ) ) .

Indeed. This isometry relation allows one to extend the class of integrands in the stochastic integral. Now. on a set of probability one.Stochastic integration . At (ω) is an increasing function of t and we can consider Stieltjes integration using this. 2 Remark 4. 2004 . Taking expectations. t E( 0 g 2 dA ) = E( 2 gj ∆A )) and this leads us ﬁnally to the formula t 2 t E( 0 g dXs ) = E( 0 g 2 dAs ) known as the isometry property.14. The isometry property then tells us that the t sequence ( 0 gn dX) of random variables is a Cauchy sequence in L2 and so t converges to some Yt in L2 . We will consider this again for the case when (Xt ) is a Wiener process. ω) dAs (ω) = j=1 m−1 0 2 t 2 gj (ω) 1(tj . t 0 m−1 g(s. The reader should consult the text of Meyer for the full picture. suppose that (gn ) is a sequence from E which converges to a map h : R+ × Ω → R in the sense t that E( 0 (gn − h)2 dAs ) → 0.informally 65 Hence t E( 0 g dX 2 m−1 )= j=1 2 E( gj (Atj+1 − Atj ) ) . Indeed. This allows us to deﬁne 0 h dX as this Yt .tj+1 ] (s) dAs (ω) = j=1 2 gj (ω) ( Atj+1 (ω) − Atj (ω) ) . November 22. The Doob-Meyer decomposition of Xt as Mt + At is far from straightforward and involves further technical assumptions.

66 Chapter 4 ifwilde Notes .

we have P (Wt1 − Ws1 ∈ A1 . without further ado. . (Ft )) is said to be a Wiener process starting at 0 if it obeys the following: (a) W0 = 0 almost surely and the map t → Wt is continuous almost surely. An in R. P. Wt2 − Ws2 . the random variable Wt − Ws has a normal distribution with mean 0 and variance t − s.2. we see that the distribution of Ws+t − Ws is normal with mean 0 and variance t. . Wtn − Wsn with 0 ≤ s1 < t1 ≤ s2 < t2 ≤ s3 < · · · ≤ sn < tn are independent. Remarks 5. so P (Ws+t − Ws ∈ A) = √ 1 2πt e−x A 2 /2t dx for any Borel set A in R. . 2. . Then (a) says that all paths start at 0 (almost surely) and are continuous (almost surely). for any Borel sets A1 . Wt = Wt+0 − W0 almost surely. . . For each ﬁxed ω ∈ Ω. . An adapted process (Wt )t∈R+ on a ﬁltered probability space (Ω. so each Wt has a normal distribution with mean 0 and variance t. ∆Wn−1 ∈ An−1 } ∩ { ∆Wn ∈ An }) 67 . Wt − Ws is independent of Fs . From (b). . . by (a). (b) For any 0 ≤ s < t. The independence property (c) implies that any collection of increments Wt1 − Ws1 . . with the deﬁnition. Indeed. Wtn − Wsn ∈ An ) ∆W1 ∆Wn = P ({ ∆W1 ∈ A1 . . .Chapter 5 Wiener process We begin. think of the map t → Wt (ω) as the path of a particle (with t interpreted as time). . . 3. 1. . .1. S. (c) For all 0 ≤ s < t. Deﬁnition 5. Furthermore.

. xn . . . Wtn − Wtn−1 = xn − xn−1 . . . the one-point compactiﬁcation of R. . Wtn is 2 p(x1 . . . . tn )f (x1 . . xn . . t1 . . ··· since { ∆Wj ∈ Aj } ∈ Fsn for all 1 ≤ j ≤ n − 1 = = P (∆W1 ∈ A1 ) P (∆W2 ∈ A2 ) . . Suppose that f ∈ C(Ω) depends only on a ﬁnite number of coordinates in Ω. . . . . A Wiener process is also referred to as Brownian motion. Wtn −Wtn−1 are independent so their joint density is a product of individual densities. and let Ω = t∈R+ Ωt be the (compact) product space. . t) = √2πt e−x /2t denote the density of Wt = Wt − W0 and let 0 < t1 < · · · < tn . Such a Wiener process exists. the random variables Wt1 . . . . . A d-dimensional Wiener process (starting from 0 ∈ Rd ) is a d-tuple (Wt1 .68 Chapter 5 = P (∆W1 ∈ A1 . . (Wtd ) are independent Wiener processes in R (starting at 0). . . . . . ˙ Let Ωt = R. . . 1 6. tn − tn−1 ) . . Then Wt (ω) = ωt . . . . . . . . . . . . tn ) = p(x1 . . . . 5. Wtn = xn is to say that Wt1 = x1 . let p(x. . . t1 ) p(x2 − x1 . the t-th component of ω ∈ Ω. . This suggests that the joint probability density of Wt1 . f (ω) = f (xt1 . . t2 . Wtd ) where the (Wt1 ). In fact. Wt2 −Wt1 . Then we deﬁne ρ(f ) = Rn p(x1 . xn ) dx1 . Wt2 − Wt2 = x2 − x1 . and ∆Wn is independent of Fsn . Then to say that Wt1 = x1 . say. dxn .) ifwilde Notes . (This elegant construction is due to Edward Nelson. xtn ). Now. . t1 . Wt2 = x2 . 4. . p(xn − xn−1 . . . P (∆Wn ∈ An ) as required. It can be shown that ρ can be extended into a (normalized) positive linear functional on the algebra C(Ω) and so one can apply the RieszMarkov Theorem to deduce the existence of a probability measure µ on Ω such that ρ(f ) = Ω f (ω) dµ . . x2 . t2 − t1 ) . ∆Wn−1 ∈ An−1 ) P (∆Wn ∈ An ). . .

s. (iv) (Wt ) is an L2 -martingale. (ii) E( (Wt − Ws )2 ) = |t − s|. we calculate 2 E( Ws Wt ) = E( Ws (Wt − Ws ) ) + E( Ws ) =0 =0 = E( Ws ) E( Wt − Ws ) + var Ws since E(Ws ) = 0 = s. Theorem 5. (iii) E( Wt4 ) = 3t2 . so Wt ∈ L2 . The Wiener process enjoys the following properties. by deﬁnition. null at 0. a. then (Mt ) is a martingale. Replacing α by 1/t and rearranging. Proof. (i) E( Ws Wt ) = min{ s. e−αx 2 /2 (iii) We have ∞ −∞ √ dx = α−1/2 2π . we get E( Wt4 ) = √ and the proof is complete. Using independence.s. t } = s ∧ t. (i) Suppose s ≤ t. (iv) For 0 ≤ s < t. November 22. Then E( (Wt − Ws )2 ) = var(Wt − Ws ) . (v) If Mt = Wt2 − t. we have (using independence) E(Wt | Fs ) = E(Wt − Ws | Fs ) + E(Ws | Fs ) = Ws = E(Wt − Ws ) + Ws a. Diﬀerentiating both sides twice with respect to α gives ∞ −∞ x4 e−αx 2 /2 √ dx = 3α−5/2 2π . since E(Wt − Ws ) = 0. (ii) Again. = t − s . 2004 . suppose s ≤ t.3. 1 2πt ∞ −∞ x4 e−x 2 /2t dx = 3t2 . Wt = Wt − W0 has a normal distribution.Wiener process 69 We collect together some basic facts in the following theorem.

Since I1 = t. P (n) is true for all n ∈ N. for s ≤ t. 2k k! (2n)! tn E(Wt2n ) = . Integration by 2n n! parts. ifwilde Notes . E(Wt2k ) = (2k)! tk . Chapter 5 2 E(Wt2 | Fs ) = E((Wt − Ws )2 | Fs ) + 2E(Wt Ws | Fs ) − E(Ws | Fs ) 2 2 = t − s + 2Ws − Ws .5. let P (n) be the statement that which says that P (k + 1) is true. =t−s+ 2 Ws 2 so that E( (Wt2 − t) | Fs ) = (Ws − s) almost surely. by independence. Indeed. Example 5.70 (v) Let 0 ≤ s < t. For a ∈ R.4. in particular) is a martingale. By induction. 2 = E((Wt − Ws )2 ) + 2Ws E(Wt | Fs ) − Ws by (ii) and (iv). Then. ( eaWt − 2 a t ) (and so ( eWt − 2 t ). Example 5. let Ik = E(Wt2k ) and for n ∈ N. E(eaWt − 2 a t | Fs ) = E(ea(Wt −Ws )+aWs − 2 a t | Fs ) = e− 2 a t E(ea(Wt −Ws ) eaWs | Fs ) = e− 2 a t eaWs E(ea(Wt −Ws ) | Fs ) = e− 2 a t eaWs E(ea(Wt −Ws ) ) . = e− 2 a t eaWs e 2 a = eaWs − 2 a 1 2s 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 (t−s) since we know that Wt − Ws has a normal distribution with mean zero and variance t − s. with probability one. gives E(Wt2k ) = E(Wt2k+2 )/t(2k + 1) and so the truth of P (k) implies that E(Wt2k+2 ) = t(2k + 1) E(Wt2k ) = t(2k + 1) (2(k + 1))! t(k+1) (2k)! tk = 2k k! 2k+1 (k + 1)! To see this. we see that P (1) is true. For k ∈ N.

where mesh(D) = max{ tj+1 − tj : tj ∈ D }. Then c(Yt − Ys ) = (Wc2 t − Wc2 s ) is independent of Fc2 s = Gs . This is the quadratic variation of the Wiener process. we calculate mn j=0 mn mn ( ∆n Wj )2 − T 2 =E j=0 ( ∆n Wj )2 − ∆n t j j=0 2 = i. the distribution of the increment Yt − Ys is that of (Wc2 t − Wc2 s )/c. (oﬀ-diagonal terms vanish by independence). For each n and 0 ≤ j ≤ mn − 1.6. 2004 . For any c > 0. normal with mean zero and variance (c2 t−c2 s)/c2 = t−s. namely. Then j+1 j mn j=0 ( ∆n Wj )2 → T in L2 as n → ∞. and so therefore is Yt − Ys . Hence (Yt ) is a Wiener process with respect to (Gt ). = j = j E ( ∆n Wj )4 − 2 ( ∆n Wj )2 ∆n tj + (∆n tj )2 2 (∆n tj )2 = j ≤ 2 mesh(Dn ) ∆n t j j = 2 mesh(Dn ) T → 0 as required. Also.j E ( ( ∆n Wi )2 − ∆n ti )( ( ∆n Wj )2 − ∆n tj ) E ( ( ∆n Wj )2 − ∆n tj )2 . let ∆n Wj denote (n) (n) the increment ∆n Wj = Wt(n) − Wt(n) and set ∆n tj = tj+1 − tj . for any 0 ≤ s < t. November 22. Let Dn = { 0 = t0 < t1 < · · · < tmn = T } be a sequence of partitions of the interval [0. To see this.7.Wiener process (n) (n) (n) 71 Example 5. We can see this as follows. Yt = 1 Wc2 t is a Wiener process with respect c to the ﬁltration generated by the Yt s. Clearly. Example 5. T ] such that mesh(Dn ) → 0. Y0 = 0 almost surely and the map t → Yt (ω) = Wc2 t (ω)/c is almost surely continuous because t → Wt (ω) is. Let Gt = Fc2 t which is equal to the σ-algebra generated by the random variables { Ys : s ≤ t }.

The continuity at t = 0 requires additional argument. it is evidently continuous almost surely on (0. In fact. Again. This is example is useful in that it relates large time behaviour of the Wiener process to small time behaviour. we have Yt − Y0 2 2 = t |1/s − 1/t| ≤ t X1/t − t X1/s 1/2 = t X1/t − s X1/s 2 2 + |t − s| (1/s) + t X1/s − s X1/s 1/2 2 = Yt 2 2 = t2 X1/t 2 2 = t2 var X1/t = t → 0 as t ↓ 0. let Yt = t X1/t and set Y0 = 0. At t = 0. Let (Gt ) be the ﬁltration where Gt is the σ-algebra generated by the family { Ys : s ≤ t }. it can be shown that the map t → Yt on R+ is continuous almost surely.72 Chapter 5 Remark 5. respectively. The behaviour of Xt for large t is related to that of Ys for small s and both Xt and Ys are Wiener processes. suppose that 0 < s < t. ∞) because this is true of t → Xt . For t > 0. 0 < 1/t < 1/s and so X1/t and (X1/s − X1/t ) are independent normal random variables with zero means and variances given by 1/t and 1/s − 1/t. But orthogonality for jointly normal random variables implies independence and so it follows that the increment Yt − Ys is independent of Gs . Then for any 0<s<t Yt − Ys = t X1/t − s X1/s = (t − s) X1/t − s (X1/s − X1/t ) . Then for any r < s E((Yt − Ys ) Yr ) = E( (t X1/t − s X1/s ) r X1/r ) = rt E(X1/t X1/r ) − rs E(X1/s X1/r ) = rt E(X1/t (X1/r − X1/t )) + rt E(X1/t X1/t ) = rt E(X1/t ) E(X1/r − X1/t ) + rt var(X1/t ) = 0 + rt (1/t) − 0 − rs (1/s) − rsE(X1/s (X1/r − X1/s )) − rs E(X1/s X1/s ) − rsE(X1/s ) E(X1/r − X1/s ) − rs var(X1/s ) = 0. ifwilde Notes . we ﬁnd that Yt − Ys 2 → 0 as s → t. Now. Finally. Notice that it is easy to show that t → Yt is L2 -continuous. we see that Yt − Y0 = Yt = t X1/t which is a normal random variable with mean zero and variance t2 /t = t. This shows that (Yt − Ys ) is orthogonal in L2 to each Yr .8. For s > 0 and t > 0. When s = 0. It follows that Yt − Ys is a normal random variable with mean zero and variance (t − s)2 /t + s2 (1/s − 1/t) = (t − s).

Let (Wt ) be a Wiener process and let Xt = µt + σWt for t ≥ 0 (where µ and σ are constants). We see that for 0 ≤ s < t. 2004 . E(Xt | Fs ) = E(µt + σWt | Fs ) = µt + σWs > µs + σWs = Xs . Fix β > 0. 1) with value ≤ β. Then (Xt ) is a martingale if µ = 0 but is a submartingale if µ ≥ 0. So let An = { ω : ∃s ∈ [0. First we just consider 0 ≤ t ≤ 1. then (f (t)−f (s))/(t−s) is within β of f ′ (s) and so is smaller than f ′ (s)+β ≤ 2β). then certainly we can say that |f (t) − f (s)| ≤ 2β |t − s| whenever |t − s| is suﬃciently small. (If t is suﬃciently close to s. We ﬁnd | f ( k+2 ) − f ( k+1 ) | ≤ | f ( k+2 ) − f (s) | + | f (s) − f ( k+1 ) | n n n n ≤ 2β| k+2 − s| + 2β|s − n ≤ 6β n k+1 n | November 22. Evidently An ⊂ An+1 and n An includes all ω ∈ Ω for which the function t → Wt (ω) has a derivative at some point in [0. they are nevertheless extremely jagged. the sample path t → Wt (ω) is nowhere diﬀerentiable. if a given function f is diﬀerentiable at some point s with f ′ (s) ≤ β. note ﬁrst that k−1 < n ≤ s < k+1 < k+2 and so we may n n n estimate each of the three terms involved in (∗∗) with the help of (∗). as the next result indicates. Let n ≥ 1/δ and let k be the largest integer such that k/n ≤ s. | f ( n ) − f ( k−1 ) | } ≤ n n n n 6β n (∗∗) k To verify this. Non-diﬀerentiability of Wiener process paths Whilst the sample paths Wt (ω) are almost surely continuous in t. whenever |t − s| ≤ 2 n }. | f ( k+1 ) − f ( n ) | . With probability one. Theorem 5. suppose f is some function such that for given s there is some δ > 0 such that |f (t) − f (s)| ≤ 2β |t − s| (∗) if |t − s| ≤ 2δ. 1) such that |Wt (ω) − Ws (ω)| ≤ 2β |t − s| . Then k k max{ | f ( k+2 ) − f ( k+1 ) | . Now.Wiener process 73 Example 5. Proof.9. Once again.10.

let gk (ω) = max{ |W k+2 (ω) − W k+1 (ω)| . Now. |W k+1 (ω) − W k (ω)| .74 and k k | f ( k+1 ) − f ( n ) | ≤ | f ( k+1 ) − f (s) | + | f (s) − f ( n ) | n n k ≤ 2β| k+1 − s| + 2β|s − n | n Chapter 5 ≤ 4β n and k k | f ( n ) − f ( k−1 ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( k−1 ) | n n k ≤ 2β| n − s| + 2β|s − k−1 n | ≤ 6β n which establishes the inequality (∗∗). according to our discussion above. |W k − W k−1 | } ≤ n n n n 6β n ) = P ( { |W k+2 − W k+1 | ≤ n n n n 6β n { |W k+1 − W k | ≤ = P ( { |W k+2 − W k+1 | ≤ n n n n 6β n }∩ } ∩ { |W k − W k−1 | } ≤ n n 6β n 6β n }) }). n n n n |W k (ω) − W k−1 (ω)| } n n and let Bn = { ω : gk (ω) ≤ 6β n Now. For given ω. However. if ω ∈ An . then Wt (ω) is diﬀerentiable at some s and furthermore |Wt (ω) − Ws (ω)| ≤ 2β |t − s| if |t − s| ≤ 2/n. }) × 6β n × P ( { |W k+1 − W k | ≤ } ) P ( { |W k − W k−1 | ≤ n n 6β n by independence of increments. Bn = k=1 { ω : gk (ω) ≤ n−2 6β n } and so P (Bn ) ≤ We estimate P ( gk ≤ 6β n k=1 P ( gk ≤ 6β n ). n−2 for some k ≤ n − 2 } . n n |W k+1 − W k | . this means that gk (ω) ≤ 6β/n where k is the largest integer with k/n ≤ s. ifwilde Notes . ) = P ( max{ |W k+2 − W k+1 | . Hence ω ∈ Bn and so An ⊂ Bn .

Wiener process
6β/n −6β/n 3 12 β n
2 /2

75
n 2π n 2π

= ≤

e−n x

dx

3

= C n−3/2 where C is independent of n. Hence P (Bn ) ≤ n C n−3/2 = C n−1/2 → 0 as n → ∞. Now for ﬁxed j, Aj ⊂ An for all n ≥ j, so P (Aj ) ≤ P (An ) ≤ P (Bn ) → 0 as n → ∞, and this forces P (Aj ) = 0. But then P (
j

Aj ) = 0.

What have we achieved so far? We have shown that P ( n An ) = 0 and so, with probability one, Wt has no derivative at any s ∈ [0, 1) with value smaller than β. Now consider any unit interval [j, j + 1) with j ∈ Z+ . Repeating the previous argument but with Wt replaced by Xt = Wt+j , we deduce that almost surely Xt has no derivative at any s ∈ [0, 1) whose value is smaller than β. But to say that Xt has a derivative at s is to say that Wt has a derivative at s + j and so it follows that with probability one, Wt has no derivative in [j, j + 1) whose value is smaller than β. We have shown that if Cj = { ω : Wt (ω) has a derivative at some s ∈ [j, j + 1) with value ≤ β } then P (Cj ) = 0 and so P ( j∈Z+ Cj ) = 0, which means that, with probability one, Wt has no derivative at any point in R+ whose value is ≤ β. This holds for any given β > 0. To complete the proof, for m ∈ N, let Sm = { ω : Wt (ω) has a derivative at some s ∈ R+ with value ≤ m } . Then we have seen above that P (Sm ) = 0 and so P ( m Sm ) = 0. It follows that, with probability one, Wt has no derivative anywhere.

November 22, 2004

76

Chapter 5

Itˆ integration o
We wish to indicate how one can construct stochastic integrals with respect to the Wiener process. The resulting integral is called the Itˆ integral. We o follow the strategy discussed earlier, namely, we ﬁrst set-up the integral for integrands which are step-functions. Next, we establish an appropriate isometry property and it then follows that the deﬁnition can be extended abstractly by continuity considerations. We shall consider integration over the time interval [0, T ], where T > 0 is ﬁxed throughout. For an elementary process, h ∈ E, we deﬁne the stochastic T integral 0 h dW by
T 0 n

h dW ≡ I(h)(ω) =
n i=1 gi 1(ti ,ti+1 ]

i=1

gi+1 (ω) ( Wti+1 (ω) − Wti (ω) ) .

where h = with 0 = t1 < . . . < tn+1 = T and where gi is bounded and Fti -measurable. Now, we know that Wt2 has Doob-Meyer decomposition Wt2 = Mt + t, where (Mt ) is an L1 -martingale. Using this, we can calculate E(I(h)2 ) as in the abstract set-up and we ﬁnd that E(I(h)2 ) = E
0 T

h2 (t, ω) dt
t 0 h

which is the isometry property for the Itˆ-integral. o For any 0 ≤ t ≤ T , we deﬁne the stochastic integral
t 0 n

dW by

h dW = I(h 1(0,t] =
i=1

gi+1 (ω) ( Wti+1 ∧t (ω) − Wti ∧t (ω) )

and it is convenient to denote this by It (h). We see (as in the abstract theory discussed earlier) that for any 0 ≤ s < t ≤ T E(It (h) | Fs ) = Is (h) , that is, (It (h))0≤t≤T is an L2 -martingale. Next, we wish to extend the family of allowed integrands. The right hand side of the isometry property suggests the way forward. Let KT denote the linear subspace of L2 ((0, T ] × Ω) of adapted processes f (t, ω) such that there is some sequence (hn ) of elementary processes such that
T

E
0

( f (s, ω) − hn (s, ω) )2 ds

→ 0.

(∗)

We construct the stochastic integral I(f ) (and It (f )) for any f ∈ KT via the isometry property. Indeed, we have E( (I(hn ) − I(hm ))2 ) = E( I(hn − hm )2 )
T

=E
0

(hn − hm )2 ds .
Notes

ifwilde

Wiener process

77

But by (∗), (hn ) is a Cauchy sequence in KT (with respect the norm h KT = T E( 0 h2 ds)1/2 ) and so (I(hn )) is a Cauchy sequence in L2 . It follows that there is some F ∈ L2 (FT ) such that E( (F − I(hn ))2 ) → 0 . We denote F by I(f ) or by 0 f dWs . One checks that this construction does not depend on the particular choice of the sequence (hn ) converging to f in KT . The Itˆ-stochastic integral obeys the isometry property o
T T

E(
0

f dWs

2

T

)=
0

E(f 2 ) ds

for any f ∈ KT . For any f, g ∈ KT , we can apply the isometry property to f ± g to get E( (I(f ) ± I(g))2 ) = E( (I(f ± g))2 ) = E On subtraction and division by 4, we ﬁnd that
T T

(f ± g)2 ds .

E( I(f )I(g) ) = E
0

f (s)g(s) ds .

Replacing f by f 1(0,t] in the discussion above and using hn 1(0,t] ∈ E rather than hn , we construct It (f ) for any 0 ≤ t ≤ T . Taking the limit in L2 , the martingale property E(It (hn ) | Fs ) = Is (hn ) gives the martingale property of It (f ), namely, E(It (f ) | Fs ) = Is (f ) , 0≤s≤t≤T.

For the following important result, we make a further assumption about the ﬁltration (Ft ). Assumption: each Ft contains all events of probability zero. Note that by taking complements, it follows that each Ft contains all events with probability one. Theorem 5.11. Let f ∈ KT . Then there is a continuous modiﬁcation of It (f ), 0 ≤ t ≤ T , that is, there is an adapted process Jt (ω), 0 ≤ t ≤ T , for which t → Jt (ω) is continuous for all ω ∈ Ω and such that for each t, It (f ) = Jt , almost surely. Proof. Let (hn ) in E be a sequence of approximations to f in KT , that is
T

E
0

(hn − f )2 ds

→0
November 22, 2004

Hence. the map t → It (hn ) is almost surely continuous (because this is true of the map t → Wt ) and It (hn )−It (hm ) = It (hn −hm ) is an almost surely continuous L2 -martingale. Hence. m → ∞. It follows that (I(hn )) is a Cauchy sequence in L2 . In particular. for any k ∈ N.78 Chapter 5 so that I(hn ) → I(f ) in L2 (and also It (hn ) → It (f ) in L2 ). m ≥ Nk then I(hn ) − I(hm ) 2 < 21 41 . . for j > k 0≤t≤T sup | It (hnj (ω) − It (hnk )(ω) | ≤ sup | It (hnj )(ω) − It (hnj−1 )(ω) | 0≤t≤T + sup | It (hnj−1 )(ω) − It (hnj−2 )(ω) | 0≤t≤T + . where k0 may depend on ω. k k 2 If we set nk = N1 + · · · + Nk . k 4k 2 2 But then k P (Ak ) < ∞ and so by the Borel-Cantelli Lemma (Lemma 1. So by Doob’s Martingale Inequality. Now. we have P ( sup | It (hn − hm ) | > ε ) ≤ ε1 I(hn − hm ) 2 2 2 0≤t≤T for any ε > 0. then clearly nk+1 > nk ≥ Nk and I(hnk+1 ) − I(hnk ) 2 2 < 1 1 2k 4k for k ∈ N. . B For ω ∈ Bc. it follows that P ( Ak inﬁnitely-often ) = 0 . there is Nk such that if n. if we set ε = 1/2k and denote by Ak the event Ak = { sup | It (hnk+1 − hnk ) | > 0≤t≤T 1 2k } then we get P (Ak ) ≤ (2k )2 I(hn − hm ) 2 2 < (2k )2 1 1 1 = k. I(hn ) − I(hm ) 2 2 T = 0 E(hn − hm )2 ds → 0 as n. + sup | It (hnk+1 )(ω) − It (hnk )(ω) | 0≤t≤T ifwilde Notes . we must have 0≤t≤T sup | It (hnk+1 )(ω) − It (hnk )(ω) | ≤ 1 2k for all k > k0 .3).

there is a set Ek with P (Ek ) = 1 such that if ω ∈ Ek then t → It (hnk )(ω) is continuous on [0. November 22.Wiener process 79 ≤ 1 2j−1 2 . But It (hnkj )(ω) → Jt (ω) on B c ∩E and so It (hnkj )(ω) → Jt (ω) on B c ∩E ∩St and therefore Jt (ω) = It (f )(ω) for ω ∈ B c ∩E ∩St . 2k + 1 2j−2 < + ··· + 1 2k This means that for each ω ∈ B c . by construction. Then P (B c ∩ E) = 1 and if ω ∈ B c ∩ E then t → Jt (ω) is continuous on [0. so P (E) = 1. T ]. we may say that Jt = It (f ) almost surely. We still have to show that the process (Jt ) is adapted. T ]. Now. T ] to some function of t. This is where we use the hypothesis that Ft contains all events of zero probability. In other words. it is uniformly Cauchy and so must converge uniformly on [0. on St with P (St ) = 1. It (hnk ) → It (f ) in L2 and so there is some subsequence (It (hnkj )) such that It (hnkj )(ω) → It (f )(ω) almost surely. say. However. for each k. 2004 . Indeed. We set Jt (ω) = 0 for all t if ω ∈ B c ∩ E / which means that t → Jt (ω) is continuous for all ω ∈ Ω. say Jt (ω). Since P (B c ∩E ∩St ) = 1. we know that Jt = lim It (hnk ) 1B c ∩E k→∞ Ft -measurable and so it follows that Jt is Ft -measurable and the proof is complete. the sequence of functions (It (hnk )(ω)) of t is a Cauchy sequence with respect to the norm ϕ = sup0≤t≤T |ϕ(t)|. Set E = k Ek .

80 Chapter 5 ifwilde Notes .

This might suggest that we should not expect stochastic calculus to be simply calculus with ω ∈ Ω playing a kind of parametric rˆle. Wt ) dWt almost surely. Suppressing the n dependence. makes it diﬃcult to evaluate stochastic integrals. Suppose ﬁrst that F (t. o Indeed. Theorem 6. T ] into n equal subintervals. if t−s is small. Proof. Fix (n) (n) (n) (n) ω ∈ Ω0 . for example. Wt ) dt + 0 ∂x F (t. After all. W0 ) + 0 T 1 ∂t F (t. |∂x F | < C and |∂xx F | < C. Then T F (T. of course. o It is constructed via limits of sums in which the integrator points to the future. Suppose that ∂x F (t. In particular. x) is such that the partial derivatives ∂x F and ∂xx F are bounded on [0. WT ) = F (0. so that 0 = t0 < t1 < · · · < tn = T partitions the interval [0. This. Let F (t. Wt ) ∈ KT . x) be a function such that the partial o derivatives ∂t F and ∂xx F are continuous. There is no reason to suppose that there is a stochastic fundamental theorem of calculus. we know. The following theorem allows a similar construction for stochastic integrals.1 (Itˆ’s Formula). Let Ω0 ⊂ Ω be such that P (Ω0 ) = 1 and t → Wt (ω) is continuous for each ω ∈ Ω0 . that the derivative of x3 is 3x2 and so the usual fundamental theorem of calculus tell us that the integral of 3x2 is x3 . By diﬀerentiating many functions. then we see that (Wt −Ws )2 is of ﬁrst order (on average) rather than second order. 81 . the Itˆ stochastic integral is not an integral in the usual sense.Chapter 6 Itˆ’s Formula o We have seen that E((Wt − Ws )2 ) = t − s for any 0 ≤ s ≤ t. say. Let tj = jT /n. one can consequently build up a list of integrals. Wt ) + 2 ∂xx F (t. T ]×R.

T ]. 2 j=0 + for some zj between Wtj (ω) and Wtj+1 (ω). Wtj+1 (ω)) − F (tj . by Taylor’s Theorem (to 2nd order). say. T ] × [−M. tj+1 ]. x) is continuous and so is uniformly continuous on any rectangle in R+ × R. Wtj+1 (ω)) − F (tj . T ]. Wtj (ω)) ∆j W + 1 ∂xx F (tj . Wtj (ω)) . Wtj+1 (ω)) − F (tj . Wtj+1 (ω)) | < ε for suﬃciently large n (so that |τj − tj+1 | ≤ 1/n is suﬃciently small). for some τj ∈ [tj . Wtj+1 (ω)) − F (tj . zj ) (∆j W )2 . ∂t F (t. Also t → Wt (ω) is continuous and so is bounded on the interval [0. Wtj+1 (ω)) + ∂t F (tj+1 . n−1 j=0 n−1 = ∂t F (τj . ≡ Γ1 (n) + Γ2 (n) + Γ3 (n) . M ] and so. WT (ω)) − F (0. Wtj+1 (ω)) . for any given ε > 0. Wtj+1 (ω)) ∆j t n−1 + j=0 F (tj . By hypothesis. by Taylor’s Theorem (to 1st order). We write ∂t F (τj .82 let ∆j t = tj+1 − tj (n) (n) Chapter 6 and ∆j W = Wtj+1 (ω) − Wtj (ω). |Wt (ω)| ≤ M on [0. Wtj+1 (ω)) − ∂t F (tj+1 . Consider ﬁrst Γ1 (n). Wtj+1 (ω)) ∆j t ∂x F (tj . We shall consider separately the behaviour of these three terms as n → ∞. W0 (ω)) n−1 j=0 n−1 = F (tj+1 . | ∂t F (τj . Wtj+1 (ω)) n−1 = j=0 + j=0 n−1 F (tj . x) is uniformly continuous on [0. Then we have F (T. ∂t F (t. Wtj (ω)) F (tj+1 . Wtj+1 (ω)) − ∂t F (tj+1 . then. ifwilde Notes . Wtj (ω)) = j=0 ∂t F (τj . Wtj+1 (ω)) ∆j t = ∂t F (τj . In particular.

Wtj (ω)) 1(tj . that is. Ws (ω)) ds 0 Next. ω) → g(·. for suﬃciently large n. n−1 Γ1 (n) = j=0 ∂t F (τj . we see that I(gn ) − I(g) 2 → 0. where g(s. ω) − g(s. That is. ω) uniformly on (0. 2004 . It follows that T Γ1 (n) → almost surely as n → ∞. Applying the Isometry Property. the second summation converges to the T 0 ∂s F (s. Wtj+1 (ω)) − ∂t F (tj+1 . Wtj (ω)) ∆j W j=0 Wtj+1 (ω)−Wtj (ω) = I(gn )(ω) where gn ∈ E is given by n−1 gn (s. by Lebesgue’s Dominated Convergence Theorem. T ] and gn (·. Wtj+1 (ω)) ∆j t n−1 + j=0 ∂t F (tj+1 . ω)|2 ds ≤ 4T C 2 and →0 E 0 |gn − g|2 ds as n → ∞. integral For large n.Itˆ’s Formula o 83 Hence. ω) − g(s. Wtj (ω)) is continuous (and so uniformly continuous) on [0. November 22. the ﬁrst summation on the right hand side is bounded by j ε ∆j t = ε T and. ω)|2 ds → 0 T 0 as n → ∞ for each ω ∈ Ω0 . ω) = ∂x F (s. Wtj+1 (ω)) ∆j t . Γ2 (n) = I(gn ) → I(g) in L2 and so there is a subsequence (gnk ) such that I(gnk ) → I(g) almost surely. the function t → ∂x F (tj . Now. as n → ∞. This is n−1 ∂x F (tj . Ws (ω)) ds. Ws (ω)). But therefore (since P (Ω0 ) = 1) T |gn (s.tj+1 ] (s) . It follows that T 0 |gn (s. T ]. Γ2 (nk ) → I(g) almost surely. ∂s F (s. ω) = j=0 ∂x F (tj . consider term Γ2 (n).

By independence. Wi (ω)) (∆i W )2 − ∆i t + 1 ∂xx F (ti . ∂xx F (ti . Chapter 6 For any ω ∈ Ω. As before. Wi ))2 E (∆i W )2 − ∆i t)2 . Wi ))2 ( (∆i W )2 − ∆i t)2 E (∂xx F (ti . Wi (ω)) (∆i W )2 − ∆i t etc.84 We now turn to term Γ3 (n) = write 1 2 1 2 n−1 2 j=0 ∂xx F (tj . if i < j. Ws (ω)) ds almost surely as E(Φ1 (nk )2 ) = E(( = E( 2 i αi ) ) i.j αi αj ) where αi = ∂xx F (ti . Wi (ω)) ∆i t → 1 2 T 0 1 2 T ∂xx F (s. Ws (ω)) ds 0 as n → ∞. Wi (ω)) (∆i W )2 . zi )(∆i W )2 = 1 2 ∂xx F (ti . we j=0 2 see that for ω ∈ Ω0 Φ2 (n) = n−1 1 i=0 2 ∂xx F (ti . To discuss Φ1 (nk ). Summing over j. = i ≤C 2 by independence. Wi (ω)) (∆i W )2 − ∆i t . Hence Φ2 (nk ) → k → ∞. zj )(∆j W ) . Wi (ω)) ∆i t 2 + 1 2 ∂xx F (ti . E(αi αj ) = E( αi ∂xx F (tj . zi ) − ∂xx F (ti . ≤ C2 =C =C ifwilde 2 i 2 i i E (∆i W )2 − ∆i t)2 E (∆i W )4 − 2∆i t(∆i W )2 + (∆i t)2 ( E (∆i W )4 − (∆i t)2 ) Notes . we write Γ3 (n) as Γ3 (n) ≡ Φ1 (n) + Φ2 (n) + Φ3 (n) where Φ1 (n) = n−1 1 ∂xx F (ti . Wj ) ) E( (∆j W )2 − ∆j t ) =0 and so E(Φ1 (nk )2 ) = i 2 E( αi ) = i E (∂xx F (ti . consider ∂xx F (s.

Then (just as we have argued for ∂x F ). 2 i E(βi ) + i=j E(βi βj ) 2 by independence. n−1 i=0 as n → ∞. |∂xx F (ti .j E(βi βj ) . 2 since E(βi ) = 0. for all suﬃciently large n. which establishes the claim. 4 2 2 i E( (∆i W ) − 2∆i t (∆i W ) + (∆i t) ) 2 i 2 (∆i t) 2 2 i ∆i t) ) 2 2 i {(∆Wi ) − ∆i t}) ) =2T n as n → ∞. we consider Sn − T 2 2 = E( ( = = = = = = →0 = E( (Sn − = E( (Sn − T )2 ) where βi = (∆Wi )2 − ∆i t.Itˆ’s Formula o 85 = C2 i ( 3(∆i t)2 − (∆i t)2 ) (∆i t)2 i T nk 2 =C 2 = C2 2 = i 2 T2 2 C nk 2 → 0. zi ) − ∂xx F (ti . zi ) − ∂xx F (ti . 2004 . i. Fix ω ∈ Ω0 . It follows that for any given ε > 0. To see this. we may say that the function t → ∂xx F (t. ( ∂xx F (ti . Hence. i E(βi ) . November 22. Hence Φ1 (nk ) → 0 in L2 . Wi (ω))| < ε for all 0 ≤ i ≤ n − 1. i E(βi ) + i=j E(βi ) E(βj ) . Wi (ω)) )(∆i W )2 . for suﬃciently large n. Now we consider Φ3 (n) = i ( ∂xx F (ti . T ]. Wi (ω)) )(∆i W )2 ≤ ε Sn (∗) where Sn = n−1 2 i=0 (∆i W ) . Claim: Sn → T in L2 as n → ∞. Wt (ω)) is uniformly continuous on [0. zi ) − ∂xx F (ti .

T ] and the same remark applies to the partial derivatives of FN and F . But ω ∈ BN and so (∗∗) holds with Fn replaced by FN which in turn means that (∗∗) holds with now FN replaced by F – simply because FN and F (and the derivatives) agree for such ω and all t in the relevant range. Then ∂x Fn and ∂xx Fn are continuous and bounded on [0. Wt ) + 2 ∂xx F (t. W0 ) + 0 T 1 ∂t F (t. Wt ) dt + 0 ∂x F (t. Set Fn (t. Combining these results. We conclude that T F (T. ifwilde Notes . we see that Γ1 (nm ) + Γ2 (nm ) + Γ3 (nm ) T T → ∂s F (s. let ϕn : R → R be a smooth function such that ϕn (x) = 1 for |x| ≤ n + 1 and ϕn (x) = 0 for |x| ≥ n + 2. It follows that there is some N ∈ N such that |Wt (ω)| < N for all t ∈ [0. x). Fn = F . Then ω ∈ Ω0 n Bn and so t → Wt (ω) is continuous. Ws (ω)) ds + 0 0 ∂x F (s. x) = ϕn (x) F (t. Moreover.86 Chapter 6 It follows that Snk → T in L2 and so there is a subsequence such that both Φ1 (nm ) → 0 almost surely and Snm → T almost surely. Wt ) dWt for all ω ∈ A and the proof is complete. Fix ω ∈ A. ∂x Fn = ∂x F and ∂xx Fn = ∂xx F whenever |x| ≤ n. It follows from (∗) that Φ3 (nm ) → 0 almost surely as m → ∞. ∂t Fn = ∂t F . Wt (ω)) for t ∈ [0. Wt ) 2 dt (∗∗) + 0 ∂x Fn (t. To remove this restriction. T ] and so FN (t. WT ) = Fn (0. Ws ) dWs + 1 2 T ∂xx F (s. Wt ) + 1 ∂xx Fn (t. We can apply the previous argument to deduce that for each n there is some set Bn ⊂ Ω with P (Bn ) = 1 such that T Fn (T. WT ) = F (0. which concludes the proof for the case when ∂x F and ∂xx F are both bounded. W0 ) + 0 T ∂t Fn (t. Ws (ω)) ds 0 almost surely. T ] × R. Let A = Ω0 ∩ so that P (A) = 1. Wt ) dWt for ω ∈ Bn . Wt (ω)) = F (t.

Applying Itˆ’s Formula. ∂x F = cos x and ∂xx F = − sin x. t s dWs = t Wt − Ws ds . we may say that eαWt − 2 tα is a martingale. Then Itˆ’s Formula tells us that o 2 Wt2 = W0 + t 0 1 2 t 2 ds + 0 2Ws dWs . Example 6. WT ) = F (0. 1 2 1 2 November 22. Wt ) − F (0.Itˆ’s Formula o 87 Example 6. x) = sin x.3. x) obeys ∂t F = − 2 ∂xx F .5. W0 ) + 0 T 1 ∂t F (t. Then if ∂x F (t. 0 Example 6. Then we ﬁnd that t t F (t. 0 1 Example 6. Wt ) + 2 ∂xx F (t. x) = s x. t 0 Ws ds + 0 0 s dWs . that is. Let F (t. Let F (t. so we ﬁnd that ∂t F = 0. Wt2 = t + 2 This can also be expressed as t t Ws dWs . Let F (s. Itˆ’s formula gives o T F (T. Wt ) ∈ KT . x) = eαx− 2 tα . Suppose that the function F (t. 0 Ws dWs = 0 1 2 1 Wt2 − 2 t . W0 ) = t Wt − 0 = that is. W0 ) = sin Wt = or 0 t 0 cos(Ws ) dWs − t 1 2 t sin(Ws ) ds 0 cos(Ws ) dWs = sin Wt + 1 2 sin(Ws ) ds . Wt ) − F (0. we get o t F (t.4. 2004 .2. x) = x2 . Wt ) =0 dt + 0 ∂x F (t. Wt ) dWt martingale Taking F (t.

X0 ) = ∂s F (s. then we recover the expression (∗). on the grounds that dt dX and dt dt are negligible. Xs ) ds t + 0 v(s. Xs ) 0 1 + 2 v 2 ∂xx F (s. 2 However. Xs ) dWs . Xs ) + u ∂x F (s. ω) ∂x F (s. But we know that E((∆W )2 ) = ∆t and so if we replace dW dW by dt and substitute these expressions for dX and dX dX into (∗∗). this can be written as 1 dF (t. ω) dWs where u and v obey P ( t 2 0 |u| ds < ∞) = 1 and similarly for v. we might also write dF (t. One has t F (t. Xt ) = ∂t F dt + u ∂x F dt + 2 v 2 ∂xx F dt + v ∂x F dW . in the same spirit. Xt ) − F (0.88 Chapter 6 Itˆ’s formula also holds when Wt is replaced by the somewhat more general o processes of the form t t Xt = X 0 + 0 u(s. ω) ds + 0 v(s. It would appear that standard practice is to manipulate such symbolic diﬀerential expressions (rather than the proper integral equation form) in conjunction with the following Itˆ table: o Itˆ table o dt2 = 0 dt dW = 0 dW 2 = dt ifwilde Notes . (∗) On the other hand. we may write dX = u dt + v dW which leads to dX dX = u2 dt2 =0 (∗∗) + 2uv dt dW =0 + v 2 dW dW . X) = ∂t F dt + ∂x F dX + 1 ∂xx F dX dX . In symbolic diﬀerential form.

or. We deduce that F (Xt ) = eXt = Zt obeys dZ = eX g dW = Z g dW . Yt = Y0 + 0 (Ms − u Xs Ms ) dWs which is a martingale. Consider Zt = e Ê t 0 1 g dW − 2 t Ê t 0 g 2 ds .7. Applying Itˆ’s formula (equation (∗) above) with F (t.Itˆ’s Formula o 89 Example 6. we obtain dMt = dF (Vt ) = (0 − 1 u2 eV + 1 u2 eV ) ds − u eV dW 2 2 =0 = −u e dWt . T 0 Ê g 2 ds ) < ∞. we have ∂t F = 0. with F (t. Suppose that dX = u dt + dW . 1 It can be shown that this process is a martingale for suitable g – such a condition is given by the requirement that E(e 2 Novikov condition. Let t 1 2 Xt = X0 + 0 g dW − g 2 ds 0 1 so that dX = − 2 g 2 ds + g dW . Mt = e− consider the process Xt Mt . So dYt = −Xt u eVt dWt + Mt (u dt + dWt ) − u eVt dWt (u dt + dWt ) = (−u Xt Mt + Mt ) dWt . Then. so that t Zt = Z0 + 0 g Zs dWs . x and ∂ F = ex so that ∂x F = e xx dF = ∂t F (X) dt + ∂x F (X) dX + 1 ∂xx F (X) dX dX 2 1 = 0 + eX − 1 g 2 dt + g dW + 2 eX dX dX 2 g 2 dt = eX g dW . in integral form. (so ∂t F = 0 and ∂x F = ∂xx F = ex and F (Vt ) = Mt ). November 22. the so-called t 0 Example 6. Then Ê 1 u dW − 2 Ê t 0 u2 ds and dYt = Xt dMt + Mt dYt + dXt dMt .6. 2004 . x) = ex . The question now is what is dMt ? Let dVt = − 1 u2 dt − u dWt so that 2 o Mt = eVt . x) = ex . t Vt = −u Xt Mt dWt + u Mt dt + Mt dWt − u eVt dt since Mt = eVt .

this is just a convenient shorthand for the corresponding integral equation. W (2) . . Typically. Wt ) dWt 2 (∗∗) ifwilde Notes . X) dXi + 1 2 i. .j ∂xi xj dXi dXj where the Itˆ table is enhanced by the extra rule that dW (i) dW (j) = δij dt. Let o (W (1) . Of course. . .9. let us seek a solution of the form Xt = f (t. . Wt ) dt + fx (t. Such an equation has been used in ﬁnancial mathematics (to model a risky asset). mathematically. . µ and σ are constants with σ > 0. Wt ) + 1 fxx (t. . X) dt + i=1 ∂xi F (t. Consider the “stochastic diﬀerential equation” dXt = µ Xt dt + σ Xt dWt (∗) with X0 = x0 > 0.8. The quantity Xt is the object under investigation and the second term on the right is supposed to represent some random input to the change dXt . x). W (m) ) be an m-dimensional Wiener process (that is. To solve this stochastic diﬀerential equation. Xn (ω)). one might wish to consider the behaviour of a certain quantity as time progresses. There is also an m-dimensional version of Itˆ’s formula. . Then n dY = ∂t F (t. According to Itˆ’s o formula. Example 6. Here. .90 Chapter 6 Remark 6. the W (j) are independent Wiener processes) and suppose that dX1 = u1 dt + v11 dW (1) + · · · + v1m dW (m) . These are usually obtained from physical considerations in essentially the same way as are “usual” diﬀerential equations but when a random inﬂuence is to be taken into account. such a process would satisfy dXt = df = ft (t. dXn = un dt + vn1 dW (1) + · · · + vnm dW (m) and let Yt (ω) = F (t. . X1 (ω). o Stochastic Diﬀerential Equations The purpose of this section is to illustrate how Itˆ’s formula can be used o to obtain solutions to so-called stochastic diﬀerential equations. Then one theorizes that the change of such a quantity over a small period of time is approximately given by such and such terms (depending on the physical problem under consideration) plus some extra factor which is supposed to take into account some kind of random inﬂuence. Wt ) for some suitable function f (t.

n Now. standard normal). Wt ) 2 σ f (t. x) = C(0) eσx+t(µ− 2 σ ) . we can calculate the expectation E(Xt ). identically distributed random variables with mean zero (in fact. whereas E(Xn ) → ∞ (exponentially quickly). . . From this. November 22. x) requires C(t) = et(µ− 2 σ(2) C(0) which leads to the solution f (t. we have Xn = x0 eσ Wn +n(µ− 2 σ = x0 eσn ( 1 Wn −( 2 n 1 2) σ 2 −µ)) . . 0) = C(0) so that C(0) = x0 and we have found the required solution Xt = x0 eσ Wt +t(µ− 2 σ ) . Wt ) .Itˆ’s Formula o 91 Comparing coeﬃcients in (∗) and (∗∗). 2004 . x) satisfying 1 µ f (t. x) = eσx C(t). So we can apply the Law of Large Numbers to deduce that P ( Wn → 0 as n → ∞) = 1 . So we try to ﬁnd a function f (t. W0 ) = f (0. x) (i) (ii) σ f (t. Wt ) = fx (t. If σ 2 > 2µ. we ﬁnd µ f (t. x) = fx (t. Zn are independent. Wt ) + 1 fxx (t. 1 2 Equation (ii) leads to f (t. x) = ft (t. Wt ) = ft (t. Substitution into equation (i) 1 But then X0 = f (0. . x) + 2 fxx (t. we can write Wn (Wn − Wn−1 ) + (Wn−1 − Wn−2 ) + · · · + (W1 − W0 ) = n n Z1 + Z2 + · · · + Zn = n where Z1 . We see that Xn has the somewhat surprising behaviour that Xn → 0 with probability one. However. the right hand side above → 0 almost surely as n → ∞. We ﬁnd E(Xt ) = x0 etµ E eσ Wt − 2 tσ =1 1 2) 1 2 = x0 e tµ and so we see that if µ > 0 then the expectation E(Xt ) grows exponentially as t → ∞.

Wt ) + ∂y ϕ(Wt . x.11. In diﬀerential form. y) so that ∂t f = 0. Wt ) dWt (1) (2) (1) (2) (1) + ∂y ϕ(Wt . or Yt = Y0 + Hence Xt = C e−αt Y0 + σ C t 0 t 0 σ C t 0 eαs dWs . Ornstein-Uhlenbeck Process Consider dXt = −α Xt dt + σ dWt Chapter 6 with X0 = x0 . 2 2 In particular. Itˆ’s formula then says that dMt = df = ∂x ϕ(Wt . ifwilde Notes . Wt ) dWt + 1 2 (1) (1) (2) (1) + ∂y ϕ(Wt . we require g ′ Y = −αg Y gh = σ . W (2) ) be a two-dimensional Wiener process and (1) (2) o let Mt = ϕ(Wt . Example 6. Wt ) is a martingale. y) = ϕ(x. Wt ) dWt (2) (1) (2) (1) (2) (2) 2 2 ∂x ϕ(Wt . Wt ). We look for a solution of the form Xt = g(t) Yt where dYt = h(t) dWt . Wt ) dt . Let (W (1) . X0 = x0 and so x0 = CY0 and we ﬁnally obtain the solution Xt = x0 e−αt + σ 0 t eα(s−t) dWs . if ϕ is harmonic (so ∂x ϕ + ∂y ϕ = 0) then dMt = ∂x ϕ(Wt . Wt ) dWt (1) (2) (2) and so Mt = ϕ(Wt . The ﬁrst of these equations leads to the solution g(t) = C e−αt . When t = 0. where α and σ are positive constants. Comparing this with the original equation. Let f (t. eαs dWs = e−αt C Y0 + σ eαs dWs . this becomes (using the Itˆ table) o dXt = g dYt + dg Yt + dg dYt = gh dWt + g ′ Yt dt + 0 . This gives σ h = σ/g = σeαt /C and so dY = C eαt dW .92 Example 6.10.

u(t. y) = u(t−s. x + Ws ) e s 0 Ê Ê Ê Ê + df d e ds dWs Ê s 0 q(x+Wv ) dv s 0 q(x+Wv ) dv s 0 q(x+Wv ) dv s 0 q(x+Wv ) dv ds + 0 q(x+Wv ) dv dWs . Then (the unique) solution to the initial-value problem ∂t u(t. Let q : R → R and f : R → R be bounded. has the representation u(t. o In this case ∂s f = −∂1 u = −u1 . x) = f (x). 2004 . x) = 1 2 uxx (t. x) . x+y). x + Ws ) e Ê s 0 q(x+Wv ) dv Applying the rule d(XY ) = (dX) Y + X dY + dX dY together with the Itˆ o table. we get dMs = df e Ê s 0 q(x+Wv ) dv +fd e Ê s 0 q(x+Wv ) dv = −q(x + Ws ) u(t − s. x + Ws ) dWs where we have made use of the partial diﬀerential equation satisﬁed by u. with u(0. Consider Itˆ’s formula applied to the function f (s. x) with u(0. Proof.12 (Feynman-Kac). x + Ws ) ds + u2 (t − s. x) + q(x) u(t. set Ms = u(t − s. x) = √2πt −∞ However. x + Ws ) dWs ds = −q(x + Ws ) u(t − s. ∂yy f = ∂22 u = u22 and so we get df (s. x) = f (x) (where f is well-behaved). November 22. x + Ws ) ds 1 + 2 u22 (t − s. Theorem 6. rather than at 0. x + Ws ) e + u2 (t − s. x + Ws ) e + f q(x + Ws ) e = u2 (t − s. ∂y f = ∂2 u = u2 . the right hand side here is equal to the expectation E( f (x + Wt ) ) and x + Wt is a Wiener process started at x. For 0 ≤ s ≤ t. Ws ) = −u1 (t − s. The solution can be written as ∞ 2 1 f (v) e(v−x) /2t dv . x) = E f (x + Wt ) e Ê t 0 q(x+Ws ) ds .Itˆ’s Formula o 93 Feynman-Kac Formula Consider the diﬀusion equation ut (t. x) = 1 2 ∂xx u(t. x + Ws ) dWs dWs + u2 (t − s.

Let X ∈ L2 (Ω. then the isometry property tells us that f → I(f ) is an isometric isomorphism between KT and the subspace of L2 (FT ) orthogonal to 1. To say that X in the Hilbert space L2 (FT ) obeys E(X) = 0 is to say that X is orthogonal to 1. However. in particular. Martingale Representation Theorems Let (Fs ) be the standard Wiener process ﬁltration (the minimal ﬁltration generated by the Wt ’s). so the theorem says that every element of L2 (FT ) orthogonal to 1 is a stochastic integral. Theorem 6. that the second part of the theorem follows from the ﬁrst T part by setting X = XT . α = X0 = E(XT ). P ). by the ﬁrst part. x + Wt ) e =f (x+Wt ) Ê t 0 q(x+Wv ) dv by the initial condition. then there is f ∈ KT (unique almost surely) such that t Xt = X0 + 0 f dWs for all 0 ≤ t ≤ T . Moreover. if we denote 0 f dW by I(f ). E(Mt ) = E(M0 ). x) and E(Mt ) = E u(0. Evidently. The result follows.13 (L2 -martingale representation theorem). x) almost surely and so E(M0 ) = u(t. further. ifwilde f dWs 0 Notes . x + Ws ) e Ê s 0 q(x+Wv ) dv dWs and so Mτ is a martingale and. Then there is f ∈ KT (unique almost surely) such that T X =α+ 0 f (s. The uniqueness of f ∈ KT in the theorem is T a consequence of the isometry property. Note. If (Xt )t≤T is an L2 -martingale. M0 = u(t. by construction. ω) dWs where α = E(X). XT = α + 0 f dW and so t Xt = E(XT | Ft ) = α + for 0 ≤ t ≤ T .94 It follows that τ Chapter 6 Mτ = M0 + 0 u2 (t − s. Indeed. FT . We will not prove this here but will just make some remarks.

Xt ) dt + 1 ∂xx f (t. 2 − t v 2 ds is the martingale part. But now dXs = u ds + v dWs = 0 + v dWs and Xs is a martingale and 2 2 Xt = X 0 + t t 2Xs v dWs 0 + 0 v 2 ds martingale part.14. ∂x f = 2x and ∂xx f = 2 so that 2 d(Xt ) = 0 + dXt dXt +2X dXt . The converse is true (L´vy’s characterization) as we show next using the e following result. Xt ) dXt 2 where. For any A ∈ G. Suppose that X is F-measurable and for each θ ∈ R 2 2 E(eiθX | G) = e−θ σ /2 1A ) −θ2 σ 2 /2 E(1A ) P (A) . we see that E(eiθX ) = e−θ 2 σ 2 /2 and so it follows that X is normal with mean zero and variance σ 2 . with A = Ω. Then ∂t f = 0. x) = x2 . E(eiθX 1A ) = E(e−θ =e =e 2 σ 2 /2 Proposition 6. almost surely.Itˆ’s Formula o 95 Recall that if dXt = u dt + v dWt then Itˆ’s formula is o df (t. we have dX = v dW = dW so that Xt = X0 + Wt and in this case (Xt ) is a Wiener process. suppose that A ∈ G with P (A) > 0. Xt ) dXt dXt + ∂x f (t. P (A) P (A) November 22. Deﬁne Q on F by Q(B) = P (B ∩ A) E(1A 1B ) = . o Set u = 0 and let f (t. dXt dXt = v 2 dt. according to the Itˆ table. =v 2 dt That is. If v = 1. To show that X is independent of G. Let G ⊂ F be σ-algebras. Zt increasing process part. At 2 This is the Doob-Meyer decomposition of the submartingale (Xt ). Xt ) = ∂t f (t. 2004 . −θ2 σ 2 /2 In particular. 2 2 Xt = X0 + 0 t t 2Xs dXs + 0 v 2 ds . But if v = 1. Then X is independent of G and X has a normal distribution with mean 0 and variance σ 2 . then t v 2 ds = t Note that Xt 0 0 2 and so this says that Xt − t is the martingale part. Proof.

ω) dWs . Theorem 6. if Φ denotes the standard normal distribution function. Xt ). x) = eiθx so that ∂t f = 0. Hence. This trivially also holds for A ∈ G with P (A) = 0 and so we may conclude that this holds for all A ∈ G which means that X is independent of G. β 2 dt=dAt =dt β dWt ifwilde Notes . Using this. ∂x f = iθ eiθx and ∂xx f = −θ2 eiθx and apply Itˆ’s formula to f (t. it follows that β 2 dt = dt (the increasing part is At = t). then for any x ∈ R we have Q(X ≤ x) = Φ(x/σ) =⇒ P ({ X ≤ x } ∩ A) = Φ(x/σ) = P (X ≤ x) P (A) =⇒ P ({ X ≤ x } ∩ A) = P (X ≤ x) P (A) for any A ∈ G with P (A) = 0. Then (Xt ) is a Wiener process.96 Then Q is a probability measure on F and EQ (eiθX ) = Ω Chapter 6 eiθX dQ eiθX dP P (A) Ω iθX 1 ) E(e A = P (A) = = e−θ 2 σ 2 /2 from the above. there is β ∈ KT such that t Xt = 0 β(s. By the L2 -martingale representation theorem. 2 We have seen that d(Xt ) = dZt + β 2 dt where (Zt ) is a martingale and so by hypothesis and the uniqueness of the Doob-Meyer decomposition d(X 2 ) = dZ + dA. we can now discuss L´vy’s theorem where as before. e Let (Xt )t≤T be a (continuous) L2 -martingale with X0 = 0 and such that 2 Xt − t is an L1 -martingale. where now dX = β dW . Let f (t.15 (L´vy’s characterization of the Wiener process). Proof. It follows that X also has a normal distribution. we work e with the minimal Wiener ﬁltration. with mean zero and variance σ 2 with respect to Q. to get o 1 d(eiθXt ) = − 2 θ2 eiθXt dXt dXt +iθ eiθXt dXt .

Apply Itˆ’s formula to f (t. T ]. November 22. However. Proof. we consider the process t Xt = W t + 0 µ(s. x) = ex and where Yt is the o t t 1 process Yt = − 0 µ dW − 1 0 µ2 ds so Mt = eYt and dY = −µ dW − 2 µ2 ds. Cameron-Martin. Letting ϕ(t) = E(eiθ(Xt −Xs ) | Fs ). Let Mt = e− Claim: (Mt )t≤T is a martingale. Ê t 0 1 µ dW − 2 Ê t 0 µ2 ds . The result now follows from the proposition.Itˆ’s Formula o 97 So eiθXt − eiθXs = − 1 θ2 2 and therefore 1 eiθ(Xt −Xs ) − 1 = − 2 θ2 t t s s t eiθXu du + iθ s t eiθXu β dWu eiθ(Xu −Xs ) du + iθ s t eiθ(Xu −Xs ) β dWu (∗) Let Yt = 0 eiθXu β dWu . this term then drops out and we ﬁnd that 1 E(eiθ(Xt −Xs ) | Fs ) − 1 = − 2 θ2 t s E(eiθ(Xu −Xs ) | Fs ) du . To discuss this. we have ϕ(t) − 1 = − 1 θ2 2 t ϕ(u) du s 2 /2 =⇒ ϕ(t) = C e−(t−s)θ =⇒ ϕ′ (t) = − 1 θ2 ϕ(t) and ϕ(s) = 1 2 for t ≥ s. 2004 . it becomes a martingale if the underlying probability measure is changed. 2 We see that ∂t f = 0 and ∂x f = ∂xx f = ex and therefore dM = d(eY ) = 0 + 1 eY dY dY + eY dY 2 = 1 2 1 eY µ2 ds + eY (−µ dW − 2 µ2 ds) = −µeY dW and so Mt = eYt is a martingale. Then (Yt ) is a martingale and the second term on the right hand side of equation (∗) is equal to iθ e−iθXs (Yt − Ys ). as claimed. Taking the conditional expectation with respect to Fs . ω) ds where µ is bounded and adapted on [0. Girsanov change of measure The process Wt +µt is a Wiener process “with drift” and is not a martingale. Y ) where f (t.

Then Q is a probability measure and (Xt )t≤T is a martingale with respect to Q. ifwilde Notes . it is clearly a measure. Furthermore. Theorem 6. Since Q is the map A → Q(A) = A MT dP on F. dQ = MT dP . as required. Proof. Then. we ﬁnd that Xt dQ = A A Xt MT dP E(Xt MT | Ft ) dP .16.) (Note. We can write Q(A) = Ω 1A MT dP and therefore Ω f dQ = Ω f MT dP for Q-integrable f (symbolically. let 0 ≤ s ≤ t ≤ T and let A ∈ Fs . Let Q be the measure on F given by Q(A) = E(1A MT ) for A ∈ F. incidentally. Q(Ω) = E(MT ) = E(M0 ) since Mt is a martingale.98 Claim: (Xt Mt )t≤T is a martingale. Proof. Xt Mt dP A = A = = A Xs Ms dP E(Xs MT | Fs ) dP Xs MT dP A = A = = A Xs dQ E Q (Xt |Fs ) = Xs and so and the proof is complete. Itˆ’s product formula gives o d(X M ) = (dX) M + X dM + dX dM = M (dW + µ ds) + X(−µM dW ) + (−µM ) ds = (M − µXM ) dW and therefore X t Mt = X 0 M0 + 0 t Chapter 6 (M − µXM ) dW is a martingale. that Q(A) = 0 if and only if P (A) = 0.) To show that Xt is a martingale with respect to Q. But E(M0 ) = 1 and so we see that Q is a probability measure on F. using the facts shown above that Mt and Xt Mt are martingales. since A ∈ Fs ⊂ Ft .

Birkh¨user. Very good background reference for probability theory. 99 . Billingsley. P. z Springer. L. Probability. Arnold. Probability: Theory and Examples. 1996. Real Analysis and Probability. 2nd edition. Excellent. Durrett. 1990. Chung and R. This book is a must. Absolutely ﬁrst class textbook — written to be understood by the reader. Pricing and Hedging of Financial Derivatives. 2nd edition. The author clearly enjoys his subject — but be prepared to ﬁll in many gaps (as exercises) for yourself. R. Springer. 1995. Kiesel. Some of these are very instructional. Advanced monograph with many references to a the ﬁrst author’s previous works. as the authors point out. 1974.Textbooks on Probability and Stochastic Analysis There are now many books available covering the highly technical mathematical subject of probability and stochastic analysis. Addison-Wesley. Very nice — user friendly. 1999. Ash. Williams. Theory and Applications. Bingham and R. 1998. H. Brze´niak and Tomasz Zastawniak. 3rd edition. Z. Excellent for background probability theory and functional analysis. Introduction to Stochastic Integration. Stochastic Diﬀerential Equations. N. SUMS. Basic Stochastic Processes. Risk-Neutral Valuation. Breiman. John Wiley. J. 1972. L. 1968. Useful reference for those interested in applications in mathematical ﬁnance. but. Probability and Measure. John Wiley. R. K. many proofs are omitted. Academic Press. well-written and highly recommended. Duxbury Press. L.

Karatzas and S. Very good — recommended. User-friendly — but lots of gaps. For the specialist in ﬁnancial mathematics. Klebaner. Elliott and P. Korn and E. E. D. J. For the practitioners. Stochastic Diﬀerential Equations and Diﬀusion Processes. Stochastic Diﬀerential Equations and their Applications. Shiryaev. R. Good reference book — with constant reference to the author’s other book (and with many exercises). 2nd edition. 1984. N. For those who enjoy functional analysis. Ikeda and S. Financial Markets. 1980. 2001. Option Pricing and Portfolio Optimization. Horwood Publishing. Good — but sometimes very brisk. American Mathematical Society. Springer.100 R. R. 2nd edition. Lipster and A. 1982. Elliott. Modern Methods of Financial Mathematics. V. Stochastic Calculus and Applications. I General Theory. 2nd edition. 1996. Advanced treatise. S. I. J. 1999. Lamberton and B. Brownian Motion and Stochastic Calculus. Many references to other texts and with proofs left to the reader. Press. North-Holland. 1998. X. Cambridge University R. American Mathematical Society. Advanced monograph. Introduction to Stochastic Calculus Applied to Finance. Kopp. Brownian Motion. Springer. Advanced no-nonsense monograph. Stochastic Calculus. Shreve. 2000. 1999. 1991. Lapeyre. 1997. Especially concerned with generalized stochastic processes. CRC Press. P. Durrett. E. A Practical Introduction. T. N. Stochastics of Random Processes. Well worth a look. Mathematics of Financial Markets. E. R. Chapman and Hall/CRC 1996. Introduction to Stochastic Calculus with Applications. Stochastic Analysis and the Pricing of Derivative Securities. Korn. Watanabe. Martingales and Stochastic Integrals. Imperial College Press. A. F. Hida. ifwilde Notes . Springer. Mel´nikov. Springer. 1989. Mao. Kopp. Springer.

Springer. An Introduction with Applications. Excellent — very well-written careful logical account of the theory. This book is like a breath of fresh air for those interested in the mathematics. Advanced treatise. Diﬀusions. Springer. A. Winkler. Probability and Potentials. Yor. J. D.Textbooks 101 P. Martingales and Stochastic Analysis. Steele. 2000. Markov Processes and Martingales. Continuous martingales and Brownian motion. Elementary Stochastic Calculus with Finance in View. Advanced mathematical monograph. D. 2004 . 1991. John Wiley. 1966. D. with proofs of lots of background material — recommended. Very good — recommended reading. B. Øksendal. Cambridge University Press. 1998. Blaisdell Publishing Company. Stochastic calculus and Financial Applications. Revuz and M. 1991. Good — but there are many results of interest which only appear as exercises. Springer. Very amusing — but expect to look elsewhere if you would like detailed explanations. Williams. 1979. November 22. Vieweg and Son. H. World Scientiﬁc. von Weizs¨cker and G. T. World Scientiﬁc. Top-notch work from one of the masters (but not easy). 1995. Probability with Martingales. M. Meyer. Mikosch. Yeh. 2001. Stochastic Diﬀerential Equations. 5th edition. Very good. Excellent — a very enjoyable and accessible account. a 1990. Stochastic Integrals. Williams. J.