This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

2

Lawrence C. Evans Department of Mathematics UC Berkeley

Chapter 1: Introduction Chapter 2: A crash course in basic probability theory Chapter 3: Brownian motion and “white noise” Chapter 4: Stochastic integrals, Itˆ’s formula o Chapter 5: Stochastic diﬀerential equations Chapter 6: Applications Appendices Exercises References

1

PREFACE These notes survey, without too many precise details, the basic theory of probability, random diﬀerential equations and some applications. Stochastic diﬀerential equations is usually, and justly, regarded as a graduate level subject. A really careful treatment assumes the students’ familiarity with probability theory, measure theory, ordinary diﬀerential equations, and partial differential equations as well. But as an experiment I tried to design these lectures so that starting graduate students (and maybe really strong undergraduates) can follow most of the theory, at the cost of some omission of detail and precision. I for instance downplayed most measure theoretic issues, but did emphasize the intuitive idea of σ–algebras as “containing information”. Similarly, I “prove” many formulas by conﬁrming them in easy cases (for simple random variables or for step functions), and then just stating that by approximation these rules hold in general. I also did not reproduce in class some of the more complicated proofs provided in these notes, although I did try to explain the guiding ideas. My thanks especially to Lisa Goldberg, who several years ago presented my class with several lectures on ﬁnancial applications, and to Fraydoun Rezakhanlou, who has taught from these notes and added several improvements. I am also grateful to Jonathan Weare for several computer simulations illustrating the text. Thanks also to many readers who have found errors, especially Robert Piche, who provided me with an extensive list of typos and suggestions that I have incorporated into this latest version of the notes.

2

**CHAPTER 1: INTRODUCTION A. MOTIVATION Fix a point x0 ∈ Rn and consider then the ordinary diﬀerential equation: (ODE)
**

n n

˙ x(t) = b(x(t)) (t > 0) x(0) = x0 ,

where b : R → R is a given, smooth vector ﬁeld and the solution is the trajectory x(·) : [0, ∞) → Rn .

x(t) x0

**Trajectory of the differential equation ˙ Notation. x(t) is the state of the system at time t ≥ 0, x(t) :=
**

d dt x(t).

In many applications, however, the experimentally measured trajectories of systems modeled by (ODE) do not in fact behave as predicted:

X(t) x0

Sample path of the stochastic differential equation Hence it seems reasonable to modify (ODE), somehow to include the possibility of random eﬀects disturbing the system. A formal way to do so is to write: ˙ X(t) = b(X(t)) + B(X(t))ξ(t) (t > 0) (1) X(0) = x0 , where B : Rn → Mn×m (= space of n × m matrices) and ξ(·) := m-dimensional “white noise”. This approach presents us with these mathematical problems: • Deﬁne the “white noise” ξ(·) in a rigorous way. • Deﬁne what it means for X(·) to solve (1). • Show (1) has a solution, discuss uniqueness, asymptotic behavior, dependence upon x0 , b, B, etc. 3

write dt instead of the dot: dX(t) dW(t) = b(X(t)) + B(X(t)) . We ask: what stochastic diﬀerential equation does Y (t) := u(X(t)) (t ≥ 0) solve? Oﬀhand. there will still remain these modeling problems: • Does (SDE) truly model the physical situation? • Is the term ξ(·) in (1) “really” white noise. but highly oscillatory functions? See Chapter 6. We say that X(·) solves (SDE) provided t t (2) X(t) = x0 + 0 b(X(s)) ds + 0 B(X(s)) dW for all times t > 0 . Thus we may symbolically write ˙ W(·) = ξ(·).B. 4 . is a stochastic diﬀerential equation. or is it rather some ensemble of smooth. x0 = 0. SOME HEURISTICS Let us ﬁrst study (1) in the case m = n. and B ≡ I. etc. This expression.: See Chapter 5. d Now return to the general case of the equation (1). dt dt and ﬁnally multiply by “dt”: (SDE) dX(t) = b(X(t))dt + B(X(t))dW(t) X(0) = x0 . denoted W(·). As we will see later these questions are subtle. we would guess from (3) that dY = u′ dX = u′ bdt + u′ dW. properly interpreted. And once all this is accomplished. Now we must: • Construct W(·): See Chapter 3. ITO’S FORMULA Assume n = 1 and X(·) solves the SDE (3) dX = b(X)dt + dW. Part of the trouble is the strange form of the chain rule in the stochastic calculus: ˆ C. Suppose next that u : R → R is a given smooth function. thereby asserting that “white noise” is the time derivative of the Wiener process. and diﬀerent answers can yield completely diﬀerent solutions of (SDE). or Brownian motion. • Show (2) has a solution. The solution of (1) in this setting turns out to be the n-dimensional Wiener process. t • Deﬁne the stochastic integral 0 · · · dW : See Chapter 4. b ≡ 0.

2 1 with the extra term “ 2 u′′ dt” not present in ordinary calculus. we obtain 1 dY = u′ dX + u′′ (dX)2 + . the relative change of price. 2 1 ′ = u (bdt + dW ) + u′′ (bdt + dW )2 + . A standard model assumes that dP . Using once again Itˆ’s formula we can check that the o solution is P (t) = p0 e σW (t)+ µ− σ 2 2 t t . . 5 . however ! In fact. 2 Here we used the “fact” that (dW )2 = dt. namely Y (t) := eW (t) . This is wrong. Let P (t) denote the (random) price of a stock at time t ≥ 0. Consequently if we compute dY and keep all terms of order dt or 1 (dt) 2 . Y (0) = 1 is Y (t) := eW (t)− 2 . (4) d dx . the solution of the stochastic diﬀero ential equation dY = Y dW.according to the usual chain rule. ˆ and not what might seem the obvious guess. In other words. 2 from (3) = 1 u′ b + u′′ dt + u′ dW + {terms of order (dt)3/2 and higher}. Example 2. called respectively the drift and the volatility of the stock. dW ≈ (dt)1/2 in some sense. where ′ = as we will see. Hence dY = 1 u′ b + u′′ dt + u′ dW. where p0 is the starting price. involving stochastic diﬀerentials. A major goal of these notes is to provide a rigorous interpretation for calculations like these. . . evolves according to P the SDE dP = µdt + σdW P for certain constants µ > 0 and σ. which follows from (4). dP = µP dt + σP dW P (0) = p0 . According to Itˆ’s formula. . Example 1.

A sample path for stock prices 6 .

Central Limit Theorem Conditional expectation Martingales This chapter is a very rapid introduction to the measure theoretic foundations of probability theory. variance Distribution functions Independence Borel–Cantelli Lemma Characteristic functions Strong Law of Large Numbers. E.CHAPTER 2: A CRASH COURSE IN BASIC PROBABILITY THEORY A. D. H. The diameter of the large circle is 4 inches and the chord will hit the small circle if it falls within its 2-inch diameter. More details can be found in any good introductory text. 7 . Basic deﬁnitions Expected value. G. Take a circle of radius 2 inches in the plane and choose a chord of this circle at random. Thus probability of hitting inner circle = area of inner circle 1 = . for instance Bremaud [Br]. A. I. B. What is the probability this chord intersects the concentric circle of radius 1 inch? Solution #1 Any such chord (provided it does not hit the center) is uniquely determined by the location of its midpoint. Chung [C] or Lamperti [L1]. BASIC DEFINITIONS Let us begin with a puzzle: Bertrand’s paradox. C. F. area of larger circle 4 Solution #2 By symmetry under rotation we may assume the chord is vertical.

2 6 probability of hitting inner circle = Hence θ Therefore probability of hitting inner circle = 2π 6 2π 2 = 1 . Here Ac := Ω − A is the complement of A. ∞ k=1 Ak ∈ U. The correct way to do so is by introducing as follows the precise mathematical structure of a probability space. A σ-algebra is a collection U of subsets of Ω with these properties: (i) ∅. . The angle θ the chord makes with the horizontal lies between ± π and the chord hits the inner circle if θ lies between ± π . A2 . DEFINITION. P (Ω) = 1. DEFINITION. denoted Ω. Let U be a σ-algebra of subsets of Ω. then ∞ k=1 Ak . then P( ∞ k=1 Ak ) ≤ 8 ∞ k=1 P (Ak ). We start with a nonempty set. Ω ∈ U. We call P : U → [0. certain subsets of which we will in a moment interpret as being “events”. 4 inches 2 Solution #3 By symmetry we may assume one end of the chord is at the far left point of the larger circle. 1] a probability measure provided: (i) P (∅) = 0. 3 PROBABILITY SPACES. · · · ∈ U.2 inches 1 = . (ii) If A ∈ U. This example shows that we must carefully deﬁne what we mean by the term “random”. (iii) If A1 . then Ac ∈ U. · · · ∈ U. A2 . (ii) If A1 .

B ∈ U. . A triple (Ω. . . satisfying pj = 1. . U is a σ-algebra of subsets of Ω. U. Example 2. DEFINITION. . This means that we must ﬁrst of all carefully identify an appropriate (Ω. P ) when we try to solve problems. Then (Rn . denoted B. Here is another example. . For each set A = {ωj1 .(iii) If A1 . . that is. We deﬁne P (B) := B f (x) dx for each B ∈ B. integrable function. . are disjoint sets in U. . P ). U. ω2 . points ω ∈ Ω are sample points. Example 1. (ii) P (A) is the probability of the event A. . Then (Rn . then P( It follows that if A.s. P ) is a probability space. we deﬁne P (A) := pj1 + pj2 + · · · + pjm . ωjm } ∈ U. We call f the density of the probability measure P . We take U to comprise all subsets of Ω. with 1 ≤ j1 < j2 < .”). U. P ) is called a probability space provided Ω is any set. Terminology. . ωN } be a ﬁnite set. . . . jm ≤ N . to three distinct models of (Ω. k=1 k=1 implies P (A) ≤ P (B). and write P = δz . . ωj2 . We call P the Dirac mass concentrated at the point z. B. . and suppose we are given numbers 0 ≤ pj ≤ 1 for j = 1. and now deﬁne P (B) := 1 0 if z ∈ B if z ∈ B / for sets B ∈ B. (i) A set A ∈ U is called an event. Assume that f is a nonnegative. N . Suppose instead we ﬁx a point z ∈ Rn . P ) is a probability space. (iii) A property which is true except for an event of probability zero is said to hold almost surely (usually abbreviated “a. 9 . Example 3. then A⊆B ∞ Ak ) = ∞ P (Ak ). Let Ω = {ω1 . such that Rn f dx = 1. B. The reader should convince himself or herself that the three “solutions” to Bertrand’s paradox discussed above represent three distinct interpretations of the phrase “at random”. A2 . A probability space is the proper setting for mathematical probability theory. and P is a probability measure on U. . The smallest σ-algebra containing all the open subsets of Rn is called the Borel σ-algebra.

) × [0. π RANDOM VARIABLES. values of h values of θ P (B) = 2·area of B for each π B ∈ U. DEFINITION. 2 We denote by A the event that the needle hits a horizontal line. A mapping X : Ω → Rn is called an n-dimensional random variable if for each B ∈ B. Consequently A = {(θ. We equivalently say that X is U-measurable.Example 4 (Buffon’s needle problem). we have X−1 (B) ∈ U. P ). Let us next take π Ω = [0. h) ∈ Ω | h ≤ sin θ }. 1]. let h = distance from the center of needle to nearest line. U. P ) be a probability space. 2 2 π 2 1 1 and so P (A) = 2(area of A) = π 02 2 sin θ dθ = π . the values of which we can observe. What is the probability that it hits one of the parallel lines? The ﬁrst issue is to ﬁnd some appropriate probability space (Ω. For this. which is the smallest σ-algebra of subsets of Rn containing all open sets. which is nevertheless not “directly observable”. Remember from Example 2 above that B denotes the collection of Borel subsets of Rn . We can now check h that this happens provided sin θ ≤ 1 . We are therefore interested in introducing mappings X from Ω to Rn . U. 10 . We may henceforth informally just think of B as containing all the “nice. We can think of the probability space as being an essential mathematical construct. wellbehaved” subsets of Rn . The plane is ruled by parallel lines 2 inches apart and a 1-inch long needle is dropped at random on the plane. Let (Ω. U = Borel subsets of Ω. θ = angle (≤ π ) that the needle makes with the horizontal. up to translations and reﬂection. 2 θ h needle These fully determine the position of the needle.

am are real numbers. χA (ω) := 1 0 if ω ∈ A if ω ∈ A. then m X= i=1 ai χAi is a random variable. . if A1 . . Let X : Ω → Rn be a random variable. if Y = Φ(X) for some reasonable function Φ. Check that {X−1 (B) | B ∈ B} is a σ-algebra. comments. / is a random variable. called a simple function. IMPORTANT REMARK. This is the smallest sub-σalgebra of U with respect to which X is measurable. . More generally. A2 . LEMMA. We also denote P (X−1 (B)) as “P (X ∈ B)”. In particular. Conversely. . called the σ-algebra generated by X. if a random variable Y is a function of X. the probability that X is in B. We usually write “X” and not “X(ω)”. we in principle know also Y (ω) = Φ(X(ω)). that is. suppose Y : Ω → R is U(X)-measurable. 11 . We will also use without further comment various standard facts from measure theory. STOCHASTIC PROCESSES. Example 1. In these notes we will usually use capital letters to denote random variables. . Consequently if we know the value X(ω).Notation. This follows the custom within probability theory of mostly not displaying the dependence of random variables on the sample point ω ∈ Ω. We introduce next random variables depending upon time. in probabilistic terms. Then there exists a function Φ such that Y = Φ(X). and i=1 a1 . Let A ∈ U. Proof. a2 . Boldface usually means a vector-valued mapping. with Ω = ∪m Ai . the σ-algebra U(X) can be interpreted as “containing all relevant information” about the random variable X. for instance that sums and products of random variables are random variables. Example 2. Y is in fact a function of X. Then the indicator function of A. . then Y is U(X)-measurable. . Hence if Y is U(X)-measurable. Then U(X) := {X−1 (B) | B ∈ B} is a σ-algebra. It is essential to understand that. although we may have no practical way to construct Φ. . clearly it is the smallest σ-algebra with respect to which X is measurable. Am ∈ U.

The idea is that if we run an experiment and observe the random values of X(·) as time evolves. we are in fact looking at a sample path {X(t.Y simple Ω Y dP. the mapping t → X(t. . U. X 2 . we deﬁne X dP := Ω sup Y ≤X.DEFINITIONS. If (Ω. . . 0) and X − = max(−X. Ω X 2 dP. X(t. Next.ω1) time X(t. · · · .ω2) Two sample paths of a stochastic process B. X = (X 1 . suppose X : Ω → Rn is a vector-valued random variable. VARIANCE Integration with respect to a measure. . P ) is a probability space and k X = i=1 ai χAi is a real-valued simple random variable. X n ). 12 . we deﬁne the integral of X by k X dP := Ω i=1 ai P (Ai ). If next X is a nonnegative random variable. (ii) For each point ω ∈ Ω. Finally if X : Ω → R is a random variable. ω) | t ≥ 0} for some ﬁxed ω ∈ Ω. (i) A collection {X(t) | t ≥ 0} of random variables is called a stochastic process. 0). Then we write X dP = Ω Ω X 1 dP. Ω provided at least one of the integrals on the right is ﬁnite. we write X dP := Ω Ω X + dP − X − dP. so that X = X + − X − . ω) is the corresponding sample path. X n dP Ω . we will in general observe a diﬀerent sample path. If we rerun the experiment. EXPECTED VALUE. Here X + = max(X.

then 1 P (|X| ≥ λ) ≤ p E(|X|p ) for all λ > 0. i = 1. y = (y1 .. . Xm ≤ xm ) 13 for all xi ∈ Rn .. Xm : Ω → Rn are random variables. . . . . If X is a random variable and 1 ≤ p < ∞. m. Observe that V (X) = E(|X − E(X)|2 ) = E(|X|2 ) − |E(X)|2 . . xn ) ∈ Rn . . DISTRIBUTION FUNCTIONS Let (Ω.Xm : (Rn )m → [0. P ) be a probability space and suppose X : Ω → Rn is a random variable. . . n. . FX1 . Notation. yn ) ∈ Rn .Xm (x1 . U. . LEMMA (Chebyshev’s inequality). .. . . .. . . We call V (X) := Ω |X − E(X)|2 dP the variance of X. We have E(|X|p ) = Ω |X|p dP ≥ {|X|≥λ} |X|p dP ≥ λp P (|X| ≥ λ).. . . DEFINITION. DEFINITIONS. . . xm ) := P (X1 ≤ x1 . DEFINITION.. 1]. (i) The distribution function of X is the function FX : Rn → [0.. C. . . . their joint distribution function is FX1 .We will assume without further comment the usual rules for these integrals. . . Let x = (x1 . We call E(X) := Ω X dP the expected value (or mean value) of X. λ Proof.. . Then x≤y means xi ≤ yi for i = 1. . . 1] deﬁned by FX (x) := P (X ≤ x) for all x ∈ Rn (ii) If X1 . where | · | denotes the Euclidean norm.

we say X has a Gaussian (or normal) distribution. Example 2. It follows then that (1) P (X ∈ B) = B f (x) dx for all B ∈ B This formula is important as the expression on the right hand side is an ordinary integral. . with mean m and variance σ 2 . with mean m and covariance matrix C. If X : Ω → Rn has density f (x) = −1 1 1 e− 2 (x−m)·C (x−m) ((2π)n det C)1/2 (x ∈ Rn ) for some m ∈ Rn and some positive deﬁnite. Suppose X : Ω → Rn is a random variable and F = FX its distribution function. In this case let us write X is an N (m. σ 2 ) random variable. . If there exists a nonnegative. 14 . . . symmetric matrix C. dy1 . . integrable function f : Rn → R such that x1 xn F (x) = F (x1 . yn ) dyn . Example 1. C) random variable. and can often be explicitly calculated. we say X has a Gaussian (or normal) distribution. . −∞ then f is called the density function for X.x X Rn Ω DEFINITION. . If X : Ω → R has density f (x) = √ 1 2πσ 2 e− |x−m|2 2σ 2 (x ∈ R). . . . We then write X is an N (m. xn ) = −∞ ··· f (y1 .

and assume that its distribution function F = FX has the density f . Proof. Suppose ﬁrst g is a simple function on Rn : m g= i=1 bi χBi (Bi ∈ B). since as mentioned before the probability space (Ω. and σ 2 the variance. Example. Hence we can compute E(X). Indeed. and Y = g(X) is integrable. Let X : Ω → Rn be a random variable. This is an important observation. Then E(Y ) = Rn g(x)f (x) dx. IMPORTANT REMARK. If X is N (m. E(X) = Rn xf (x) dx and V (X) = Rn |x − E(X)|2 f (x) dx. etc. by approximation. V (X). . But also g(x)f (x) dx = Rn i=1 m bi Bi f (x) dx = i=1 bi P (X ∈ Bi ) by (1). Suppose g : Rn → R. then E(X) = √ and V (X) = √ 1 2πσ 2 1 2πσ 2 ∞ −∞ ∞ −∞ xe− (x−m)2 2σ 2 dx = m (x − m)2 e− 15 (x−m)2 2σ 2 dx = σ 2 . In particular. Consequently the formula holds for all simple functions g and. in terms of integrals over Rn . all quantities of interest in probability theory can be computed in Rn in terms of the density f . U. Therefore m is indeed the mean. σ 2 ). P ) is “unobservable”: all that we “see” are the values X takes on in Rn . it holds therefore for general functions g.LEMMA. m Then E(g(X)) = m bi i=1 Ω χBi (X) dP = i=1 m bi P (X ∈ Bi ).

even if P (B) = 0: DEFINITION. B ∈ U be two events. P (B) This observation motivates the following DEFINITION. the reader may wish to check that if A and B are independent events. Two events A and B are called independent if P (A ∩ B) = P (A)P (B). and let A. so that ˜ ˜ ˜ P (Ω) = 1. Then the probability that ω lies in A is P (A ∩ B) = P (A∩B) . This concept and its ramiﬁcations are the hallmarks of probability theory. P ) be a probability space. Let (Ω. since presumably any information that the event B has occurred is irrelevant in determining the probability that A has occurred. We take this for the deﬁnition. What then is the probability that ω ∈ A also? A ω B Ω Since we know ω ∈ B. Think this way.D. the probability of A. We write P (A | B) := P (A ∩ B) P (B) if P (B) > 0. given B. We want to ﬁnd a reasonable deﬁnition of P (A | B). Ac and B c are independent. Now what should it mean to say “A and B are independent”? This should mean P (A | B) = P (A). U. U := {C ∩ B | C ∈ U} and P := P (B) . Suppose some point ω ∈ Ω is selected “at random” and we are told ω ∈ B. Likewise. INDEPENDENCE MOTIVATION. To gain some insight. 16 P (A ∩ B) P (B) . with P (B) > 0. Thus P (A) = P (A | B) = and so P (A ∩ B) = P (A)P (B) if P (B) > 0. we can regard B as being a new probability space. P ˜ ˜ ˜ Therefore we can deﬁne Ω := B. then so are Ac and B.

. . . Let X1 . . . Take Ω = [0. . . Let Ui ⊆ U be σ-algebras. . i=1 Example. . . we have P (Ak1 ∩ Ak2 ∩ · · · ∩ Akm ) = P (Ak1 )P (Ak2 ) . . which may be found in Breiman [B]. Xk ∈ Bk ) = P (X1 ∈ B1 )P (X2 ∈ B2 ) · · · P (Xk ∈ Bk ). . P (Akm ). . X2 ∈ B2 . are independent if for all integers k ≥ 2 and all choices of Borel sets B1 . for all choices of e1 . . . To prove this. Then Y := f (X1 . . These are the Rademacher functions. . 2n k even k odd (0 ≤ ω < 1). Xk = ek ) = P (X1 = e1 )P (X2 = e2 ) · · · P (Xk = ek ). Bk ⊆ Rn : P (X1 ∈ B1 . . Let Xi : Ω → Rn be random variables (i = 1. U the Borel subsets of [0. This is equivalent to saying that the σ-algebras {U(Xi )}∞ are independent. Xn ) and Z := g(Xn+1 . we transfer our deﬁnitions to random variables: DEFINITION. An . . Deﬁne for n = 1. These events are independent if for all choices 1 ≤ k1 < k2 < · · · < km . we have P (Ak1 ∩ Ak2 ∩ · · · ∩ Akm ) = P (Ak1 )P (Ak1 ) · · · P (Akm ). . . . . We omit the proof. which we assert are in fact independent random variables.DEFINITION. Lastly. . ek ∈ {−1. . Xn (ω) := 1 −1 if if k 2n k 2n ≤ω< ≤ω< k+1 . be events. 17 . . 2n k+1 . . and P Lebesgue measure. It is important to extend this deﬁnition to σ-algebras: DEFINITION. . . ). . Xn+m ) are independent. . Suppose f : (Rk )n → R and g : (Rk )m → R. it suﬃces to verify P (X1 = e1 . . . 2. This can be checked by showing that both sides are equal to 2−k . . for i = 1. X2 = e2 . . . . . . . . 1). 1}. LEMMA. We say the random variables X1 . . . . . . . Xm+n be independent Rk -valued random variables. . 1). We say that {Ui }∞ i=1 are independent if for all choices of 1 ≤ k1 < k2 < · · · < km and of events Aki ∈ Uki . . Let A1 .

. xm ) = P (X1 ≤ x1 . (2) is equivalent to (3) fX1 . . . i = 1. by (3) = P (X1 ∈ B1 ) · · · P (Xm ∈ Bm ) = P (A1 ) · · · P (Am ). . · · · . i = 1.··· . .··· .×Bm fX1 ···Xm (x1 . Xm are independent. Proof. . m). m. Then k=1 FX1 ···Xm (x1 . . . . Then E(X1 · · · Xm ) = = R Rm x1 · · · xm fX1 ···Xm (x1 . . Proof. If X1 . . Xm ∈ Bm ) = B1 ×. . · · · . xm ) dx1 . . . . . m. . . m. . We prove the converse statement for the case that all the random variables have densities. . . . The random variables X1 . where the functions f are the appropriate densities. Then Ai = X−1 (Bi ) for some i Bi ∈ B. . If the random variables have densities. . . . Assume ﬁrst that {Xk }m are independent. xm x1 fX1 (x1 ) dx1 · · · 18 xm fXm (xm ) dxm R by (3) = E(X1 ) · · · E(Xm ). then E(|X1 · · · Xm |) < ∞ and E(X1 · · · Xm ) = E(X1 ) · · · E(Xm ). Hence P (A1 ∩ · · · ∩ Am ) = P (X1 ∈ B1 . xm ) dx1 · · · dxm fXm (xm ) dxm Bm = B1 fX1 (x1 ) dx1 . xm ) = fX1 (x1 ) · · · fXm (xm ) for all xi ∈ Rn . . . . . .THEOREM. xm ) = FX1 (x1 ) · · · FXm (xm ) for all xi ∈ Rn . . Xm ≤ xm ) = P (X1 ≤ x1 ) · · · P (Xm ≤ xm ) = FX1 (x1 ) · · · FXm (xm ). . . . real-valued random variables.Xm (x1 . . with E(|Xi |) < ∞ (i = 1.. . U(Xm ) are independent σ-algebras. . . Select Ai ∈ U(Xi ). . Xm : Ω → Rn are independent if and only if (2) FX1 . . . . .. Suppose that each Xi is bounded and has a density. . Therefore U(X1 ). . 2. . 1. One of the most important properties of independent random variables is this: THEOREM. . .Xm (x1 . . . . i = 1. .

) ≤ P ∞ n=1 ∞ m=n i. real-valued random variables. We illustrate a typical use of the Borel–Cantelli Lemma. .o. BOREL–CANTELLI LEMMA. where we used independence in the next last step. An . If X1 . .o. Am ≤ ∞ m=n P (Am ). DEFINITION. Then E(X1 + X2 ) = m1 + m2 and V (X1 + X2 ) = Ω (X1 + X2 − (m1 + m2 ))2 dP (X1 − m1 )2 dP + +2 Ω Ω = Ω (X2 − m2 )2 dP (X1 − m1 )(X2 − m2 ) dP =0 =0 = V (X1 ) + V (X2 ) + 2E(X1 − m1 )E(X2 − m2 ). ∞ n=1 P (An ) < ∞.THEOREM. . Let m1 := EX1 .) = 0. and so for each n is called “An inﬁnitely often”. By deﬁnition An i. . An . A sequence of random variables {Xk }∞ deﬁned on some probability space k=1 converges in probability to a random variable X. m). abbreviated “An i. . the case m = 2 holding as follows. .o. P (Am ) < ∞. . . Use induction. BOREL–CANTELLI LEMMA We introduce next a simple and very useful way to check if some sequence A1 . . Let A1 . = P (An i. be events in a probability space. E. m2 := E(X2 ). . provided k→∞ lim P (|Xk − X| > ǫ) = 0 19 for each ǫ > 0. . If Proof. Proof. . Xm are independent.”. . . Then the event ∞ ∞ n=1 m=n Am = {ω ∈ Ω | ω belongs to inﬁnitely many of the An }. then V (X1 + · · · + Xm ) = V (X1 ) + · · · + V (Xm ). The limit of the left-hand side is zero as n → ∞ because APPLICATION.o. . . . . with V (Xi ) < ∞ (i = 1. . of events “occurs inﬁnitely often”. . . . . then P (An ∞ m=n Am .

kj → ∞. ). Xm are independent random variables. then φX (λ) = eimλ− λ2 σ 2 2 (λ ∈ Rn ) (λ ∈ R). LEMMA. .) = 0. then for each λ ∈ Rn φX1 +···+Xm (λ) = φX1 (λ) . |Xkj (ω) − X(ω)| ≤ 1 provided j ≥ J. If the real-valued random variable X is N (m. (Here Im(z) means the imaginary part of the complex number z. σ = 1 and calculate 1 x2 e 2 eiλx √ e− 2 dx = √ φX (λ) = 2π 2π −∞ ∞ ∞ −∞ e− (x−iλ)2 2 dx. let us suppose that m = 0. We move the path of integration in the complex plane from the line {Im(z) = −λ} √ x2 ∞ to the real axis. 1. . For each positive integer j we select kj so large that P (|Xkj − X| > 1 1 ) ≤ 2. F. Let Aj := {|Xkj −X| > 1 }. .o. . −λ2 To see this. Since < ∞. . σ 2 ). DEFINITION. which will later provide us with a useful means to identify normal random variables. . . φXm (λ). . . . Proof. Then φX (λ) := E(eiλ·X ) is the characteristic function of X. . (k = 0. kj−1 < kj < . CHARACTERISTIC FUNCTIONS It is convenient to introduce next a clever integral transform.) Hence φX (λ) = e− λ2 2 . . (ii) If X is a real-valued random variable. for some index J depending on j ω. Therefore for almost all sample points ω. then there exists a subsequence {Xkj }∞ ⊂ j=1 {Xk }∞ such that k=1 Xkj (ω) → X(ω) for almost every ω. Example. φ(k) (0) = ik E(X k ) (iii) If X and Y are random variables and φX (λ) = φY (λ) 20 for all λ. j j 1 and also . If Xk → X in probability. . (i) If X1 . and recall that −∞ e− 2 dx = 2π.THEOREM. j j2 the Borel–Cantelli Lemma implies P (Aj i. . . Let X be an Rn -valued random variable.

= E(eiλ·X1 eiλ·X2 · · · eiλ·Xm ) by independence 2. σ2 ). each of which “has the same probabilistic information as X”: DEFINITION. To see this. See Breiman [B] for the proof of (iii). 3. If we additionally assume that the random variables X1 . . . Proof. and so φ′ (0) = iE(X). . . Example. . Assertion (iii) says the characteristic function of X determines the distribution of X. G. Y is N (m2 . σ1 + σ2 ). . which records the outcome of some sort of random experiment. are independent. Xn . .then FX (x) = FY (x) for all x. . If X and Y are independent. real-valued random variables. The formulas in (ii) for k = 2. and if 2 2 X is N (m1 . for all x. The idea is this. we can regard this sequence as a model for repeated and independent runs of the experiment. Suppose we are given a probability space and on it a real–valued random variable X. just calculate φX+Y (λ) = φX (λ)φY (λ) = eim1 λ− λ2 σ 2 1 2 eim2 λ− λ2 2 λ2 σ 2 2 2 = ei(m1 +m2 )λ− 2 2 (σ1 +σ2 ) . . Let us calculate φX1 +···+Xm (λ) = E(eiλ·(X1 +···+Xm ) ) = E(eiλ·X1 ) · · · E(eiλ·Xm ) = φX1 (λ) . STRONG LAW OF LARGE NUMBERS. follow similarly. More precisely. A sequence X1 . Xn . . . . . . . . We have φ′ (λ) = iE(XeiλX ). . . CENTRAL LIMIT THEOREM This section discusses a mathematical model for “repeated. . 1. . . then 2 2 X + Y is N (m1 + m2 . . . σ1 ). . Xn . imagine 21 . . φXm (λ). . the outcomes of which we can measure. . We can model repetitions of this experiment by introducing a sequence of random variables X1 . of random variables is called identically distributed if FX1 (x) = FX2 (x) = · · · = FXn (x) = . independent experiments”. . .

. . Then n 4 n If i = j. 2. . X2 (ω). . k. we have n 4 n n E Xi i=1 = 4 E(Xi ) +3 2 2 E(Xi Xj ) i=1 i. as we could otherwise consider Xi − m in place of Xi . 1. . . . . . . =0 Consequently. . . We may also assume m = 0. Supposing that the random variables are real–valued entails no loss of generality. We will as well suppose for simplicity that 4 E(Xi ) < ∞ (i = 1. ). Let X1 . THEOREM (Strong Law of Large Numbers). integrable random variables deﬁned on the same probability space. . or l. .l=1 E(XiXj Xk Xl ) = E(Xi )E(Xj Xk Xl ). we can deduce the common expected values of the random variables. . Then P 1 n n i=1 4 2 = nE(X1 ) + 3(n2 − n)(E(X1 ))2 n Xi ≥ ε =P i=1 Xi ≥ εn n 4 ≤ ≤ C 1 . Then P X1 + · · · + Xn =m n→∞ n lim = 1. be a sequence of independent.j. since the Xi are identically distributed. Xn (ω). . .k. Proof.that a “random” sample point ω ∈ Ω is given and we can observe the sequence of values X1 (ω). independence implies E Xi i=1 = E(XiXj Xk Xl ). Now ﬁx ε > 0. . i. . . First we show that with probability one. ε4 n2 22 1 E (εn)4 Xi i=1 . identically distributed. .j=1 i=j ≤ n2 C for some constant C. Write m := E(Xi ) for i = 1. Xn . . What can we infer from these observations? STRONG LAW OF LARGE NUMBERS.

. Suppose the real–valued random variables X1 . k=1 Then P (B) = 0 and n 1 lim Xi (ω) = 0 n→∞ n i=1 for each sample point ω ∈ B. The foregoing says that 1 lim sup n n→∞ n i=1 Xi (ω) ≤ 1 . . Also. LEMMA. q ≥ 0. Then E(X1 + · · · + Xn ) = np V (X1 + · · · + Xn ) = npq. . X1 (ω) + · · · + Xn (ω) → m as n → ∞.o. (X1 − p)2 dP = (1 − p)2 P (X1 = 1) + p2 P (X1 = 0) = q p + p2 q = qp. Let us start with a simple calculation.We used here the Chebyshev inequality. By independence. are independent and identically distributed. . Take ε = k . and probability q = 1 − p of coming up tails. Proof. p + q = 1. By the Borel–Cantelli Lemma. LAPLACE–DE MOIVRE THEOREM. which estimate the “ﬂuctuations” we can expect in this limit. E(X1 ) = V (X1 ) = Ω 2 Ω X1 dP = p and therefore E(X1 + · · · + Xn ) = np. . P 1 n n i=1 Xi ≥ ε i. . which has probability p of coming up heads. . V (X1 + · · · + Xn ) = V (X1 ) + · · · + V (Xn ) = npq. n We turn next to the Laplace–De Moivre Theorem. k except possibly for ω lying in an event Bk . 1 3. Xn . The Strong Law of Large Numbers says that for almost every sample point ω ∈ Ω. with P (Xi = 1) = p P (Xi = 0) = q for p. and its generalization the Central Limit Theorem. therefore. with P (Bk ) = 0. / FLUCTUATIONS. = 0. 23 . Write B := ∪∞ Bk . We can imagine these random variables as modeling for example repeated tosses of a biased coin.

be independent. Sn − np Sn − E(Sn ) . . have a distribution which tends to the Gaussian N (0. Xn be the independent. n→∞ lim P Sn − np a≤ √ ≤b npq 1 =√ 2π b e− a x2 2 dx. in accord with the Strong Law of Large 2 Numbers. real–valued random variables in the preceding Lemma. = √ npq V (Sn )1/2 Hence the Laplace–De Moivre Theorem says that the sums Sn . real-valued random variables with E(Xi) = m. . Interpretation of the Laplace–De Moivre Theorem. . . Let X1 . In view of the Lemma. 1) as n → ∞. Xn . properly renormalized. n then for large n 2b √ n 2b − √n 1 n P −b ≤ Sn − ≤ b ≈ √ 2 2π e− x2 2 dx →0 as n→∞. for i = 1. Suppose a > 0. . 1 Thus for almost every ω. CENTRAL LIMIT THEOREM. n Sn (ω) → 1 . . Set Then for all −∞ < a < b < +∞ (1) n→∞ Sn := X1 + · · · + Xn . Deﬁne the sums Sn := X1 + · · · + Xn . . We now generalize the Laplace–De Moivre Theorem: THEOREM (Central Limit Theorem). then √ √ a x2 1 n a n a n =√ ≤ Sn − ≤ lim P − e− 2 dx. . . identically distributed. V (Xi ) = σ 2 > 0. . but Sn (ω) − n “ﬂuctuates” with probability 1 to exceed any ﬁnite bound 2 b. 1 Consider in particular the situation p = q = 2 . A proof is in Appendix A. . . . . identically distributed. Let X1 .THEOREM (Laplace–De Moivre). . 1 =√ 2π b lim P a≤ Sn − nm √ ≤b nσ 24 e− a x2 2 dx. n→∞ 2 2 2 2π −a If we ﬁx b > 0 and write a = 2b √ . . Then for all −∞ < a < b < +∞.

Thus if P (B) > 0. with P = P (B) . (A∩B) given B. We will later invoke this assertion to motivate our requirement that Brownian motion be normally distributed for each time t ≥ 0. 1) random variable. the expected value of the random variable X. For simplicity assume m = 0. φ Xn (λ) = S X √ n n n φX1 λ √ n n for λ ∈ R. . but for any sequence of independent. σ = 1. Then φ √n (λ) = φ √1 (λ) . We earlier decided to deﬁne P (A | B). How then should we deﬁne E(X | B). we should set E(X | B) = mean value of X over B 1 X dP. What is a reasonable deﬁnition of E(X | Y ). It turns out that this convergence of the characteristic functions implies the limit (1): see Breiman [B] for more. H. Outline of Proof. But e− 2 is the characteristic function of an N (0. = P (B) B Next we pose a more interesting question. 25 . . φ′ (0) = iE(X1 ) = 0. provided P (B) > 0. since we can always rescale to this case. 2 with φ(0) = 1. Now φ = φX1 satisﬁes 1 φ(µ) = φ(0) + φ′ (0)µ + φ′′ (0)µ2 + o(µ2 ) 2 as µ → 0. λ2 1− +o 2n → e− λ2 2 for all λ. given the event B? Remember that P ˜ we can think of B as the new probability space. identically distributed random variables with ﬁnite variance. the probability of A. as n → ∞. because the random variables are independent and identically distributed. Consequently our λ setting µ = √n gives φX1 and so φ √n (λ) = S n λ2 λ √ n =1− λ2 +o 2n λ2 n λ2 n n . φ′′ (0) = −E(X1 ) = −1. to be P P (B) .Thus the conclusion of the Laplace–De Moivre Theorem holds not only for the 0– or 1–valued random variable considered before. CONDITIONAL EXPECTATION MOTIVATION.

FIRST APPROACH TO CONDITIONAL EXPECTATION. Then E(X | Y ) is any U(Y )measurable random variable such that X dP = A A E(X | Y ) dP for all A ∈ U(Y ). Let us take these properties as the deﬁnition in the general case: DEFINITION. What is our best guess of X. Am contains ω. . we should take 1 on A1 P (A1 ) A1 X dP 1 on A2 P (A2 ) A2 X dP E(X | Y ) := . for which we provide two introductory discussions. 1 on Am . and only this. Assume we are given a probability space (Ω. Y = i=1 ai χAi . Next.the expected value of the random variable X. am and disjoint events A1 . and not a constant. notice that it is not really the values of Y that are important. A2 . P ). what is our best guess as to the value X(ω)? This turns out to be a subtle. . whose union is Ω. we can tell which event A1 . That is. . a2 . Example. U. and so on A1 a1 a 2 on A2 Y = . A2 . for distinct real numbers a1 . . am on Am . . This. . . . Let Y be a random variable. P (Am ) Am X dP We note for this example that • E(X | Y ) is a random variable. our best estimate for X should then be the average value of X over each appropriate event. let X be any other real–valued random variable on Ω. but rather just the σ-algebra it generates. Am . That is. each of positive probability. given Y ? Think about the problem this way: if we know the value of Y (ω). given another random variable Y ? In other words if “chance” selects a sample point ω ∈ Ω and all we know about ω is the value Y (ω). . . but extremely important issue. • E(X | Y ) is U(Y )-measurable. . known. This motivates the next 26 . on which is m deﬁned a simple random variable Y . . . . . Finally. • A XdP = A E(X | Y ) dP for all A ∈ U(Y ). . We start with an example.

that is. y∈V It is not particularly diﬃcult to show that. has various additional nice properties. Interpretation. For this take any other vector w ∈ V . We call v the projection of x onto V . we see that the function i(·) has a minimum at τ = 0. The least squares problem asks us to ﬁnd a vector z ∈ V so that |z − x| = min |y − x|. We will later see that the conditional expectation E(X | V). (iii) E(X) = E(X | W). THEOREM. Then for each σ-algebra V ⊂ U. (8) x·w =z·w for all w ∈ V. from which we intend to build an estimate of the random variable X. the conditional expectation E(X | V) exists and is unique up to V-measurable sets of probability zero. and (ii) requires that our estimate be consistent with X. Hence 0 = i′ (0) = 2(z − x) · w. If X : Ω → Rn is an integrable random variable. SECOND APPROACH TO CONDITIONAL EXPECTATION.DEFINITION. Let (Ω. Let X be an integrable random variable. Now we want to ﬁnd formula characterizing z. An elegant alternative approach to conditional expectations is based upon projections onto closed subspaces. We can understand E(X | V) as follows. and is motivated by this example: Least squares method. Since z + τ w ∈ V for all τ . at least as regards integration over events in V. We can check without diﬃculty that (i) E(X | Y ) = E(X | U(Y )). (7) z = projV (x). so deﬁned. which uses a few advanced concepts from measure theory. Ω} is the trivial σ-algebra. there exists a unique vector z ∈ V solving this minimization problem. Condition (i) in the deﬁnition requires that E(X | V) be constructed from the information in V. 27 . V ⊆ U. Suppose we are given a vector x ∈ Rn . (ii) E(E(X | V)) = E(X). and (ii) A X dP = A E(X | V) dP for all A ∈ V. U. we deﬁne E(X | V) to be any random variable on Ω such that (i) E(X | V) is V-measurable. Consider for the moment Rn and suppose that V is a proper subspace. Remark. We are given the “information” available in a σ-algebra V. Deﬁne then i(τ ) := |z + τ w − x|2 . P ) be a probability space and suppose V is a σalgebra. where W = {∅. We omit the proof. given x.

Take in particular W = χA for any set A ∈ V. and if X. it follows that X dP = A A Z dP 28 for all A ∈ V. Next. Consequently if X ∈ L2 (Ω). U–measurable random variables Y . which consists of all real-valued. Let us take the linear space L2 (Ω) = L2 (Ω. In view of the deﬁnition of the inner product. W ) = (Z.x z=projV(x) 0 V The geometric interpretation is that the “error” x − z is perpendicular to the subspace V . we return now to conditional expectation. Almost exactly as we established (8) above. by analogy with (7) in the ﬁnite dimensional case. We call ||Y || the norm of Y . Y ∈ L2 (Ω). such that ||Y || := Y dP Ω 2 1 2 < ∞. the space of square–integrable random variables that are V–measurable. we can likewise show (X. W ) for all W ∈ V. Consider then V := L2 (Ω. V). take as before V to be a σ-algebra contained in U. Y ) := Ω XY dP = E(XY ). This is a closed subspace of L2 (Ω). we deﬁne their inner product to be (X. we can deﬁne its projection (9) Z = projV (X). U). Projection of random variables. . Motivated by the example above.

Since Z ∈ V is V-measurable, we see that Z is in fact E(X | V), as deﬁned in the earlier discussion. That is, E(X | V) = projV (X). We could therefore alternatively take the last identity as a deﬁnition of conditional expectation. This point of view also makes it clear that Z = E(X | V) solves the least squares problem: ||Z − X|| = min ||Y − X||;

Y ∈V

and so E(X | V) can be interpreted as that V-measurable random variable which is the best least squares approximation of the random variable X. The two introductory discussions now completed, we turn next to examining conditional expectation more closely. THEOREM (Properties of conditional expectation). (i) If X is V-measurable, then E(X | V) = X a.s. (ii) If a, b are constants, E(aX + bY | V) = aE(X | V) + bE(Y | V) a.s. (iii) If X is V-measurable and XY is integrable, then E(XY | V) = XE(Y | V) a.s. (iv) If X is independent of V, then E(X | V) = E(X) a.s. (v) If W ⊆ V, we have E(X | W) = E(E(X | V) | W) = E(E(X | W) | V) a.s. (vi) The inequality X ≤ Y a.s. implies E(X | V) ≤ E(Y | V) a.s. Proof. 1. Statement (i) is obvious, and (ii) is easy to check 2. By uniqueness a.s. of E(XY | V), it is enough in proving (iii) to show (10)

A

XE(Y | V) dP =

m i=1 bi χBi ,

XY dP

A

for all A ∈ V.

First suppose X =

**where Bi ∈ V for i = 1, . . . , m. Then
**

m

A

XE(Y | V) dP =

bi

i=1 m A∩Bi ∈V

E(Y | V) dP

=

i=1

bi

A∩Bi

Y dP =

A

XY dP.

This proves (10) if X is a simple function. The general case follows by approximation. 29

**3. To show (iv), it suﬃces to prove us compute: X dP =
**

A Ω

A

E(X) dP =

A

**X dP for all A ∈ V. Let E(X) dP,
**

A

χA X dP = E(χA X) = E(X)P (A) =

**the third equality owing to independence. 4. Assume W ⊆ V and let A ∈ W. Then
**

A

E(E(X | V) | W) dP =

A

E(X | V) dP =

X dP,

A

since A ∈ W ⊆ V. Thus E(X | W) = E(E(X | V) | W) a.s. Furthermore, assertion (i) implies that E(E(X | W) | V) = E(X | W), since E(X | W) is W-measurable and so also V-measurable. This establishes assertion (v). 5. Finally, suppose X ≤ Y , and note that

A

E(Y | V) − E(X | V) dP = =

A

E(Y − X | V) dP Y − X dP ≥ 0

A

for all A ∈ V. Take A := {E(Y | V) − E(X | V) ≤ 0}. This event lies in V, and we deduce from the previous inequality that P (A) = 0. LEMMA (Conditional Jensen’s Inequality). Suppose Φ : R → R is convex, with E(|Φ(X)|) < ∞. Then Φ(E(X | V)) ≤ E(Φ(X) | V). We leave the proof as an exercise. I. MARTINGALES MOTIVATION. Suppose Y1 , Y2 , . . . are independent real-valued random variables, with E(Yi ) = 0 (i = 1, 2, . . . ). Deﬁne the sum Sn := Y1 + · · · + Yn . What is our best guess of Sn+k , given the values of S1 , . . . , Sn ? The answer is E(Sn+k | S1 , . . . , Sn ) = E(Y1 + · · · + Yn | S1 , . . . , Sn ) (11) + E(Yn+1 + · · · + Yn+k | S1 , . . . , Sn ) = Y1 + · · · + Yn + E(Yn+1 + · · · + Yn+k ) = Sn .

=0

Thus the best estimate of the “future value” of Sn+k , given the history up to time n, is just Sn . If we interpret Yi as the payoﬀ of a “fair” gambling game at time i, and therefore Sn as the total winnings at time n, the calculation above says that at any time one’s 30

future expected winnings, given the winnings to date, is just the current amount of money. So the formula (11) characterizes a “fair” game. We incorporate these ideas into a formal deﬁnition: DEFINITION. Let X1 , . . . , Xn , . . . be a sequence of real-valued random variables, with E(|Xi |) < ∞ (i = 1, 2, . . . ). If Xk = E(Xj | X1 , . . . , Xk ) we call {Xi }∞ a (discrete) martingale. i=1 DEFINITION. Let X(·) be a real–valued stochastic process. Then U(t) := U(X(s) | 0 ≤ s ≤ t), the σ-algebra generated by the random variables X(s) for 0 ≤ s ≤ t, is called the history of the process until (and including) time t ≥ 0. DEFINITIONS. Let X(·) be a stochastic process, such that E(|X(t)|) < ∞ for all t ≥ 0. (i) If X(s) = E(X(t) | U(s)) a.s. for all t ≥ s ≥ 0, then X(·) is called a martingale. (ii) If X(s) ≤ E(X(t) | U(s)) a.s. for all t ≥ s ≥ 0, a.s. for all j ≥ k,

X(·) is a submartingale.

Example. Let W (·) be a 1-dimensional Wiener process, as deﬁned later in Chapter 3. Then W (·) is a martingale. To see this, write W(t) := U(W (s)| 0 ≤ s ≤ t), and let t ≥ s. Then E(W (t) | W(s)) = E(W (t) − W (s) | W(s)) + E(W (s) | W(s)) = E(W (t) − W (s)) + W (s) = W (s) a.s. (The reader should refer back to this calculation after reading Chapter 3.) LEMMA. Suppose X(·) is a real-valued martingale and Φ : R → R is convex. Then if E(|Φ(X(t))|) < ∞ for all t ≥ 0, Φ(X(·)) is a submartingale. We omit the proof, which uses Jensen’s inequality. Martingales are important in probability theory mainly because they admit the following powerful estimates: 31

Next choose a ﬁner and ﬁner partition of [0. THEOREM (Martingale inequalities). . . . p (ii) If X(·) is a martingale and 1 < p < ∞. .s. Notice that (i) is a generalization of the Chebyshev inequality. 32 . then n=1 E 1≤k≤n max |Xk |p ≤ for all n = 1. Choose λ > 0. . t > 0 and select 0 = t0 < t1 < · · · < tn = t. (i) If {Xn }∞ is a submartingale. t ≥ 0. We can also extend these estimates to continuous–time martingales. (i) If X(·) is a submartingale. and λ > 0. t] and pass to limits. . The proof of assertion (ii) is similar. We check that {X(ti )}n is a martingale and apply the discrete martingale i=1 inequality. Let X(·) be a stochastic process with continuous sample paths a. (ii) If {Xn }∞ is a martingale and 1 < p < ∞. then n=1 P 1≤k≤n max Xk ≥ λ ≤ 1 + E(Xn ) λ for all n = 1.THEOREM (Discrete martingale inequalities). p p−1 p E(|Xn |p ) A proof is provided in Appendix B. then E 0≤s≤t max |X(s)| p ≤ E(|X(t)|p). then P 0≤s≤t max X(s) ≥ λ ≤ 1 E(X(t)+) λ p p−1 for all λ > 0. Outline of Proof. .

CHAPTER 3: BROWNIAN MOTION AND “WHITE NOISE” A. B. C. D. Motivation and deﬁnitions Construction of Brownian motion Sample paths Markov property A. MOTIVATION AND DEFINITIONS SOME HISTORY. R. Brown in 1826–27 observed the irregular motion of pollen particles suspended in water. He and others noted that • the path of a given particle is very irregular, having a tangent at no point, and • the motions of two distinct particles appear to be independent. In 1900 L. Bachelier attempted to describe ﬂuctuations in stock prices mathematically and essentially discovered ﬁrst certain results later rederived and extended by A. Einstein in 1905. Einstein studied the Brownian phenomena this way. Let us consider a long, thin tube ﬁlled with clear water, into which we inject at time t = 0 a unit amount of ink, at the location x = 0. Now let f (x, t) denote the density of ink particles at position x ∈ R and time t ≥ 0. Initially we have f (x, 0) = δ0 , the unit mass at 0. Next, suppose that the probability density of the event that an ink particle moves from x to x + y in (small) time τ is ρ(τ, y). Then f (x, t + τ ) = (1) =

∞ −∞ ∞ −∞

f (x − y, t)ρ(τ, y) dy 1 f − fx y + fxx y 2 + . . . 2

∞

ρ(τ, y) dy.

**But since ρ is a probability density, −∞ ρ dy = 1; whereas ρ(τ, −y) = ρ(τ, y) by ∞ ∞ symmetry. Consequently −∞ yρ dy = 0. We further assume that −∞ y 2 ρ dy, the variance of ρ, is linear in τ :
**

∞ −∞

y 2 ρ dy = Dτ, D > 0.

We insert these identities into (1), thereby to obtain f (x, t + τ ) − f (x, t) Dfxx (x, t) = {+ higher order terms}. τ 2 Sending now τ → 0, we discover ft = D fxx 2 33

This is the diﬀusion equation, also known as the heat equation. This partial diﬀerential equation, with the initial condition f (x, 0) = δ0 , has the solution f (x, t) =

x2 1 e− 2Dt . (2πDt)1/2

This says the probability density at time t is N (0, Dt), for some constant D. In fact, Einstein computed: R = gas constant T = absolute temperature RT D= , where f = friction coeﬃcient NA f NA = Avogadro’s number.

This equation and the observed properties of Brownian motion allowed J. Perrin to compute NA (≈ 6 × 1023 = the number of molecules in a mole) and help to conﬁrm the atomic theory of matter. N. Wiener in the 1920’s (and later) put the theory on a ﬁrm mathematical basis. His ideas are at the heart of the mathematics in §B–D below. RANDOM WALKS. A variant of Einstein’s argument follows. We introduce a 2-dimensional rectangular lattice, comprising the sites {(m∆x, n∆t) | m = 0, ±1, ±2, . . . ; n = 0, 1, 2, . . . }. Consider a particle starting at x = 0 and time t = 0, and at each time n∆t moves to the left an amount ∆x with probability 1/2, to the right an amount ∆x with probability 1/2. Let p(m, n) denote the probability that the particle is at position m∆x at time n∆t. Then p(m, 0) = Also p(m, n + 1) = and hence p(m, n + 1) − p(m, n) = Now assume (∆x)2 =D ∆t for some positive constant D. 1 (p(m + 1, n) − 2p(m, n) + p(m − 1, n)). 2 0 1 m=0 m = 0.

1 1 p(m − 1, n) + p(m + 1, n), 2 2

This implies D p(m, n + 1) − p(m, n) = ∆t 2 p(m + 1, n) − 2p(m, n) + p(m − 1, n) (∆x)2

2

.

Let ∆t → 0, ∆x → 0, m∆x → x, n∆t → t, with (∆x) ≡ D. Then presumably ∆t p(m, n) → f (x, t), which we now interpret as the probability density that particle 34

is at x at time t. The above diﬀerence equation becomes formally in the limit D fxx , 2 and so we arrive at the diﬀusion equation again. ft = MATHEMATICAL JUSTIFICATION. A more careful study of this technique of passing to limits with random walks on a lattice depends upon the Laplace– De Moivre Theorem. As above we assume the particle moves to the left or right a distance ∆x with probability 1/2. Let X(t) denote the position of particle at time t = n∆t (n = 0, . . . ). Deﬁne

n

Sn :=

i=1

Xi ,

where the Xi are independent random variables such that P (Xi = 0) = 1/2 P (Xi = 1) = 1/2 for i = 1, . . . . Then V (Xi ) = 1 . 4 Now Sn is the number of moves to the right by time t = n∆t. Consequently X(t) = Sn ∆x + (n − Sn )(−∆x) = (2Sn − n)∆x. Note also V (X(t)) = (∆x)2 V (2Sn − n) = (∆x)2 n = Again assume

(∆x)2 ∆t

= (∆x)2 4V (Sn ) = (∆x)2 4nV (X1 ) (∆x)2 t. ∆t

= D. Then Sn −

n 2 n 4

X(t) = (2Sn − n)∆x =

√

n∆x =

Sn −

n 2

n 4

√

tD.

**The Laplace–De Moivre Theorem thus implies lim n→∞
**

t=n∆t,

(∆x)2 ∆t

=D

P (a ≤ X(t) ≤ b) = lim

n→∞

Sn − n a b 2 √ ≤ ≤ √ n tD tD 4

√b tD √a tD

1 =√ 2π =√

e−

x2 2

dx

1 2πDt

b a

e− 2Dt dx.

x2

Once again, and rigorously this time, we obtain the N (0, Dt) distribution. 35

what is the probability that a sample path of Brownian motion takes values between ai and bi at time ti for each i = 1. . · · · . . we now introduce Brownian motion. t). P (a ≤ W (t) ≤ b) = √ 1 2πt b a e− 2t dx. random disturbances aﬀecting the position of a moving particle will result in a Gaussian distribution. . for which we take D = 1: DEFINITION. n? a3 a5 a1 a2 b3 a4 b2 b1 t1 t2 t3 b4 t4 b5 t5 36 . . for i = 1. . .s. W (t2 ) − W (t1 ). E(W 2 (t)) = t for each time t ≥ 0. . x2 since W (t) is N (0. Notice in particular that E(W (t)) = 0. From the deﬁnition we know that if W (·) is a Brownian motion.. . then for all t > 0 and a ≤ b. (ii) W (t) − W (s) is N (0.Inspired by all these considerations. since we can expect that any suitably scaled sum of independent. an ≤ W (tn ) ≤ bn )? In other words. Suppose we now choose times 0 < t1 < · · · < tn and real numbers ai ≤ bi . . t − s) for all t ≥ s ≥ 0. A real-valued stochastic process W (·) is called a Brownian motion or Wiener process if (i) W (0) = 0 a. W (tn ) − W (tn−1 ) are independent (“independent increments”). . B. . n. the random variables W (t1 ). The Central Limit Theorem provides some further motivation for our deﬁnition of Brownian motion. What is the joint probability P (a1 ≤ W (t1 ) ≤ b1 . (iii) for all times 0 < t1 < t2 < · · · < tn . CONSTRUCTION OF BROWNIAN MOTION COMPUTATION OF JOINT PROBABILITIES.

. . . xn ) = χ[a1 . Thus the probability that a2 ≤ W (t2 ) ≤ b1 . should equal b2 a2 1 2π(t2 − t1 ) e − |x2 −x1 |2 2(t2 −t1 ) dx2 . . . tn − tn−1 | xn−1 ) dxn . . a1 ≤ x1 ≤ b1 . y1 + · · · + yn ). Let us write Xi := W (ti ). . yn ) := f (y1 . e 2πt In general. we would therefore guess that (2) P (a1 ≤ W (t1 ) ≤ b1 . f (x1 . y2 . . xn )g(x1 . Then for all positive integers n. W (tn )) = ∞ −∞ ··· ∞ −∞ f (x1 . . Our taking gives (2).We can guess the answer as follows. Let W (·) be a one-dimensional Wiener process. . . and given that W (t1 ) = x1 . . dx1 . t2 − t1 | x1 ) . . . g(xn . .b1 ] (x1 ) · · · χ[an . g(xn . We also deﬁne h(y1 . Yi := Xi − Xi−1 for i = 1. . t1 | 0)g(x2. t1 | 0)g(x2. then presumably the process is N (x1 . . . . t | y) := √ 1 − (x−y)2 2t . . a2 ≤ W (t2 ) ≤ b2 ) = for a1 g(x1 . . . t2 −t1 | x1 ) dx2 dx1 g(x. . y1 + y2 . given that W (t1 ) = x1 . t2 − t1 | x1 ) . . 37 . an ≤ W (tn ) ≤ bn ) = b1 bn a1 ··· an g(x1 . all choices of times 0 = t0 < t1 < · · · < tn and each function f : Rn → R. . t2 ]. We know b1 P (a1 ≤ W (t1 ) ≤ b1 ) = a1 e √ − 2t1 x2 1 2πt1 dx1 . tn − tn−1 | xn−1 ) dxn . . we have Ef (W (t1 ). . t1 | 0)g(x2 . Hence it should be that b1 b2 a2 P (a1 ≤ W (t1 ) ≤ b1 . . . dx1 . n. . . .bn ] (xn ) Proof. . . The next assertion conﬁrms and extends this formula. . . THEOREM. t2 − t1 ) on the interval [t1 .

. 38 . .Then Ef (W (t1). and that each Yi is N (0. Thus W (t) = ξ(t) does not really exist. HEURISTICS. Yn ) = ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ··· h(y1 . Then E(W (t)W (s)) = t ∧ s = min{s. . . . . . ti −ti−1 ). BUILDING A ONE-DIMENSIONAL WIENER PROCESS. . . since W (s) is N (0. = s = t ∧ s. g(xn . show that this series converges. We also changed variables using the identities yi = xi − xi−1 for i = 1. . ω) is in fact diﬀerentiable for no time t ≥ 0. . for a. the space of all real-valued. s) and W (t) − W (s) is independent of W (s). . As we will see later however. . square–integrable funtions deﬁned on (0. . . dy1 f (x1 . . . n and x0 = 0. . yn )g(y1 . The Jacobian for this change of variables equals 1. Assume t ≥ s ≥ 0. 1). 1) . t1 | 0)g(y2. t2 − t1 | 0) . ω the sample ˙ path t → W (t. Then E(W (t)W (s)) = E((W (s) + W (t) − W (s))W (s)) = E(W 2 (s)) + E((W (t) − W (s))W (s)) = s + E(W (t) − W (s))E(W (s)) =0 =0 for t ≥ 0. tn − tn−1 | 0)dyn . This procedure is a form of “wavelet analysis”: see Pinsky [P]. and prove then that we have built a Wiener process. . LEMMA. Our method will be to develop a formal expansion of white noise ξ(·) in terms of a cleverly selected orthonormal basis of L2 (0. . . g(yn. n. . . The main issue now is to demonstrate that a Brownian motion actually exists. . . . . We start with an easy lemma. . . W (tn )) = Eh(Y1 . xn )g(x1 .e. s ≥ 0. . tn − tn−1 | xn−1 ) dxn . t2 − t1 | x1 ) . . t1 | 0)g(x2 . dx1 . Remember from Chapter 1 that the formal time-derivative dW (t) ˙ W (t) = = ξ(t) dt is “1-dimensional white noise”. t} Proof. . = ··· For the second equality we recalled that the random variables Yi = W (ti ) −W (ti−1 ) are independent for i = 1. Suppose W (·) is a one-dimensional Brownian motion. We will then integrate the resulting expression in time.

we expect that φh (s) → E(ξ(t)ξ(s)). X(·) is called stationary in the wide sense. ﬁx t > 0. A white noise process ξ(·) is by deﬁnition Gaussian. and set φh (s) := E = W (t + h) − W (t) h W (s + h) − W (s) h 1 [E(W (t + h)W (s + h)) − E(W (t + h)W (s)) − E(W (t)W (s + h)) + E(W (t)W (s))] h2 1 = 2 [((t + h) ∧ (s + h)) − ((t + h) ∧ s) − (t ∧ (s + h)) + (t ∧ s)]. t = s. s ≥ 0. where δ0 is the unit mass at 0. For white noise. If X(·) is any real-valued stochastic process with E(X 2 (t)) < ∞ for all t ≥ 0. s ≥ 0). This gives the formula (3) above. we have 1 f (λ) = 2π ∞ −∞ e−iλt δ0 dt = 39 1 2π for all λ. the autocorrelation function of X(·). ˙ Remark: Why W( · ) = ξ( · ) is called white noise. s) := E(X(t)X(s)) (t. . we deﬁne r(t. A formal “proof” is this. But φh (s) ds = 1.However. If r(t. s) = c(t−s) for some function c : R → R and if E(X(t)) = E(X(s)) for all t. h height = 1/h graph of φh t-h t t+h Then φh (s) → 0 as h → 0. with c(·) = δ0 . and so presumably φh (s) → δ0 (s − t) in some sense. as h → 0. wide sense stationary. In general we deﬁne f (λ) := 1 2π ∞ −∞ e−iλt c(t) dt (λ ∈ R) to be the spectral density of a process X(·). we do have the heuristic formula (3) “E(ξ(t)ξ(s)) = δ0 (s − t)”. In addition. Suppose h > 0.

and we will next make this rigorous by choosing a particularly nice basis. But this is already automatically true as the ψn are orthogonal. ´ LEVY–CIESIELSKI CONSTRUCTION OF BROWNIAN MOTION 40 . 1). Suppose now {ψn }∞ is a complete. all “frequencies” contribute equally in the correlation function. it is reasonable to believe that formula (4) makes sense. just as—by analogy—all colors contribute equally to make white light. n. where ψn = ψn (t) are functions of 0 ≤ t ≤ 1 only and so are not random variables. This seems to be true for any orthonormal basis. 1). We expect that the An are independent and Gaussian. orn=0 thonormal basis of L2 (0. We write formally (4) ξ(t) = ∞ n=0 An ψn (t) (0 ≤ t ≤ 1).Thus the spectral density of ξ(·) is ﬂat. The orthonormality means that 1 ψn (s)ψm (s) ds = δmn 0 for all m. with E(An ) = 0. RANDOM FOURIER SERIES. Similarly. But then the Brownian motion W (·) should be given by t (5) W (t) := 0 ξ(s) ds = ∞ n=0 t An 0 ψn (s) ds. that is. It is easy to see that then 1 An = 0 ξ(t)ψn (t) dt. Therefore to be consistent we must have for m = n 1 1 0 = E(An )E(Am ) = E(AnAm ) = 0 1 0 1 0 1 E(ξ(t)ξ(s))ψn(t)ψm (s) dtds δ0 (s − t)ψn (t)ψm (s) dtds by (3) = 0 = 0 ψn (s)ψm (s) ds. 1 E(A2 ) = n 0 2 ψn (s) ds = 1. Consequently if the An are independent and N (0.

1 −1 for 2 < t ≤ 1. k Note also that for all l > k. . either hk hl = 0 for all t or else hk is constant on the support of hl . . .DEFINITION. Then 0 f dt = 1/2 f dt. 1). . 1 1/2 1 If n = 0. we have 0 f dt = 0. 2. . since 0 = we deduce k+1 2n+1 k 2n+1 1/2 f 0 dt + 1 1/2 f dt = 1 0 f dt. We will prove f = 0 almost everywhere. and both are equal to zero. If 2n ≤ k < 2n+1 . . n = 1. 0 2. 1. h1 (t) := 1 for 0 ≤ t ≤ 1 2 height = 2 n/2 graph of hk width = 2 -(n+1) Graph of a Haar function LEMMA 1. Let n = 1. 1 1 Proof. In this second case 1 0 1 1 hl hk dt = ±2n/2 1 hl dt = 0. We have 0 h2 dt = 2n 2n+1 + 2n+1 = 1. Thus 41 f dt = 0 for all dyadic . The functions {hk (·)}∞ form a complete. 1. we set n/2 n n +1/2 for k−2 ≤ t ≤ k−22n 2 2n n n +1/2 hk (t) := < t ≤ k−2n +1 −2n/2 for k−22n 2 0 otherwise. Continuing in this way. Suppose f ∈ L2 (0. . . 0 f hk dt = 0 for all k = 0. The family {hk (·)}∞ of Haar functions are deﬁned for 0 ≤ k=0 t ≤ 1 as follows: h0 (t) := 1 for 0 ≤ t ≤ 1. r s f dt = 0 for all 0 ≤ k < 2n+1 . orthonormal basis of k=0 L2 (0. 1).

. DEFINITION. LEMMA 2.rationals 0 ≤ s ≤ r ≤ 1. t sk (t) := 0 hk (s) ds (0 ≤ t ≤ 1) is the k th –Schauder function. ∞ k=0 Our goal is to deﬁne W (t) := Ak sk (t) for times 0 ≤ t ≤ 1. For k = 0. Let {ak }∞ be a sequence of real numbers such that k=0 |ak | = O(k δ ) ∞ as k → ∞ for some 0 ≤ δ < 1/2. then 0≤t≤1 n n max |sk (t)| = 2−n/2−1 . 1. 1) random k=0 variables deﬁned on some probability space. N (0. We must ﬁrst of all check whether this series converges. r. 2. and so for all 0 ≤ s ≤ r ≤ 1. But f (r) = d dr r f (t) dt = 0 0 a. 2n 2 Consequently if 2n ≤ k < 2n+1 . where the coeﬃcients {Ak }∞ are independent. 42 . . .e. Then the series ak sk (t) k=0 converges uniformly for 0 ≤ t ≤ 1. height = 2 -(n+2)/2 graph of sk width = 2 -n Graph of a Schauder function The graph of sk is a “tent” of height 2−n/2−1 . lying above the interval [ k−2 . . k−2n +1 ].

. t ≤ 1. log k) ≤ Ce−4 log k = C 1 . we have ∞ s2 2 e− 2 ds P (|Ak | > x) = √ 2π x ∞ s2 2 x2 e− 4 ds ≤ √ e− 4 2π x √ for some constant C. Notice that for 2n ≤ k < 2n+1 . the numbers {Ak (ω)}∞ almost surely satisfy the hypothesis of k=1 Lemma 2 above. Fix ε > 0. (2n+1 )δ 2−n/2−1 < ε n=m LEMMA 3. the Borel–Cantelli Lemma implies P (|Ak | ≥ 4 log k i. = t ∧ s for each 0 ≤ s. |Ak (ω)| = O( log k) as k → ∞.Proof.) = 0. LEMMA 4. Proof. ∞ k=0 sk (s)sk (t) provided k ≥ K. then P (|Ak | ≥ 4 Since 1 k4 ≤ Ce− x2 4 . k = 2. Suppose {Ak }∞ are independent. 1) random variables. ∞ k=2m 2n ≤k<2n+1 max |ak | ≤ C(2n+1 )δ . Therefore for almost every sample point ω. since 0 ≤ δ < 1/2. . N (0. . |ak ||sk (t)| ≤ ∞ n=m bn ∞ 2n ≤k<2n+1 0≤t≤1 max |sk (t)| ≤C for m large enough. we have |Ak (ω)| ≤ 4 log k where K depends on ω. Set x := 4 log k. k4 < ∞. 43 . In particular. . the functions sk (·) have disjoint supports. For all x > 0.o. Set bn := Then for 0 ≤ t ≤ 1. Then k=1 for almost every ω.

We assert as well that W (t) − W (s) is N (0. N (0. The uniform convergence is a consequence of Lemmas 2 and 3. ω. ω) := ∞ k=0 Ak (ω)sk (t) ( 0 ≤ t ≤ 1) converges uniformly in t. ∞ k=0 1 0≤τ ≤s s= 0 φt φs dτ = a k bk . 1) = e− = e− = e− =e λ2 2 λ2 2 λ2 2 ∞ 2 k=0 (sk (t)−sk (s)) ∞ k=0 s2 (t)−2sk (t)sk (s)+s2 (s) k k (t−2s+s) (t−s) by Lemma 4 2 −λ 2 . 44 . Deﬁne for 0 ≤ s ≤ 1. To prove this. To prove W (·) is a Brownian motion. ω. for a. Then the sum W (t.e. φs (τ ) := Then if s ≤ t. Furthermore (i) W (·) is a Brownian motion for 0 ≤ t ≤ 1. Proof.s. 2. ω) is continuous. 1) random varik=0 ables deﬁned on the same probability space. where 1 t 1 ak = 0 φt hk dτ = 0 hk dτ = sk (t). the sample path t → W (t. let us compute E(eiλ(W (t)−W (s)) ) = E(eiλ = = ∞ k=0 ∞ k=0 ∞ k=0 Ak (sk (t)−sk (s)) ) by independence E(eiλAk (sk (t)−sk (s)) ) e− λ2 2 (sk (t)−sk (s))2 since Ak is N (0. and (ii) for a. this implies (ii). bk = 0 φs hk dτ = sk (s). we ﬁrst note that clearly W (0) = 0 a. t − s) for all 0 ≤ s ≤ t ≤ 1. 1. Let {Ak }∞ be a sequence of independent. THEOREM. Lemma 1 implies 1 0 s < τ ≤ 1.Proof.e.

. 45 . . . Ak sk (t1 )+iλ2 ∞ k=0 Ak sk (t2 ) ) E(eiAk [(λ1 −λ2 )sk (t1 )+λ2 sk (t2 )] ) e− 2 ((λ1 −λ2 )sk (t1 )+λ2 sk (t2 )) 1 2 = e− 2 1 1 ∞ 2 2 2 2 k=0 (λ1 −λ2 ) sk (t1 )+2(λ1 −λ2 )λ2 sk (t1 )sk (t2 )+λ2 sk (t2 ) 2 = e− 2 [(λ1 −λ2 ) =e t1 +2(λ1 −λ2 )λ2 t1 +λ2 t2 ] 2 by Lemma 4 − 1 [λ2 t1 +λ2 (t2 −t1 )] 1 2 2 . This is (6) for m = 2. We assemble these inductively by setting W (t) := W (n − 1) + W n (t − (n − 1)) for n − 1 ≤ t ≤ n. The theorem above demonstrated how to build a Brownian motion on 0 ≤ t ≤ 1.. THEOREM (Existence of one-dimensional Brownian motion).By uniqueness of characteristic functions. we will know from uniqueness of characteristic functions that FW (t1 ). deﬁned for all times t ≥ 0. . U. Let (Ω. .. t − s). and the general case follows similarly. . Next we claim for all m = 1. 1). independent random variables {An }∞ are deﬁned. Outline of proof.W (tm )−W (tm−1 ) (x1 . .. . the increment W (t) − W (s) is N (0. W (tm ) − W (tm−1 ) Thus (6) will establish the Theorem. that (6) E(e i m j=1 m λj (W (tj )−W (tj−1 )) )= j=1 e − λ2 j 2 (tj −tj−1 ) . . As we can reindex the N (0. we have E(ei[λ1 W (t1 )+λ2 (W (t2 )−W (t1 ))] ) = E(ei[(λ1 −λ2 )W (t1 )+λ2 W (t2 )] ) = E(ei(λ1 −λ2 ) = = ∞ k=0 ∞ k=0 ∞ k=0 are independent. Then W (·) is a one-dimensional Brownian motion. P ) be a probability space on which countably many N (0. xm ) = FW (t1 ) (x1 ) · · · FW (tm )−W (tm−1 ) (xm ) for all x1 . and for all 0 = t0 < t1 < · · · < tm ≤ 1. t ≥ 0. . 3. . as asserted.. Now in the case m = 2. This proves that W (t1 ). Then there exists a 1-dimensional Brownian n=1 motion W (·) deﬁned for ω ∈ Ω. 2. . 1) random variables to obtain countably many families of countably many random variables. . xm ∈ R. . we can therefore build countably many independent Brownian motions W n (t) for 0 ≤ t ≤ 1. Once this is proved.

n). Then W(·) := (W 1 (·). . . . 2. 1) random variables. . . . .This theorem shows we can construct a Brownian motion deﬁned on any probability space on which there exist countably many independent N (0. tm − tm−1 | xm−1 ) dxm . . . (i) If W(·) is an n-dimensional Brownian motion. . xm )g(x1 . If W(·) is an n-dimensional Wiener process. W n (·)) is an n-dimensional Brownian motion. . . LEMMA. . W k (·) is a 1-dimensional Wiener process. l = 1. . we have (7) Ef (W(t1 ). . . We mostly followed Lamperti [L1] for the foregoing theory. where g(x. . by independence. n). . n. If k = l. . E(W k (t)W l (s)) = E(W k (t))E(W l (s)) = 0. . . DEFINITION. . g(xm . n. . . . . . t ≥ s ≥ 0. . . tI) for each time t > 0.) (ii) E((W k (t) −W k (s))(W l (t) −W l (s))) = (t −s)δkl Proof. 3. Therefore P (W(t) ∈ A) = 1 (2πt)n/2 e− A |x|2 2t dx for each Borel subset A ⊆ Rn . . and each function f : Rn × Rn × · · · Rn → R. . then W(t) is N (0. . . By the arguments above we can build a probability space and on it n independent 1-dimensional Wiener processes W k (·) (k = 1. t1 | 0)g(x2 . . k = 1. . l = 1. (2πt)n/2 46 . . W(tm )) = Rn ··· Rn f (x1 . . for each m = 1. BROWNIAN MOTION IN Rn It is straightforward to extend our deﬁnitions to Brownian motions taking values in Rn . t2 − t1 | x1 ) . . (ii) More generally. . The proof of (ii) is similar. n. . . . then (i) E(W k (t)W l (s)) = (t ∧ s)δkl (k. (k. THEOREM. t | y) := |x−y|2 1 e− 2t . W n (·)) is an n-dimensional Wiener process (or Brownian motion) provided (i) for each k = 1. dx1 . . An Rn -valued stochastic process W(·) = (W 1 (·). and (ii) the σ-algebras W k := U(W k (t) | t ≥ 0) are independent.

. A function f : [0. . . 47 . (i) Let 0 < γ ≤ 1. the random variables W 1 (t). . Let X(·) be a stochastic process with continuous sample paths a. ω) − X(s.Proof. s. . t | 0). . DEFINITIONS. C. Hence the sample path t → X(t. T > 0. For each time t > 0. T ]. the sample path t → W(t. ω)| ≤ K|t − s|γ for all 0 ≤ s. 1. α > 0. C ≥ 0 and for all 0 ≤ t. . T ]. W n (t) are independent. but is nowhere o 2 1 H¨lder continuous with any exponent γ > 2 . t ∈ [0. xn ) ∈ Rn . γ.. (ii) We say f is H¨lder continuous with exponent γ > 0 at the point s if there o exists a constant K such that |f (t) − f (s)| ≤ K|t − s|γ for all t ∈ [0. SAMPLE PATH PROPERTIES In this section we will demonstrate that for almost every ω. T ] → R is called uniformly H¨lder continuous with exponent γ > 0 if there exists a constant K such o that |f (t) − f (s)| ≤ K|t − s|γ for all s. T ]. . . ω) is uniformly H¨lder continuous with expoo nent γ on [0. there exists a constant β K = K(ω. such that E(|X(t) − X(s)|β ) ≤ C|t − s|1+α for constants β. = (2πt)n/2 We prove formula (7) as in the one-dimensional case. A good general theorem to prove H¨lder continuity is this important theorem o of Kolmogorov: THEOREM. Then for each 0 < γ < α .s. t ≤ T. ω) is uniformly H¨lder continuous for each exponent γ < 1 . . xn ) = fW 1 (t) (x1 ) · · · fW n (t) (xn ) x2 x2 1 1 1 n e− 2t · · · e− 2t = (2πt)1/2 (2πt)1/2 2 |x| 1 e− 2t = g(x. we have fW(t) (x1 . CONTINUITY OF SAMPLE PATHS. ω) almost o surely is nowhere diﬀerentiable and is of inﬁnite variation for each time interval. T ) such that |X(t. Consequently for each point x = (x1 . . . and almost every ω. In particular t → W(t.

ω there exists m = m(ω) such that X( 1 i i+1 . Thus for almost all ω and any T > 0. an ndimensional Brownian motion. ω) − X( n .s. take T = 1. .e. ω) ≤ nγ n 2 2 2 48 for 0 ≤ i ≤ 2n − 1 . T ] for each exponent 0 < γ < 1/2. 1. we deduce n=1 P (An ) < ∞. For simplicity. We have for all integers m = 1. β X( i 1 i+1 ) − X( n ) > nγ for some integer 0 ≤ i < 2n . . .o.APPLICATION TO BROWNIAN MOTION. . . E(|W(t) − W(s)|2m ) = |x|2 1 |x|2m e− 2r dx for r = t − s > 0 (2πr)n/2 Rn |y|2 x 1 |y|2m e− 2 dy rm y= √ = n/2 r (2π) Rn m m = Cr = C|t − s| . o Proof of Theorem. whence the Borel– Cantelli Lemma implies P (An i. So for a. . 2. n 2 2 2 P X( i 1 i+1 ) − X( n ) > nγ n 2 2 2 β E i+1 i X( n ) − X( n ) 2 2 1 2n 1+α 1 2nγ −β by Chebyshev’s inequality ≤C 1 2nγ −β = C2 i=0 n(−α+γβ) .) = 0. ω) is uniformly H¨lder continuous on [0. . the sample path t → W(t. α = m − 1. for exponents o 0<γ< 1 1 α = − β 2 2m for all m. The process W(·) is thus H¨lder continuous a. Pick any (8) Now deﬁne for n = 1. Consider W(·). Thus the hypotheses of Kolmogorov’s theorem hold for β = 2m. An := Then P (An ) ≤ ≤ 2n −1 i=0 2n −1 i=0 2n −1 0<γ< α . ∞ Since (8) forces −α + γβ < 0.

.provided n ≥ m. . . − + 1 2 p1 1 2 q1 +···+ −···− 1 2 pk 1 2 ql (n < p1 < · · · < pk ) (n < q1 < · · · < ql ) j i ≤ n ≤ t2 . In view of (9). ω)| ≤ K p 2 r γ Furthermore |X(i/2 − 1/2 n p1 − · · · − 1/2 . and consequently |X(t1 . ω) − X(t2 . 1 . ω) ≤ K 2nγ 2n for 0 ≤ i ≤ 2n − 1 if we select K = K(ω) large enough. ω) − X(i/2 . ω)| ≤ K 2n n n γ ≤ Ktγ . 2. But then we have (9) for all n ≥ 0. ω) − X(i/2 − 1/2 k pr n p1 − · · · − 1/2 pr−1 for r = 1.* We now claim (9) implies the stated H¨lder continuity. ω) − X(j/2n . i−j |X(i/2 . Let t1 . ﬁx o ω ∈ Ω for which (9) holds. ω)| ≤ C|t1 − t2 |γ *Omit the second step in this proof on ﬁrst reading. nγ 2 49 . 0 < t2 − t1 < 1. 2n 2 1 j−i ≤ t < n−1 n 2 2 and so j = i or i + 1. 1] be dyadic rationals. Add up the estimates above. ω) − X(j/2 . n K 2nγ r=1 ∞ 1 2pr γ r=1 1 since pr > n 2rγ C ≤ Ctγ by (10). 1 X( i+1 . Select n ≥ 1 so that (10) We can write t1 = t2 = for t1 ≤ Then i 2n j 2n 2−n ≤ t < 2−(n−1) for t := t2 − t1 . k. ω)| ≤ Ctγ . ω) − X( 2in . To see this. t2 ∈ [0. ω)| ≤ K ≤ = In the same way we deduce |X(t2 . . to discover |X(t1 .

then |W (t. For n ≫ 1. (Dvoretzky. ω) − W ( . (We call X(·) a version of X(·) if P (X(t) = X(t)) = 1 for all t ≥ 0. .e. set i = [ns] + 1 and note that for j = i. t → W(t. ω) − W ( n γ γ j+1 j + s− ≤K s− n n M ≤ γ n 50 . ω) is 2 nowhere H¨lder continuous with exponent γ. .for all dyadic rationals t1 . 1]. ω) ≤ W (s. ω) W ( . Fix an integer N so large that N γ− 1 2 > 1.s. o (ii) In particular. and thus are nowhere o diﬀerentiable. Proof. It suﬃces to consider a one-dimensional o Brownian motion. Remark. for almost every ω. β > 0. 1] and some constant C = C(ω). ω) − W ( n n n j+1 . . Erd¨s. sample path is H¨lder continuous for o ˜ ˜ each exponent 0 < γ < α/β. ω) − W (s. i + 1. i + N − 1 j j+1 j . t2 ∈ [0. The proof above can in fact be modiﬁed to show that if X(·) is a stochastic process such that E(|X(t) − X(s)|β ) ≤ C|t − s|1+α (α. ω) + W (s. the estimate above holds for all t1 . ω)| ≤ K|t − s|γ for all t ∈ [0. ω. C ≥ 0). the sample path t → W(t. ω) is H¨lder continuous with exponent γ at some o point 0 ≤ s < 1. Kakutani) 1. t2 ∈ [0. Now if the function t → W (t. Since t → X(t. 1] and some constant K.e. (i) For each 1 < γ ≤ 1 and almost every ω. ˜ then X(·) has a version X(·) such that a. THEOREM. . ω) is nowhere diﬀerentiable and is of inﬁnite variation on each subinterval.) So any Wiener process has a version with continuous sample paths a. 2. ω) is continuous for a. NOWHERE DIFFERENTIABILITY Next we prove that sample paths of Brownian motion are with probability one 1 nowhere H¨lder continuous with exponent greater than 2 . and we may for simplicity consider only times 0 ≤ t ≤ 1.

n n=k i=1 ≤ lim inf P n→∞ n Ai M. then W (t. n and independent. .n = 0. ω) were of ﬁnite variation on some subinterval.n M =1 k=1 n=k i=1 We will show this event has probability 0. 3. it would then be diﬀerentiable almost everywhere there. ·) is H¨lder continuous with exponent o γ at some time 0 ≤ s < 1 is contained in ∞ ∞ ∞ n Ai . . Thus ω ∈ Ai M. j 1 since the random variables W ( j+1 ) − W ( n ) are N 0. 2. n→∞ since N (γ − 1/2) > 1. For all k and M . Therefore the set of ω ∈ Ω such that W (ω. ω) would be H¨lder continuous o (with exponent 1) at s.n i=1 ≤ lim inf n P M 1 |W ( )| ≤ γ n n N . y2 2 dy −M n1/2−γ 1/2−γ Ai M. .n i=1 ≤ lim inf n→∞ n→∞ P (Ai ) M. This holds for all k. Now n √ M n−γ nx2 M 1 n e− 2 dx P |W ( )| ≤ γ = √ n n 2π −M n−γ 1 =√ 2π ≤ Cn We use this calculation to deduce: P ∞ n M n1/2−γ e− .n n=k i=1 ≤ lim inf nC[n1/2−γ ]N = 0. M. M =1 k=1 n=k i=1 and assertion (i) of the Theorem follows. 51 .n := j M j+1 W( ) − W( ) ≤ γ for j = i.for some constant M . Thus P ∞ ∞ ∞ n Ai M. If W (t. ω) is diﬀerentiable at s. But this is almost surely not so. and all large n. some M ≥ 1. If W (t. i + N − 1 n n n for some 1 ≤ i ≤ n. P ∞ n n Ai M. . M .

The idea underlying the proof is that if |W (t. V ⊆ U. The probability that the above inequality holds for all these j’s is a small number to the large power N . ω)| ≤ γ n n n for all n ≫ 1 and at least N values of j. the σ-algebra U(s) := U(X(r) | 0 ≤ r ≤ s) is called the history of the process up to and including time s. then P (A | V) := E(χA | V) for A ∈ U. the conditional probability of A. and is therefore extremely small. We can informally interpret U(s) as recording the information available from our observing X(r) for all times 0 ≤ r ≤ s. ω)| ≤ K|t − s|γ for all t. MARKOV PROPERTY DEFINITION. DEFINITION.Interpretation. If V is a σ-algebra. j j+1 M |W ( . given V. ω) − W ( . But these are independent events of small probability. Therefore P (A | V) is a random variable. then A sample path of Brownian motion D. ω) − W (s. 52 . If X(·) is a stochastic process.

and this last expression agrees with that above. and (13) P (W(t) ∈ B | W(s)) = 1 (2π(t − s))n/2 e− B |x−W(s)|2 2(t−s) dx a. Loosely speaking. Let A be a Borel set and write Φ(y) := 1 (2π(t − s))n/2 e− 2(t−s) dx. s | 0)Φ(y) dy. then C = {W(s) ∈ B} for some Borel set B ⊆ Rn . Then W(·) is a Markov process. t − s | y) dxdy = B g(y. for all 0 ≤ s < t. We will only prove (13). W(t) ∈ A) = B A g(y. Note carefully that each side of this identity is a random variable. Φ(W(s))dP = C Ω χB (W(s))Φ(W(s)) dP |y|2 e− 2s χB (y)Φ(y) = dy (2πs)n/2 Rn = B g(y. you can predict the probabilities of future values of X(t) just as well as if you knew the entire history of the process before time s. THEOREM. for all 0 ≤ s ≤ t and all Borel subset B of Rn . the process only “knows” its value at time s and does not “remember” how it got there.s. and Borel sets B . The idea of this deﬁnition is that. s | 0)Φ(y) dy. On the other hand. s | 0)g(x. Proof.DEFINITION. Hence C χ{W(t)∈A} dP = P (W(s) ∈ B. Let W(·) be an n-dimensional Wiener process. and so establishes (13). given the current value X(s). 53 . This veriﬁes (14). we must show (14) C χ{W(t)∈A} dP = Φ(W(s)) dP C for all C ∈ U(W(s)).s. A |x−y|2 As Φ(W(s)) is U(W(s)) measurable. Now if C ∈ U(W(s)). An Rn -valued stochastic process X(·) is called a Markov process if P (X(t) ∈ B | U(s)) = P (X(t) ∈ B | X(s)) a.

ω) = b. 54 . If W(s. as discussed before in §C. say.Interpretation. then the future behavior of W(t. The Markov property partially explains the nondiﬀerentiability of sample paths for Brownian motion. ω) depends only upon this fact and not on how W(t. Thus the path “cannot remember” how to leave b in such a way that W(·. ω) approached the point b as t → s− . ω) will have a tangent there.

But before we can study and solve such an integral equation. we must ﬁrst deﬁne T G dW 0 for some wide class of stochastic processes G. D. s) dW for all times t ≥ 0. with g(0) = g(1) = 0. which we will in Chapter 5 interpret to mean t t (1) X(t) = X0 + 0 b(X. A FIRST DEFINITION. Motivation Deﬁnition and properties of Itˆ integral o Indeﬁnite Itˆ integrals o Itˆ’s formula o Itˆ integral in higher dimensions o A. 55 . Then let us deﬁne 1 0 1 1 g dW := − g ′ W dt. 2 g dW = 1 0 g 2 dt. (i) E (ii) E 1 0 g dW 1 0 = 0. s) ds + 0 B(X. ω) is of inﬁnite variation for almost every ω. Suppose now n = m = 1.ˆ CHAPTER 4: STOCHASTIC INTEGRALS. E. Wiener and Zygmund [P-W-Z]. Suppose g : [0. Let us check out the properties following from this deﬁnition: LEMMA (Properties of the Paley–Wiener–Zygmund integral). ITO’S FORMULA A. 0 Note that 0 g dW is therefore a random variable. Note carefully: g is an ordinary. 1] → R is continuously diﬀerentiable. One possible deﬁnition is due to Paley. deterministic function and not a stochastic process. For instance. C. so that the right-hand side of (1) at least makes sense. Observe also that this is not at all obvious. then 0 G dW simply cannot be understood as an ordinary integral. T since t → W(t. t)dW X(0) = X0 . B. t)dt + B(X. MOTIVATION Remember from Chapter 1 that we want to develop a theory of stochastic diﬀerential equations of the form (SDE) dX = b(X.

1 1 2 1 E 0 gm dW − gn dW 0 = 0 (gm − gn )2 dt. We can take a sequence of C 1 functions 1 gn . 1. and not for stochastic processes. 0 then the integrand B(X. We must devise a deﬁnition for a wider class of integrands (although the definition we ﬁnally decide on will agree with that of Paley. 0 56 . Consequently we can n=1 1 1 g dW := lim 0 n→∞ gn dW. 0 The extended deﬁnition still satisﬁes properties (i) and (ii). Suppose now g ∈ L2 (0. except that this only makes sense for functions g ∈ L2 (0. t B(X. let us think about what might be an appropriate deﬁnition for T W dW = ?. In view of property (ii). we calculate 1 2 1 1 E 0 g dW =E 0 1 1 0 1 g (t)W (t) dt 0 ′ g ′ (s)W (s) ds = 0 g ′ (t)g ′ (s)E(W (t)W (s))dsdt =t∧s t 1 = 0 1 g ′ (t) 0 sg ′ (s) ds + t t tg ′ (s) ds dt = 0 1 g ′ (t) tg(t) − t 0 g ds − tg(t) dt 1 = 0 g (t) − ′ g ds dt = 0 0 g 2 dt. =0 2. RIEMANN SUMS. If we wish to deﬁne the integral in (1). Discussion. as above. To continue our study of stochastic integrals with random integrands. and therefore { deﬁne 1 g 0 n dW }∞ is a Cauchy sequence in L2 (Ω). To conﬁrm (ii).Proof. Wiener. 1 This is a reasonable deﬁnition of 0 g dW . 1). 1). Zygmund if g happens to be a deterministic C 1 function. E 1 0 g dW =− 1 0 g ′ E(W (t)) dt. s) dW. such that 0 (gn − g)2 dt → 0. with g(0) = g(1) = 0). t) is a stochastic process and the deﬁnition above will not suﬃce.

λ) := k=0 W (τk )(W (tk+1 ) − W (tk )). Then mn −1 k=0 (W (tn ) − W (tn ))2 → b − a k+1 k in L2 (Ω) as n → ∞. T ]. . T ] is an interval. k+1 k k+1 k 57 . the term in the double sum is E((W (tn ) − W (tn ))2 − (tn − tn ))E(· · · ). Let [a. DEFINITIONS. a partition P of [0. (iii) For ﬁxed 0 ≤ λ ≤ 1 and P a given partition of [0. (i) If [0. . Then k Qn − (b − a) = Hence E((Qn − (b − a)) ) = 2 ((W (tn ) − W (tn ))2 − (tn − tn )). Set Qn := mn −1 n k=0 (W (tk+1 ) mn −1 k=0 − W (tn ))2 . introduced in Chapter 1. A reasonable procedure is to construct a Riemann sum approximation. b] be an interval in [0. j+1 j j+1 j For k = j. k+1 k k+1 k mn −1 mn −1 k=0 j=0 E([(W (tn ) − W (tn ))2 − (tn − tn )] k+1 k k+1 k [(W (tn ) − W (tn ))2 − (tn − tn )]). Proof. and then–if possible–to pass to limits. . T ] is a ﬁnite collection of points in [0. and suppose P n := {a = tn < tn < · · · < tn n = b} 0 1 m are partitions of [a. m − 1).where W (·) is a 1-dimensional Brownian motion. set τk := (1 − λ)tk + λtk+1 (k = 0. with λ ﬁxed? W dW . For such a partition P and for 0 ≤ λ ≤ 1. with |P n | → 0 as n → ∞. This assertion partly justiﬁes the heuristic idea. ∞). we deﬁne m−1 R = R(P. (ii) Let the mesh size of P be |P | := max0≤k≤m−1 |tk+1 − tk |. The key LEMMA (Quadratic variation). T 0 This is the corresponding Riemann sum approximation of question is this: what happens if |P | → 0. b]. T ]: P := {0 = t0 < t1 < · · · < tm = T }. . that dW ≈ (dt)1/2 .

If P n denotes a partition of [0. Remark. k+1 k W (tn ) − W (tn ) k+1 k n n tk+1 − tk Therefore for some constant C we have n Yk = Yk := is N (0. we see again that sample paths have inﬁnite variation with probability one: m−1 sup P k=0 |W (tk+1 ) − W (tk )| = ∞. deﬁne Rn := Then n→∞ mn −1 k=0 n W (τk )(W (tn ) − W (tn )). Passing if necessary to a subsequence. k+1 k Pick an ω for which this holds and also for which the sample path is uniformly H¨lder continuous with some exponent 0 < γ < 1 . LEMMA. Let us now return to the question posed above. E((Qn − (a − b)) ) ≤ C 2 mn −1 k=0 (tn − tn )2 k+1 k as n → ∞. . k+1 k lim Rn = W (T )2 1 + λ− 2 2 58 T. and thus equals 0. Hence E((Qn − (b − a)) ) = where 2 mn −1 k=0 2 E((Yk − 1)2 (tn − tn )2 ). as W (t) − W (s) is N (0. mn −1 k=0 (W (tn ) − W (tn ))2 → b − a a.according to the independent increments. as to the limit of the Riemann sum approximations. ≤ C | P n | (b − a) → 0.s. 1). Then o 2 b − a ≤ K lim sup |Pn | n→∞ γ mn −1 k=0 |W (tn ) − W (tn )| k+1 k for a constant K. T ] and 0 ≤ λ ≤ 1 is ﬁxed. Since |Pn | → 0. t − s) for all t ≥ s ≥ 0.

C. That is. in §B) of 0 W dW corresponds to the o choice λ = 0. Hence C → 0 in L2 (Ω) as n → ∞. k k+1 k k+1 Proof. It turns out that Itˆ’s deﬁnition (later. T W 2 (T ) T − W dW = 2 2 0 59 T . k+1 k =:C According to the foregoing Lemma. E W (T )2 1 Rn − − λ− 2 2 2 T → 0. B. In particular the limit of the Riemann sum approximations depends upon the n n choice of intermediate points tn ≤ τk ≤ tn . We combine the limiting expressions for the terms A.the limit taken in L2 (Ω). and thereby establish the Lemma. A → T in L2 (Ω) as n → ∞. That is. where τk = (1 − λ)tn + λtn . A similar argument 2 shows that B → λT as n → ∞. We have Rn := = mn −1 k=0 n W (τk )(W (tn ) − W (tn )) k+1 k mn −1 k=0 W 2 (T ) 1 − 2 2 mn −1 k=0 (W (tn ) − W (tn ))2 k+1 k =:A mn −1 k=0 + n (W (τk ) − W (tn ))2 + k =:B n n (W (tn ) − W (τk ))(W (τk ) − W (tn )). Next we study the term C: E([ mn −1 k=0 n n (W (tn ) − W (τk ))(W (τk ) − W (tn ))]2 ) k+1 k mn −1 k=0 n n E([W (tn ) − W (τk )]2 )E([W (τk ) − W (tn )]2 ) k+1 k = (independent increments) = mn −1 k=0 (1 − λ)(tn − tn )λ(tn − tn ) k+1 k k+1 k ≤ λ(1 − λ)T |P n | → 0.

(ii) The σ-algebra W + (t) := U(W (s)−W (t) | s ≥ t) is the future of the Brownian motion beyond time t.and. where X0 is a random variable independent of W + (0). See Chapter 6 for more. takes λ = 1 . k k k+1 if λ > 0. G(·) will in general depend on Brownian motion W (·). DEFINITION AND PROPERTIES OF ITO’S INTEGRAL Let W (·) be a 1-dimensional Brownian motion deﬁned on some probability space (Ω. DEFINITIONS. A family F (·) of σ-algebras ⊆ U is called nonanticipating (with respect to W (·)) if (a) F (t) ⊇ F (s) for all t ≥ s ≥ 0 (b) F (t) ⊇ W(t) for all t ≥ 0 (c) F (t) is independent of W + (t) for all t ≥ 0. Exact deﬁnitions are later. We should informally think of F (t) as “containing all information available to us at time t”. Indeed. U. X0 ). More discussion. An alternative deﬁnition. but the idea is that t represents time. tn ]. so that 2 T 0 W ◦ dW = W 2 (T ) 2 (Stratonovich integral). building the Riemann sum approximation by evaluating n the integrand at the left-hand endpoint τk = tn on each subinterval [tn . Our primary example will be F (t) := U(W (s) (0 ≤ s ≤ t). and since we do not know what W (·) will do on [tn . due to Stratonovich. ˆ B. What are the advantages of taking λ = 0 and getting T W dW = 0 W 2 (T ) T − ? 2 2 First and most importantly. and n we do not know at time tn its future value at the future time τk = (1 −λ)tn +λtn . This is not what one would guess oﬀhand. more generally. IMPORTANT REMARK. (i) The σ-algebra W(t) := U(W (s) | 0 ≤ s ≤ t) is called the history of the Brownian motion up to (and including) time t. We also refer to F (·) as a ﬁltration. tn ] will k k k=1 ultimately permit the deﬁnition of T G dW 0 for a wide class of so-called “nonanticipating” stochastic processes G(·). 60 . it is best to use the known value of G(tn ) in the k k k+1 approximation. r W dW = s W 2 (r) − W 2 (s) (r − s) − 2 2 for all r ≥ s ≥ 0. P ). DEFINITION.

This is however a bit subtle to deﬁne. . The idea is that G(·) is nonanticipating and. to be developed below. we will see that since at each moment of time “G depends only upon the past history of the Brownian motion”. useful and elegant formulas. G(t) is F (t)–measurable. We will actually need a slightly stronger notion. Then T m−1 G dW := 0 k=0 Gk (W (tk+1 ) − W (tk )) is the Itˆ stochastic integral of G on the interval (0. in addition. The idea is that for each time t ≥ 0. (ii) Likewise. T ).This will be employed in Chapter 5. (i) We denote by L2 (0. In other words. and so we pause here to emphasize the basic point. Let G ∈ L2 (0. o Note carefully that this is a random variable. Then each Gk is an F (tk )-measurable random variable. as above. . progressively measurable stochastic processes G(·) such that T E 0 G2 dt < ∞. and understand. Discussion. m − 1). 61 . . T ) is called a step process if there exists a partition P = {0 = t0 < t1 < · · · < tm = T } such that G(t) ≡ Gk f or tk ≤ t < tk+1 (k = 0. . some nice identities hold. and we will not do so here. namely that G(·) be progressively measurable. the stochastic T integral 0 G dW in terms of some simple. A real-valued stochastic process G(·) is called nonanticipating (with respect to F (·)) if for each time t ≥ 0. we will be able to deﬁne. These measure theoretic issues can be confusing to students. where X0 will be the (possibly random) initial condition for a stochastic diﬀerential equation. T ) the space of all real–valued. A process G ∈ L2 (0. DEFINITION. For progressively measurable integrands G(·). which would be false if G “depends upon the future behavior of the Brownian motion”. DEFINITION. since G is nonanticipating. T ) is the space of all real–valued. the random variable G(t) “depends upon only the information available in the σ-algebra F (t)”. T ) be a step process. DEFINITIONS. L1 (0. is appropriately jointly measurable in the variables t and ω together. DEFINITION. progressively measurable processes F (·) such that T E 0 |F | dt < ∞.

and so Gk is independent of W (tk+1 )− W (tk ). then W (tk+1 ) − W (tk ) is independent of Gk Gj (W (tj+1 ) − W (tj )). On the other hand. Thus E(Gk Gj (W (tk+1 ) − W (tk ))(W (tj+1 ) − W (tj ))) <∞ =0 = E(Gk Gj (W (tj+1 ) − W (tj )))E(W (tk+1 ) − W (tk )). W (tk+1 )−W (tk ) is W + (tk )-measurable. Furthermore.j=1 E (Gk Gj (W (tk+1 ) − W (tk ))(W (tj+1 ) − W (tj ))) . Then T m−1 E 0 G dW = k=0 E(Gk (W (tk+1 ) − W (tk ))). Now Gk is F (tk )-measurable and F (tk ) is independent of W + (tk ). T (ii) E 0 G dW = 0. The ﬁrst assertion is easy to check. H ∈ L2 (0. We have for all constants a. Suppose next G(t) ≡ Gk for tk ≤ t < tk+1 . 0 Proof. =0 2. 62 . b ∈ R and for all step processes G. Now if j < k. T ): T T T (i) 0 aG + bH dW = a 0 G dW + b 0 H dW.LEMMA (Properties of stochastic integral for step processes). Hence E(Gk (W (tk+1 ) − W (tk ))) = E(Gk )E(W (tk+1 ) − W (tk )). 1. T 2 E G dW 0 m−1 = k. (iii) E T 2 G dW 0 T =E G2 dt .

m → ∞ T G dW := lim 63 n→∞ Gn dW 0 . ω) is continuous for a. APPROXIMATION BY STEP FUNCTIONS. but the idea is this: if t → G(t. t → G (t. T ) such that T E 0 |G − Gn |2 dt → 0. as above. k = 0. T ). deﬁne Gm (t) := 0 t mem(s−t) G(s) ds. ω. T ). Outline of proof. T ). The plan now is to approximate an arbitrary process G ∈ L2 (0. n n n For a general G ∈ L2 (0. and so the limit 0 E DEFINITION. take step processes Gn as above. there exists a sequence of bounded step processes Gn ∈ L2 (0.e. If G ∈ L2 (0. . If G ∈ L2 (0. and then pass to limits to deﬁne the Itˆ integral of G. . [nT ]. Now approximate G m by step processes.s. ω) is continuous for almost every ω. . T ) by step processes in L2 (0. T ). We omit the proof. o LEMMA (Approximation by step processes).Consequently E T 2 G dW 0 m−1 = = k=0 m−1 E(G2 (W (tk+1 ) − W (tk ))2 ) k E(G2 )E((W (tk+1 ) − W (tk ))2 ) k =tk+1 −tk T k=0 =E 0 G2 dt . and T 0 2 m |Gm − G|2 dt → 0 a. Then T 2 T 0 G − G dW n m =E 0 (Gn − Gm )2 dt T →0 as n. Then G m ∈ L (0. . we can set k k k+1 Gn (t) := G( ) for ≤ t < . T ).

Finally. b ∈ R T (i) 0 aG + bH dW = a 0 T G dW + b 0 H dW.s. T ). and is left as an exercise. 0 Proof. 2. It is not hard to check that this deﬁnition does not depend upon the particular sequence of step process approximations in L2 (0. For many applications. T ). H ∈ L (0. 1. instead of just L2 (0.s. T T It turns out that we can then deﬁne G dW := lim 0 n→∞ Gn dW. we have T T For all constants a. The idea is to ﬁnd a sequence of step processes Gn ∈ M2 (0. To this end we deﬁne M2 (0. EXTENDING THE DEFINITION. assertion (iv) results from (iii) and the identity 2ab = (a+b)2 −a2 −b2 . 0 T T (iv) G dW 0 H dW GH dt .exists in L2 (Ω). Assertion (i) follows at once from the corresponding linearity property for step processes. o although we will not do so in these notes. Statements (ii) and (iii) are also easy consequences of the similar rules for step processes. T ). progressively measurable processes G(·) such that T 0 G2 dt < ∞ a. as n → ∞. 0 64 . It is possible to extend the deﬁnition of the Itˆ integral to cover G ∈ M2 (0. it is important to consider a wider class of integrands. T (iii) E E 0 G dW 0 T =E =E G2 dt . T ) such that T 0 (G − Gn )2 dt → 0 a. T ) to be the space of all real–valued. T ). 2 and for all G. (ii) T E 0 G dW 2 = 0. ˆ THEOREM (Properties of Ito Integral).

the indeﬁnite integral of G(·). G ∈ L2 (0. Note carefully that the diﬀerential symbols are simply an abbreviation for the integral expressions above: strictly speaking “dX”.s. (ii) Furthermore. ω. 65 . T ) and t → G(t. We say that X(·) has the stochastic diﬀerential dX = F dt + GdW for 0 ≤ t ≤ T . In this section we note some properties of the process I(·). ˆ D.s. I(·) has a version with continuous sample paths a. “dt”.e. we will always mean this version. ω) is continuous for a. evaluated at τk = tn . and “dW ” have no meaning alone. then the indeﬁnite integral I(·) is a martingale. These facts will be quite useful for proving Itˆ’s formula later in §D and in solving the stochastic diﬀerential equations o in Chapter 5. set t I(t) := 0 G dW (0 ≤ t ≤ T ). Note I(0) = 0. For G ∈ L2 (0. This conﬁrms the consistency of Itˆ’s integral with the earlier o n calculations involving Riemann sums.the expressions on the right converging in probability. (i) If G ∈ L2 (0. T ). ITO’S FORMULA DEFINITION. In particular. INDEFINITE ITO INTEGRALS DEFINITION. T ). T ) and all times 0 ≤ s ≤ r ≤ T . m with |P n | → 0. where P n = {0 = tn < · · · < tn n = T } is any sequence of partitions. We will not prove assertion (i). then mn −1 k=0 T G(tn )(W (tn ) − W (tn )) → k k+1 k G dW 0 in probability. namely that it is a martingale and has continuous sample paths a. a proof of (ii) is in Appendix C. THEOREM. k ˆ C. T ). Henceforth when we refer to I(·). More on Riemann sums. Suppose that X(·) is a real–valued stochastic process satisfying X(r) = X(s) + s r r F dt + s G dW for some F ∈ L1 (0. if G ∈ M2 (0. See for instance Friedman [F] or Gihman–Skorohod [G-S] for details.

t)F + (X. t). ∂t (ii) In view of our deﬁnitions. We will prove Itˆ’s formula o below. G ∈ L2 (0. Let X(·) = W (·). Thus for almost every ω. Then dX = dW and thus F ≡ 0. the expression (2) means for all 0 ≤ s ≤ r ≤ T .ˆ THEOREM (Ito’s Formula). etc. 66 . t)G2 dt 2 ∂x 2 ∂x s ∂t r ∂u (X. but ﬁrst here are some applications: Example 1. t). t). ∂x . ∂x2 exist and are continuous. Y (r) − Y (s) = u(X(r). + s ∂x t t (iii) Since X(t) = X(0) + 0 F ds + 0 G dW . ∂x2 ˆ ILLUSTRATIONS OF ITO’S FORMULA. t) are continuous and so the integrals in (3) are deﬁned. above is (X(t). G ≡ 1. Suppose that X(·) has a stochastic diﬀerential dX = F dt + GdW. u(x) = xm . Assume u : R × [0. This integrated is the identity r W dW = s W 2 (r) − W 2 (s) (r − s) − . ∂u . t). Hence Itˆ’s formula gives o 1 d(W m ) = mW m−1 dW + m(m − 1)W m−2 dt. ∂u (X(t). Set Y (t) := u(X(t). t) + (X. T ] → R is continuous and that ∂u ∂u ∂ 2 u ∂t . s) r (3) = ∂u ∂u 1 ∂ 2u (X. 2 2 a formula we have established from ﬁrst principles before. t)G dW almost surely. T ). T ). ∂t ∂x ∂2u (X(t). o o Remarks. (i) The argument of u. Then Y has the stochastic diﬀerential dY = (2) = ∂u ∂t dt ∂u ∂t + ∂u ∂x dX ∂u F ∂x + 1 ∂2u 2 2 ∂x2 G dt + + 1 ∂2u 2 G 2 ∂x2 dt + ∂u GdW. r) − u(X(s). the functions t → ∂u (X(t). for F ∈ L1 (0. ∂x We call (2) Itˆ’s formula or Itˆ’s chain rule. X(·) has continuous sample paths almost surely. 2 In particular the case m = 2 reads d(W 2 ) = 2W dW + dt.

t) = x2 t x3 tx − . t) = hn (W. t) dW. 1. n! dxn the n-th Hermite polynomial.Example 2. . . . In the Itˆ stochastic calculus the expression eλW (t)− 2 plays the role that eλt o plays in ordinary calculus. Again take X(·) = W (·). s) dW = hn+1 (W (t). Then λ2 t hn (x. t) = − 2 2 6 2 2 2 4 tx t x − + . t) = eλx− d eλW (t)− λ2 t 2 λ2 t 2 = − λ2 t λ2 λW (t)− λ2 t λ2 2 e + eλW (t)− 2 2 2 . t) = x h2 (x. h4 (x. Hence e λx− λ2 t 2 = ∞ n=0 λn hn (x. t) 0 for t ≥ 0and n = 0. Thus o dY = λY dW Y (0) = 1. t) plays the o tn role that n! plays in ordinary calculus. We build upon this observation: Example 3. Consequently in the Itˆ stochastic calculus the expression hn (W (t). deﬁne 2 (−t)n x2 /2t dn e e−x /2t . about which more in Chapters 5 and 6. t). dhn+1 (W. t) := h0 (x. 1. . etc. h1 (x. that is. (from McKean [McK]) Since n 2 dn − (x−λt)2 n d 2t (e )|λ=0 = (−t) (e−x /2t ). . . t) = 24 4 8 THEOREM (Stochastic calculus with Hermite polynomials). F ≡ 0. G ≡ 1. 67 . Then λ2 t 2 dt + λeλW (t)− dW by Itˆ’s formula. . This is a stochastic diﬀerential equation. h3 (x. For n = 0. u(x. t) = 1. dλn dxn we have 2 2 dn λx− λ2 t dn 2 )| (e = (−t)n ex /2t n (e−x /2t ) λ=0 dλn dx = n!hn (x. . We have t hn (W. t). Proof.

r]. ˆ PROOF OF ITO’S FORMULA. and (ii ) d(tW ) = W dt + tdW . Proof. Plug in the expansion above for Y (t): ∞ n=0 λn hn (W (t). This identity holds for all λ and so the coeﬃcients of λn on both sides are equal. Similarly. s) dW.. Y (t) = 1 + λ 0 t Y dW for all t ≥ 0. dY = λY dW Y (0) = 1. s) dW t λn 0 hn−1 (W (s). 68 . The limit above is taken in L2 (Ω). note that r t dW = lim 0 n mn −1 k=0 n→∞ tn (W (tn ) − W (tn )). that is.and so Y (t) = e But Y (·) solves λW (t)− λ2 t 2 = ∞ n=0 λn hn (W (t). We have (i) d(W 2 ) = 2W dW + dt. k+1 k+1 k since for amost every ω the sum is an ordinary Riemann sum approximation and for this we can take the right-hand endpoint tn at which to evaluate the continuous k+1 integrand. To verify (ii). o by verifying directly two important special cases: LEMMA (Two simple stochastic differentials). We add these formulas to obtain r r t dW + 0 0 W dt = rW (r). k k+1 k where P = {0 = < < · · · < = r} is a sequence of partitions of [0. We now begin the proof of Itˆ’s formula. We have already established formula (i). since t → W (t) is continuous a. t) = 1 + λ =1+ ∞ n=1 t ∞ 0 n=0 λn hn (W (s).s. t). with n |P | → 0. r tn 0 tn 1 tn n m W dt = lim 0 mn −1 k=0 n→∞ W (tn )(tn − tn ).

agrees with the Itˆ deﬁnition. we get the ordinary calculus integration-by-parts formula. The o integrated version of the product rule is the Itˆ integration-by-parts formula: o r r r (5) s X2 dX1 = X1 (r)X2 (r) − X1 (s)X2 (s) − s X1 dX2 − G1 G2 dt. we now prove: ˆ THEOREM (Ito product rule). 2). Remarks. F (0)-measurable random variables (i = 1. T ) (i = 1. This conﬁrms that the Paley–Wiener–Zygmund deﬁnition 1 0 1 g dW = − g ′ W dt. 2). assume for simplicity that X1 (0) = X2 (0) = 0. Gi are time–independent. 2). where Fi . T ). 69 . These special cases in hand. 1. (i) The expression G1 G2 dt here is the Itˆ correction term. Then Xi (t) = Fi t + Gi W (t) (t ≥ 0. Gi (t) ≡ Gi . for Fi ∈ L1 (0. o Proof. 0 for deterministic C 1 functions g. with g(0) = g(1) = 0. Choose 0 ≤ r ≤ T . First of all. i = 1.These integral identities for all r ≥ 0 are abbreviated d(tW ) = tdW + W dt. Suppose dX1 = F1 dt + G1 dW dX2 = F2 dt + G2 dW (0 ≤ t ≤ T ). Gi ∈ L2 (0. Fi (t) ≡ Fi . Then (4) d(X1 X2 ) = X2 dX1 + X1 dX2 + G1 G2 dt. s (ii) If either G1 or G2 is identically equal to 0.

tk+1 ) on which Fi and Gi are constant random variables. X1 (s). 70 . X2 (s) are arbitrary. The case that s ≥ 0. r) and pass to limits. 2. we apply Step 1 on each subinterval [tk . T ). 2. and Fi .Thus r X2 dX1 + X1 dX2 + G1 G2 dt 0 r r = 0 X1 F2 + X2 F1 dt + 0 r X1 G2 + X2 G1 dW + 0 r G1 G2 dt = 0 (F1 t + G1 W )F2 + (F2 t + G2 W )F1 dt r + 0 2 (F1 t + G1 W )G2 + (F2 t + G2 W )G1 dW + G1 G2 r r r = F1 F2 r + (G1 F2 + G2 F1 ) 0 r W dt + 0 t dW + 2G1 G2 0 W dW + G1 G2 r. Gn ∈ i L (0. to obtain the formula r 0 Fin ds + 0 Gn dW (i = 1. If Fi . and Fi . Employing these identities. T ). we deduce: 0 r W dt + X2 dX1 + X1 dX2 + G1 G2 dt 0 = F1 F2 r 2 + (G1 F2 + G2 F1 )rW (r) + G1 G2 W 2 (r) = X1 (r)X2 (r). This is formula (5) for the special circumstance that s = 0. and add the resulting integral expressions. Gi time–independent random variables. 3. 2). In the general situation. with 2 E( E( Deﬁne T 0 |Fin − Fi | dt) → 0 − Gi )2 dt) → 0 t T (Gn i 0 as n → ∞. Gi are constant F (s)measurable random variables has a similar proof. Xi (0) = 0. we select step processes Fin ∈ L1 (0. t n Xi (t) := Xi (0) + n We apply Step 2 to Xi (·) on (s. Gi are step processes. i X1 (r)X2 (r) = X1 (s)X2 (s) + s X1 dX2 + X2 dX1 + G1 G2 dt. r r 0 We now use the Lemma above to compute 2 0 W dW = W 2 (r) − r and r t dW = rW (r). i = 1.

. where f and g are polynomials. = ∂t ∂x 2 ∂x2 This calculation conﬁrms Itˆ’s formula for u(x. o Now assume the stated formula for m − 1: 1 d(X m−1 ) = (m − 1)X m−2 dX + (m − 1)(m − 2)X m−3 G2 dt 2 1 = (m − 1)X m−2 (F dt + GdW ) + (m − 1)(m − 2)X m−3 G2 dt. and ﬁrst of all claim that 1 d(X m ) = mX m−1 dX + m(m − 1)X m−2 G2 dt. t) = f (x)g(t). 2 This is clear for m = 0. t) = f (x)g(t). This proves (6). 1. with F ∈ L (0. T ). T ). We start with the case u(x) = xm . 2 Since Itˆ’s formula thus holds for the functions u(x) = xm . . 1. and o since the operator “d” is linear. . Suppose 1 2 dX = F dt + GdW . m = 0. Suppose now u(x. where f and g are o polynomials. Thus it is true as well for any function u having the form m u(x. . 2 1 because m − 1 + 1 (m − 1)(m − 2) = 2 m(m − 1).Now we are ready for ˆ CONCLUSION OF THE PROOF OF ITO’S FORMULA. t)) = d(f (X)g) = f (X)dg + gdf (X) 1 = f (X)g ′dt + g[f ′ (X)dX + f ′′ (X)G2 dt] 2 2 ∂u 1∂ u 2 ∂u dt + dX + G dt. Then d(u(X. 1. . m = 0. 2. 1. t) = i=1 f i (x)g i (t). G ∈ L (0. . and the case m = 2 follows from the Itˆ product formula. 2 and we prove it for m: (6) d(X m ) = d(XX m−1 ) = Xd(X m−1 ) + X m−1 dX + (m − 1)X m−2 G2 dt (by the product rule) 1 = X (m − 1)X m−2 dX + (m − 1)(m − 2)X m−3 G2 dt 2 + (m − 1)X m−2 G2 dt + X m−1 dX 1 = mX m−1 dX + m(m − 1)X m−2 G2 dt. 71 . Itˆ’s formula is valid for all polynomials u in the o variable x. .

W m (·)) be an m-dimensional Brownian motion. 0 ∂x the argument of the partial derivatives of un is (X(t). . . r un (X(r). (ii) We assume F (·) is a family of nonanticipating σ-algebras. . . Invoking Step 2. We may pass to limits as n → ∞ in this expression. . Given u as in Itˆ’s formula. . 3. t)) = dt + dX i + Gi Gj dt. . m). Gij ∈ L2 (0. T ) if n F i ∈ L1 (0. T ) (i = 1. t. ∂t ∂xi 2 i. DEFINITIONS. T ] → R is continuous. T ) 72 (i = 1. n. ˆ E. . meaning that (a) F (t) ⊇ F (s) for all t ≥ s ≥ 0 (b) F (t) ⊇ W(t) = U(W(s) | 0 ≤ s ≤ t) (c) F (t) is independent of W + (t) := U(W(s) − W(t) | t ≤ s < ∞). 0) = 0 ∂un ∂un 1 ∂ 2 un 2 + F+ G dt ∂t ∂x 2 ∂x2 r ∂un + G dW almost surely. . j = 1. . Suppose dX i = F i dt + Gi dW . ∂xi ∂xj . . n. (i) An Mn×m -valued stochastic process G = ((Gij )) belongs to L2 (0. t). j = 1. . . with continuous partial derivatives ∂u ∂2u . we know that for all 0 ≤ r ≤ T. n). (i) Let W(·) = (W 1 (·). . . . A similar proof gives this: ˆ GENERALIZED ITO FORMULA.j=1 ∂xi ∂xj i=1 1 n n n ∂u ∂t . . ∂x ∂x ∂x2 uniformly on compact subsets of R × [0. If u : Rn × [0. r) − un (X(0). F n ) belongs to L1 (0. for i = 1. Itˆ’s formula is valid for all polynomial o functions u of the variables x. . with i 1 i 2 for F ∈ L (0. . thereby proving Itˆ’s o formula in general. . . .where f i and g i polynomials. (i. T ) if n×m (ii) An Rn -valued stochastic process F = (F 1 . T ). . then ∂xi ∂u ∂ 2u ∂u 1 d(u(X . T ]. . . . G ∈ L (0. ITO’S INTEGRAL IN HIGHER DIMENSIONS Notation. ∂u → ∂u ∂t ∂t 2 n 2 ∂un u → ∂u . . T ). F 2 . That is. n). ∂∂x2 → ∂ u . . there exists a sequence of polynomials un such o that n un → u. . X . .

Approximating by step processes as before. . DEFINITION. T ). G dW = E |G|2 dt . Let u : Rn ×[0. . ∂xi . T ). This means that m dX i = F i dt + j=1 Gij dW j for i = 1. T ] be continuous. T ). . as above. with continuous partial derivatives ∂u ∂u ∂2u ∂t .j=1 ∂xi ∂xj Gil Gjl dt. Suppose that dX = Fdt + GdW. . we say X(·) has the n n×m stochastic diﬀerential dX = Fdt + GdW. ˆ THEOREM (Ito’s formula in n-dimensions). . T ) and all 0 ≤ s ≤ r ≤ T . . Then d(u(X(t). then n×m T E 0 G dW = 0. . . . n). If G ∈ L2 (0. t). If X(·) = (X 1 (·). n). t))= (5) ∂u ∂t dt + n ∂u i i=1 ∂xi dX m l=1 +1 2 n ∂2u i. . If G ∈ L2 (0. X n (·)) is an Rn -valued stochastic process such that r r X(r) = X(s) + s F dt + s G dW for some F ∈ L1 (0. . . . G ∈ L2 (0. we can establish this LEMMA. (i. . . where the argument of the partial derivatives of u is (X(t). j = 1.DEFINITION. whose i-th component is m j=1 0 T Gij dW j (i = 1. ∂xi ∂xj . n. . An outline of the proof follows some preliminary results: 73 . then n×m T G dW 0 is an Rn –valued random variable. and where |G|2 := 1≤i≤n 1≤j≤m E T 0 2 T 0 |Gij |2 .

Next observe that t since X is the sum of two independent. ¯ Thus 1 ¯ 1 ¯ d(W W ) = d X 2 − W 2 − W 2 2 2 1 = 2XdX + dt − (2W dW + dt) 2 1 ¯ ¯ − (2W dW + dt) 2 ¯ )(dW + dW ) − W dW − W dW ¯ ¯ ¯ = (W + W ¯ ¯ = W dW + W dW. T ) and 1 Gk i ∈ L (0. There is no correction term “dt” here. We will also need the following modiﬁcation of the product rule: ˆ LEMMA (Ito product rule with several Brownian motions). W √ Proof. We claim that X(·) is a 1-dimensional Brownian motion. ¯ ¯ ¯ d(W 2 ) = 2W dW + dt. t − s) for t ≥ s. 1 2 74 .s. t). . 2. note ﬁrstly that X(0) = 0 a. m. 2 ) random variables. . Suppose m dX1 = F1 dt + k=1 Gk dW k 1 m and dX2 = F2 dt + Gl dW l . Let W (·) and W (·) be independent 1-dimensional Brownian motions. 2 l=1 where Fi ∈ L (0. . X(t) is N (0.¯ LEMMA (Another simple stochastic differential). since ¯ W. A similar observation shows that X(t) − X(s) is N (0. . set X(t) := W (t)+2 (t) . we know o d(X 2 ) = 2XdX + dt. Then m 2 d(X1 X2 ) = X1 dX2 + X2 dX1 + k=1 Gk Gk dt. T ) for i = 1. d(W 2 ) = 2W dW + dt. To see this. This establishes the claim. 1. ¯ Compare this to the case W = W . N (0. W are independent. k = 1. and X(·) has independent increments. From the 1-dimensional Itˆ calculus. 2. To begin. Then ¯ ¯ ¯ d(W W ) = W dW + W dW.

. dW k dW l = δkl dt 75 (k. ∂xn is the gradient of u in the x-variables. HOW TO REMEMBER ITO’S FORMULA. xn ) and t. . . ALTERNATIVE NOTATION. When dX = Fdt + GdW. . using the m 1 Lemma above. . We may symbolically compute ∂u d(u(X. CLOSING REMARKS. t)) = dt + ∂t n i=1 1 ∂u dX i + ∂xi 2 n i. D2 u = is the Hessian matrix. we sometimes write H ij := k=1 m Gik Gjk . 1. ∂xi ∂xj and then simplify the term “dX i dX j ” by expanding it out and using the formal multiplication rules (dt)2 = 0. . l = 1. . This done. xkm . ∂t 2 ∂u ∂u where Du = ∂x1 . . . Then Itˆ’s formula reads o ∂u 1 du(X.The proof is a modiﬁcation of that for the one–dimensional Itˆ product rule. and then. as o before. . ∂xi ∂xj H ij i. km . . dtdW k = 0. . for all functions u as stated. according to the Lemma above. the formula follows easily for polynomials u = u(x. ∂xi ∂ 2u . . . The Itˆ formula in n-dimensions can now be proved by a suitable modiﬁcation o of the one-dimensional proof. . proving this by an induction on k1 . after an approximation. ∂xi ˆ 2. t) = + F · Du + H : D2 u dt + Du · GdW. and n ∂2u ∂xi ∂xj F · Du = H : D2 u = Du · GdW = Fi i=1 n ∂u .j=1 n m i=1 k=1 ∂u ik G dW k . m). t) in the variables x = (x1 . We ﬁrst establish the formula for a multinomials u = u(x) = xk1 .j=1 ∂ 2u dX i dX j . . with the new feature that d(W i W j ) = W i dW j + W j dW i + δij dt. . . .

76 .The foregoing theory provides a rigorous meaning for all this.

B. . . (i) Let W(·) be an m-dimensional Brownian motion and X0 an n-dimensional random variable which is independent of W(·). . W(s) (0 ≤ s ≤ t)) (t ≥ 0). T ] → Rn .s. b = (b1 . .. bn ). the σ-algebra generated by X0 and the history of the Wiener process up to (and including) time t. (ii) Assume T > 0 is given. 77 . D. B : Rn × [0. We will henceforth take F (t) := U(X0 . . . s) ds + 0 B(X(s). T ). C. t) ∈ L2 (0. n1 b . We say that an Rn -valued stochastic process X(·) is a solution of the Itˆ stochastic diﬀerential equation o (SDE) dX = b(X.CHAPTER 5: STOCHASTIC DIFFERENTIAL EQUATIONS A. t)dt + B(X. . for all 0 ≤ t ≤ T . (Note carefully: these are not random variables. n×m and t t (iv) X(t) = X0 + 0 b(X(s). b2 . s) dW a... b1m . (ii) F := b(X. t) ∈ L1 (0. . . . Deﬁnitions and examples Existence and uniqueness of solutions Properties of solutions Linear stochastic diﬀerential equations A. T ). and b : Rn × [0. bnm DEFINITION. t)dW X(0) = X0 for 0 ≤ t ≤ T .) We display the components of these functions by writing b11 ... DEFINITIONS AND EXAMPLES We are ﬁnally ready to study stochastic diﬀerential equations: Notation. provided (i) X(·) is progressively measurable with respect to F (·). B = . . n (iii) G := B(X. T ] → Mn×m are given functions.

Y. we can always assume X(·) has continuous sample paths almost surely. 2 Thus Itˆ’s lemma for u(x) = ex gives o dX = 1 ∂ 2u 2 ∂u dY + g dt ∂x 2 ∂x2 1 1 = eY − g 2 dt + gdW + g 2 dt 2 2 = gXdW. . We will prove uniqueness later. . in §B. EXAMPLES OF LINEAR STOCHASTIC DIFFERENTIAL EQUATIONS. (i) A higher order SDE of the form Y (n) = f (t. note that 1 Y (t) := − 2 satisﬁes t 0 g 2 ds+ t 0 g dW t g ds + 0 2 g dW 1 dY = − g 2 dt + gdW. Y (n−1) )ξ. . where as usual ξ denotes “white noise”. n (n−1) X (t) Y (t) Then X2 . . as claimed. . 0 . Let m = n = 1 and suppose g is a continuous function (not a random variable). Example 1. To verify this. .Remarks. Then the unique solution of (1) is dX = gXdW X(0) = 1 1 t 0 X(t) = e− 2 for 0 ≤ t ≤ T . . dW. . 78 . . dX = . . . Y. can be rewritten into the form above by the device of setting 1 Y (t) X (t) Y (t) X 2 (t) ˙ X(t) = = . dt + . Y (n−1) ) + g(t. . . . . f (· · · ) g(· · · ) (ii) In view of (iii). .

the unique solution of (2) is X(t) = e for 0 ≤ t ≤ T . We can model the evolution of P (t) in time by supposing that dP .Example 2. assuming the initial price p0 is positive. Consequently P (t) = p0 e t . similarly to Example 2. 79 for t ≥ 0. Hence E(P (t)) = p0 eµt The expected value of the stock price consequently agrees with the deterministic solution of (3) corresponding to σ = 0. the relative P change of price. Observe that the price is always positive. 2 σW (t)+ µ− σ 2 2 dX = f Xdt + gXdW X(0) = 1 t 0 f − 1 g 2 ds+ 2 t 0 g dW dP = µP dt + σP dW . Let P (t) denote the price of a stock at time t. Example 3 (Stock prices). Hence (3) and so d(log(P )) = 1 σ 2 P 2 dt dP − by Itˆ’s formula o P 2 P2 σ2 = µ− dt + σdW. . Since (3) implies t t P (t) = p0 + 0 µP ds + 0 σP dW and E t 0 σP dW = 0. we see that t E(P (t)) = p0 + 0 µE(P (s)) ds. called the drift and the volatility of the stock. evolves according to the SDE dP = µdt + σdW P for certain constants µ > 0 and σ. Similarly.

We call B(·) a Brownian bridge. 0 1−s as we conﬁrm by a direct calculation. t A sample path of the Brownian bridge Example 5 (Langevin’s equation). In this interpretation X(·) is the velocity of the Brownian particle: see Example 6 for the position process Y (·). where ξ(·) is “white noise”. It turns out also that limt→1− B(t) = 0 almost surely.Example 4 (Brownian bridge). The solution of the SDE (4) is B(t) = (1 − t) B dB = − 1−t dt + dW B(0) = 0 (0 ≤ t < 1) 1 dW (0 ≤ t < 1). between the origin at time 0 and at time 1. b > 0 is a coeﬃcient of friction. A possible improvement of our mathematical model of the motion of a Brownian particle models frictional forces as follows for the one-dimensional case: ˙ X = −bX + σξ. and σ is a diﬀusion coeﬃcient. 80 . We interpret this to mean (5) dX = −bXdt + σdW X(0) = X0 .

Y (0) = Y1 . σ as t → ∞. 2b assuming. 81 . A better model of Brownian movement is provided by the Ornstein–Uhlenbeck equation ¨ ˙ Y = −bY + σξ ˙ Y (0) = Y0 . For any such initial condition X0 we therefore have E(X(t)) → 0 as t → ∞.for some initial distribution X0 . Observe that E(X(t)) = e−bt E(X0 ) and t E(X (t)) = E e 2 −2bt 2 X0 t 0 + 2σe −bt X0 0 2 e−b(t−s) dW + σ2 e−b(t−s) dW t 2 = e−2bt E(X0 ) + 2σe−bt E(X0 )E t e−b(t−s) dW 0 + σ2 0 e−2b(t−s) ds σ2 (1 − e−2bt ). Example 6 (Ornstein–Uhlenbeck process). 2 V (X(t)) → σ 2b 2 From the explicit form of the solution we see that the distribution of X(t) approaches N 0. We interpret this to mean that irrespective of the 2b initial distribution. independent of the Brownian motion. the solution of the SDE for large time “settles down” into a 2 Gaussian distribution whose variance σ represents a balance between the random 2b disturbing force σξ(·) and the frictional damping force −bX(·). of course. as is straightforward to verify. V (X0 ) < ∞. The solution is t X(t) = e−bt X0 + σ 0 e−b(t−s) dW (t ≥ 0). 2b 2 = e−2bt E(X0 ) + Thus the variance is given by V (X(t)) = E(X 2 (t)) − E(X(t))2 V (X(t)) = e−2bt V (X0 ) + σ2 (1 − e−2bt ). This is the Langevin equation.

We assume Y1 to be normal. Therefore t E(Y (t)) = E(Y0 ) + 0 t E(X(s)) ds e−bs E(Y1 ) ds E(Y1 ). satisﬁes the Langevin equation (6) dX = −bXdt + σdW X(0) = Y1 . Y0 and Y1 are given Gaussian random variables. = E(Y0 ) + 0 = E(Y0 ) + 82 1 − e−bt b . ˙ Then X := Y . shows X(t) to be Gaussian for all times t ≥ 0. σ is the diﬀusion coeﬃcient. studied in Example 5. the velocity process.A simulation of Langevin’s equation where Y (t) is the position of Brownian particle at time t. and ξ(·) as usual is “white noise”. As before b > 0 is the friction coeﬃcient. whence explicit formula for the solution. Now the position process is t Y (t) = Y0 + 0 X ds. X(t) = e −bt t Y1 + σ 0 e−b(t−s) dW.

and this formulation suggests that we try a successive approximation method to construct a solution. p. For the special case X1 = 0. An explicit solution can be worked out using the general formulas presented below in §D. 57] discusses this model as compared with Einstein’s . . Now the SDE means t dX = b(X)dt + dW X(0) = x X(t) = x + 0 b(X) ds + W (t). 1. AN EXAMPLE IN ONE DIMENSION. and try to solve the one–dimensional stochastic diﬀerential equation (7) where x ∈ R. X(0) = X1 . with |b′ | ≤ L for some constant L. . for all times t ≥ 0. EXISTENCE AND UNIQUENESS OF SOLUTIONS In this section we address the problem of building solutions to stochastic differential equations. σ = 1. 83 . and then t X n+1 (t) := x + 0 b(X n ) ds + W (t) (t ≥ 0) for n = 0. restoring force and −bX is a frictional damping term. we have 1 X(t) = X0 cos(λt) + λ t 0 sin(λ(t − s)) dW. b = 0. . . B. We start with a simple case: 1. This is the SDE ¨ ˙ X = −λ2 X − bX + σξ ˙ X(0) = X0 . Next write Dn (t) := max | X n+1 (s) − X n (s)| 0≤s≤t (n = 0.and a somewhat lengthly calculation shows V (Y (t)) = V (Y0 ) + σ2 σ2 t + 3 (−3 + 4e−bt − e−2bt ). . b2 2b Nelson [N. Example 7 (Random harmonic oscillator). . So deﬁne X 0 (t) ≡ x. ˙ where −λ2 X represents a linear. Let us ﬁrst suppose b : R → R is C 1 . . ).

Next is a procedure for solving SDE by means of a clever change of variables (McKean [McK. To see this note that s Dn (t) = max ≤L ≤L =C 0≤s≤t t 0 t 0 b(X n (r)) − b(X n−1 (r)) dr Dn−1 (s) ds C Ln−1 sn−1 ds (n − 1)! by the induction assumption L t . where C depends on ω. 1. where f will be selected later. according to the previous example. we have D0 (t) = max s 0≤s≤t 0 b(x) dr + W (s) ≤ C for all times 0 ≤ t ≤ T . p. . . dX = b(X)dt + σ(X)dW X(0) = x. and try to ﬁnd a function u such that X := u(Y ) solves our SDE (8).and notice that for a given continuous sample path of the Brownian motion. Assuming for the moment u and f are known. Thus for almost every ω. solves (7). SOLVING SDE BY CHANGING VARIABLES. 2. for m ≥ n we have 0≤t≤T max |X (t) − X (t)| ≤ C m n ∞ k=n Lk T k →0 k! as n → ∞. we compute 84 . Given a general one–dimensional SDE of the form (8) let us ﬁrst solve (9) dY = f (Y )dt + dW Y (0) = y. . 60]). n! 0 n n In view of the claim. 0 ≤ t ≤ T . X n (·) converges uniformly for 0 ≤ t ≤ T to a limit process X(·) which. We now claim that Ln n Dn (t) ≤ C t n! for n = 0. as is easy to check. . Note that we can in principle at least solve (9).

Let φ and f be nonnegative. Set Φ(t) := C0 + e− Therefore Φ(t)e− and thus t 0 f φ ds 0 t 0 for all 0 ≤ t ≤ T. once u is known. . Then Φ′ = f φ ≤ f Φ. continuous functions deﬁned for 0 ≤ t ≤ T . So let us ﬁrst solve the ODE u′ (z) = σ(u(z)) u(y) = x where ′ = d dz . and so t 0 = (Φ′ − f Φ)e− t 0 f ds ≤ (f φ − f φ)e− 0 0 t 0 f ds = 0. f ds ≤ Φ(0)e− f ds = C0 . solve for f (z) = 1 1 b(u(z)) − u′′ (z) . and let C0 ≥ 0 denote a constant. f ds t 0 f ds Φ ′ f φ ds. 1 u′ (Y )f (Y ) + 2 u′′ (Y ) = b(X) = b(u(Y )). Notice that both of the methods described above avoid all use of martingale estimates. A GENERAL EXISTENCE AND UNIQUENESS THEOREM We start with a useful calculus lemma: GRONWALL’S LEMMA. and u(y) = x.using Itˆ’s formula that o 1 dX = u′ (Y )dY + u′′ (Y )dt 2 1 = u′ f + u′′ dt + u′ dW. for all 0 ≤ t ≤ T. and then. σ(u(z)) 2 We will not discuss here conditions under which all of this is possible: see Lamperti [L2]. 3. (z ∈ R). φ(t) ≤ Φ(t) ≤ C0 e 85 t 0 f ds . 2 Thus X(·) solves (8) provided u′ (Y ) = σ(X) = σ(u(Y )). If t φ(t) ≤ C0 + then φ(t) ≤ C0 e Proof.

t)| ≤ L|x − x| x ˆ |b(x. s) dW . 86 . (0 ≤ t ≤ T ) (SDE) ˆ Remarks. Let X0 be any Rn -valued random variable such that (c) and (d) X0 is independent of W + (0). and both solve (SDE). 1. ˆ X(t) − X(t) = t 0 ˆ b(X. (i) “Unique” means that if X. then ˆ P (X(t) = X(t) for all 0 ≤ t ≤ T ) = 1. t) − b(ˆ. s) ds + t 0 ˆ B(X. Then there exists a unique solution X ∈ L2 (0. Since (a + b)2 ≤ 2a2 + 2b2 . Notice also that hypothesis (b) actually follows from (a). T ] → Mm×n are continuous and satisfy the following conditions: (a) |b(x. we can estimate ˆ E(|X(t) − X(t)|2 ) ≤ 2E + 2E 0 t 0 2 ˆ b(X.EXISTENCE AND UNIQUENESS THEOREM. Suppose that b : Rn × [0. x ∈ Rn ˆ for all 0 ≤ t ≤ T. t)| ≤ L|x − x| x ˆ |B(x. x. T ] → Rn and B : Rn × [0. s) − B(X. t)| ≤ L(1 + |x|) |B(x. (b) for some constant L. Uniqueness. with continuous samn ple paths almost surely. s) dW. Suppose X and X are solutions. t)dW X(0) = X0 . Then for all 0 ≤ t ≤ T . ˆ Proof. T ) of the stochastic diﬀerential n equation: dX = b(X. as above. t)dt + B(X. s) − b(X. t) − B(ˆ. X ∈ L2 (0. T ). x ∈ Rn . s) − b(X. s) ds t 2 ˆ B(X. (ii) Hypotheses (a) says that b and B are uniformly Lipschitz continuous in the variable x. E(|X0 |2 ) < ∞ where W(·) is a given m-dimensional Brownian motion. s) − B(X. t)| ≤ L(1 + |x|) for all 0 ≤ t ≤ T.

except for some ˆ set of probability zero. and 0 ≤ t ≤ T . t] → Rn . s) − B(X. Thus X(t) = X(t) a. s) − b(X. implies φ ≡ 0. . If we now set φ(t) := E(|X(t) − X(t)|2 ). s) ds + t 0 B(Xn (s). . Therefore for some appropriate constant C we have ˆ E(|X(t) − X(t)|2 ) ≤ C t 0 ˆ E(|X − X|2 ) ds.s. P 0≤t≤T ˆ max |X(t) − X(t)| > 0 = 0. ˆ Therefore Gronwall’s Lemma. . s) − B(X. s) − b(X. s) dW t =E 0 t ˆ B(X. We claim that dn (t) ≤ (M t)n+1 (n + 1)! for all n = 0.The Cauchy–Schwarz inequality implies that t 2 t f ds 0 ≤t 0 |f |2 ds for any t > 0 and f : [0. with C0 = 0. 2. Deﬁne also dn (t) := E(|Xn+1 (t) − Xn (t)|2 ). Existence. then the foregoing reads t φ(t) ≤ C φ(s) ds 0 for all 0 ≤ t ≤ T. ˆ provided 0 ≤ t ≤ T . ˆ for all 0 ≤ t ≤ T . . s) 2 ds ≤ L2 0 ˆ E(|X − X|2 ) ds. 0 ≤ t ≤ T 87 . Furthermore t 2 E 0 ˆ B(X. . s) ds t ≤ TE ≤ L2 T 0 t 0 ˆ b(X. for n = 0. . 1. We will utilize the iterative scheme introduced earlier. Deﬁne X0 (t) := X0 Xn+1 (t) := X0 + t 0 b(Xn (s). s) dW. . s) 2 ds ˆ E(|X − X|2 ) ds. As X and X have continuous sample paths almost surely. We use this to estimate t 2 E 0 ˆ b(X. and so X(r) = X(r) for all rational 0 ≤ r ≤ T .

Now note T 0≤t≤T max |Xn+1 (t) − Xn (t)|2 ≤ 2T L2 + 2 max 0≤t≤T 0 |Xn − Xn−1 |2 ds t 0 2 B(X . . s) − b(Xn−1 . s) dW ≤ 2T L2 E + 2L2 E |Xn − Xn−1 |2 ds t 0 t |Xn − Xn−1 |2 ds M n sn ds n! by the induction hypothesis ≤ 2L2 (1 + T ) ≤ M n+1 tn+1 . we have d0 (t) = E(|X1 (t) − X0 (t)|2 ) t t 2 =E 0 b(X0 . s) ds 2 + 0 B(X . s) − B(X n n−1 . T and X0 . s) ds + 0 t 2 B(X0 . (n + 1)! 0 provided we choose M ≥ 2L2 (1 + T ). depending on L. Next assume the claim is valid for some n − 1. s) dW t ≤ 2E ≤ tM 0 L(1 + |X0 |) ds + 2E 0 L2 (1 + |X0 |2 ) ds for some large enough constant M . Then dn (t) = E(|Xn+1 (t) − Xn (t)|2 ) t =E 0 t b(Xn . s) − B(X t 0 n n−1 . This conﬁrms the claim for n = 0. Consequently the martingale inequality from Chapter 2 implies E max |X n+1 0≤t≤T (t) − X (t)| n 2 ≤ 2T L 2 0 T E(|Xn − Xn−1 |2 ) ds T 0 + 8L2 ≤C 88 (M T ) n! n E(|Xn − Xn−1 |2 ) ds by the claim above. 3. This proves the claim. Indeed for n = 0. s) dW .for some constant M .

E(|X(t)|2) ≤ C(1 + E(|X0 |2 ))eCt 89 for all 0 ≤ t ≤ T .4. “C” denotes various constants. as usual. s) ds + 0 B(X. T ] to a process X(·). s) dW for 0 ≤ t ≤ T. (n + 1)! . 2n = 0. That is. therefore. tn+1 (1 + E(|X0 |2 )).o. for times 0 ≤ t ≤ T . Thus P 0≤t≤T max |Xn+1 (t) − Xn (t)| > n−1 In light of this. We have n t 2 E(|Xn+1 (t)|2 ) ≤ CE(|X0 |2 ) + CE t b(Xn . t)dW X(0) = X0 . n! 1 i. We pass to limits in the deﬁnition of Xn+1 (·). s) ds 0 2 + CE 0 B(X . We must still show X(·) ∈ L2 (0. By induction. for almost every ω X =X + j=0 n 0 (Xj+1 − Xj ) converges uniformly on [0. since P 0≤t≤T max |Xn+1 (t) − Xn (t)| > 1 2n ≤ 22n E ≤ 22n 0≤t≤T max |Xn+1 (t) − Xn (t)|2 C(M T )n n! and ∞ n=1 22n (M T )n < ∞. The Borel–Cantelli Lemma thus applies. T ). where. s) dW t n ≤ C(1 + E(|X0 |2 )) + C 0 E(|Xn|2 ) ds. to prove t t X(t) = X0 + 0 b(X. E(|Xn+1 (t)|2 ) ≤ C + C 2 + · · · + C n+2 Consequently Let n → ∞: E(|Xn+1 (t)|2 ) ≤ C(1 + E(|X0 |2 ))eCt . 5. (SDE) dX = b(X. t)dt + B(X.

Suppose that b. n C. In this case the mapping t → X(t) will be smooth if b is. t)|2 > 0 for all x ∈ Rn .e. t). We can however use estimates (i) and (ii) above to check the hypotheses of Kolmogorov’s Theorem from §C in Chapter 3. The possibility that B ≡ 0 is not excluded. o 2 provided E(|X0 |2p ) < ∞ for each 1 ≤ p < ∞. PROPERTIES OF SOLUTIONS In this section we mention. m. a few properties of the solution to various SDE. in addition. E(|X0 |2p ) < ∞ for some integer p > 1. then the solution X(·) of (SDE) satisﬁes the estimates (i) and (ii) E(|X(t) − X0 |2p ) ≤ C2 (1 + E(|X0 |2p ))tp eC2 t E(|X(t)|2p) ≤ C2 (1 + E(|X0 |2p ))eC1 t dX = b(X. but are nevertheless sometimes useful: APPLICATION: SAMPLE PATH PROPERTIES. without proofs. the mapping t → X(t) is H¨lder continuous with each exponent less than 1 . ω.and so X ∈ L2 (0. T ). and consequently it could happen that the solution of our SDE is really a solution of the ODE ˙ X = b(X. 90 . L. 0 ≤ t ≤ T. The estimates above on the moments of X(·) are fairly crude. It follows that for almost all sample paths. if for some 1 ≤ i ≤ n 1≤l≤m |bil (x. with possibly random initial data. If. then almost every sample path t → X i (t) is nowhere diﬀerentiable for a. On the other hand. t)dW X(0) = X0 for certain constants C1 and C2 . n. THEOREM (Estimate on higher moments of solutions). B and X0 satisfy the hypotheses of the Existence and Uniqueness Theorem. depending only on T. t)dt + B(X.

Bk and Xk satisfy the hypotheses of the Existence and Uniqueness Theorem. T ] → Mn×m . t)dt + B(X. Assume further that (a) and for each M > 0. D. 0 with the same constant L. t)dt + B(X. T ] → L(Rn . 2. D : [0. Example (Small noise limits). . t) := E(t) + F(t)x for E : [0. for c : [0. (b) k→∞ 0≤t≤T |x|≤M k→∞ lim E(|Xk − X0 |2 ) = 0. In particular. The stochastic diﬀerential equation dX = b(X. t)|) = 0. for almost every ω the random trajectories of the SDE dXε = b(Xε )dt + εdW Xε (0) = x0 converge uniformly on [0. t)dW X(0) = X0 . t)dW Xk (0) = Xk . T ] → Rn . where X is the unique solution of dX = b(X. t)dt + Bk (Xk . 0 lim max (|bk (x. .THEOREM (Dependence on parameters). x(0) = x0 . F : [0. DEFINITION. t) := c(t) + D(t)x. . T ] → Mn×n . Mn×m ). t) − B(x. Suppose for k = 1. t) − b(x. the space of bounded linear mappings from Rn to Mn×m . that bk . LINEAR STOCHASTIC DIFFERENTIAL EQUATIONS This section presents some fairly explicit formulas for solutions of linear SDE. T ] as ε → 0 to the deterministic trajectory of ˙ x = b(x). t)dW is linear provided the coeﬃcients b and B have this form: b(x. and B(x. 0 Then k→∞ lim E 0≤t≤T max |Xk (t) − X(t)|2 = 0. Finally suppose that Xk (·) solves dXk = bk (Xk . 91 . t)| + |Bk (x.

where Φ(·) is the fundamental matrix of the nonautonomous system of ODE dΦ = D(t)Φ. A linear SDE is called homogeneous if c ≡ E ≡ 0 for 0 ≤ t ≤ T . provided E(|X0 |2 ) < ∞. This will not be so if F(·) ≡ 0. and X0 is independent of W + (0). then b and B satisfy the hypotheses of the Existence and Uniqueness Theorem. and regard Eξ as an inhomogeneous term driving the ODE ˙ X = c(t) + D(t)X + E(t)ξ. Thus the linear SDE dX = (c(t) + D(t)X)dt + (E(t) + F(t)X)dW X(0) = X0 has a unique solution. Then the solution of (10) is t dX = (c(t) + DX)dt + E(t)dW X(0) = X0 (11) where X(t) = eDt X0 + 0 eD(t−s) (c(s)ds + E(s) dW). Φ(0) = I. FORMULAS FOR SOLUTIONS: linear equations in narrow sense Suppose ﬁrst D ≡ D is constant. owing to the extra term in Itˆ’s formula. the solution of (12) is t dX = (c(t) + D(t)X)dt + E(t)dW X(0) = X0 (13) X(t) = Φ(t) X0 + 0 Φ(s)−1 (c(s)ds + E(s) dW) . k! More generally.DEFINITION. dt These assertions follow formally from standard formulas in ODE theory if we write EdW = Eξdt. Remark. ∞ k=0 e Dt := D k tk . It is called linear in the narrow sense if F ≡ 0. If 0≤t≤T sup [|c(t)| + |D(t)| + |E(t)| + |F(t)|] < ∞. ξ as usual denoting white noise. o 92 .

Then the solution of (14) is t m dX = (c(t) + d(t)X)dt + X(0) = X0 m l l=1 (e (t) + f l (t)X)dW l X(t) = Φ(t) X0 + (15) + 0 l=1 0 t m Φ(s) −1 c(s) − el (s)f l (s) l=1 ds Φ(s)−1 el (s) dW l . We will try to ﬁnd a solution having the product form X(t) = X1 (t)X2 (t). dX1 = f (t)X1 dW X1 (0) = X0 where the functions A and B are to be selected. let us derive some of the formulas stated above. SOME METHODS FOR SOLVING LINEAR SDE For practice with Itˆ’s formula. o Example 1. See Arnold [A. Then dX = d(X1 X2 ) = X1 dX2 + X2 dX1 + f (t)X1 B(t)dt = f (t)XdW + (X1 dX2 + f (t)X1 B(t)dt). FORMULAS FOR SOLUTIONS: general scalar linear equations Suppose now n = 1. but m ≥ 1 is arbitrary. where t m Φ(t) := exp 0 d− l=1 (f l )2 ds + 2 t m f l dW l 0 l=1 .Observe also that formula (13) shows X(t) to be Gaussian if X0 is. Consider ﬁrst the linear stochastic diﬀerential equation (16) dX = d(t)Xdt + f (t)XdW X(0) = X0 for m = n = 1. where (17) and (18) dX2 = A(t)dt + B(t)dW X2 (0) = 1. 93 . 3. Chapter 8] for more formulas for solutions of general linear equations.

As above. and this identity will hold if we take B(t) := e(t)(X1 (t))−1 . Then dX = X2 dX1 + X1 dX2 + f (t)X1 B(t)dt = d(t)Xdt + f (t)XdW + X1 (A(t)dt + B(t)dW ) + f (t)X1 B(t)dt. where now (20) and (21) dX2 = A(t)dt + B(t)dW X2 (0) = X0 . Now we try to choose A. t 0 f (s) dW + t 0 d(s)− 1 f 2 (s) ds 2 . dX1 = d(t)X1 dt + f (t)X1 dW X1 (0) = 1 the functions A. B ≡ 0 and A(t) = d(t)X2 (t) will work. For this. X(t) = X1 (t)X2 (t) = X0 e a formula noted earlier. again for m = n = 1. Since the solution of (17) is 1 f (s) dW − 2 t 0 X1 (t) = X0 e we conclude that t 0 f 2 (s) ds . This is non-random: X2 (t) = e t 0 d(s) ds . 94 A(t) := [c(t) − f (t)e(t)](X1 (t))−1 . B so that dX2 + f (t)B(t)dt = d(t)X2 dt. we try for a solution of the form X(t) = X1 (t)X2 (t). Thus (18) reads dX2 = d(t)X2 dt X2 (0) = 1. Consider next the general equation (19) dX = (c(t) + d(t)X)dt + (e(t) + f (t)X)dW X(0) = X0 . B to be chosen. Example 2. We now require X1 (A(t)dt + B(t)dW ) + f (t)X1 B(t)dt = c(t)dt + e(t)dW .according to (17).

t X2 (t) = X0 + 0 t [c(s) − f (s)e(s)](X1(s))−1 ds + 0 e(s)(X1 (s))−1 dW. 95 . Employing this and the expression above for X1 . we arrive at the formula. The paper of Higham [H] is a good introduction. Remark. we have X1 (t) > 0 almost surely.Observe that since X1 (t) = e Consequently t 0 f dW + t 0 d− 1 f 2 ds 2 . a special case of (15): X(t) = X1 (t)X2 (t) t = exp 0 1 d(s) − f 2 (s) ds + 2 t r t f (s) dW 0 s × X0 + t 0 exp − s 0 0 1 d(r) − f 2 (r) dr − 2 s 0 f (r) dW 0 (c(s) − e(s)f (s)) ds + 0 exp − 1 d(r) − f 2 (r) dr − 2 f (r) dW e(s) dW. There is great theoretical and practical interest in numerical methods for simulation of solutions to random diﬀerential equations.

and also that any constant τ ≡ t0 is a stopping time. ∈F(t−1/k)⊆F(t) Also. and so {τ = t} ∈ F (t). as it allows us to investigate phenomena occuring over “random time intervals”.CHAPTER 6: APPLICATIONS A. X)dt + B(t. X)dW X(0) = X0 . A. Consider the solution X(·) of the SDE dX(t) = b(t. Observe that {τ < t} = ∞ k=1 {τ ≤ t − 1/k} . E. for all times t ≥ 0. STOPPING TIMES DEFINITIONS. We introduce now some random times that are well–behaved with respect to F (·): DEFINITION. C. τ1 ∨ τ2 := max(τ1 . An example will make this clearer: Example (Hitting a set). τ2 ). Note that τ is allowed to take on the value +∞. Then (i) {τ < t} ∈ F (t). Let τ1 and τ2 be stopping times with respect to F (·). as in Chapters 4 and 5. Stopping times Applications to PDE. A random variable τ : Ω → [0. Let (Ω. and furthermore {τ1 ∨ τ2 ≤ t} = {τ1 ≤ t} ∩ {τ2 ≤ t} ∈ F (t). The notion of stopping times comes up naturally in the study of stochastic differential equations. where b. B. τ2 ) are stopping times. This says that the set of all ω ∈ Ω such that τ (ω) ≤ t is an F (t)-measurable set. THEOREM (Properties of stopping times). B and X0 satisfy the hypotheses of the Existence and Uniqueness Theorem. U. Proof. ∞] is called a stopping time with respect to F (·) provided {τ ≤ t} ∈ F (t) for all t ≥ 0. P ) be a probability space and F (·) a ﬁltration of σ–algebras. we have {τ1 ∧ τ2 ≤ t} = {τ1 ≤ t} ∪ {τ2 ≤ t} ∈ F (t). BASIC PROPERTIES. D. (ii) τ1 ∧ τ2 := min(τ1 . 96 . Feynman-Kac formula Optimal stopping Options pricing The Stratonovich integral This chapter is devoted to some applications and extensions of the theory developed earlier.

we must show {τ ≤ t} ∈ F (t). Discussion.THEOREM. Then the event {τ ≤ t} = {X(ti ) ∈ U } ti ≤t belongs to F (t). but does not contain information about the future”. First we assume that E = U is an open set. Let E be either a nonempty closed subset or a nonempty open subset of Rn .) X(τ) E Proof. the last time that X(t) hits E. is in general not a stopping time. C) := dist(x. Next we assume that E = C is a closed set. Fix t ≥ 0. ∞). Then τ := inf{t ≥ 0 | X(t) ∈ E} is a stopping time. n The event {τ ≤ t} = also belongs to F (t). But there are 97 ∞ n=1 ti ≤t ∈F(ti )⊆F(t) {X(ti ) ∈ Un )} ∈F(ti )⊆F(t) . Set d(x. C) < }. C) and deﬁne the open sets 1 Un = {x : d(x. (In applications F (t) “contains the history of X(·) up to and including time t. (We put τ = +∞ for those sample paths of X(·) that never hit E. Take {ti }∞ to be a i=1 countable dense subset of [0. The heuristic reason is that the event {σ ≤ t} would depend upon the entire future history of process and thus would not in general be F (t)-measurable. where we sometimes think of halting the sample path X(·) at the ﬁrst time τ that it hits a set E.) The name “stopping time” comes from the example. The random variable σ := sup{t ≥ 0 | X(t) ∈ E}.

T ) T τ G dW )2 ) = E(( 0 T χ{t≤τ } G dW )2 ) (χ{t≤τ } G)2 dt) = E( 0 τ = E( 0 G2 dt). ˆ LEMMA (Ito integrals with stopping times).j=1 ∂xi ∂xj i=1 98 n n m bik bjk dt. t) = ∂ 2u ∂u ∂u 1 dt + dXi + ∂t ∂xi 2 i. then τ (i) τ E 0 G dW =0 τ (ii) Proof. t)dW. let W(·) denote m-dimensional Brownian motion. ˆ ITO’S FORMULA WITH STOPPING TIMES. Thus “stopping time” is not a particularly good name and “Markov time” would be better. k=1 . As usual. o DEFINITION. STOCHASTIC INTEGRALS AND STOPPING TIMES. We have E ( 0 G dW )2 =E 0 G2 dt . Our next task is to consider stochastic integrals with random limits of integration and to work out an Itˆ formula for these. ∈L2 (0. If G ∈ L2 (0. Recall next from Chapter 4 that if dX = b(X.many examples where we do not really stop the process at time τ . we deﬁne τ T G dW := 0 0 χ{t≤τ } G dW. t)dt + B(X. T )and 0 ≤ τ ≤ T is a stopping time. (1) du(X. Similar formulas hold for vector–valued processes. τ E 0 G dW and E(( 0 =E T 0 χ{t≤τ } G dW = 0. then for each C 2 function u. T ) and τ is a stopping time with 0 ≤ τ ≤ T . If G ∈ L2 (0.

2 The expression ∆u is called the Laplacian of u and occurs throughout mathematics and physics. The most important case is X(·) = W(·). BROWNIAN MOTION AND THE LAPLACIAN. where τ is a stopping time. t) − u(X(0). k=1 uxi bik dW k . 0) = Take expected value: 0 ∂u + Lu ∂t τ ds + 0 Du · B dW. aij = Lu := 2 i. formula (2) holds for all 0 ≤ t ≤ T . the generator of which is 1 Lu = 2 n uxi xi =: i=1 1 ∆u. Let U ⊂ Rn be a bounded open set. We will demonstrate in the next section some important links with Brownian motion. According to standard PDE theory. For a ﬁxed ω ∈ Ω. s). Thus we may set t = τ . n-dimensional Brownian motion. τ ) − u(X(0). with smooth boundary ∂U .j=1 i=1 and Du · B dW = m n m bik bjk . u=0 . FEYNMAN–KAC FORMULA PROBABILISTIC FORMULAS FOR SOLUTIONS OF PDE. for the diﬀerential operator 1 aij uxi xj + bi uxi . B.Written in integral form. Example 1 (Expected hitting time to a boundary). APPLICATIONS TO PDE. k=1 i=1 The argument of u in these integrals is (X(s). this means: t (2) u(X(t). We will see in the next section that this formula provides a very important link between stochastic diﬀerential equations and (nonrandom) partial diﬀerential equations. We call L the generator. τ (3) E(u(X(τ ). τ )) − E(u(X(0). 0)) = E 0 ∂u + Lu ∂t ds . 0) = n 0 ∂u + Lu ∂t n t ds + 0 Du · B dW. 0 ≤ τ ≤ T : τ u(X(τ ). there exists a smooth solution u of the equation (4) − 1 ∆u = 1 2 99 in U on ∂U .

s. We have for each point x ∈ U (7) u(x) = E(g(X(τx))). 2 lim E(τx ∧ n) < ∞. Proof. Formula (5) follows. It ¯ is known from classical PDE theory that there exists a function u ∈ C 2 (U ) ∩ C(U ) satisfying the boundary value problem: (6) We call u a harmonic function. We have (5) In particular. we get u(x) − E(u(X(τx))) = E 1 ds 0 = E(τx ). Deﬁne τx := ﬁrst time X(·) hits ∂U. 2 E(u(X(τx ∧ n))) − E(u(X(0))) = E Since 1 ∆u = −1 and u is bounded. Again recall that u is bounded on U . Example 2 (Probabilistic representation of harmonic functions). 100 ∆u = 0 in U u=g on ∂U . . Thus if we let n → ∞ above. Then X(·) := W(·) + x represents a “Brownian motion starting at x”. THEOREM. THEOREM. 1 ∆u(X) ds . and so u(X(τx )) ≡ 0. This says that Brownian sample paths starting at any point x ∈ U will with probability 1 eventually hit ∂U . with Lu = 1 ∆u. bounded domain and g : ∂U → R a given continuous function. Hence E(τx ) < ∞.Our goal is to ﬁnd a probabilistic representation formula for u.. Let U ⊂ Rn be a smooth. for all x ∈ U. 2 n→∞ τx ∧n 0 u(x) = E(τx ) for all x ∈ U. ﬁx any point x ∈ U and consider then an n-dimensional Brownian motion W(·). But u = 0 on ∂U . . 2. . We have for each n = 1. τx Thus τx is integrable. u > 0 in U . For this. . Brownian motion starting at x. and so τx < ∞ a. We employ formula (3). for X(·) := W(·) + x.

r). u(x) is the probability that a Brownian motion starting at x hits Γ1 before hitting Γ2 . . Assume next that we can write ∂U as the union of two disjoint parts Γ1 . we have the identity (8) u(x) = 1 area of ∂B(x. where τx now denotes the hitting time of Brownian motion starting at x to ∂B(x. THEOREM. That is. 2 the second equality valid since ∆u = 0 in U . Since u = g on ∂U . Γ1 Γ2 x Proof. if ∆u = 0 in some open set containing the ball B(x. with respect to surface measure. For each point x ∈ U . Let u solve the PDE ∆u = 0 in U u = 1 on Γ1 u = 0 on Γ2 . Apply (7) for g= Then u(x) = E(g(X(τx))) = probability of hitting Γ1 before Γ2 . r) u dS. τx E(u(X(τx))) = E(u(X(0))) + E 0 1 ∆u(X) ds = E(u(X(0))) = u(x). Γ2 . Example 3 (Hitting one part of a boundary ﬁrst). formula (7) follows. As shown above. r). then u(x) = E(u(X(τx))). ∂B(x. APPLICATION: In particular. we may reasonably guess that the term on the right hand side is just the average of u over the sphere ∂B(x. r).Proof. 101 1 0 on Γ1 on Γ2 . Since Brownian motion is isotropic in space.r) This is the mean value formula for harmonic functions.

as follows. and τx denotes the ﬁrst hitting time of ∂U . as before. for Z(t) := − t 0 c(X) ds. We know E(τx) < ∞. and so Itˆ’s formula yields o dY = −c(X)Y dt. 102 . with c ≥ 0 in U . f are smooth functions. An interpretation. Now we extend Example #2 above to obtain a probabilistic representation for the unique solution of the PDE 1 − 2 ∆u + cu = f in U u = 0 on ∂U. First look at the process Y (t) := eZ(t) . and take the expected value. X(·) := W(·) + x is a Brownian motion starting at x. obtaining E u(X(τx ))e− τx τx 0 c(X) ds − E(u(X(0))) t 0 =E 0 1 ∆u(X) − c(X)u(X) e− 2 τx c(X) ds dt . Proof. THEOREM (Feynman–Kac formula). For each x ∈ U . τx u(x) = E 0 f (X(t))e− t 0 c(X) ds dt where. the integral above converges. as claimed. t 0 c(X)ds c(X) ds We use formula (3) for τ = τx . We assume c. this simpliﬁes to give u(x) = E(u(X(0))) = E 0 f (X)e− t 0 c(X) ds dt . Hence the Itˆ product rule implies o d u(X)e− t 0 c(X) ds = (du(X))e− t 0 c(X) ds t 0 + u(X)d e− = c(X)ds ∂u(X) 1 ∆u(X)dt + dW i 2 ∂xi i=1 + u(X)(−c(X)dt)e− t 0 n e− . Since c ≥ 0. Since u solves (8).FEYNMAN–KAC FORMULA. Then dZ = −c(X)dt. We can explain this formula as describing a Brownian motion with “killing”.

this converges to t e− 0 c(X) ds . aij uxi xj + 2 i. 103 . Finally we are given a cost criterion. OPTIMAL STOPPING The general mathematical setting for many control theory problems is this. If we consider in these examples the solution of the SDE dX = b(X)dt + B(X)dW X(0) = x. Hence it should be that u(x) = average of f (X(·)) over all sample paths which survive to hit ∂U τx =E 0 f (X)e− t 0 c(X) ds dt . depending upon our choice of control and the corresponding state of the system. to minimize the cost criterion. (1 − c(X(tn ))h). or both. As h → 0. however. where now τx = hitting time of ∂U for X(·) and 1 ∆u is replaced by the operator 2 Lu := 1 bi uxi . we need to know that the various PDE have smooth solutions.j=1 i=1 n n Note. we can obtain similar formulas. Assume further that the probability of its being killed in a short time interval [t. The easiest stochastic control problem of the general type outlined above occurs when we cannot directly aﬀect the SDE controlling the evolution of X(·) and can only decide at each instance whether or not to stop. . for example by being absorbed into the medium within which it is moving. A typical such problem follows. This need not always be the case for degenerate elliptic operators L. We are given some “system” whose state evolves in time according to a diﬀerential equation (deterministic or stochastic). Then the probability of the particle surviving until time t is approximately equal to (1 − c(X(t1 ))h)(1 − c(X(t2 ))h) .Suppose that the Brownian particles may disappear at a random killing time σ. t + h] is c(X(t))h + o(h). The goal is to discover an optimal choice of controls. Given also are certain controls which aﬀect somehow the behavior of the system: these controls typically either modify some parameters in the dynamics or else stop the process. C. Remark. . where 0 = t0 < t1 < · · · < tn = t. h = tk+1 − tk .

So assume u is deﬁned above and suppose u is smooth enough to justify the following calculations. OPTIMAL STOPPING. we could just stop immediately and incur the cost g(X(0)) = g(x). Let U ⊂ Rm be a bounded. First of all. for which Jx (θ ∗ ) = θ stopping time min Jx (θ)? And if so. In addition there is a running cost per unit time f of keeping the system in operation until time θ ∧ τ . how can we ﬁnd θ ∗ ? It turns out to be very diﬃcult to try to design θ ∗ directly. Let θ be any stopping time with respect to F (·). This approach is called dynamic programming. and for each such θ deﬁne the expected cost of stopping X(·) at time θ ∧ τ to be θ∧τ (9) Jx (θ) := E( 0 f (X(s)) ds + g(X(θ ∧ τ ))). and so (12) Next take any point x ∈ U and ﬁx some small number δ > 0. It turns out that once we know u. OPTIMALITY CONDITIONS. If instead we do not stop the process before it hits ∂U . Let τ = τx denote the hitting time of ∂U . A much better idea is to turn attention to the value function (10) u(x) := inf Jx (θ). then according to (SDE) the new state of the system 104 . Note that u(x) is the minimum expected cost. We wish to determine the properties of this function.STOPPING A STOCHASTIC DIFFERENTIAL EQUATION. The idea is that if we stop at the possibly random time θ < τ . smooth domain. That is. B : Rn → M n×m satisfy the usual assumptions. θ and to try to ﬁgure out what u is as a function of x ∈ U . then the cost is a given function g of the current state of X(θ). Suppose b : Rn → Rn . u(x) = g(x) for each point x ∈ ∂U. Hence (11) u(x) ≤ g(x) for each point x ∈ U. notice that we could just take θ ≡ 0 in the deﬁnition (10). the cost is g(X(τ )). that is. Now if we do not stop the system for time δ. given we start the process at x. Furthermore. Then for each x ∈ U the stochastic diﬀerential equation dX = b(X)dt + B(X)dW X0 = x has a unique solution. τ ≡ 0 if x ∈ ∂U . we will be then be able to construct an optimal θ ∗ . if θ ≥ τ . The main question is this: does there exist an ∗ optimal stopping time θ ∗ = θx .

and assuming we do not hit ∂U . for at least some very small time δ. our cost is at least δ E( 0 f (X) ds + u(X(δ))). we combine (11)–(14) to ﬁnd that if the formal reasoning above is valid. where M u := −Lu. we have (13) Mu ≤ f in U. if u(x) < g(x) at some point x ∈ U. 105 . and then let δ → 0: 0 ≤ f (x) + Lu(x). given that we are at the point X(δ). and so (14) Mu = f at those points where u < g. 0 Divide by δ > 0. In this circumstance we then would have an equality in the formula above. So if we choose not to stop the system for time δ. 0 δ E(u(X(δ))) = u(x) + E( 0 Lu(X) ds). 0 ≤ E( f (X) + Lu(X) ds). that is. Then. the best we can achieve in minimizing the cost thereafter must be u(X(δ)). Thus it is plausible to think that we should leave the system going. then it is optimal not to stop the process at once. then the value function u satisﬁes: (15) max(M u − f.at time δ will be X(δ). In summary. Finally we observe that if in (11) a strict inequality held.j=1 ∂2u + a ∂xi ∂xj ij δ n i=1 ∂u b . u − g) = 0 in U u = g on ∂U These are the optimality conditions. we therefore have δ u(x) ≤ E( Now by Itˆ’s formula o f (X) ds + u(X(δ))). Since u(x) is the inﬁmum of costs over all stopping times. m for 1 Lu = 2 Hence n i. Equivalently. ∂xi i a = k=1 ij bik bjk .

u(x) = Jx (θ ∗ ) = inf Jx (θ) θ THEOREM. 106 . Then we will use u to design θ ∗ . Deﬁne the continuation set C := U − S = {x ∈ U | u(x) < g(x)}. Then This says that we should ﬁrst compute the solution to (15) to ﬁnd S. Suppose f. (iv) u = g on ∂U . Our rigorous study of the stopping time problem now begins by showing ﬁrst that there exists a unique solution u of (15) and second that this u is in fact minθ Jx (θ). g are given smooth functions. limǫ→0 βε (x) = ∞ for x > 0. Now we show that our solution of (15) is in fact the value function. we have for x ∈ C u(x) = E( 0 τ ∧θ∗ f (X(s)) ds + g(X(θ ∗ ∧ τ ))) = Jx (θ ∗ ). THEOREM. but computers can provide accurate numerical approximations. (iii) max(M u − f. Then uε → u. Proof. Since τ ∧ θ ∗ is the exit time from C. such that: (i) u ≤ g in U . It will in practice be diﬃcult to ﬁnd a precise formula for u. DESIGNING AN OPTIMAL STOPPING POLICY. First note that the stopping set ¯ is closed. ¯ for all x ∈ U . convex function. On this set Lu = f . 1. βε ≥ 0.SOLVING FOR THE VALUE FUNCTION. (ii) M u ≤ f almost everywhere in U . and along the way we will learn how to design an optimal stopping strategy θ ∗ . ′ where βε : R → R is a smooth. and βε ≡ 0 for x ≤ 0. an optimal stopping time. and then we should run X(·) until it hits S (or else exits from U ). S := {x ∈ U | u(x) = g(x)} θ ∗ = ﬁrst hitting time of S. and furthermore u = g on ∂C. / The idea of the proof is to approximate (15) by a penalized problem of this form: M uε + βε (uε − g) = f uε = g in U on ∂U . u − g) = 0 almost everywhere in U . In general u ∈ C 2 (U ). Let u be the solution of (15). Deﬁne for each x ∈ U . There exists a unique funtion u. deﬁne θ ∗ as above. with bounded second derivatives.

But since u(x) = Jx (θ ∗ ). if you run a ﬁnancial ﬁrm and wish to sell your customers this call option. Let us consider a given security. ¯ Thus for all x ∈ U . The initial price s0 is known. THE BASIC PROBLEM. Another basic reference is Hull [Hu]. The number p is called the strike price and T > 0 the strike (or expiration) time. θ as asserted. A derivative is a ﬁnancial instrument whose payoﬀ depends upon (i. say a stock. We will investigate a European call option. The basic question is this: What is the “proper price” at time t = 0 of this option? In other words. Goldberg. where µ > 0 is the drift and σ = 0 the volatility. for which the ﬁrm neither makes nor loses money.. D. Now let θ be any other stopping time.e. if x ∈ S. Hence u(x) ≤ E( τ ∧θ 0 f (X) ds + g(X(τ ∧ θ))) = Jx (θ). 2. We need to show u(x) = Jx (θ ∗ ) ≤ Jx (θ). OPTIONS PRICING In this section we outline an application to mathematical ﬁnance. is derived from) the behavior of S(·). how much should you charge? (We are looking for the “break–even” price.On the other hand. τ ∧ θ ∗ = 0. whose price at time t is S(t). mostly following Baxter–Rennie [B-R] and the class lectures of L. we consequently have u(x) = Jx (θ ∗ ) = min Jx (θ). at the price p at time T . Now by Itˆ’s formula o u(x) = E( 0 τ ∧θ M u(X) ds + u(X(τ ∧ θ))) ¯ But M u ≤ f and u ≤ g in U . We suppose that S evolves according to the SDE introduced in Chapter 5: (16) dS = µSdt + σSdW S(0) = s0 . which is the right to buy one share of the stock S.) 107 . we have u(x) = Jx (θ ∗ ). and so u(x) = g(x) = Jx (θ ∗ ).

we have (19) u(s. We need to calculate u. We seek how a PDE u solves for s > 0. Equivalently. As for the problem of pricing our call option. To convert this principle into mathematics. $1 at time t = T is worth only $e−rT at time t = 0. a ﬁrst guess might be that the proper price should be (17) e−rT E((S(T ) − p)+ ). no-risk interest rate is the constant r > 0. 0) is the price we are seeking. then the option is worthless. Boundary conditions. 0). The exact details appear below. For this. We average this over all sample paths and multiply by the discount factor e−rT . To simplify. but the basic idea is that we can in eﬀect “duplicate” our option by a portfolio consisting of (continually changing) holdings of a risk–free bond and of the stock on which the call is written. Duplicating an option. We demonstrate next how use these principles to convert our pricing problem into a PDE. T ) = (s − p)+ (s ≥ 0). notice ﬁrst that at the expiration time T . To go further. t) 108 (0 ≤ t ≤ T ). we assume hereafter that the prevailing. for x+ := max(x. If S(T ) > p.ARBITRAGE AND HEDGING. This means that $1 put in a bank at time t = 0 becomes $erT at time t = T . We introduce for s ≥ 0 and 0 ≤ t ≤ T . denoting the proper price of the option at time t. (17) is in fact not the proper price. we can buy a share for the price p. immediately sell at price S(T ). Furthermore. self-ﬁnancing. then S(t) = 0 for all 0 ≤ t ≤ T and so (20) u(0. We must price our option so as to create no arbitrage opportunities for others. t). Then u(s0 . to arrive at (17). 0 ≤ t ≤ T . we introduce also the notion of hedging. Other forces are at work in ﬁnancial markets. if s = 0. The reasoning behind this guess is that if S(T ) < p. Indeed the fundamental factor in options pricings is arbitrage. and thereby make a proﬁt of (S(T ) − p)+ . . This means somehow eliminating our risk as the seller of the call option. the unknown price function (18) u(s. given that S(t) = s. meaning the possibility of risk-free proﬁts. A PARTIAL DIFFERENTIAL EQUATION. deﬁne the process (21) C(t) := u(S(t). t) = 0 (0 ≤ t ≤ T ). As reasonable as this may all seem.

corresponding to our changing i=0 holdings of the stock S and the bond B over each time interval. We therefore require that (25) dC = φdS + ψdB (0 ≤ t ≤ T ). and is random since the stock price S(t) is random. Remark (discrete version of self-ﬁnancing). Conversely. = (ut + µSus + 2 (22) Now comes the key idea: we propose to “duplicate” C by a portfolio consisting of shares of S and of a bond B.Thus C(t) is the current price of the option at time t. which therefore grows at the prevailing interest rate r: (23) dB = rBdt B(0) = 1. ψi )}N . Ci = φi Si + ψi Bi is the opening value of the portfolio and Ci+1 = φi Si+1 + ψi Bi+1 represents the closing value. if the option is worthless at time T . But to make this work. we can eliminate all risk. a portfolio is self-ﬁnancing if it is ﬁnancially self contained. According to Itˆ’s formula and (16) o 1 dC = ut dt + us dS + uss (dS)2 2 σ2 2 S uss )dt + σSus dW. Now for a given time interval (ti . The point is that if we can construct φ. Here {ti }N is an increasing i=0 sequence of times and we suppose that each time step ti+1 − ti is small. The self-ﬁnancing condition means that the ﬁnancing gap Ci+1 − Ci of cash (that would otherwise have to be injected to pay for our construction strategy) must be zero. and so the buyer will exercise the option. To understand this better. beyond the initial investment to set it up. let us consider a diﬀerent model in which time is discrete. The ﬁrm thereby incurs the risk that at time T . 109 . Discussion. To see this more clearly. the ﬁnancial ﬁrm should not have to inject any new money into the hedging scheme. of course. This means that the changes in the value of the portfolio should depend only upon the changes in S. We will try to ﬁnd processes φ and ψ so that (24) C = φS + ψB (0 ≤ t ≤ T ). the proﬁts from it will exactly equal the funds needed to pay the customer. as above. imagine that your ﬁnancial ﬁrm sells a call option. We ensure this by requiring that the portfolio represented on the right-hand side of (24) be self-ﬁnancing. But if in the meantime the ﬁrm has constructed the portfolio (24). the portfolio will have no proﬁt. More precisely. and the values of the stock and bond at a time ti are given by Si and Bi respectively. A portfolio can now be thought of as a sequence {(φi . Roughly speaking. ψ so that (24) holds. This just means B(t) = ert . assume that B is a risk-free investment. the stock price S(T ) will exceed p.This is equivalent to saying that Ci+1 − Ci = φi (Si+1 − Si ) + ψi (Bi+1 − Bi ). ti+1 ). B.

t). 2 The main outcome of all our ﬁnancial reasoning is the derivation of this partial diﬀerential equation. (27). Observe that the parameter µ does not appear. ψ to make all this so. (27) imply . So if (24) holds. (29) ut + rsus + More on self-ﬁnancing. to read σ2 2 S uss )dt = rψBdt. We observe in particular that the terms multiplying dW on each side of (26) will match provided we take (27) φ(t) := us (S(t). (ut + (28) (ut + rSus + σ2 2 S uss − ru)dt = 0. Consequently. we consequently must make sure that (30) Sdφ + Bdψ + dφ dS = 0. (26) must be valid. 2 The argument of u and its partial derivatives is (S(t). where we recall φ = us (S(t). The Itˆ product rule and (24) imply o dC = φdS + ψdB + Sdφ + Bdψ + dφ dS. (23) and (25) provides the identity (26) (ut + µSus + σ2 2 S uss )dt + σSus dW 2 = φ(µSdt + σSdW ) + ψrBdt. 110 We can conﬁrm this by noting that (24). ψ = B −1 (C − φS) = e−rt (u(S. we ask that the function u = u(s. t) (0 ≤ t ≤ T ). To ensure (25).the continuous version of which is condition (25). 2 But ψB = C − φS = u − us S. Combining formulas (22). Now dφ dS = σ 2 S 2 uss dt. Thus (30) is valid provided (31) dψ = −B −1 (Sdφ + σ 2 S 2 uss dt). t) − us (S. Consequently. according to (24). to make sure that (21) is valid. and we are trying to select φ. t) solve the Black–Scholes–Merton PDE σ2 2 s uss − ru = 0 (0 ≤ t ≤ T ). t)S). Before going on. we return to the self-ﬁnancing condition (25). Then (26) simpliﬁes. A direct calculation using (28) veriﬁes (31). t).

111 (36) . Remember that u(s0 . where m = n = 1 and ξ(·) is 1-dimensional “white noise”. (b) E(ξ k (t)ξ k (s)) := dk (t − s). where we suppose that the functions dk (·) converge as k → ∞ to δ0 . suppose that {ξ k (·)}∞ k=1 is a sequence of stochastic processes satisfying: (a) E(ξ k (t)) = 0. although we omit the details here: see for instance Baxter–Rennie [B-R]. the ξ k (·) are thus presumably smooth approximations of ξ(·). E. o 1. To price our call option. In light of the formal deﬁnition of the white noise ξ(·) as a Gaussian process with Eξ(t) = 0. THE STRATONOVICH INTEGRAL We next discuss the Stratonovich stochastic calculus. If we interpret this rigorously as the stochastic diﬀerential equation: (34) dX = d(t)Xdt + f (t)XdW X(0) = X0 . How would this possibility change the solution? APPROXIMATING WHITE NOISE. (c) ξ k (t) is Gaussian for all t ≥ 0. we 2 ut + rsus + σ s2 uss − ru = 0 2 (32) u = (s − p)+ u=0 solve the boundary-value problem (s > 0. LIMITS OF SOLUTIONS. On the other hand perhaps (33) is a proposed mathematical model of some physical process and we are not really sure whether ξ(·) is “really” white noise. Most of the following material is from Arnold [A. which is an alternative to Itˆ’s approach. Motivation. Chapter 10]. More precisely. It could perhaps be instead some process with smooth (but highly complicated) sample paths. X(t) = X0 e t 0 1 d(s)− 2 f 2 (s) ds+ t 0 we then recall from Chapter 5 that the unique solution is (35) f (s) dW . the Dirac measure at 0. Now consider the problem ˙ X k = d(t)X k + f (t)X k ξ k X k (0) = X0 . 0) is the price we are trying to ﬁnd. t = T ) (s = 0. 0 ≤ t ≤ T ) (s > 0. 0 ≤ t ≤ T ). It turns out that this problem can be solved explicitly. E(ξ(t)ξ(s)) = δ0 (t − s). Let us consider ﬁrst of all the formal random diﬀerential equation ˙ X = d(t)X + f (t)Xξ (33) X(0) = X0 . (d) t → ξ k (t) is smooth for all ω.SUMMARY.

Deﬁnition of Stratonovich integral. Z k (t) converges in L2 to a process whose distributions agree with t those 0 f (s) dW . This m corresponds to a sequence of Riemann sum approximations. since it may be unclear experimentally whether we really have ξ(·) or instead ξ k (·) in (33) and similar problems. In view of all this. Recall that in Chapter 4 we deﬁned for 1dimensional Brownian motion T 0 n W dW := lim n tn 0 mn −1 k=0 |P |→0 W (tn )(W (tn ) − W (tn )) = k k+1 k W 2 (T ) − T . 2 where P := {0 = < < · · · < tn n = T } is a partition of [0. 2. This does not agree with the solution (35)! Discussion. One answer is the Stratonovich integral. Thus if we regard (33) as an Itˆ SDE with ξ(·) a “true” white o noise. (35) is our solution. with E(Z k (t)) = 0. For each time t ≥ 0. tn ]. E(Z k (t)Z k (s)) = 0 f (τ )f (σ)dk (τ − σ) dσdτ f (τ )f (σ)δ0(τ − σ) dσdτ → = 0 0 t∧s 0 f 2 (τ ) dτ. This means that (33) is unstable with respect to changes in the random term ξ(·). we get a diﬀerent solution. this is a Gaussian random variable. T ]. And therefore X k (t) converges to a process whose distributions agree with (37) ˆ X(t) := X0 e t 0 d(s) ds+ t 0 f (s) dW . t s 0 t s Furthermore. Hence as k → ∞.For each ω this is just a regular ODE. whose solution is X k (t) := X0 e Next look at Z (t) := 0 k t 0 d(s) ds+ t 0 f (s)ξk (s) ds . A one-dimensional example. This conclusion has important consequences in questions of modeling. solve the approximate problems (36) and pass to limits with the approximate solutions X k (·). t f (s)ξ k (s) ds. it is appropriate to ask if there is some way to redeﬁne the stochastic integral so these diﬃculties do not come up. But if we approximate ξ(·) by smooth processes ξ k (·). k k+1 112 tn 1 . where the integrand is evaluated at the left-hand endpoint of each subinterval [tn .

k+1 k Therefore for this case the Stratonovich integral corresponds to a Riemann sum approximation where we evaluate the integrand at the midpoint of each subinterval [tn . t) dt. T ] → Mn×n be a C 1 function such that T E 0 |B(W. Remember that Itˆ’s integral can be como puted this way: T 0 B(W. t) ◦ dW := lim n mn −1 k=0 |P |→0 B W(tn ) + W(tn ) n k+1 k . 2 (Observe the notational change: we hereafter write a small circle before the dW to signify the Stratonovich integral. tk (W(tn ) − W(tn )). tn ]. A CONVERSION FORMULA. Let W(·) be an n-dimensional Brownian motion and let B : Rn × [0. but there is a conversion formula T i T i (38) 0 B(W. t) ◦ dW = 0 B(W. k k+1 k k This is in general not equal to the Stratonovich integral. t)|2 dt < ∞. Then we deﬁne T 0 B(W. t) dW = lim n mn −1 k=0 |P |→0 B(W(tn ).) According to calculations in Chapter 4. k k+1 We generalize this example and so introduce the DEFINITION. ∂xj . we also have T 0 W ◦ dW = lim n mn −1 k=0 |P |→0 W tn + tn k+1 k 2 (W (tn ) − W (tn )). tn )(W(tn ) − W(tn )).The corresponding Stratonovich integral is instead deﬁned this way: T 0 W ◦ dW := lim n mn −1 k=0 |P |→0 W (tn ) + W (tn ) k+1 k 2 (W (tn ) − W (tn )) = k+1 k W 2 (T ) . k+1 k 2 It can be shown that this limit exists in L2 (Ω). t) dW 113 + 1 2 T 0 n j=1 ∂bij (W.

tk (W(tn ) − W(tn )). n. t) ◦ dW = b(W. tn ) k k · (W(tn ) − W(tn )) k+1 k and using the Mean Value Theorem plus some usual methods for evaluating the limit. If n = 1. we deﬁne T 0 B(X. T ] → Mn×m . s) ◦ dW (0 ≤ t ≤ T ) for b : Rn × [0. s) ds + 0 B(X. t) dW 0 mn −1 k=0 = lim n |P |→0 B W(tn ) + W(tn ) n k+1 k . Special case. Stratonovich chain rule. k+1 k 2 provided this limit exists in L2 (Ω) for all sequences of partitions P n . with |P n | → 0. If X(·) is a stochastic process with values in Rn . t) ◦ dW := lim n mn −1 k=0 |P |→0 B X(tn ) + X(tn ) n k+1 k . Here v i means the ith –component of the vector function v. . ∂x Assume now B : Rn × [0. then T 0 T b(W. Suppose that the process X(·) solves the Stratonovich integral equation t t X(t) = X(0) + 0 b(X. 114 . . t) ◦ dW − B(W. . T ] → Mn×m and W(·) is an m-dimensional Brownian motion. We omit details. tk 2 − B(W(tn ). We make this informal DEFINITION. DEFINITION. the second term on the right being the Stratonovich stochastic diﬀerential. t) ◦ dW. t) dt. We then write dX = b(X. t) dW + 0 1 2 T 0 ∂b (W. t)dt + B(X. 3. . T ] → Rn and B : Rn × [0. This formula is proved by noting T 0 T B(W.for i = 1.

This suggests that interpreting (16) and similar formal random diﬀerential equations in the Stratonovich sense will provide solutions which are stable with respect to perturbations in the random terms. t)dt + B(X. but do not say which interpretation is physically correct. dX = d(t)Xdt + f (t)X ◦ dW (Stratonovich’s sense) t t ˜ X(t) = X0 e 0 d(s) ds+ 0 f (s) dW . strictly speaking. This is a question of modeling and is not. which is similar to that for the Itˆ rule. This is indeed the case: See the articles [S1-2] by Sussmann. We omit the proof. Note also that these considerations clarify a bit the problems of interpreting mathematically the formal random diﬀerential equation (33). Next let us return to the motivational example we began with. ˜ This solution X(·) is also the solution obtained by approximating the “white noise” ξ(·) by smooth processes ξ k (·) and passing to limits. then X(t) = X0 e 0 d(s)− 2 f However. CONVERSION RULES FOR SDE. t) ◦ dW and suppose u : Rn × [0. t). We have seen that if the diﬀerential equation (33) is interpreted to mean dX = d(t)Xdt + f (t)XdW X(0) = X0 . 115 . The main diﬀerence is that we o T 1 2 make use of the formula 0 W ◦ dW = 2 W (T ) in the approximations. T ] → R is smooth. as is easily checked using the Stratonovich calculus described above. the solution is t 1 2 (Itˆ’s sense). and ∂2u o there is no additional term involving ∂xi ∂xj as there is for Itˆ’s formula. a mathematical issue. Assume dX = b(X.THEOREM (Stratonovich chain rule). Deﬁne Y (t) := u(X(t). if we interpret (33) to mean X(0) = X0 . Then ∂u ∂u dt + ◦ dXi dY = ∂t ∂xi i=1 = ∂u ∂u i + b ∂t ∂xi i=1 n n m n dt + i=1 k=1 ∂u ik b ◦ dWk . More discussion. ∂xi Thus the ordinary chain rule holds for Stratonovich stochastic diﬀerentials. o (s) ds+ t 0 f (s) dW .

t)dW X(0) = X0 if and only if X(·) solves the Stratonovich stochastic diﬀerential equation X(0) = X0 . B : Rn × [0. t)dt + B(X. 116 . Ordinary chain rule holds. Then X(·) solves the Itˆ stochastic diﬀerential equation o dX = b(X. T ] → Mn×m satisfy the hypotheses of the basic existence and uniqueness theorem. t) dt + B(X. Summary We conclude these lectures by summarizing the advantages of each deﬁnition of the stochastic integral: Advantages of Itˆ integral o 1. Simple formulas: E 2. t) ◦ dW 2 n ∂bik (x. t)bjk (x. Solutions of stochastic diﬀerential equations interpreted in Stratonovich sense are stable with respect to changes in random terms. For m = n = 1. E G dW =E G2 dt . t) ∂xj (1 ≤ i ≤ n). G dW is a martingale. T ] → Rn . 2 4. t) = k=1 j=1 i m dX = b(X. this says dX = b(X)dt + σ(X)dW if and only if 1 dX = (b(X) − σ ′ (X)σ(X))dt + σ(X) ◦ dW.Let W(·) be an m-dimensional Wiener process and suppose b : Rn × [0. A special case. for c (x. t) − 1 c(X. 2. I(t) = t 0 t 0 t 0 2 t 0 G dW = 0. Advantages of Stratonovich integral 1.

Look at the interval subintervals of length . . then √ k = np + xk npq → ∞ and √ n − k = nq − xk npq → ∞. where “o(1)” denotes a term which goes to 0 as n → ∞. Set Sn := Sn −np √ . Observe next that if x = xk = 1+ and 1− k−np √ npq . 3. 1. 2.APPENDICES Appendix A: Proof of Laplace–De Moivre Theorem (from §G in Chapter ∗ Proof. npq 2) this being a random variable taking on the value n k xk = k−np √ npq (k = 0. . npq h := √ Now if n goes to ∞. . and at the same time k changes so that |xk | is bounded. nq nq 117 . n) with probability pn (k) = −np √ . q x=1+ np q np k − np √ npq = k np p n−k x= . .) Hence as n → ∞ pn (k) = (1) = n k pk q n−k = n! pk q n−k k!(n − k)! √ −n n e n 2πnpk q n−k np k k √ e−k k k 2πke−(n−k) (n − k)(n−k) n k(n − k) nq n−k then 1 =√ 2π n−k 2π(n − k) (1 + o(1)) (1 + o(1)). The points xk divide this interval into n 1 . (See Mermin [M] for a nice discussion. √nq npq npq pk q n−k . We recall next Stirling’s formula. which says says √ n! = e−n nn 2πn (1 + o(1)) as n → ∞.

4. 1 = −k log = −k log 1 + √ = −(np + x npq) Similarly. 2π a Appendix B: Proof of discrete martingale inequalities (from §I in Chapter 2) 118 . log nq n−k n−k √ = −(nq − x npq) − p p 2 x− x nq 2nq + O n− 2 . In view of (1)−(3).Note also log(1 ± y) = ±y − log np k k y2 2 + O(y 3 ) as y → 0. n − k = nq − x npq. Now ∗ P (a ≤ Sn ≤ b) = pn (k) a≤xk ≤b √ xk = k−np npq for a < b. the latter expression is a Riemann sum approximation as n → ∞ of the integral b x2 1 √ e− 2 dx. Finally. k(n − k) npq √ √ since k = np + x npq. Hence k np q x np q 2 q x− x np 2np + O n− 2 . 1 Add these expressions and simplify. observe (3) n 1 = √ (1 + o(1)) = h(1 + o(1)). 2 Consequently (2) lim n→∞ np k k k−np √ npq →x nq n−k n−k = e− x2 2 . to discover lim log n→∞ np k k k−np √ →x npq nq n−k n−k =− x2 .

Notice next that the proof above in fact demonstrates λP max Xk > λ ≤ + Xn dP. . . . . . . . Xk )) + E(χAk E(Xn | X1 . .Proof. Xk )) E(χAk Xk ) by the submartingale property k=1 ≥ λP (A) by (4). 1≤k≤n {max1≤k≤n Xk >λ} Apply this to the submartingale |Xk |: (5) λP (X > λ) ≤ 119 Y dP. 2. Deﬁne k−1 Ak := j=1 {Xj ≤ λ} ∩ {Xk > λ} (k = 1. . k=1 Therefore n + E(Xn ) ≥ + E(Xn χAk ) k=1 n k=1 n + E(E(Xn χAk | X1 . . . n). 1. {X>λ} . Xk )) = = k=1 n ≥ ≥ k=1 n E(χAk E(Xn | X1 . . . we have n n λP (A) = λ k=1 P (Ak ) ≤ E(χAk Xk ). . Then n A := 1≤k≤n max Xk > λ = k=1 Ak . . disjoint union Since λP (Ak ) ≤ (4) Ak Xk dP . .

it follows that |I n −I m |2 is a submartingale.for X := max1≤k≤n |Xk |. Y := |Xn |. We will assume assertion (i) of the Theorem in §C of Chapter 4. Now take some 1 < p < ∞. such that T E 0 (Gn − G)2 dt → 0. since Brownian k+1 k motion does. The martingale inequality now implies P sup |I n (t) − I m (t)| > ε =P sup |I n (t) − I m (t)|2 > ε2 0≤t≤T 0≤t≤T 1 ≤ 2 E(|I n(T ) − I m (T )|2 ) ε T 1 |Gn − Gm |2 dt . If Gn (s) ≡ Gn for tn ≤ s < tn .. which states that the indeﬁnite integral I(·) is a martingale. Therefore I n (·) has continuous sample paths a. Then E(|X| ) = − =p ≤p =p Ω p ∞ 0 ∞ 0 ∞ 0 X λp dP (λ) for P (λ) := P (X > λ) λp−1 P (λ) dλ λp−1 1 λ Y dP {X>λ} dλ by (5) Y 0 λp−2 dλ Y X p−1 dP Ω 1/p dP = ≤ p p−1 p p−1 1−1/p Y p dP Ω Ω X p dP . There exist step processes Gn ∈ L2 (0. T ). then k k k+1 k−1 I n (t) = i=0 Gn (W (tn ) − W (tn )) + Gn (W (t) − W (tn )) i i+1 i k k for tn ≤ t < tn .s. = 2E ε 0 120 . for 0 ≤ t ≤ T . Write I n (t) := t 0 Gn dW . Since I n (·) is a martingale. Appendix C: Proof of continuity of indeﬁnite Itˆ integral (from §C in o Chapter 4) Proof.

Choose ε = P

1 . 2k

**Then there exists nk such that 1 2k
**

T

0≤t≤T

sup |I n (t) − I m (t)| >

≤ 22k E ≤

0

|Gn (t) − Gm (t)|2 dt

**We may assume nk+1 ≥ nk ≥ nk−1 Ak := Then
**

0≤t≤T

1 for m, n ≥ nk . k2 ≥ . . . , and nk → ∞. Let 1 2k .

sup |I nk (t) − I nk+1 (t)| > P (Ak ) ≤

1 . k2 Thus by the Borel–Cantelli Lemma, P (Ak i.o.) = 0; which is to say, for almost all ω 1 provided k ≥ k0 (ω). sup |I nk (t, ω) − I nk+1 (t, ω)| ≤ k 2 0≤t≤T Hence I nk (·, ω) converges uniformly on [0, T ] for almost every ω, and therefore J(t, ω) := limk→∞ I nk (t, ω) is continuous for amost every ω. As I n (t) → I(t) in L2 (Ω) for all 0 ≤ t ≤ T , we deduce as well that J(t) = I(t) amost every for all 0 ≤ t ≤ T . In other words, J(·) is a version of I(·). Since for almost every ω, J(·, ω) is the uniform limit of continuous functions, J(·) has continuous sample paths a.s.

121

EXERCISES (1) Show, using the formal manipulations for Itˆ’s formula discussed in Chapter o 1, that t Y (t) := eW (t)− 2 solves the stochastic diﬀerential equation dY = Y dW, Y (0) = 1.

t (Hint: If X(t) := W (t) − 2 , then dX = − dt + dW .) 2

**(2) Show that P (t) = p0 e solves
**

σW (t)+ µ− σ 2

2

t

,

dP = µP dt + σP dW, P (0) = p0 .

(3) Let Ω be any set and A any collection of subsets of Ω. Show that there exists a unique smallest σ-algebra U of subsets of Ω containing A. We call U the σ-algebra generated by A. (Hint: Take the intersection of all the σ-algebras containing A.) (4) Let X = i=1 ai χAi be a simple random variable, where the real numbers ai are distinct, the events Ai are pairwise disjoint, and Ω = ∪k Ai . Let i=1 U(X) be the σ-algebra generated by X. (i) Describe precisely which sets are in U(X). (ii) Suppose the random variable Y is U(X)-measurable. Show that Y is constant on each set Ai . (iii) Show that therefore Y can be written as a function of X. (5) Verify:

∞ −∞ k

e−x dx = √ 1 2πσ 2

2

√

π,

∞

√

1 2πσ 2

∞ −∞

xe−

(x−m)2 2σ 2

dx = m,

−∞

(x − m)2 e−

(x−m)2 2σ 2

dx = σ 2 .

(6) (i) Suppose A and B are independent events in some probability space. Show that Ac and B are independent. Likewise, show that Ac and B c are independent. (ii) Suppose that A1 , A2 , . . . , Am are disjoint events, each of positive probability, such that Ω = ∪m Aj . Prove Bayes’ formula: j=1 P (Ak | B) = provided P (B) > 0. 122

m j=1

P (B | Ak )P (Ak ) P (B | Aj )P (Aj )

(k = 1, . . . , m),

(7) During the Fall, 1999 semester 105 women applied to UC Sunnydale, of whom 76 were accepted, and 400 men applied, of whom 230 were accepted. During the Spring, 2000 semester, 300 women applied, of whom 100 were accepted, and 112 men applied, of whom 21 were accepted. Calculate numerically a. the probability of a female applicant being accepted during the fall, b. the probability of a male applicant being accepted during the fall, c. the probability of a female applicant being accepted during the spring, d. the probability of a male applicant being accepted during the spring. Consider now the total applicant pool for both semesters together, and calculate e. the probability of a female applicant being accepted, f. the probability of a male applicant being accepted. Are the University’s admission policies biased towards females? or males? (8) Let X be a real–valued, N (0, 1) random variable, and set Y := X 2 . Calculate the density g of the distribution function for Y . a (Hint: You must ﬁnd g so that P (−∞ < Y ≤ a) = −∞ g dy for all a.) (9) Take Ω = [0, 1] × [0, 1], with U the Borel sets and P Lebesgue measure. Let g : [0, 1] → R be a continuous function. Deﬁne the random variables X1 (ω) := g(x1 ), X2 (ω) := g(x2 ) for ω = (x1 , x2 ) ∈ Ω.

Show that X1 and X2 are independent and identically distributed. (10) (i) Let (Ω, U, P ) be a probability space and A1 ⊆ A2 ⊆ · · · ⊆ An ⊆ . . . be events. Show that P

∞ n=1

An

= lim P (Am ).

m→∞

**(Hint: Look at the disjoint events Bn := An+1 − An .) (ii) Likewise, show that if A1 ⊇ A2 ⊇ · · · ⊇ An ⊇ . . . , then P
**

∞ n=1

An

= lim P (Am ).

m→∞

**(11) Let f : [0, 1] → R be continuous and deﬁne the Bernstein polynomial
**

n

bn (x) :=

k=0

f

k n

n k x (1 − x)n−k . k

Prove that bn → f uniformly on [0, 1] as n → ∞, by providing the details for the following steps. 123

Y (x. Write Sn = X1 +· · ·+Xn .xn ) = 2 n n (15) Prove that (i) E(E(X | V)) = E(X). n (iii) Therefore |bn (x) − f (x)| ≤ E(|f ( Sn ) − f (x)|) n Sn = |f ( ) − f (x)| dP + n A Ac |f ( Sn ) − f (x)| dP.) (13) Let X and Y be two independent positive random variables. (ii) E(X) = E(X | W). for each ǫ > 0 there exists δ(ǫ) > 0 such that |f (x) − f (y)| ≤ ǫ if |x − y| ≤ δ(ǫ). 124 .) for each continuous function f . 1 (Hint: P (| x1 +. . dxn = f ( ) n 2 1 12ǫ2 n .. where W = {∅. Y . (14) Show that 1 n→∞ 1 0 1 lim 0 ··· f( 0 1 x1 + . ∞ −∞ fX. P (Xk = 0) = 1−x. n for A = {ω ∈ Ω | | Sn − x| ≤ δ(ǫ)}..(i) Since f is uniformly continuous. n (iv) Then show |bn (x) − f (x)| ≤ ǫ + 2M 2M Sn V( )=ǫ+ V (X1 ). 2 δ(ǫ) n nδ(ǫ)2 for M = max |f |. and suppose that fX and fY are the density functions for X. xn ) dx1 dx2 . Y . Then bn (x) = E(f ( Sn )). take a sequence of independent random variables Xk such that P (Xk = 1) = x. Ω} is the trivial σ-algebra.xn − 2 | > ǫ) ≤ ǫ1 V ( x1 +. . .Y is the joint density function of X. Show that the density function for X + Y is fX+Y (z) = (Hint: If g : R → R. (12) Let X and Y be independent random variables. 1]. Find the density of X + Y . we have E(g(X + Y )) = ∞ −∞ ∞ −∞ fX (z − y)fY (y) dy.. where fX. Conclude that bn → f uniformly. each with density e−x if x ≥ 0 f (x) = 0 if x < 0.. . (ii) Given x ∈ [0. y)g(x + y) dxdy.

y) dydx. Therefore we must show that (∗) A X dP = A Φ(Y ) dP for all A ∈ U(Y ). (Hint: Use assertion (iii) from the previous problem. Fill in the details. Show that E(X|Y ) = Φ(Y ) a. 2 2 2 n (iii) A smooth function Φ : R → R is called convex if the matrix ((Φxi xj )) is n nonnegative deﬁnite for all x ∈ Rn . So the left hand side of (∗) is (∗∗) A X dP = Ω χB (Y )X dP = ∞ −∞ B xf (x. ∞ f (x. y) dx −∞ (Hints: Φ(Y ) is a function of Y and so is U(Y )–measurable. (This means that i.) (17) A smooth function Φ : R → R is called convex if Φ (x) ≥ 0 for all x ∈ R.) (ii) Prove the conditional Jensen’s inequality: Φ(E(X|V)) ≤ E(Φ(X)|V). The right hand side of (∗) is Φ(Y ) dP = A ∞ −∞ B Φ(y)f (x. ′′ 1 1 x+y ) ≤ Φ(x) + Φ(y) 2 2 2 for all x. y) .j=1 Φxi xj ξi ξj ≥ 0 for all ξ ∈ Rn .) (18) (i) Prove Jensen’s inequality: Φ(E(X)) ≤ E(Φ(X)) for a random variable X : Ω → R. 125 . y) dx −∞ .s. Y be two real–valued random variables and suppose their joint distribution function has the density f (x.) Prove Φ( Φ(y) ≥ Φ(x) + DΦ(x) · (y − x) and Φ( for all x. y ∈ R. Now A = Y −1 (B) for some Borel subset of R. (i) Show that if Φ is convex. y ∈ Rn . y) dydx. then Φ(y) ≥ Φ(x) + Φ′ (x)(y − x) (ii) Show that x+y 1 1 ) ≤ Φ(x) + Φ(y) for all x. which equals the right hand side of (∗∗). for Φ(y) = ∞ xf (x. (Here “D” denotes the gradient. y ∈ R. where Φ is convex.(16) Let X.

Show E(W 2k (t)) = (2k)!tk .) (24) Deﬁne U (t) := e−t W (e2t ). T ] → Rn is uniformly H¨lder o continuous with exponent γ. where W (·) is a one-dimensional Brownian motion. You do not need to show this. it is also is uniformly H¨lder continuous with o each exponent 0 < δ < γ. (ii) Show that f (t) = tγ is uniformly H¨lder continuous with exponent γ o on the interval [0. t − s) for times 0 ≤ s ≤ t. (25) Let W (·) be a one-dimensional Brownian motion. t < ∞. and deﬁne ¯ W (t) := tW ( 1 ) for t > 0 t 0 for t = 0.(19) Let W (·) be a one-dimensional Brownian motion.) (26) (i) Let 0 < γ ≤ 1. Apply the m Borel–Cantelli Lemma. the variance of which we know from the previous homework problem. Show that if f : [0. Show that E(eλX(t) ) = e λ2 t3 6 for each t > 0. Show that m→∞ lim W (m) =0 m almost surely. where W (·) is a one-dimensional Brownian motion. (ii) cW(t/c2 ) for all c > 0 (“Brownian scaling”). ¯ ¯ ¯ Show that W (t) − W (s) is N (0. 2k k! (20) Show that if W(·) is an n-dimensional Brownian motion. Show that E(X 2 (t)) = t3 3 for each t > 0. 1]. (W (·) also has independent increments and so is a one-dimensional Brownian motion. t (23) Deﬁne X(t) as in the previous problem. (21) Let W (·) be a one-dimensional Brownian motion. 1) random variable X = W (m) . Then Am = √ √ {|X| ≥ mǫ} for the N (0. 126 . Show that E(U (t)U (s)) = e−|t−s| for all − ∞ < s. (m) (Hint: Fix ǫ > 0 and deﬁne the event Am := {| Wm | ≥ ǫ}.) (22) Deﬁne X(t) := 0 W (s) ds. (Hint: X(t) is a Gaussian random variable. then so are (i) W(t + s) − W(s) for all s ≥ 0.

These notes show that if W (·) is a one–dimensional Brown2 ian motion. W n ) be an n-dimensional Brownian motion. Assume X be an integrable random variable. (28) Prove that if G. Show that Φ(X(·)) is a submartingale.(27) Let 0 < γ < 1 . and write Y (t) := |W(t)|2 − nt for times t ≥ 0. (Hint: W 2 (t) = (W (t)−W (s))2 −W 2 (s)+2W (t)W (s). depending on ω. Show that X(·) is a martingale. Assume also E(|Φ(X(t))|) < ∞ for all t ≥ 0. and then condition with respect to the history of I(·). (Hint: Compute dY . the history of W (·). o (33) Let W(·) = (W 1 . Take the conditional expectation with respect to W(s). 127 . . t ≤ 1. and take F (·) to be a ﬁltration of σ– algebras. then for almost every ω there exists a constant K. U. T ). ω)| ≤ K|t − s|γ for all 0 ≤ s.) (32) Use the Itˆ chain rule to show that Y (t) := e 2 cos(W (t)) is a martingale. . . . (Hint: Use the conditional Jensen’s inequality.) (34) Show that T t W 2 dW = 0 1 3 W (T ) − 3 T W dt 0 T and 0 T W 3 dW = (35) Recall from the notes that Y := e satisﬁes 3 1 4 W (T ) − 4 2 t 0 1 g dW − 2 t 0 W 2 dt. H ∈ L2 (0. Show that there does not exist a constant K such that (∗) holds for almost all ω. P ) be a probability space. (Hint: 2ab = (a + b)2 − a2 − b2 . such that (∗) |W (t.) (31) Suppose X(·) is a real-valued martingale and Φ : R → R is convex. ω) − W (s.) (29) Let (Ω. (30) Show directly that I(t) := W 2 (t) − t is a martingale. Show that Y (·) is a martingale. 0 g 2 ds dY = gY dW. then T T T E 0 G dW 0 H dW =E 0 GH dt . and deﬁne X(t) := E(X|F (t)) for times t ≥ 0.

(Hint: x ≡ 0 is a solution. W n ) be an n-dimensional Brownian motion and write 1 n 2 1 2 (t > 0) (41) Solve the SDE dX = −Xdt + e−t dW . Find also a solution which is positive for times t > 0. Show that if x0 > 0. and show in particular that E(B 2 (t)) → 0 as t → 1− . .Use this to prove E(e T 0 g dW ) = e2 1 T 0 g 2 ds . (ii) Next. W 2 . Show that this problem has inﬁnitely many solutions. (ii) Show that the solution blows up at a ﬁnite. t)) = u(0. and then combine these solutions to ﬁnd ones which are zero for some time and then become positive. ∂t 2 ∂x2 and suppose W (·) is a one-dimensional Brownian motion. . E(X(s)X(t)) = 2b (39) (i) Consider the ODE x = x2 ˙ (t > 0) x(0) = x0 . look at the ODE x = x2 ˙ x(0) = 0. t) be a smooth solution of the backwards diﬀusion equation ∂u 1 ∂ 2 u + = 0.) (40) (i) Use the substituion X = u(W ) to solve the SDE dX = − 1 e−2X dt + e−X dW 2 X(0) = x0 . random time. . (36) Let u = u(x. . 128 . (42) Let W = (W 1 . R := |W| = (W ) i=1 i 2 . the solution “blows up to inﬁnity” in ﬁnite time. (38) Let X solve the Langevin equation. 0). Show that for each time t > 0: E(u(W (t). σ ) 2b random variable. (37) Calculate E(B 2 (t)) for the Brownian bridge B(·). Show that σ 2 −b|t−s| e . and suppose that X0 is an N (0.

Show that R solves the stochastic Bessel equation n dR = i=1 Wi n−1 dW i + dt. x dy (Hint: Let f (x) := 0 σ(y) and set g := f −1 .) 129 . Show X := g(W ). Modify Φ to build a function u which equals 0 on the inner sphere and 1 on the outer sphere. for n ≥ 3. then |X| is constant in time. where the point x0 lies in the region U = {0 < R1 < |x| < R2 } Calculate explicitly the probability that X will hit the outer sphere {|x| = R2 } before hitting the inner sphere {|x| = R1 }. Write X = W + x0 . (Hint: Check that Φ(x) = 1 |x|n−2 satisﬁes ∆Φ = 0 for x = 0. dX 1 = X 2 dt + dW 1 dX 2 = X 1 dt + dW 2 . (48) Let W denote an n–dimensional Brownian motion. W 2 ) is a Brownian motion. R 2R (43) (i) Show that X = (cos(W ). Show τ is a stopping time.) (45) Solve (47) Let τ be the ﬁrst time a one–dimensional Brownian motion hits the halfopen interval (a. where W = (W 1 . (46) Solve 1 dX = 2 σ ′ (X)σ(X)dt + σ(X)dW X(0) = 0 where W is a one–dimensional Brownian motion and σ is a smooth. positive function. X 2 ) is any other solution. the inverse function of f . sin(W )) solves the SDE system dX 2 = − 1 X 2 dt + X 1 dW 2 dX 1 = − 1 X 1 dt − X 2 dW 2 (ii) Show also that if X = (X 1 . b]. (44) Solve the system dX 1 = dt + dW 1 dX 2 = X 1 dW 2 .

Handbook of Stochastic Methods for Physics. 1996. Dynamical Theories of Brownian Motion. Linear Estimation and Stochastic Control. Breiman. [Br] P. [L2] J. V. Z. [Ml] A. Stirling’s formula!. [Hu] J. Probability Theory: An Analytic View. Chung. Arnold. McKean. Stochastic Diﬀerential Equations. I. Cambridge U. Notes on random functions. The mathematics of Brownian motion and Johnson noise. [B-S] A. [Fr] M. Davis. Gardiner. Futures and Other Derivatives (4th ed). Borodin and P.References L. [L1] J. 1967. 1988. Baxter and A. 1968. Kyˆto (1964). [A] [B-R] 130 . 647–668. Benjamin. [N] E. [G] C. Freidlin. Elementary Probability Theory with Stochastic Processes. Friedman. 1996. Chemistry. J. Stochastic Diﬀerential Equations: Theory and Applications. A. Gihman and A. Itˆ’s calculus in ﬁnancial decision making. N.. Lamperti. Functional Integration and Partial Diﬀerential Equations. H. On the gap between deterministic and stochastic ordinary diﬀerential equations. [McK] H. Academic Press. Math. Springer. Press. Rennie. Academic Press. Princeton University Press. [K] N. Pinsky. [O] B. Press. Math. Oksendal. 1995. American J. [B] L. 1983. Stochastic Diﬀerential Equations: An Introduction with Applications. [P] M. Prentice Hall. 19–41. Cambridge U. o 481–496. 1974. Malliaris. 225–240. Skorohod. Physics 64 (1996). Bremaud. A simple construction of certain diﬀusion processes. [M] D. Introduction to Fourier Analysis and Wavelets. [H] D. and A. Brooks/Cole. 525–546. [S2] H. Springer. K. C. Probability. Mermin. Lamperti. American J. Press. 1999. An Introduction to Probabilistic Modeling. Handbook of Brownian Motion: Facts and Formulae. 1975. [C] K. Princeton U. Sussmann. Salminen. Vol. Stochastic Integrals. J. 1995. SIAM Review 25 (1983). V. 1 and 2. Chapman and Hall. Physics 52 (1984). 37 (1959). An algorithmic introduction to numerical simulation of stochastic diﬀerential equations. American Math Society. Zygmund. Probability. 1993. An interpretation of stochastic diﬀerential equations as ordinary diﬀerential equations which depend on the sample point. [G-S] I. Probability 6 (1978). 1985. Nelson. Ann. and the Natural Sciences. M. 1969. Springer. W. [F] A. Springer. N. Options. [P-W-Z] R. 1972. Introduction to the Theory of Diﬀusion Processes. Higham. G. Bulletin AMS 83 (1977). 2002. Wiener. Stochastic Diﬀerential Equations and Applications. Stroock. Sussmann. [D] M. 362–365. Birkhauser. [S] D. Paley. Financial Calculus: An Introduction to Derivative Pricing. Hull. o 161–170. L. Springer. 4th ed. SIAM Review 43 (2001). [G] D. 296–298. Krylov.. Wiley. Addison–Wesley. Gillespie. [S1] H.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd