Ludwig Arnold - Stochastic Differential Equations - Theory and Applications - Wiley Interscience PDF

Stochastic
Differential Equations:
Theory and Applications
LUDWIG ARNOLD
A WILEY-INTERSCIENCE PUBLICATION
JOHN WILEY & SONS, New York 0 London 0 Sydney Toronto

Original German language edition published
by R. Oldenbourg Verlag, Munich.
Copyright ©1973 by R. Oldenbourg Verlag.
Copyright ©1974, by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
No part of this book may be reproduced by any means, nor
transmitted, nor translated into a machine language with-
out the written permission of the publisher.
Library of Congress Cataloging in Publication Data:

Arnold, Ludwig, 1937-
Stochastic differential equations.
"A Wiley-Interscience publication."
Translation of Stochastische Differentialgleichengen.
Bibliography: p.
1. Stochastic differential equations. I. Title.
QA274.23.A7713 519.2 73-22256
ISBN 0-471-03359-6
Printed in the United States of America

10 9 8 7 6 5 4 3 2 1
Preluce
The theory of stochastic differential equations was originally developed by

mathematicians as a tool for explicit construction of the trajectories of diffusion
processes for given coefficients of drift and diffusion. In the physical and engi-
neering sciences, on the other hand, stochastic differential equations arise in a
,quite natural manner in the description of systems on which so-called "white
noise" acts.
As a result of this variety in the motivations, the existing detailed studies of the
subject, as a rule, either are not written from a standpoint of applications or are
inaccessible to the person intending to apply them. This holds for the important
original work of Ito [42] as well as the books by Gikhman and Skorokhod [5],
Dynkin [211, McKean [451, and Skorokhod [471. The shorter accounts of sto-
chastic dynamic systems in books on stability, filtering, and control (see, for
example, Bucy and Joseph [611, Khas'minskiy [65] , Jazwinski [66] , or Kushner
[72]) are rather unsuited for the study and understanding of the subject. Because
of the language barrier, the comprehensive book by Gikhman and Skorokhod
[36] in Russian is likely to be inaccessible to most persons interested in the sub-
ject. (Note added during printing: An English translation has just been published
by Springer Verlag.) Also, to a great extent it deals with a considerably more
general case.
This book is based on a course that I gave in the summer semester of 1970 at the
University of Stuttgart for fifth-semester students of mathematics and engineer-
ing. It is written at a moderately advanced level. Apart from probability theory,
the only prerequisite is the mathematical preparation usual for students of
physical and engineering sciences. The reader can obtain for himself the necessary
familiarity with probability theory from any one of a large number of good texts
(see [11 -[17] ). I have summarized the most important concepts and results of
probability theory in Chapter 1, though this is intended only for reference and
review purposes. It is only with an intuitive understanding of the basic concepts
of probability theory that the reader will be able to distinguish between method-
ological considerations and technical details.
Throughout, I have gone to great pains (by means of remarks, examples, special
cases, and mention of heuristic considerations) to achieve the greatest clarity
possible. Proofs that do not provide much information for the development of
the subject proper have been omitted. The chapters on stability, filtering, and
control include a few examples of possible uses of this new tool.
Professor Peter Sagirow of the University of Stuttgart suggested that I write this
book. He also read the manuscript and made many valuable suggestions for im-
vi Preface
provement. For this I wish to express my gratitude to him. I also thank sincerely
Hede Schneider of Stuttgart and Dominique Gaillochet of Montreal for typing
the manuscript. Finally, I wish to express my thanks to the Centre de Recherches
Mathematiques of the University of Montreal, where I was able to complete the
manuscript during my stay there in the academic year 1970-1971.
Ludwig Arnold
Montreal, Quebec
March 1971
Contents
Introduction .................. . . xi
Notation and Abbreviations ........... ' ..................... xv

1. Fundamentals of Probability Theory ...................... 1
1.1 Events and random variables ............................ 1
1.2 .....................
Probability and distribution functions 3
1.3 Integration theory, expectation ......................... 7
1.4 Convergence concepts ................................ 12
1.5 Products of probability spaces, independence ................ 14
1.6 Limit theorems .................................... 17
1.7 Conditional expectations and conditional probabilities .......... 18
1.8 ................................. 21
Stochastic processes
1.9 Martingales ....................................... 25
2. Markov Processes and Diffusion Processes .................. 27
2.1 The Markov property ................................ 27

2.2 Transition probabilities, the Chapman-Kolmogorov equation ...... 29
2.3 Examples ........................................ 34
2.4 The infinitesimal operator ............................. 36
2.5 Diffusion processes .................................. 39
2.6 Backward and forward equations ........................ 41
3. Wiener Process and White Noise ......................... 45
3.1 Wiener process .................................... 45
3.2 White noise ....................................... 50
viii Contents
4. Stochastic Integrals .................................. 57

4.1 Introduction ...................................... 57
4.2 An example ...................................... 58
4.3 Nonanticipating functions ............................. 61
4.4 Definition of the stochastic integral ....................... 64
4.5 Examples and remarks ................................ 75
5. The Stochastic Integral as a Stochastic Process, Stochastic

Differentials ...................................... 79
5.1 The stochastic integral as a function of the upper limit .......... 79
5.2 Examples and remarks ................................ 84
5.3 Stochastic differentials. Ito's theorem ..................... 88

5.4 Examples and remarks in connection with It6's theorem ......... 92
5.5 Proof of Ito's theorem ................................ 96
6 Stochastic Differential Equations, Existence and

Uniqueness of Solutions .............................. 100
6.1 Definition and examples .............................. 100
6.2 Existence and uniqueness of a solution .................... 105
6.3 Supplements to the existence-and-uniqueness theorem .......... III
7. Properties of the Solutions of Stochastic Differential

Equations ....................................... 116
7.1 The moments of the solutions ......................... 116
7.2 Analytical properties of the solutions ..................... 120

7.3 Dependence of the solutions on parameters and initial values ..... 122
8. Linear Stochastic Differential Equations ................... 1 25
8.1 Introduction ..................................... 125

Contents ix
8.2 Linear equations in the narrow sense ..................... 128
8.3 The Ornstein-Uhlenbeck-process ....................... 134
8.4 The general scalar linear equation ....................... 136
8.5 The general vector linear equation ....................... 141
9. The Solutions of Stochastic Differential Equations as Markov and

Diffusion Processes .................................. 145
9.1 Introduction ..................................... 145

9.2 The solutions as Markov processes ....................... 146
9.3 The solutions as diffusion processes ...................... 152
9.4 Transition probabilities .............................. 156
10. Questions of Modeling and Approximation ................. 163
10.1 The shift from a real to a Markov process .................. 163

10.2 Stratonovich's stochastic integral ....................... 167
10.3 Approximation of stochastic differential equations ........... 172
11. Stability of Stochastic Dynamic Systems ................... 176
11.1 Stability of deterministic systems ....................... 176
11.2 The basic ideas of stochastic stability theory ................ 179

11.3 Stability of the moments ............................. 188
11.4 Linear equations .................................. 190

11.5 The disturbed n th-order linear equation ................... 196
11.6 Proof of stability by linearization ....................... 198
11.7 An example from satellite dynamics ..................... 199
12. Optimal Filtering of a Disturbed Signal ................... 202
12.1 Description of the problem ........................... 202

X Contents
12.2 The conditional expectation as optimal estimate ............. 205

12.3 The Kalman-Bucy filter ............................. 206
12.4 Optimal filters for linear systems ....................... 208
13. Optimal Control of Stochastic Dynamic Systems ............. 211
13.1 Bellman's equation ................................. 211

13.2 Linear systems ................................... 213
13.3 Control on the basis of filtered observations ................ 215
Bibliography ......................................... 217
Name and Subject Index ................................. 222

Introduction
Differential equations for random functions (stochastic processes) arise in the

investigation of numerous physics and engineering problems. They are usually of
one of the following two fundamentally different types.
On the one hand, certain functions, coefficients, parameters, and boundary or
initial values in classical differential equation problems can be random. Simple
examples are
X, = A (t) X, + B (t), X,o = c,
with random functions A (t) and B (t) as coefficients and with random initial
value c or
Xt = f (t, X,, sle), X,0 = c,

with the random function rl,,the random initial value c, and the fixed function f
(all the functions are scalar). If these random functions have certain regularity
properties, one can consider the above-mentioned problems simply as a family
of classical problems for the individual sample functions and treat them with the
classical methods of the theory of differential equations.
The situation is quite different if random "functions" of the so-called "white
noise" type appear in what is written formally as an ordinary differential equa-
tion, for example, the "function" E, in the equation
(a) X, = / (t, X,)+G (t, X,) X,o=c.
This "white noise" is conceived as a stationary Gaussian stochastic process with
mean value zero and a constant spectral density on the entire real axis. Such a
process does not exist in the conventional sense, since it would have to have the
Dirac delta function as covariance function, and hence an infinite variance (and
independent values at all points). Nonetheless, the "white noise" , is a very
useful mathematical idealization for describing random influences that fluctuate
rapidly and hence are virtually uncorrelated for different instants of time.
Such equations were first treated in 1908 by Langevin [44] in the study of the
Brownian motion of a particle in a fluid. If X, is a component of the velocity, at
an instant t,of a free particle that performs a Brownian motion, Langevin's equa-
tion is
(b) it = - a X, + a ft) a> 0, a constants.
Here, - a X, is the systematic part of the influence of the surrounding me-
dium due to dynamic friction. The constant a is found from Stokes's law to be
Xii Introduction
a = 6 it a j7/m, where a is the radius of the (spherical) particle, m is its mass, and
ri is the viscosity of the surrounding fluid. On the other hand, the term a , repre-
sents the force exerted on the particle by the molecular collisions. Since under
normal conditions the particle uniformly undergoes about 1021 molecular colli-
sions per second from all directions, a E, is indeed a rapidly varying fluctuational
term, which can be idealized as "white noise." If we normalize , so that
its covariance is the delta function, then a2 = 2 a k Tim (where k is Boltzmann's
constant and T is the absolute temperature of the surrounding fluid). The same
equation (b) arises formally for the current in an electric circuit. This time, E,
represents the thermal noise. Of course, (b) is a special case of equation (a), the
right-hand member of which is decomposed as the sum of a systematic part /
and a fluctuational part G E, .
In model (b) of Brownian motion, one can calculate explicitly the probability
distributions of X, even though , is not a random function in the usual sense.
As a matter of fact, every process X,with these distributions (Ornstein-Uhlenbeck
process) has sample functions that, with probability 1, are non differentiable, so that
(b) and, more generally, (a) cannot be regarded as ordinary differential equations.
For a mathematically rigorous treatment of equations of type (a), a new theory
is necessary. This is the subject of the present book. It turns out that, whereas
"white noise" is only a generalized stochastic process, the indefinite integral
(c) W, = E, ds
0
i
can nonetheless be identified with the Wiener process. This is a Gaussian stochas-
tic process with continuous (but nowhere differentiable) sample functions, with mean
E W, = 0 and with covariance E W, W, = min (t, s).
If we write (c) symbolically as
dW,=e,dt,
(a) can be put in the differential form
(d) dX, = / (t, X,) dt+G (t, X,) dW , X,0=c.
This is a stochastic differential equation (Ito's) for the process X,. It should be
understood as an abbreviation for the integral equation
(e) X,=c+1 /(s,X,)ds+J G(s,X,)dW,.

to 80
Since the sample functions of W, are with probability 1 continuous though not of
bounded variation in any interval, the second integral in (e) cannot, even for
smooth C, be regarded in general as an ordinary Riemann-Stieltjes integral with
respect to the sample functions of W, because the value depends on the intermediate
points in the approximating sums. In 1951, 1 to [421 defined integrals of the form
Iniroduction xiii
Y, = f G (s) dW,
90
for a broad class of so-called nonanticipating functionals G of the Wiener process

W, and in doing so put the theory of stochastic differential equations on a solid
foundation. This theory has its peculiarities. For example, the solution of the
equation
dX, = X, dW X0 =1,
is not e'', but
X,=e w,-t12 ,
which one does not derive by purely formal calculation according to the classical
rules. It turns out that the solution of a stochastic differential equation of the
form (d) is a Markov process with continuous sample functions-in fact, a diffusion
process. Conversely, every (smooth) diffusion process is the solution of a stochas-
tic differential equation of the form (d) where /and G2 are respectively the coef-
ficients of drift and diffusion.
For diffusion processes, there exist effective methods for calculating transitional
and finite-dirnensional distributions and distributions of many functionals. These
methods belong to the so-called analytical or indirect of probability methods
which deal not with the timewise development of the state X but, for example,
with the timewise development of transition probabilities P (X, E B I X, = x).
In contrast, the calculus of stochastic differential equations belongs to the prob-
abilistic or direct methods since with them we are concerned with the random
variable X, and its variation. An equation of the form (d) or (e) represents a
construction rule (albeit in general a complicated one) with which one can con-
struct the trajectories of X, from the trajectories of a Wiener process W, and
an initial value c.
The law of motion for state of a stochastic dynamic system without after-effects
("without memory") can be described by an equation more general than (d),
namely, by an equation of the form
(f)
In the case of fluctuational influences that are additively superimposed on a sys-
tematic part, we have
g (t, x, h) (t, x) h + G (t, x) (Y,+e-Y,).
Here, Y, is a process with independent increments and equation (f) takes the
form
xiv Introduction
dX, = J (t, X,) dt+G(t, X,) dY,.

Such equations have been studied by Ito [421. We shall confine ourselves here to
the most important special case Y, = W, .
Notation and A bbre viations
Vectors of dimension d are basically treated as d x I matrices (column vectors).

Equations or inequalities involving random variables hold in general only with
probability 1. As a rule, the argument to of random variables is omitted.
A' Transpose of the matrix A

Xi A d -dimensional stochastic process with index set [to, TI C [0, oo) = R+
X; The transpose (vector) of X,
X, The derivative of X, with respect to t
Rd d -dimensional Euclidean space with distance function I x - yl
I Unit matrix
1,4 The indicator function of the set A
d
IxI The norm of xERd: Ix12x?=x'x=tr(xx')
d
x' y Scalar product of x, y E Rd, x' y = xi yi = tr (x y')
i=1
z y' Matrix(xi y1)

d m
IAI Norm of the d X m matrix A : 1,412 = Z a j = tr A A' (note that
i=1 i=i
IAxI :IAI IxI, IABI -IAi IBI
d
tr A = aii Trace of the matrix A
i=t
A positive-definite (resp. nonnegative-definite): x' A x >0 (resp. ? 0) for
all x*0
b (t) Dirac's delta function
az Probability measure concentrated at the point x
sup (resp. inf) Least upper (greatest lower) bound of a scalar set or sequence
lim sup (resp. lim inf) Greatest (resp. least) point of accumulation of
a scalar sequence
xvi Notations and A bbreviations
o (g (t)), 0 (g (t)) Quantity whose ratio to g (t) (usually as t --+ 0) approaches

0 (resp. remains bounded).
(0, 21, P) Probability space
21(Q) The sigma-algebra generated by the family (F of sets
$d, %d (M) The sigma-algebra of Borel sets in Rd (resp. in M C Rd )
21([ti, t2[) The sigma-algebra generated by the random variables X,
for t1 ' t = t2
A (vector-valued) white noise
Wt A (vector-valued) Wiener process
LP = LP (0, 21, P) All random variables such that E I X IP < oo

X. (w), f x) X (resp. f) as a function of the variables replaced by the dot
for fixed w (resp. x )
T
L2 [to, Tj All measurable functions such that j If (s)I2 ds < o0
80
P (s, x, t, B) Probability of transition of the point x at the instants into the

set B at the instant t
p (s, x, t, y) Density of P (s, x, t, )
n (t, x, Y) = (2 n t)-d12 e-ir-X12129
9 (m, C) d -dimensional normal distribution with expectation vector m

and covariance matrix C
st-lim (resp. qm-lim, resp. ac-lim) Stochastic limit (resp. mean square or qua-
dratic mean limit, resp. limit with probability 1) of a sequence of random variables
Stochastic Differential Equations:
Theory and Applications
Chapter 1
Fundamentals of Probability Theory
1.1 Events and Random Variables
Probability theory deals with mathematical models of trials whose outcomes de-
pend on chance. We group together the possible outcomes-the elementary events
-in a set Q with typical element w E Q. If the trial is the tossing of a coin, then a =
{heads, tails); for the throwing of a pair of (distinguishable) dice, Q = {(i, j): 1 < i,
i-:5 6}; for the life length of a light bulb,Q =(0, oo); in the observation of water
level from time tl to time t2, Q is the set of all real functions (or perhaps all con-
tinuous functions) defined on the interval [t1, t2). An observable eventA is a sub-
set of Q, which we indicate by writing A C Q. (In the dice example,A might be
((i, j): i + j = an even number), and in the light bulb example, .4 might be {w:
w ? c}.)
On the other hand, not every subset of'Q is in general an observable or interest-
ing event. Let 21 denote the set of observable events for a single trial. Of course,
21 must include the certain eventQ, the impossible event 0 (the empty set) and,
for every event A, its complement A. Furthermore, given two events ,4 and B in
21, the union A U B and the intersection A n B also belong to 21; thus, 21 is an
algebra of events. In many practical problems, one must be able to make count-
able unions and intersections in 21. To do this, it is sufficient to assume that
0
U
n=1
when An E 21 for n > 1. An algebra 21 of events with this property is called a sig-
ma algebra. Henceforth, we shall deal with sigma algebras exclusively. In the
terminology of measure theory, which is parallel to the terminology of proba-
bility theory, the elements of 21 are called measurable sets and the pair (a, 21) is
called a measurable space. Two events A and B are said to be incompatible if
they are disjoint, that is, if A n B = 0. If A is a subset of B, indicated by writing
A C B (where A = B is allowed), we say that A implies B.
Let Cr denote a family of subsets of Q. Then, there exists in Q a smallest sigma-
algebra 21((i) that contains all sets belonging to (E. This 21 (() is called the sigma
algebra generated by (F.
2 1. Fundamentals of Probability Theory
Let (9, 21) and (Q', 21') denote measurable spaces. A mapping X: Q -Q'that
assigns to every w E 9 a member w' = X (w) of Q' is said to be (21-21')-measur-
able (and is called an Q'-valued random variable on (9, 21)) if the pre images of
measurable sets in 9' are measurable sets in 0, that is, if, for A' E 21',
{w:X(w)EA'}=[X(w)EA']=X-1 (A')E21.
The set 21(X) of preimages of measurable sets is itself a sigma algebra in .Q and is
the smallest sigma algebra with respect to which X is measurable. It is called the
sigma algebra generated by X in 0.
Suppose that 9' is the d-dimensional Euclidean space Rd with distance function
Y
d 112
1 x-YI=21 Ixk-Yk12(k-1 ) +
Yd
In this special case, we shall always choose as the sigma algebra 21' of events the
sigma algebra $d of Borel sets in Rd generated by the d-dimensional intervals
ak < xk < bk for k = 1, 2,... , d. Borel sets include in particular all closed and all
open sets and all kinds of d -dimensional intervals (half-open, unbounded, etc.).
Although there are "many" non-Borel sets, it is not easy to exhibit specific ex-
amples of them. If Q' is a subset of Rd (examples for d =1 are 10, oo),10, 11'(0,
1, 2, ...), etc.), then we always choose 21' =!Bd (Q) = IA' = B n dl': B E V).
An Rd-valued function X : 9 Rd, where (9, 21) is a measurable space, is mea-

surable (resp. Borel-measurable, resp. 21 -measurable) if and only if the d com-
ponents Xk, for 1:_5 k:_5 d, are real-valued (or scalar) measurable (resp. Borel-
measurable, resp.21-measurable) functions. A real-valued function X: a---* R1 is
measurable if and only if the preimages of all intervals of the form (- oo, aJ are
measurable, that is, if and only if
{w: X (w) <a} = (X <aj E 21 for all a E R1.
A (d x m}matrix-valued function
X11 (lit) ... X1m (w)

X(w)=
Xdl (w) ... Xd. (w)
is measurable (as a function with range in Rd"') if and only if the elements X; are
measurable.
The indicator function 1e of a set A C 0 is defined by
1.2 Probability and Distribution Functions 3
1 for w E A,
14 (w)_
0 for wi$A.
The indicator function 1A is measurable if and only if A is measurable, that is,

AE21.
Calculation with measurable functions is greatly facilitated by the fact that the
set of real-valued measurable functions is closed under the operations of classical
analysis, that is, under at most countably infinite applications of addition, sub-
traction, multiplication, division, and evaluation of the supremum, infimum, and
limits superior and inferior provided the operations lead to another real-valued
function. In particular, the limit of a sequence of measurable functions is a mea-
surable function.
The intuitive background of the concept of a random variable or measurable
function is as follows: Suppose that we are given a trial described by a measur-
able space (Q, 21), where Q is the set of possible observable elementary events
and 21 is the sigma algebra of events that are observable or interesting in the
framework of the trial.
Now it could happen that we would not be able to observe w directly but, with
the help of a measuring instrument, could measure a value X (w) in a set Q',
where the sigma algebra of events 21' is defined on Q'. Thus, the value of X (w)
would depend on w, that is, on chance. The assumption of measurability of the
function X means that for every observable meaningful event in the space a'
there is a meaningful event in the original space. Now, information is, in general,
lost by observing X (w) instead of w, a fact expressed by the condition that
21(X) is only a sub-sigma-algebra of 21. In throwing of a pair of dice, we can
choose in Q = ((i, j) : 1 < i, j:5 6) the set of all subsets of Q to serve as 21. Every
function defined on Q is then measurable. On the other hand, if 21 is the sigma
algebra generated by the sets Ak = ((i, j): i + j = k}, where 2:-5 k 12 (that is, if
only the sum of the spots can be observed), then, for example, X ((i, j)) = i is
not measurable.
1.2 Probability and Distribution Functions

Let (Q, 21) denote a measurable space. A set function p defined on 21 is called a
measure if
a) O:p(A):oo for allAE21,
b) N (QS) = 0,
C) l4 (U A) = ` u (An) if A,, E 21 for all

n=1
n > I and A. n A. = 0 (n m) (sigma-additivity).
The triple (Q, 21, ,u) is called a measure space, and p (A) is called the measure of
the setA. If Q is the union of an at most countably infinite family of sets in2l,
each with finite measure, then ,u is said to be sigma-finite. If a (Q) <00, then,u
is said to be finite. A normed finite measure P, that is, a measure P with the
property
P(Q)=1,
is called a probability measure or simply a probability, and the triple (Q, 21, P)
is called a probability space. The set function P thus assigns to every event A a
number P (A), known as the probability of A, such that 0:-S P (A) < 1.
Furthermore, we have
P(A)=1-P(A), AE21,
and
P(A)P(B) for A C B, A, B E 21.

Probability theory is concerned, roughly speaking, only with the calculation of
new probabilities from given ones. The a priori probabilities are obtained either
from theoretical considerations or from observation of frequencies in long series
of trials. We shall use the frequency interpretation of probability theory as the
basis of our discussion. This interpretation means that P (A) is the theoretical
value of the relative frequency of A in a large number n of performances of a
trial characterized by the measurable space (Q, 21); that is,
nA
P (A)
n
where n is the number of performances of the trial and nA is the number of oc-
currences of the event A in the course of the n performances. Estimations of
P (A) are provided by statistics.
When a pair of dice are to be thrown, a probability is assigned to all subsets of
0 = ((i, j):1 < i, j ! 6) by assigning (and this holds for all countable sets D) to
the singleton sets ((i, j)} the specific probabilities:
6 6
P (,(i, j)}) = Pij a O, 21 Pi) = 1.

i-1 j-1
Then, for every set A,
P(A)=IY, Pij
U. j)EA
1.2 Probability and Distribution Functions S
If the dice are independent (see section 1.5), the pit have the form
6 6
Pii = pi q1, p, = qi = 1.
i=1 i=1
Finally, if the dice are "fair", then pi = qi = 1/6, so that pit = 1/36.
Of all nonnormed measures, we are interested here only in Lebesgue measure 1,
defined on the set of Borel sets t$d in Rd. This measure assigns to every d-
dimensional interval its "length":
d
1 ({x: ak < xk S bk}) _ 11 (bk -ak) .
k=1
Thus, in the case of simple sets, it corresponds to their elementary geometric
content and it can be carried over in an unambiguous way from intervals to all
Borel sets. Since 1 (Rd) = oo, Lebesgue measure is not finite. However, since
1({x:-n_xk<n})_(2n)d<oo, n=1,2,...,
it is a sigma-finite measure. Every countable set (for example, the set of points
with rational coordinates) has Lebesgue measure 0.
Let (Q, 21, p) denote a measure space and let E (w) denote a proposition regard-
ing the elements w ofQ. Then, we shall write E [,u] to mean that E is true for all
w in Q except possibly for those to in some set N (belonging to 21) such that
fr (N) = 0 , so that E, is true for all to E N is a probability P , we shall say
"certain [P] " or, since P (N) = 1, "with probability I".
Now, suppose that (Q, 21, P) is a probability space, that (Q', 21') is a measur-
able space, and that X is a random variable with values in Q'. The function X
maps the probability P onto the measurable space of the images by
PX (A') = P (X -1(A')) = P {w: X ((o) E A'} = P (X E A'} for all
A' E 21'.
The function PX is called the distribution of X. It contains the information
needed for probability-theoretic examination of X. For an Rd-valued random
variable, the distribution PX is uniquely defined on 5 d by its distribution func-
tion
F(x)=F(x1,...,xd)=P{w:X1(w):x1,...,Xd(w)-xd)
= P {X : x],
which shows how likely it is that X will assume values to the "left" of the point
xE Rd. The function F (x) is a convenient tool for describing the distribution
of X inasmuch as it is not a set function but an ordinary point function defined
on Rd. It is also called the joint distribution function of the d scalar random
variables X1, ... , Xd, which are the components of X . Ford = 1, F (x) is an
increasing function that is everywhere continuous from the right; also,

F (- oo) = lim F (x) = 0, F (oo) = lim F (x) = 1.
These properties carry over in an obvious way to the case d > 2. Every function
with these properties is called a distribution function. In applications, random
variables are defined in terms of their distribution functions. In fact, for given
F, it is always possible to construct a probability space (Q, 21, P) and a random
variable X in such a way that F is the distribution function of X. For example,
one might choose (Q, 21, P) = (Rd, $d, PF) and X (w) = w, where PF is the
probability uniquely defined on 8d by F.
If F is the distribution function of an Rd-valued random variable X, it is possible
to obtain the distribution function (marginal distribution) of a group
of k components (where k: d) of X by replacing in F (x1, ... , Xd) the un-
encountered arguments with oo. For example, we obtain the one-dimensional
marginal distributions as follows:
Fk(xk) = P1Xk<xkI = F(oo,..., 00) xk, oo, ... , oo).
If the probability PF generated on 8d by F is Lebesgue-continuous (that is, if
PF (N) = 0 for every set of Lebesgue measure zero belonging to $d), then PF
has a density (see section 1.3); that is, there exists an integrable function f (x)?
0 such that
xt U
F(x1,...,xd)= J
... f(Y1,...,Yd)dy1...dyd.
The distribution function F is then absolutely continuous (and hence continuous)

in the classical sense; that is, for every e> 0, there exists a b such that, for finite-
ly many disjoint intervals 11, ... , in, the total increase in F on these intervals is
less than e whenever their total length is less than b. Since F is an absolutely con-
tinuous function, it is differentiable [Al , and
ad F
f (xt, x2, ... , xd). ['l
3x1 3x2... axd =
In particular, equality holds wherever the density f is continuous.

An Rd -valued random variable has a normal distribution (a Gaussian distribution)
92 (m, C), where m E Rd and C is a positive-definite d x d matrix, whenever its
distribution function has density
f (x) = ((27t)d det (C))""112 exp 1 (x-m)' C-1 (x-m)).
The probability-theoretic significance of the parameters m and C will be made

clear in the next section.
1.3 Integration Theory, Expectation 7
1.3 Integration Theory, Expectation

To a real-valued random variable X defined on (Q, 2(, P) we now wish to assign
a specific number, namely its expectation. If X assumes only finitely many dis-
tinct values c1, ... , cn, the expectation is defined by
n
EX = > ci P (X =ci).
This is a special case of an integral which by an approximation process can be ex-

tended to arbitrary real-valued random variables.
Let us summarize integration theory for real-valued random variables over an
arbitrary space(Q, 21, P). An integral is defined in three steps:
Step 1. Suppose that
n
X = Z cilAj, U Ai=Q, AinAi=0(iAiE21,
i=1 i=1
is a so-called step function. We define
X(w)dP(w)= X dP= ciP(A1).

Q Q
Step 2. Now, let X> 0 be any measurable function. There exists an increasing
sequence {X.} of nonnegative measurable step functions such that
lim X. (w) = X (w) for all w E Q
n - oo
and
n
lim
oo
XndP=c:oo.
Q
Here, c is independent of the special sequence{Xn}. We define
JXdP=c.
Step 3. Now, let X denote any measurable function. We decompose it into posi-
tive and negative parts:
X = X+-X-, X+ = X IixeOll X- = -X 1jx<oi.

Then we define
XdP= X+dP-X-dP,
Q Q Q
assuming that this does not yield oo - oo.

A random variable X is said to be (P-) integrable if the integral J X dP is finite.
In probability theory, the integral is also called the expectation and one writes
E(X)=EXXdP.
For example, for every A E 21, we have E 1 A = P (A).
For Rd - and matrix-valued random variables, we define
EX1 X,
XdP=EX= X=
Q
E Xd Xd
and
A d P= E A= (E Aii) , A= (Aii )d x m
respectively. These expectations exist (in the sense of existence of expectation of

each component) if and only if E IXI <oo in the first case and E IAA <oo in the
second. In the remainder of this section, we shall discuss only Rd-valued random
variables unless the contrary is stated.
For actual evaluation of expectations (for example, from distribution functions),
the following transformation theorem is of importance: let g: denote
a measurable function. Then, for Y = g (X),
EY= g (x) dF (x),

J
Rd
In particular, for d = m and g (x) =_ x,
EX= xdF(x),
Rd
provided the integral in question exists. Here, F is the distribution function of X

and we write j g (x) dF (x) for the integral of g over the probability space (Rd,
Sd, Px)-the so-called Lebesgue-Stieltjes integral. If it exists, it coincides for con-
tinuous g with the usual Riemann-Stieltjes integral.
In the case of an integrable discrete random variable that assumes the values
c, E Rd with probabilities pi, the last formula yields
0
EXcipi,
i-t
1.3 Integration Theory, Expectation 9
and, for an integrable absolutely continuous random variable with density / (see
the Radon-Nikodym theorem below),
x1 / (x1, ... , xd) dxl ... dxd

Rd
EX= J
x/(x)dx=
Rd Xd / (x1, ... , xd) dx1 ... dxd
Rd
J
x1 /1 (x1) dx1
/i is the density of X.
Here, these integrals are in general Lebesgue integrals, which we shall treat at
the end of this section.
Suppose that, for p? 1,
QP = QP (a, 21, P) _ X: X is an Rd-valued random variable,
EIXIP<oo}.
We have 2P C 29 (where p>_ q) and 2P is a linear space. If in QP we shift to the
set LP of equivalence classes of random variables that with probability 1 coin-
cide and if we set
IIXIIp = (E IXIP)11p,
then, LP is a Banach space with respect to this norm. In fact, L2 is a Hilbert
space with scalar product
d
(X, Y)=EX'Y= EXiYi.
i=1
InL',wehaveIEXI - EIXIand
E (a X+ P Y) = a E X+ fi E Y (Linearity)
For d = 1, we have
EX E Y for X: Y (Monotonicity).
In addition, we have Holder's inequality (for p = 2, Schwan's inequality)
I(X, Y)I = IIXIIp 11Y119 (p> 1, 1/p+ 1/q =1, X E LP, Y E L9),
Minkowski's inequality (the triangle inequality in LP)
IIX + YIIp < IIXIIp'+" 11 YIIp (p i=1, X, YELP),
and Chebyshev's inequality

PlIX I>cl <c-PEJXVP (p>O,c>O).
For d = 1, the number
V (X) = E (X - E X )2 = a2 (X), X E L2 ,
is called the variance of X, the number a = (V (X ))r 12 is called the standard de-
viation of X, the number E Xk (where k is a natural number and X belongs to
Lk) is called the kth moment of X, the number E (X -E X)k is called the kth
central moment of X, and the number
Cov (X, Y) = E (X-EX)(Y-E Y)
(where X and Y belong to L2) is called the covariance of X and Y. If cov (X, Y) _
0, we say that X and Y are uncorrelated. For d > 1, the symmetric nonnegative-
definite d x d matrix
Cov (X, Y) = E (X - E X) (Y - E Y)'

= (Cov (Xi, Y))
is called the covariance matrix of the Rd -valued random variables X and Y. For
Cov (X, X), we write simply Cov(X). The characteristic function of a random
variable X or of its distribution function F is
9'(t)=px(t)=EeiPX= ( eiradF(x), tERd.

Rd
The distribution function F is uniquely defined by q'.

A random variable X that has the normal distribution % (m, C) defined in section
1.2 has expectation vector E X = m, covariance matrix Cov (X) = C and charac-
teristic function
p(t)=exp(it'm-1 t'Ct tERd.

`` 2
This last relation can also serve as a definition of normal distribution, provided C
is nonnegative-definite (and the distribution can therefore be concentrated on a
certain linear subspace of Rd whose dimension is equal to the rank of C). For
d = 1, % (m, a2) has central moments
n?I, odd,
E (X - m)" =
1.3.5-... (n-1) a", n?2, even.
If X has, for d 1, the distribution % (m, C), A is a fixed p x d matrix, and a is a

1.3 Integration Theory, Expectation II
member of RP, then Y = A X + a has the distribution 91(A m+ a, A C A'). In

particular, every group of components of a normally distributed vector is nor-
mally distributed.
Some important convergence theorems dealing with the integration theory sum-
marized here are:
a) Theorem on monotonic convergence: If {X.} is an increasing sequence of
nonnegative random variables, then
lim E (Xn) = E (lim Xn).
n_o n-oo
b) Theorem on dominated convergence: Let (Xn) denote a sequence of random
variables such that X E LP, where p ? 1,
st-lim Xn = X
n -* 00
(see section 1.4), and IX,, I:-5 YELP. Then, X E LP, IIXn - X JJP --.0, and
lim EXn=EX.
This theorem holds, in particular, when almost certain convergence of X. to X
rather than stochastic convergence obtains.
An integration theory can be developed for an arbitrary measure space (S2, 21, p),
in exactly the same way as in the case pu (Q) = 1. However, the relation £P C 2q
for p>q is false in the case p (Q)=oo. The only infinite measure space that we
consider here is the space (Rd, 23d, 1). The resulting integral is called the
Lebesgue integral. We write
f fd1= f f(x)dx
Rd Rd
and we define, for every Borel subset B of Rd,
f (x) dx = J
f IB d1.
J
B Rd
In the case d = I and B is[a, b[,(- oo, b),R1, etc., the notations
b 6 CO
J
f (x) dx, f f (x) dx, J
f (x) dx
are the usual ones. The Lebesgue integral is more general than the familiar Rie-
mann integral defined in terms of upper and lower sums, inasmuch as it is de-
fined for more functions. Every bounded Riemann-integrable (for example, every
continuous) function defined on a bounded interval is also Lebesgue-integrable
and the two integrals coincide. The same holds for nonnegative integrands and
for improper Riemann integrals. All the integrals over Rd or subsets of it that
we have been considering are basically Lebesgue integrals though, for the most
part, they might, by virtue of the smoothness of the integrands, be regarded as
Riemann integrals.
Let v and u denote any two measures on(Q, 21). The measure v is said to bep-
continuous if v (N) = 0 wheneverp (N) = 0. Also, v is said to havep-density
f __>_ 0 whenever
for allAE21.
A
We then write
f=dp
Radon-Nikodym theorem: Let v and u denote two measures defined on (Q, 21),
and suppose that p is sigma-finite. Then, v isp-continuous if and only if v has a
p-density. This density is uniquely defined [p] . We then have
Xdv=Xdvdu
sZ Q du
as long as one side of this equation is meaningful.
If (Q, 21) = (Rd, `$d) and p = A, we also speak of Lebesgue-continuity. If v = P
is a Lebesgue-continuous probability on (Rd, %d) with distribution function F,
then there exists a density f ? 0 uniquely defined [A] such that
P (B) = f (x) dx for all B E 8d,

J
B
and we have [A] (in particular, wherever f is continuous)
ad F
(XI, ... , xd).
axl ... axd
1.4 Convergence Concepts

Let X and X,,, where n >1, denote Rd -valued random variables defined on a
probability space(Q, 21, P). The following four convergence concepts find ap-
plication in probability theory:
a) If there exists a set of measure zero N E 21 such that, for all w j N, the
1.4 Convergence Concepts 13
sequence of the X,, (w) E Rd converges in the usual sense to X (w) E Rd, they,
is said to converge almost certainly [P] or with probability 1 to X. We
write
ac-lim X,, = X.
n -+oo
b) If, for every e> 0,

Pn (e) = P {w: I X. (w) - X (w) I > e} -- 0 (n -- oo),
then (Xn} is said to converge stochastically or in probability to X, and we write
st-lim X. = X.
n-oo
c) If X. and X belong to LP andE IX.-XIP-.0, then (X.) is said to converge
in p th mean toX. For p = 1, we speak simply of convergence in mean; for p = 2,
we speak of convergence in mean square or in quadratic mean and we write
qm-lim
n-oo X. = X.
d) Let F. and F denote the distribution functions of X. and X. Then, if
n
lim
+Rd
CO
j g (x) dFn (x) _ g (x) dF (x)
Rd
for every real-valued continuous bounded function g defined on Rd, the se-
quence {X , } is said to converge in distribution to X. This is the case if and only
if
lim F. (x) = F (x)
n-.oo
at every point at which F is continuous or
lim qPn (t) = T (t) for all t E Rd,
n+oo
where ipn and q' are characteristic functions.
These convergence concepts are in the following relationship to each other:
convergence in q th mean
4
convergence in p th mean, where (p < q)
4
almost certain convergence = stochastic convergence convergence in distribution
Furthermore, a sequence converges stochastically if and only if every subsequence

of it contains an almost certainly convergent subsequence. A sufficient condition
for X.-+X (almost certainly) is the condition
00
E JXn-XIP < oo for some p>0.
n=1
Let {X.} denote a sequence of Rd-valued random variables with distribution
9t (m., Q. This sequence converges in distribution if and only if
mn-rm,
The limit distribution is 9t (m, C). This follows from consideration of the char-
acteristic functions.
1.5 Products of Probability Spaces, Independence

Suppose that we are given finitely many measurable spaces (514, 21i) for i = 1,
.... n. From these spaces, we construct a product measurable space (Q, 21) in
the following way: the set 9 is the Cartesian product
n
a=X A= Dix...XQn,
i=I
consisting of all n-tuples to = (w1, ... , w,) such that wi E Q.
In 51, we define the product sigma-algebra
n
21=X 21i=21,X...x21n
i=1
as the sigma-algebra generated by the cylinder sets in 51

A=A1x... xAn, AiE21i.
The sigma-algebra 21 is also the smallest sigma-algebra with respect to which the
projectionspi: 51- Q defined bypi (w) = wi ("projection onto the ith axis")
are (21- 2[i )-measurable.
If probabilities Pi are given on (51i, 21i), then there exists on (9, 21) exactly one
probability P, the so-called product probability
n
P=X Pi= Pix...XPn,
4=I
with the property

P(A1x... xAn)=P(A1)...P(An) forallAiE214.
This holds also for sigma-finite measures Pi.
In the case (511, 211) = ... = (Qn, 21,,), we write for the product space (01, 21;).
For example, (Rd, $d)=((RI)d, (ZI)d)and Lebesgue measure in Rd is the
product of the d one-dimensional Lebesgue measures.
1.5 Products of Probability Spaces, Independence 15
Fubini's theorem, written for the case n = 2, is as follows: Let X denote a non-
negative or (P1 X P2)-integrable X 212 - $1)-measurable scalar function de-
fined on Sgt X Q2. Then,
X d (Pt X P2) =
J
(f X (wt, (02) dP2 (w2)) dP1 (oil)
Q1 x Q2 Q1 Q2
_
I: ( J X (wt, (02) dP1 (wt)) dP2 (w2)
Product probability spaces serve as models for describing composite experiments

that consist in carrying out n individual "mutually unaffected" experiments
(statistically independent experiments [see below] ) either simultaneously or
successively.
The theory of product spaces can be extended to products of an arbitrary
family {(S2i, 21i, Pi)}iE, (where the index set I is not the empty set) of prob-
ability spaces. Suppose that
a= X Qi
iEl
is the set of functions w defined on I and assuming at the point i E Ia value

wi E Qi and that
21=X21 i
iE1
is the sigma algebra in 0 generated by the cylinder sets

A = X Ai, Ai E 21i, Ai * Qi only finitely many times.
iEl
Again, 21 is also the smallest sigma-algebra with respect to which every pro-
jection pi defined by pi (w) = wi is measurable. Then, there exists exactly one
product probability P defined on (9, 21), that is, one probability P that assigns
to the cylinders the value
P (X A) = H Pi (Ai)
,El iEl
The projections pi have distributions Pi.

A fundamental concept of probability theory is that of independence of events
or of random variables. Let (S2, 21, P) denote a probability space. The events
Al, ... , A are said to be (statistically) independent if
P(Atn...r) A.)=P(A,)...
Sub-sigma-algebras 211, ... , 21 of 21 are said to be independent if this equation
holds for every possible choice of events Ai E %i. Finally, random variables
X1, ... , X. (whose ranges may differ for different values of the subscript) are
said to be independent if the sigma-algebras 21 (X1), ... , 21 (Xn) generated by
them are independent.
Any family of events (sigma-algebras, random variables) is said to be independ-
ent if the events belonging to every finite subfamily of that family are inde-
pendent in the sense of the definition given above.
In a product probability space, the projections are independent random vari-
ables. For a given sequence F1, F2, ... of d-dimensional distribution functions,
we can therefore construct an explicit sequence X1, X2, ... of independent
random variables such that the distribution function of Xi is Fi. We do so by
choosing Q = (Rd)°° = set of all sequences w= (w1, w2, ...) with elements
wi E Rd,21= (%d)°°, P = PF1 X PF1 x ..., and Xi (w) = pi (w) = wi.
Let Xl denote an Rd -valued and X2 an R"`-valued random variable. Let F (x1, x2)
denote their joint distribution function and let Fl (xl)=F (xl, oo) and F2 (x2)=
F (oo, x2) denote the marginal distributions of X1 and X2, respectively. Then, Xl
and X2 are independent if and only if, for every x1 E Rd and X2 E Rm,
F (x1, x2) = Fl (xl) F2 (x2)

or, in case of existence of densities,
1 (xi, x2) = fl (xl) /2 (x2)

This can be carried over in an analogous manner to the case of n > 2 random
variables.
Let X1, ... , X. denote independent real-valued integrable random variables.
Then, their product is also integrable and
n
E Xi)=K E(Xi).
i-t r
If t he Xi belong to L2, they are uncorrelated and
V (XJ + ... +Xn) = V (X1)+ ... +V (X.).
If X = (XI, X2) has a normal distribution, uncorrelatedness of Xl and X2 im-

plies their independence. If the Xi are independent and have distribution
% (mi, C,,), then
Sn = Xi
i-1
has distribution
1.6 Limit Theorems 17
1.6 Limit Theorems

For an arbitrary sequence {An} of events in a probability space (Q, 2(, P), the set
A = {w: w E A. for infinitely many n}
is also an event. With regard to its probability, we have the
Borel-Cantelli lemma. If P (An) < oo, then P (A) = 0. If the sequence (An) is
independent, then, conversely, I P (An) = oo implies P (A) = 1. In order to give
the simplest possible form to the following theorems, we shall consider an inde-
pendent sequence (Xn) of real-valued identically distributed random variables de-
fined on(9, 2I, P), all with the same distribution function F, and we shall study
the limiting behavior of the partial sums
Sn=Xr+...+Xn.
The strong law of large numbers.
ac -lim S" = c (finite)

n+oo n
if and only if E Xt exists. If it does, then c = E X1.

Law of the iterated logarithm. Suppose that V (X,)=02< oo. If we set EX, = a,
then, with probability 1,
I^mssup
S -na
2nloglogn
and
lim inf
S .-n a -Ian.
n--°° 2nloglogn =
(Here, "log" denotes the natural logarithm.)
All versions of the central limit theorem assert that the sum of a large number of
independent random variables has, under quite general conditions, an approxi-
mately normal distribution. In our special case of identically distributed sum-
mands, it is as follows: Suppose that 0 < V (Xn) = a2 < oo and that E (Xn) = a.
Then, (Sn- a n)/a f tends in distribution to 91(0,1); that is, for all x E R',
nlym P [Sa n _ xJ i e-r'IZ dy,
1.7 Conditional Expectations and Conditional Probabilities

Let (Q, 21, P) denote a probability space. The elementary conditional probability
of an event A E% under the condition B E 21, where P (B) > 0, is
P(AIB)=P(AnB).
P (B)
However, we frequently encounter a whole family of conditions, each of which
has probability 0. Then we need the following more general concept of condi-
tional expectation.
Let X E L1 (Q, 21, P) denote an Rd -valued random variable and let ( C 21 de-
note a sub-sigma-algebra of 21. The probability space (Q, U, P) is a coarsening
of the original one and X is, in general, no longer ( -measurable. We seek now a
1 -measurable coarsening Y of X that assumes, on the average, the same values
as X, that is, an integrable random variable Y such that
Y is (!-measurable,
YdP=JXdP for all CECF.
C c
According to the Radon-Nikodym theorem, there exists exactly one such Y, al-
most certainly unique. It is called the conditional expectation of X under the
condition (!. We write
Y=E(XI(F).
As a special case, we consider a sub-sigma-algebra Cr whose elements are arbi-
trary unions of countably many "atoms" (A,,) such that
0
U
A=1
The quantity E (X I(S) is constant on the sets A,,. ForP (A,,) > 0,
(AS)
E (X () (w) = E (X I A.) = X dP for all w E A,, ,
P J
with the value for P (A.) = 0 not specified.

Therefore, the conditional expectation is, for fixed X and CF, a function of
w E Q. It follows from the definition that, in particular,
E(E(XI([))=E(X)
and
J E (X j(F)I < E (IXI I1) almost certainly.

Other important properties of the conditional expectation are as follows (all
the equations and inequalities shown hold with probability 1):
1. 7 Conditional Expectations and Conditional Probabilities 19
a) (S={0,Q} -*E(XI(F)=E(X),
b) X>0 =o E(XI(E)?0,
c) X CS-measurable -* E(XI(E) = X,
d) X = const = a - E (XI(E) = a,
e) X,YEL' -.E(aX+bYI(E)=aE(XICS)+bE(YI(E),
f) X <_Y ..E(Xl(F)_E(YICS),
g) X CS- measurable , X, X Y' E Ll - E (X Y'l(E) = X E (Y'I (F),
in particular E (E (XI(F) Y'l(F) = E (XI(F) E (Y'l(s),
h) X, CS independent - E (X I (F) = E (X).
For later use, we point out in particular, that, for (SI C CsZ C 21,
(1.7.1) E (E (X leDler) = E (E (X I(S1)I(Sz) = E (X I(!,).
The conditional probability P (A I (F) of an event A under the condition CS C 21 is
defined by
P(AI(E) = E(141(S)
Being a conditional expectation, the conditional probability is aQ -measurable
function on Q. In particular, for a CS generated by at most countably many
atoms {An},
P(AI(F)(w)(A( jn) for all wEAnsuch that P(A.)>0.

P
From the properties of conditional expectation, it follows in particular that
0:P(AI(F)<_1,P(0 l(F)=0,P(DI(E) = 1, and
{An} pairwise disjoint in 21,

=1 n=1
with probability 1 . However, since P (AI(F) is defined only up to a set of mea-

sure 0 depending on A , it does not follow that P ( l ) is, for fixed w E Q, a
probability on 21. On the other hand, for a random variable X, consider the
conditional probability
P (X E BI (F) = P ({w: X (w) E B} (Cs), BE $d.
There exists a function p ((o, B) defined on .Q X $d with the following prop-

erties: For fixed w E A the function p (w, ) is a probability on $d; for fixed
B, the function p ( , B) is a version of P (X E B I CS); that is, p (-, B) is CS-mea-
surable and
P(CfIX EBI)= J p(w,B)dP(w) forallCE(F.

C
Such a function p (uniquely defined. up to a set of measure 0 in (F that is inde-

pendent of B) is called the conditional (probability) distribution of X for given
(F. For g (X) E LV,
E(g(X)I f)= J
g(x)p(w,dx).
Rd
If (Q', 21') is a measurable space and (I = 21 (Y) is the sigma-algebra generated by

an Q'-valued random variable Y, we write
E(XI(F)=E(XIY).
For every 21(Y)-measurable random variable Z, there exists a measurable func-
tion h such that Z = h (Y); that is, the value of Z is fixed by the value of Y at w.
This h is uniquely defined up to a set N of images of Y such that P [Y E NI = 0.
For the conditional expectation E (X I Y), this means the existence of a measur-
able function h defined on Q' such that
E(XIY)=h(Y).
We write suggestively
h(y)=E(XIY=y)
In the special case of the conditional probability P (X E B I Y), we go on to the
conditional distribution p (w, B), for which there is now an almost certainly
unique q such that
p (w, B) = q (Y (w), B).
We also write
q(y,B)=P(XEBIY=y),
which is measurable with respect toy and, for fixed y, is a probability with
respect to B. For g (X) E Lr,
(1.7.2) E(g(X)IY=y)= J g(x)q(y,dx)= J

g(x)P(dxlY=y)
Rd Rd
and
P[XEBI= P(XEBIY=y)dPy(w'),
Q
where Py is the distribution of Y.
If q (y, ) = P (X E I Y = y) has a density h (x, y) in Rd, this density is called
the conditional density of X under the condition Y = y.
1.8 Stochastic Processes 21
Example. Let X denote an Rd-valued and Y an RP-valued random variable and

suppose that their joint distribution in Rd*P has a density I (x, y). Then,
P (X E BI Y = y) has a density h (x, y) which, for all y such that the marginal
density
12(Y)= f f(x,y)dx
Rd
of Y is positive, has the form
1(x, Y)
h (x, y) =
12 (Y)
For all integrable functionsg (x), we have, in accordance with formula (1.7.2),
J g(x)1(x,y)dx
E(g(X)IY-Y)- Rd
1z (Y)
If X and Y are independent, then 1(x, y) =1r (x) 12 (y), so that h (x, y)=1t (x).
Suppose that the joint distribution of X and Y is a normal distribution 9 (m, C).
Let us write E X = ms, E Y = my, and
E (X - ms) (X - mi)' E (X - (Y - mr)' C= C=r

C = _ .
E(Y-my)(X-mx)'E(Y-my)(Y-mr)' (CrsCY)
Then, for positive-definite Cy, the conditional density of X given Y = y is the

density of the d-dimensional normal distribution
Cr' (Y-my), Crr Cr:)
1.8 Stochastic Processes
The example cited in section 1.1 of the measurement of water level during an in-
terval of time (ta, T) and the description of the position of a particle subject to
Brownian motion as a function of time make it necessary to consider simulta-
neously a family of random variables that depend on a continuous parameter
(time).
More generally, let 1 denote an arbitrary nonempty index set and let (a, 21, P)
denote a probability space. Then, a family {X,; t E 1) of Rd-valued random vari-
ables is called a stochastic process (random process, random function) with pa-
rameter set (index set) l and state space Rd.
If I is finite, we are dealing simply with finitely many random variables. In the
case 1= { ... , -1, 0, 1, ... } or { 1, 2.... }, we speak of a random sequence or a
time series. It is preferable to reserve the term "process" for! uncountably in-
finite.
In what follows, l is always an interval (to, T(, where to<T, of the real axis R1.
We interpret the parameter t as time. We wish to admit the cases to = - oo and
T=oo. Then, (to, T} is interpreted as (- oo, T(, (to, oo), or (- oo, oo).
If {X,; t E (to, T(} is a stochastic process, then,X, (-)is, for every fixed i E (to, T I,
an Rd-valued random variable whereas, for every fixed w E Q (hence for every
observation), X. (w) is an Rd-valued function defined on (to, T], hence an ele-
ment of the product space (Rd)<<o. Tl. It is called a sample function (realization,
trajectory, path) of the stochastic process.
The finite-dimensional distributions of a stochastic process {X,; t E (to, T(} are
given by
P (X, <x] = F, (x),

P (X81 < X1' X12 5X2( = Fil, 92 (x1, x2),
P (X,1 < x1 , ... , X'.:5 xn} = F,1.... , I. (x1, ... , xn),
where t and t; belong to Ito, TI, x and xi belong to Rd (the symbol : applies to
the components), and n > 1.
Obviously, this system of distribution functions satisfies the following two con-
ditions:
a) Condition of symmetry: if {il, ... , in} is a permutation of the numbers 1,
n, then for arbitrary instants and n ? 1,
Flit..... #in (xil, ... , xi,) = F,1 ,n (x1, ... , xn).
b) Condition of compatibility: /for m <n and arbitrary t,+1, , to E (to, T},
F81 . In. in+1. . to (xl, ... , x,,,, co, ... , oo) = F11,..., ,m (x1, ... , x,)-
In many practical cases, we are given not a family of random variables defined on
a probability space but a family of distributions P,1 ,n (B1,... , Bn) or their
distribution functions F,1, , In (xl, ... , xn)which satisfy the symmetry and com-
patibility conditions. That these two concepts are equivalent is seen from the fol-
lowing theorem:
(1.8.1) Kolmogorov's fundamental theorem. For every family of distribution
functions that satisfy the symmetry and compatibility conditions, there exists a
1.8 Stochastic Processes 23
probability space (9, 21, P) and a stochastic process (X,; t E Ito, TJ) defined on
it that possesses the given distributions as finite-dimensional distributions.
In particular, if we start off with given distributions, we shall always assume the
following choice to have been made:
Q= (Rd)11o, T I
= set of all Rd-valued functions to = w (.) defined on
(to, TJ,
2( = ($d)[90. TI = product sigma-algebra generated by the cylinder
sets,
X, (w) = w (t) = projection of w onto the "t-axis", that is, the value
of the function w at the point t.
Now, the probability P on (Q, 21) is not (as in section 1.5 for independent X,)
simply the product probability but is determined on the cylinder sets by
,,,(B1,...,B.)
and can be continued in a unique manner to all21. Henceforth, this canonical
choice, for which the elementary events coincide with the sample functions, will
be the one made.
Also, for(X,; t E (to, TJ} we shall write brieflyX, orX (t), usually omitting the
variable to.
Two stochastic processes X, and X, defined on the same probability space are
said to be (stochastically) equivalent if, for every t E (to, TJ, we have X, = IS
with probability 1. Then, X, is called a version of X, and vice versa. The finite-
dimensional distributions of X, and X, coincide. However, since the set N, of ex-
ceptional values of w for which X, - X, depends in general on t, the sample
functions of equivalent processes can have quite different analytica properties. For
example, for Q = J to, T J = 10, 11 and P =1, the processes X, (w) = 0 and
(0, w # t,
T, (w) _
are equivalent. Nonetheless, every sample function of X, is a continuous function

whereas no sample function of X, is. To avoid such phenomena, we shall in what
follows always assume without further mention that a separable version (which
always exists) has been chosen. A process X, is said to be separable if there ex-
ists a countable set (t1 , t2, ... } = M of instants, dense in the interval (to, TI,
and a set N E 21 of P-measure 0 such that, for every open subinterval (a, 6) of
(to, T J and every closed subset A of Rd, the two sets
(w: X,(w)EA for all tE(a,b)t1M}E21

and
{w: X, (w) E A for all t E (a, b)} (in general not measurable)
differ, if at all, only on a subset of N. If we arrange for all subsets of the sets of
measure zero to belong to 21 (which is always possible), the second set will also
belong to 21 and will possess the same probability as the first.
How can we tell from the finite-dimensional distributions of a process whether
this process has continuous sample functions or not? The following criterion of Kol-
mogorov asserts that only the two-dimensional distributions are necessary for
this: Let a, b, and c denote positive numbers such that, for t and s in [to, T J,
(1.8.2) E (X,-X,la < c It-s(1+b
Then, (a separable version of) X, possesses with probability 1 continuous sample

functions.
A stochastic process is said to be (strictly) stationary if its finite-dimensional
distributions are invariant under time displacements, that is, if, for t;, t; + t E
(to, TI,
F,j+1....,tn+t (xl, ... , xn) = Ft]...., In (ii, ... , in)
For stationary processes, t is generally a member of RI. If in addition X, E L2 for
all 9, it then follows that E X, = m = const and Cov (X,, X,) = C (t - s). A pro-
cess with the last two properties is said to be stationary in the wide sense. If it
has the property lim E (X,-X,(2 = 0 (mean-square continuity), then the co-
variance matrix C has the representation
0
C(t)= J
e"adF(u), -oo<t<oo,
-00
where the d X d matrix F'(u) _ (F;l (u)) is the so-called spectral distribution func-
tion of X. This matrix has the following properties: a) for arbitrary ul < u2, the
matrix F (u2) - F (u1) is nonnegative-definite and b) tr (F (oo) -F (- 00)) < oo .
From an intuitive standpoint, F gives the distribution of the frequencies of the
harmonic oscillations participating in the construction of X. If F has a density
f, this is called the spectral density of X,. If
0
1C (t)l dt < oo,
1
-00
/ is obtained from the inversion formula
(u) = 21a e-'tn C (t) dt.

00
1.9 Martingales 25
An Rd-valued stochastic process is called a Gaussian process if its finite-dimen-

sional distributions are normal distributions, hence, if the joint distribution of
X,1, ... , X,n has the following characteristic function:
In (U1, exp (L I uk m (tk)-' I 4 ui C (tk, t) Ui

k=1 2 k=1 j=1
u1,..., u,ERd, t1,...,
Here, m (t) = E X, and C (t, s) = Cov (X X,).
Therefore, the finite-dimensional distributions of a Gaussian process are uniquely
determined by the two functions m (t) and C (t, s) (hence by the first and second
moments). A Gaussian process that is stationary in the wide sense is strictly sta-
tionary.
1.9 Martingales
Let (0, 21, P) denote a probability space, let {X,; I E Its, T]} denote anRd-
valued stochastic process defined on (Q, 21, P), and let {21,},El,o, TI denote an in-
creasing family of sub-sigma-algebras of 21, that is, one having the property
21, C 21, for to <s_St_T.
If X, is 21, -measurable and integrable for all t, then the pair {X 21,}tEl,o,Tl is
called a martingale if
E (X, [ 21,) = X, almost certainly
for all sand tin [to, TI, where s < t. 1 f X, is a real-valued process and if we re-
place the equality sign in the last formula with < orwhat we have is a super-
martingale or a submartingale. In particular, if
21,=21([to,t])=21(X,;toys<t),
that is, the history of the process X, prior to the instant t, is chosen as a condi-
tion, then X, is called a martingale (or a supermartingale or submartingale).
Martingales are an abstract presentation of the concept of "fair game" and they
constitute one of the most important tools in the theory of stochastic processes.
Sample functions of a (separable) martingale have no discontinuities of the second
kind; that is, they have, at worst, jumps.
Let X, and Y, denote two martingales with respect to the same monotonic
family 21, Then, A X, + B Y, (where A and B are fixed p x d matrices) is a
martingale and, in particular, X, - X,o is a martingale. Furthermore, for every
martingale X, the process [XJP (where p ? 1) is a submartingale whenever
X, E LP. For a real-valued martingale X, X$ = max (X 0) and X,- = max
(- X 0) are submartingales. The martingale X, is a submartingale if and only if

-X, is a supermartingale. For a supermartingale (resp. submartingale), E X, is a
monotonically decreasing (resp. increasing) function. We have the following
Convergence theorem: If {X,, 21,1 is a supermartingale that satisfies the condition
sup E X, < oo,

Ito,TI
then the limit

ac-limX,=X
exists and belongs toL' . This holds in particular for X, > 0 or
sup E (X,) < oo.
[to, T I
If [a, b] is a bounded subinterval of (to, T[, then the so-called supermartingale

inequalities
cP[sup X,>_c]<_EXa+EXy,
la, 61
c P (Iianbfl X, < - c] < E Xb
hold for every supermartingale and positive number c.

Since the process - IX,IP is a supermartingale if X, E LP (where p? 1) is an Rd-
dimensional martingale, it follows in particular that
(1.9.1) P [ sup IXJ _> c) '5 E lXblP/cP for all c > 0 .

Ia,bI
Furthermore, for every martingale X, E LP (where p> 1),
(1.9.2) E (sup JX,IP) = ( P )P E lXbVP.

Ia,bl p-1
Chapter 2
Markov Processes and Diffusion
Processes
2.1 The Markov Property

The groundwork for the theory of Markov stochastic processes was laid in 1906
by A. A. Markov who, in his investigation of connected experiments, formulated
the principle (now named after him) that the "future" is independent of the
"past" when we know the "present".
On the other hand, this principle is the causality principle of classical physics
carried over to stochastic dynamic systems. It specifies that knowledge of the
state of a system at a given time is sufficient to determine its state at any future
time. An analytical example of this may be seen in the theory of ordinary dif-
ferential equations: the differential equation
z,=/(t,x,)
states that the change taking place in x, at time t depends only on x, and t and
not on the values of x, for s <t. A consequence of this is that, under certain con-
ditions on /, the solution curve for x, is uniquely determined by an initial point
(to, c):
x,=x,(to,c), t>to, x,0=c.
Further information about the state x, at previous times s<t is therefore not
necessary for determination of the solution curve. We say that the system has
no after-effects or "no memory".
If we carry this idea over to stochastic dynamic systems, we get the Markov
property. It says that if the state of a system at a particular time s(the present)
is known, additional information regarding the behavior of the system at times
t <s (the past) has no effect on our knowledge of the probable development of
the system at t > s (in the future).
We shall now give a formal mathematical definition of the Markov property as a
property of certain stochastic processes.
Let (X,; t E [to, T[} denote a stochastic process whose state space is a d-dimen-
sional Euclidean space Rd (for d 1) and whose index set is an interval [to, T j
28 2. Markov Processes and Diffusion Processes
of the real axisR'. For our purposes, it will be sufficient in all cases to assume
[to, TI C 10, oo) = R+.
Thus, 0:5 to < oo. Here, we admit T = oo, in which case (to, TI should be inter-
preted as Ito, oo). For (X,; t E }to, TI} we shall write simplyX,. We shall refer to
the index t as the "time". We shall always assume that the state space Rd is
endowed with the sigma-algebra $d of Borel sets.
The process X, is defined on a certain probability space (0, 21, P) . A sample
function X. (w) is therefore an Rd -valued function defined on the interval
Ito, T j . We emphasize again that we always assume that the choice
Q = (Rd)Ito,TI
is made, where (Rd)Ito,TI is the space of all Rd-valued functions defined on the
interval (to, TI,
21 = ($d)lto.Tl
is the product sigma-algebra generated by the Borel sets in Rd, and X, = to (t)
for all w E 0. Then, P is the probability uniquely defined (according to Kolmo-
gorov's fundamental Theorem (1.8.1)) by the finite-dimensional distributions of
the process X, on (0, 21). If we have further information regarding the analyti-
cal properties of the sample functions, we can choose for Q certain subspaces of
(Rd)Ito.TI (for example, the space of all continuous functions).
Suppose that, for to < t1 < t2 < T,
2 1 (Jt1, t21) = 21 (X,, t1 < t :_5 t2)
is the smallest sub-sigma-algebra of 21 with respect to which all the random vari-
ablesX for t1 <t 9 t2, are measurable. In terms of our time figure, 21 (It,, t2J)
contains the "history" of the process A, from time t1 to time t2, that is, those
events that are determined by the conditions imposed on the course of the pro-
cess X, during the interval [t1, t21 and at no other time. 21 (It,, t21) is generated
by the cylinder events
(w: X,1 (w)EB1,..., =[X,1EB1,... , X,,,EBn1,

t1:s1<... B;E$d.
We can now formally define a Markov process:
(2.1.1) Definition. A stochastic process (X t E (to, TJ} defined on the prob-
ability space (.Q, 21, P) with index set Ito, T} C 10, oo)-and with state space Rd
is called a Markov process (or an elementary or weak Markov process) if the fol-
lowing so-called Markov property (or elementary or weak Markov property) is
satisfied: for to < s t <_ T and all B E $d, the equation
(2.1.2) P (X, E B 121(Ito, sJ)) = P (X, E BMX,)
2.2 Transition Probabilities, the Chapman-Kolmogorov Equation 29
B2
Fig. 1:
Cylinder event.
1
tt
L
Si
I l
Se
t t
t2
-
holds with probability 1. We summarize the various equivalent clarifying formu-
lations of the Markov property in
(2.1.3) Theorem. Each of the following conditions is equivalent to the Markov
property:
a) For toys<_t<T and AE 21(it,TI),

P (A121([to, sj)) = P (AI X,).
b) For to -,< s-:5 t-:5 T and Y 21 (It, T])- measurable and integrable,
E (YI21(Ito, sJ)) = E (YI X,)
c) For to<tl <C-512-:5 T, A, E 21(Ito, 111) and A2E21(It2, TJ),
P (A1 n -421X,) = P (A1 11,) P (A21 X,)-
d) For n>l,to -t1<... <t.<t:5T and BEZd,
(All equations asserting equality of conditional probabilities hold with prob-

ability 1).
Proof of these assertions can be found, for example, in Doob [3) , pp. 80-85.
(2.1.4) Remark. A verbal formulation of Theorem (2.1.3c) is as follows: Given
a Markov process, the past and future are statistically independent when the
present is known.
2.2 Transition Probabilities, the Chapman-Kolmogorov Equation
Let X,, for g o <t:5 T, denote a Markov process. In accordance with what was
said in section 1.7, there exists a conditional distribution q (X B) = P (s, X t,
B) corresponding to the conditional probability P (X1 E BJX,). The function
P (s, x, t, B) is a function of the four arguments s, t E Ito, TI (with s -:5-i t),
x E Rd, and B E !8d It has the following properties:

(2.2.1 a) For fixed s :S t and B E V, we have with probability 1
P (s, x, t, ) is a probability on $d for fixed s:_5 t and x E Rd.

(2.2.1c) P (s, , t, B) is $d-measurable for fixed s:-:5 t and B E cad.
(2.2.1 d) For to < s < u < t < T and B E %d and for all x E Rd with the pos-
sible exception of a set N C Rd such that P IX, E N J = 0, we have
the so-called Chapman-Kolmogorov equation.
(2.2.2) P (s, x, t, B) _ P (u, y, t, B) P (s, x, u, dy).

Rd
Fig. 2: x
The Chapman-Koimogorov equation.
This can be explained as follows: we have, with probability 1,
21 (Ito,sl))
= E (P (X, E B 121 ([to, uJ)12 Qto, sJ))
= E (P (X, E B IXU)121 (Ito, sJ))
= E (P (u, X,,, t, B) I X,)
= J
P (u, y, t, B) P (s, X,, u, dy).
Rd
Here, we used first the Markov property, then the relationship

21 (Ito, sJ) C 21 (Ito, uj), S:5 u,
and Eq. (1.7.1), again the Markov property, and finally Theorem (2.1.3b) and
Eq. (1.7.2).
One can modify the function P (s, x, t, B) in such a way that (2.2.2) holds for
all x E Rd (without destroying properties (a)-(c)). Henceforth, we shall always
assume this to have been done.
2.2 Transition Probabilities, the Chapman-Kolmogorov Equation 31
Furthermore, it is always possible to choose P (s, x, t, B) in such a way that

(2.2.1e) For all s E [to, TJ and B E $d, we have
(1 for x E B,
P (s, x, s, B) = IB (x) = jl
0 for x4$ B.
This last statement follows from the fact that
P (X, E BMX:) =1Ix,EBI
for all [P] values X, = x.
(2.2.3) Definition. A function P (s, x, t, B) with the properties (2.2.lb-e)
(where (2.2.2) is satisfied for all x E Rd ) is called a transition probability (transi-
tion function). If X, is a Markov process and P (s, x, t, B) is a transition prob-
ability, so that (2.2.1 a) is satisfied, then P (s, x, t, B) is called a transition prob-
ability of the Markov process X, . Then, for fixed s, t E [to,T J such that s < t, it
is uniquely defined as a function of x and B with the possible exception of a set
N of values of x (independent of B) such that P [X, E NJ = 0.
We shall also use the notation
P (s, x, t, B) = P (X, E BI X, = x),
which is the probability that the observed process will be in the set B at time t if
at times, wheres <t, it was in the state x. Here, the number P (X, E BJX, = x)
is completely defined by the equation above, even though the condition [X, = xJ
may have probability 0 (as it does for most of the processes examined in the pres-
ent book).
Fig. 3:'
The transition probability x
As, x, t, BY
I I 0-
s t
(2.2.4) Remarks. (a) If the probability P (s, x, 1, ) has a density, that is, if, for
all s, t E [to, TJ, where s < t (for s = t, existence of a density is impossible by vir-
tue of (2.2.1 e)), all x E Rd, and all B E mod, we have
P (s, x, t, B) _ p (s, x, t, y) dy.

8
where p (s, x, t, y) is a nonnegative-valued function that is measurable with re-

spect toy and whose integral is equal to 1, then Eq. (2.2.2) reduces to
p (s, x, t, y) = p (s, x, u, z) p (u, z, t, y) dz.

J
Rd
In imprecise language, this means that the probability of a transition from x at

time s toy at time t is equal to the probability of the transition to z at an inter-
mediate time u, multiplied by the probability of the transition from z at the
time u to y at the time t, summed over all intermediate values z.
b) In (2.2.1e), we can replace IB (x) with a (B) (where b. is the probability
measure concentrated at the point x). What (2.2. le) says is that the state of the
process does not change during a time interval of length 0.
The significance of the transition probabilities for Markov processes is that all
finite-dimensional distributions of the process can be obtained from them and
from the initial distribution at time to,. More precisely (see, for example,
Krickeberg [7] , p. 151), we have
(2.2.5) Theorem. If X, is a Markov process in [to, TI, if P (s, x, t, B) is its
transition probability, and if P,0 is the distribution of X,o, that is,
P,. (A)=P[X,,EA1,
then, for finite-dimensional distributions
P (X,1 E B1, ... , Xen E Bn1, to : t1 < ... <tn : T, B; E 8d,
we have
J J
...
1
P(tn-l,xn-1, tn,Ba)
Rd Bt Bn-,1
(2.2.6)
'P (tn-2, xn-2, to-1, dxn-1) ... P (to, xo, it, dx1) PO (dxo),
and hence, in particular,
P (X, E B[ = P (to, x, t, B) P,0 (dx).

J
Rd
In applied problems we frequently deal with transition probabilities in the sense

of the definition (2.2.3), rather than Markov processes in the sense of the defini-
tion (2.1.1), and we must first construct the process as a family of random vari-
ables. That this is always possible is asserted by the following theorem, which
also gives us a second and more convenient way of defining a Markov process.
(2.2.7) Theorem. Let P (s, x, t, B) denote a transition probability, where s,
t E (to, T[. Then for every initial probability P,0 on $d there exist a probability
2.2 Transition Probabilities, the Equation 33
space (Q, 21, P) and a Markov process X, (where t E [to, TI) defined on it, which
has transition probability P (s, x, t, B) and for which X,a has the distribution P,0.
To prove this, we use equation (2.2.6) to construct from P (s, x, t, B) and P,0
consistent finite-dimensional distributions and from them, in accordance with
Kolmogorov's fundamental theorem (1.8.1), the desired process. Here, we can al-
ways use forQ, X and 21 the special choice discussed in section 2.1.
(2.2.8) Definition. A Markov process X, for t E [to, TI is said to be homogen-
eous (with respect to time) if its transition probability P (s, x, t, B) is station-
ary, that is, if the condition
P(s+u, x, t+u, B) = P (s, x, t, B)
is identically satisfied for t° < s < t < T and to < s + u <_ t + u < T. In this case,
the transition probability is thus a function only of x, t - s, and B. Hence, we
can write it in the form
P(1-s, x, B)=P(s,x, t, B), 0:_5t-s:T-to.
Therefore, P (t, x, B) is the probability of transition from x to B in time t, re-
gardless of the actual position of the interval of length t on the time axis. For
homogeneous processes, the Chapman-Kolmogorov equation becomes
P (t + s, x, B) = P (s, y, B) P (t, x, d y).

J
Rd
As a rule, homogeneous Markov processes are defined on an interval of the form

Ito, oo), so that the transition probability P (t, x, B) is defined for t E 10, oo).
(2.2.9) Remark. Every Markov process X, can, by assuming time to be a state
component, be transformed into a homogeneous Markov process Y, = (t, X,)
with state space Ito, T j x Rd. The transition probability Q (t, y, B) of Y, for
the special sets B = C x D, C E 81 ([to, TI), D E 8d is then given by
(2.2.10) Q (t, y, C x D) = Q (t, (s, x), C x D) = P (s, x, s + t, D)1c (s + t).
This uniquely determines the probability Q (t, y, ) on the entire set
8' (Ito, TJ) x 8d .
(2.2.11) Remark. When is the Markov process X fort E I to, TI, also a station-
ary process? A necessary and sufficient condition for stationarity (see, for example,
Khas'minskiy [65J , p. 97) is:
a) X, is homogeneous;
b) there exists an invariant distribution P° in the state space, that is,
P° (B) = P (t, x, B) P° (dx) for all B E 8d, t E 10, T-to1

J
Rd
If we choose this P° as initial distribution for X,0, then X, is a stationary process.

Furthermore, if there exists an invariant P°, we have, for arbitrary initial distri-
butions and T = oo,
lim P [X, E BI = P° (B)
l -OO
for all B E `8d whose boundary has P°-measure 0; that is, the invariant distribu-
tion is a stationary limit distribution and is in fact independent of the initial dis-
tribution. There are probabilistic and analytical conditions under which a homo-
geneous transition function P (t, x, B) admits an invariant distribution (see
Prohorov and Rozanov [15], p. 272, or Khas'minskiy [65], p. 99). Compare
Theorem (8.2.12) and remark (9.2.14).
2.3 Examples
By Theorem (2.2.7) an initial and a transition probability fix a Markov process.
In the following examples, we shall assume these probabilities given.
(2.3.1) Example: Deterministic motion. Suppose that to every pair(s, t),
where t° <s <t <T, is assigned a measurable mapping G,,, of Rd into itself
such that, for all x E Rd,
G,,,(x)=x
and
(2.3.2) G,,,, (G,,,, (x)) = G,,, (x), s < u < t .

These equations define in the state space Rd a deterministic motion which shifts
into y = G,,, (x), over a time interval of length t - s, a point that is at x at time
s. A special case is
G,,, (x) = x+v (t -s),
which describes a uniform motion with velocity v E Rd, or, more generally,
G,,, (x) = x, (s, x),
where x, is the solution of the differential equation
i,
with initial condition x, = x (and f is such that there exists a unique solution on
the interval is, TJ).The corresponding transition probability is
1 for G,,, (x) E B,

P (s, x, t, B) = 6G (:) (B)
-,I t o for G,,, (x) 4$ B.
Property (2.2.1b) is obvious, (2.2.1c) follows from the measurability of the map-
pingG,,, , (2.2.1e) follows from the property G,,, (x) = x, and the Chapman-
2.3 Examples 35
Kolmogorov equation (2.2.1d) follows since
j 6G.', (Y) (B) 8c,. u (x) (dy) = 8c.., (G,,. (x)) (B).
Rd
A nontrivial stochastic effect can be achieved only by the choice of the initial
probability.
(2.3.3) Example: Whether a process is Markovian depends essentially on
the choice of the state space. Whereas the solution x, of the first-order differen-
tial equation
xt = f (t, x,), x,0 = c, to-:S t, T,
is a Markov process (and this also holds for a differential equation of the form
where , is a family of independent random variables which are also independent

of x,o), this is not in general true for the solution of an nth-order differential
equation
xfn) X( ,n-1))
= o' X"
Nevertheless, the customary shift to a first-order differential equation for the d n-
dimensional process
t
Yt =
x+n-1)
shows that the Markov property holds fory,.

(2.3.4) Example: Wiener process. The Wiener process is a homogeneous d-di-
mensional Markov process W, defined on 10, oo) with stationary transition prob-
ability
91(x, t!), t>0,

t=0;
that is, for t > 0,
P(t,x,B)=P(W,+,EB(W,=x)_ (2a t)-d/2 e-lY-.12/2tdY
By virtue of the familiar formula

n(s,x,z)n(t,z,y)dz=n(s+t,x,y),
Rd
the Chapman-Kolmogorov equation holds for the densities
p (t, x, y) = n (t, x, y) = (2 it t)-412 e-Ir-x12121
As a rule, the initial probability Po is taken equal to bo; that is, Wo = 0. Since
n(t,x+z,y+z)=n(t,s,y)forallzERd,
we are dealing with a spacewise as well as a timewise homogeneous process. We
shall examine it in greater detail in section 3.1 . With the criterion (1.8.2), we
shall be able later to show easily that W, can be chosen in such a way that it
possesses continuous sample functions with probability 1. Henceforth, we shall
assume that W, was so chosen. The function W, is a mathematical model of the
Brownian motion of a free particle in the absence of friction.
2.4 The Infinitesimal Operator

Just as in example (2.3.1) we assigned to a differential equation i, =f (t, x,) the
family G,,, of transformations in the state space, where G,,, (x) = x, (s, x) is the
value of the solution, at time t, on a solution trajectory that begins at x at time
s, we can also assign to a general Markov process X, a family of mappings, name-
ly operators defined on a function space.
We begin with the discussion of the homogeneous case. Let X for i E Ito, TI, de-
note a homogeneous Markov process with transition probability P (t, x, B). We
define the operator T, on the space B (Rd) of bounded measurable scalar func-
tions defined on Rd and equipped with the norm
IIgII = sup Ig (x)I

xERd
as follows: for t E 10, T - tol, let T, g denote the function defined by
(2.4.1) T, g (x) = E. g (X i) = J
g (y) P (t, x, dy).
Rd
Obviously, T, g (x) is the mean value (independent of s) of g (X,+,) under the

condition X, = x. Since
Tr'B(x)=P(t,x,B),
we can derive the transition probability from the operators T. These operators
have the following properties:
(2.4.2) Theorem. The operators T for t E 10, T - to) map the space B (Rd)
2.4 The Infinitesimal Operator 37
into itself, they are linear, positive, and continuous, and they have the norm
11T,11 = 1. The operator To is the identity operator, and
T,+,= T, T,=T,T,, t, s,t+sE10,T-toj.
In particular, in the case T = oo the T, constitute a commutative one-parameter
semigroup, the so-called semigroup of Markov transition operators.
(2.4.3) Examples. In the case of deterministic motion generated by an autono-
mous ordinary differential equation ai, (x,) we have
T, g (x) = g (x, (0, x)),
where x, (0, x) is the solution with the initial value xo = x. For the Wiener pro-
cess W, , we have
t)-1l2
T, g (x) = (2 7r J
g (y) e-IY"zl=l2e dy
Rd
=(2n)-d/2 e-l:l2l2g(x+Vz)dz, t>0.

Rd
The dynamics of a Markov process may be described by a single operator repre-

senting the derivative of the family T, at the point t = 0.
(2.4.4) Definition. The infinitesimal operator (generator) A of a homogeneous
Markov process X, for to-:5 t:-5 T, is defined by
T, g (x)-g (x)
(2.4.5) A g (x) = lim
f+0 t
, gEB(Rd),
where the limit is uniform with respect to x (that is, the limit in B (Rd)). The
domain of definition D,4 C B (Rd) consists of all functions for which the
limit in (2.4.5) exists. The quantity A g (x) is interpreted as the mean infinitesi-
mal rate of change of g (X,) in case X, = x.
The operator A is in general an unbounded closed linear operator. If the transi-
tion probabilities of X, are stochastically continuous, that is, if, for every x E Rd
and every e > 0,
lim P (t, x, U,) = 1, U, = {y: 1 y-x1 <e},
840
then P (t, x, B) is uniquely defined by A. In particular, Markov processes with

sample functions that are continuous from the right (or, in particular, continuous)
have stochastically continuous transition probabilities.
(2.4.6) Example. For uniform motion with velocity v E Rd, we have
T,g(x)=g(x+vt)
and
g(x+v t)-g(x)
lim
t
I
i=1
vi
ag (x)
axi
The existence and the required uniformity of the limit are guaranteed in the
domain 0 4 = set of all bounded uniformly continuous functions with bounded
uniformly continuous first partial derivatives.
(2.4.7) Example. For the d-dimensional Wiener process W, we must calculate
Ag(x)=(2n)-dl2 Jim e-Ij!=l2

(g (x+Viz)-g(x))dz
J
Rd
in accordance with (2.4.3). For this we use Taylor's theorem, which, for every
twice continuously partially differentiable function g, yields
d t d d
g (x+Viz)-g (x) = Vi zi g,, (x)+- zi zj g.izj (x)
2 i=lj=t
d d
t
+ ZiZj(gxizi(x)-gzizi(x)),
2 i=1 j=1
where z is a point between x and x+Yi z. When we substitute this into the ex-
pression given above for A g (x), we get
d 32g-1
g=12
Ag=2 E az2 -24g,

where A is the Laplacian operator. The existence and uniformity of the limit are
ensured for all gin DA , which now is the set of bounded twice continuously
partially differentiable functions with bounded and uniformly continuous second
partial derivatives.
Let us turn now to the nonhomogeneous case. Let X , for t E (to, Tj, denote an
arbitrary Markov process with transition probability P (s, x, t, B). We refer to
the remark (2.2.9), according to which Y, = (t, X,) is a homogeneous Markov
process with state space [to, T) x Rd C Rd"". We now define the Markov transi-
tion operators T, and the infinitesimal operator .4 of X, as being equal to the
same quantities as in the case of the corresponding homogeneous process Y,
X,) under the definitions given earlier.
By virtue of (2.2.10),
T, g (s, x) = E,.= g (s+t, X,+,) _ g (s+ t, Y) P (s, x, t+s, dy),

Rd
0<t=T-s,
2.5 Diffusion Processes 39
where g (s, x) is a bounded measurable function in [to, T) X Rd and

T, g (s, x)-g (s, x)
(2.4.8) A g (s, x) = lim
e40 t
where the limit means the uniform limit in (s, x) E [to, T] X Rd. Once again, sto-
chastically continuous transition probabilities P (s, x, t, B) (in particular, the tran-
sition probabilities of Markov processes with sample functions that are continuous
from the right) are uniquely defined by A.
Frequently, it is sufficient to determine the actions of A on only those functions
g that are independent of s, that is, to consider
T, g (s, x)-g (x)
(2.4.9) Ag (s, x) = lim , g E B (Rd).
+0 t
For homogeneous processes, this reduces to (2.4.5) but, in general, T, g is, like
A g, a function of s and x.
2.5 Diffusion Processes
Diffusion processes are special cases of Markov processes with continuous sample
functions which serve as probability-theoretic models of physical diffusion phenom-
ena. The simplest and oldest example is the motion of very small particles, such
as grains of pollen in a fluid, the so-called Brownian motion. The Wiener process
W, of example (2.3.4) is a mathematical model of this timewise homogeneous
phenomenon in a homogeneous medium (see section 12.1 and Wax [511).
Besides the original significance of the diffusion process, there is another one,
which is emphasized in this book, namely, the description of technical systems
subject to "white noise". Also, continuous models for random-walk problems
lead to diffusion processes.
Depending on the classification of methods (see the Introduction), there are two
basically different approaches to the class of diffusion processes. On the one hand,
one can define them in terms of the conditions on the transition probabilities
P (s, x, t, B), which is what we shall do in the present section. On the other
hand, one can study the state X, itself and its variation with respect to time. This
leads to a stochastic differential equation for X, . As we shall see in Chapter
9, the two approaches lead essentially to the same class of processes.
(2.5.1) Definition. A Markov processX, for t0 < t <T, with values in Rd and
almost certainly continuous sample functions is called a diffusion process if its tran-
sition probability P (s, x, t, B) satisfies the following three conditions for every
s E [to, T), x E Rd, and e> 0:
I
a) lim P (s, x, t, dy) = 0;
Its t-S Ir-=I>,
f
b) there exists an Rd-valued function / (s, x) such that
lim 1
,1, t-s Ir-=I=
J (y-x) P (s, x, t, dy) _ l (s, x);
c) there exists a d x d matrix-valued function B (s, x) such that

I
lim (y-x) (y -.z)' P (s, x, t, dy) = B (s, x).
£4, t-S f
Ir-=IS=
The functions/ and B are called the coefficients of the diffusion process. In par-
ticular, / is called the drift vector and B is called the diffusion matrix. B (s, x),
is symmetric and nonnegative-definite.
(2.5.2) Remark. In conditions a) and b) of definition (2.5.1), we had to use
truncated moments since E,,, X, and E,,, X, X, did not necessarily exist. None-
theless, if, for some 6 > 0,
lim 1 E,,, IX,-X,12+e = lim 1 ly-x12+e p (s, x, t, dy) = 0,

Its t-S Its t-S Rd
J
then, since
jy-xlk E2+aI
ly-x21 6 P
P (s, x, t, dy) < -k (s, x, t, dy)
Ir-:I >8 Rd
for k = 0, 1, 2, condition a) is automatically satisfied and we can choose Rd as

region of integration in conditions b) and c).
(2.5.3) Remark. Let us make clear just what conditions a), b), and c) in defi-
nition (2.5. 1) mean. Condition a) means that large changes in X, over a short
period of time are improbable:
POX,-X,1 <_eIX,=x) = 1-o (t-s).
Let us suppose that the truncated moments in b) and c) are replaced with the
usual ones. Then, for the first two moments of the increment X, - X, under the
condition X, = x as t l s,
E,.z (X,-X,) = / (s, x) (t-s)+o (t-s)

and
E,, (X,-X,) (X,-X,)' = B (s, x) (t-s)+o (t-s).

Therefore,
2.6 Backward and Forward Equations 41
Cov,,. (X,-X,)=B (s, x) (t - s) + o (t - s),

where Cov,, x (X, - X,) is the covariance matrix of X, - X, with respect to the
probability P (s, x, t, ). Therefore, f (s, x) is the mean velocity vector of the
random motion described by X, under the assumption X, = x, whereas B (s, x)
is a measure of the local magnitude of the fluctuation of X, - X, about the
mean value. If we neglect the last term o (t - s), we can write
X.-X, f (s, X,) (t - s) + G (s, X,) ,
where E,, = 0, Cov,, X E _ (t - s) 1, and G (s, x) is any d x d matrix with the
property G G'= B. Now, the increments W, - W, of the Wiener process have dis-
tribution 91 (0, (t - s) 1). Since we are now only concerned with the distributions,
we can write
X, - X, / (s, X,) (t - s) + G (s, X,) (W, - W,) .
The shift (which is usual in analysis) to differentials yields

dX,=f(t,X,)dt+G (t, X,)dW,.
This is a stochastic differential equation that, under a suitable definition of "so-
lution", has as its solution the diffusion process X, that we started with.
(2.5.4) Examples. a) Uniform motion with velocity v is a diffusion process
with f = v and B =_ 0.
b) The Wiener process W, is a diffusion process with drift vector f =- 0 and dif-
fusion matrix B =-1.
2.6 Backward and Forward Equations

The decisive property of diffusion processes is that their transition probability
P (s, x, t, B) is, under certain regularity assumptions, uniquely determined mere-
ly by the drift vector and the diffusion matrix. This is surprising inasmuch as, on
the basis of definition (2.5.1),t and B are obtained only from the first two moments of
P (s x, t, B), which do not in general define a distribution.
To each diffusion process with coefficients f and B = (by) is assigned the second-
order differential operator
a a2
(2.6.1) 7) = ft (s, x) +- 2: 2:
1
b;i (s, x)
i-i .azi 2 i:r jai axi ax;
¶ g can be formally written for every twice partially differentiable function
g (x) and is determined by f and B. In section 2.4, we saw that every diffusion
process is uniquely determined by its infinitesimal operator.. We calculate
this operator from
(2.6.2) A g (s, x) _ l io1 (g (s+ t, y)-g (s, x)) P (s, x, t +s, dy)
t
by means of a Taylor expansion of g (s+t, y) about (s, x) under the assumption

that g is defined and bounded on [to, Tj x Rd and is, on that set, twice contin-
uously differentiable with respect to the xi and once continuously differentiable
with respect to s. When we use conditions b) and c) of definition (2.5.1), we ob-
tain for the right-hand member of (2.6.2) the operator alas +`.t?. Under certain
conditions on/ and B (which we shall specify in section 9.4) we have, for all
functions in DA,
A= a +Z
as
or, for time-independent functions and the homogeneous case,
A=Z.
Therefore, the diffusion process is uniquely determined in this case by f and B.
Furthermore, we see that the first derivatives in 7) arise as a result of systematic
drift and the second derivatives as a result of the local irregular "chaotic" fluc-
tuational motions.
In the next chapter, we shall take a purely probabilistic route to the construction
of a diffusion process for a given operator. A purely analytical approach yields
(2.6.3) Theorem. Let X , for to<t<T, denote ad-dimensional diffusion pro-
cess with continuous coefficients/ (s, x) and B (s, x). The limit relations in defi-
nition (2.5. 1) hold uniformly ins E [to, T). Let g (x) denote a continuous
bounded scalar function such that
u (s, x) = E,,= g (X.) = J

g (y) P (s, x, t, dy)
Rd
fors <t, where t is fixed, and x E Rd is continuous and bounded, as are its deri-
vatives au/axi and a2U/axi axi for I i, j:_5 d. Then, u (s, x) is differentiable
with respect to s and satisfies Kolmogorov's backward equation
au+`) u = 0,
(2.6.4)
as
where Z is the operator (2.6.1), with the end condition
u (s, x) = g (x).
A proof of this theorem can again be obtained by means of a Taylor expansion

of u. For details, see Gikhman and Skorokhod 15), p. 373. The name "backward
equation" stems from the fact that the differentiation is with respect to the
2.6 Backward and Forward Egltations 43
backward time arguments s and x in contrast with the forward equation (see
Theorem (2.6.9)), in which the transition density p (s, x, t, y) is differentiated
with respect to t and y.
(2.6.5) Remark. Theoretically, the backward equation (2.6.4) enables us to
determine the transition probability P (s, x, t, ) This transition probability is
uniquely defined if we know all the integrals
u(s,x)= J
g(y)P(s,x,t,dy),
Rd
where g ranges over a set of functions that is dense in the space C (Rd) of con-
tinuous bounded functions. If the solution of (2.6.4) is unique for these func-
tionsg, we can, for known f and B, calculate u (s, x) from it and then calculate
P (s, x, t, ).
(2.6.6) Theorem. Suppose that the assumptions of Theorem (2.6.3) regarding
X, hold. If P (s, x, t, ) has a density p (s, x, t, y) that is continuous with re-
spect to s and if the derivatives ap/ax; and a2p/ax; axe exist and are continuous
with respect to s, then p is a so-called fundamental solution of the backward
equation
ap+Zp=0;
that is, it satisfies the end condition
lim p (s, x, t, y) = a (x-y)
t9 '
where b is Dirac's delta function.

(2.6.7) Example. The transition density of the Wiener process
p (s, x, t, y) = (2 it (t-s))-d/2 e-ly-xl212(9-s)
is, for fixed t and y, a fundamental solution of the backward equation
ap+l a2p=0.
as
E ax?
2 tat
(2.6.8) Remark. If X, is a homogeneous process, then the coefficients f (s, x)
I (x) and B (s, x) = B (x) (and hence the operators) are independent of s. Since
P (s, x, t, B) = P (t - s, x, B), the sign of ap/as changes in the backward equa-
tion, for example, for the density p (s, x, y); that is,
ap+`.np=0.
(2.6.9) Theorem. Let X, for to :-5 t < T, denote a d-dimensional diffusion pro-
cess for which the limit relationships in definition (2.5.1) hold uniformly in s
and x and which possesses a transition density p (s, x, t, y). If the derivatives
ap/at, 3(J; (t, y) p)/3y;and a2 (b;, (t, y) p)/ay; 3y; exist and are continuous func-
tions, then, for fixed s and x such that s : t, this transition density p (s, x, t, y)
is a fundamental solution of Kolmogorov's forward or the Fokker-Planck equa-
tion
ap d a 1 d d a2
(l, (t, y) P) - 1
at + E ay,
(2.6.10) (b41(t, y) P) = 0.
2 i-I an ay;
For proof, we again refer to Gikhman and Skorokhod (51, p. 375.
If we define the distribution of X,O in terms of the initial probability P,Q, we ob-
tain from p (s, x, t, y) the probability density p (t, y) of X, itself:
P (t, y) = f P (to, x, t, y) P O (dx).

Rd
If we apply the integration with respect to P,o (dx) to (2.6.10), we see that
p (t, y) also satisfies the forward equation.
(2.6.11) Example. For the Wiener process, the forward equation for the homo-
geneous transition density
p (t, x, y) = (2 a t)-d12 e-Ir-xlsl2t
becomes
ap-I d a2p
at iay?'
which in this case is identical to the backward equation with x replaced
by y.
Chapter 3
Wiener Process and White Noise
3.1 Wiener Process
Let us now summarize the most important properties of the d-dimensional Wiener
process W, defined in section 2.3. This process, a mathematical model of Brownian
motion of a free particle with friction neglected, is a spacewise and timewise homo-
geneous diffusion process with drift vector / = 0 and diffusion matrix B I. A
clear understanding of this process is especially important since it proves to be the
fundamental building block for all (smooth) diffusion processes. We have already
seen this in the heuristic derivation of the stochastic differential equation of sec-
tion 2.5.
Since W, is a Markov process, all the distributions of W, are defined, in accord-
ance with (2.2.6), by the initial condition
Wo=0
and the stationary transition density
p(t,x,y)= n(t,x,Y)=(2 rt)-d12 eXp(-IY-xI2/2 t).
From this we get the density p (t, y) of W, itself:
P (t, Y) = n (9, y) = n (t, 0, y) = (2 n t)-d12 exp (-Iy12/2 t).
This is the density of the d-dimensional normal distribution `R (0, t I). The n-
dimensional distribution of W P I W,1 E Bt, ... , W,n E B I,where 0<t1 < ...
<ta, then has, according to formula (2.2.6), the density
(3.1.1) n (t1, 0, x1) n (t2-t1, x1, x2) ... n (t.-t,-1, x11-1, x,).
Since E IW4 - W,I' _ (d2+ 2 d) (t - s)2 , it follows immediately from Kolmo-
gorov's criterion (see section 1.8) that for these finite-dimensional distributions
there exists a stochastic process with continuous sample functions. Since
d
Y) (2 at)-t12exp(-IY;-x,12/2 t),
i=1
the d components of W, are themselves independent one-dimensional Wiener

processes.
46 3. Wiener Process and White Noise
If 2i,=21(W,,s:t),then, for s<t,

E(W,I`m,)=E(W,JW,)_
J
Rd
that is, W, is a d-dimensional martingale.

Of basic importance is the following property: W, has independent increments;
that is, for0<tl < ... <tn, the random variables
W,1, W,2 -W11, ... ' W1n-Wtn-1
are independent. Here, W, - W, (for s < t) has the distribution 91(0, (t - s) I),
which depends only on t - s; that is, the increments are stationary. The last two
assertions follow immediately from (3.1.1) when we remember that, in our case,
the value of n (s, x, t, y) = n (t - s, x, y) depends only on y - x and t - s.
In fact, a Wiener process can be defined as a process with independent and sta-
tionary 91(0, (t - s)1)-distributed increments W, -W,, with initial value Wo = 0,
and with almost certainly continuous sample functions.
The fact that W, is a process with independent and stationary increments makes
it possible to apply the limit theorems for sums of independent identically dis-
tributed random variables. This provides valuable information regarding the
order of magnitude of the sample functions of W, . The strong law of large
numbers states that
lim Lt = 0 with probability 1.

00 t
8
The true order of magnitude of the sample functions follows from the law of the
iterated logarithm: for d = 1 (that is, for each individual component of a d -
dimensional Wiener process),
Wt _
lim
p 2 t log log t- 1
and
lim inf
t.oo 2t l glogt--1
t
both with probability 1. This means that, for every e > 0 and for almost every
sample function W, (w) , there exists an instant to (w) subsequent to which we
always have
- (1 + e) 2 tlog log t < W, (co) < (1 + e) 2 t log log t .
On the other hand, the bounds (1-e) 2 i log log t and -(1- e) V2 t log log t
3.1 Wiener Process 47
(for 0 < e < 1) are exceeded in every t -neighborhood of oo for almost every
sample function.
For a d-dimensional Wiener process, we have in general
(3.1.2) lira sup

t-.oo Y - -t-tog` to g t= 1
which is somewhat surprising in that it means that the individual (independent!)

components of W, are not simultaneously of the order V2 t log log t. This is
true because, otherwise, V would appear in the right-hand member of (3.1.2).
For a generalization, see Theorem (7.2.5).
Equation (3.1.2) becomes plausible by virtue of the invariance of W, under rota-
tion. We have
(3.1.3) Lemma. a) W, is a Gaussian stochastic process with expectation E W,
= 0 and covariance matrix
EW,W;=min (t,s)I.
b) W, is invariant under rotations in Rd; that is, if W, is a Wiener process, so is
V, = U W where U is an orthogonal matrix.
c) If W, is a Wiener process, the processes -W C W,IC2 (where c * 0), t W lit,
and W,., - W, (where s is fixed and t _> 0) are also Wiener processes.
Having Lemma (3.1.3c) at our disposal, we are in the position to study also the
local behavior of IV,.
The application of the law of the iterated logarithm to t Wti, yields, for d = 1
(and hence for every component of a d-dimensional Wiener process),
W, _
li 140
2t loglog1/t-1
and
W,
lim inf -1,
t+o 2tloglogl/t
for d-dimensional W,
limssu 1,
l/t -
+
WIo g
p V2 t log
for almost all sample functions. One consequence of this is that every component of
sample functions W, has, with probability 1, in every interval of the form `.0, E)
with e > 0 infinitely many zeros, which cluster about the point t = 0 . This
behavior is exhibited at every point c > 0 because, by Lemma
48 3.- Wiener Process and White Noise
(3.1.3), part c), when W, is a Wiener process, W,+,-W, (for fixed s and non-
negative t) is also a Wiener process (independent in fact of W, fort < s).
Almost all sample functions of a Wiener process are continuous though, in accordance
with a theorem of N. Wiener, nowhere differentiable functions. Proof of this as-
sertion and most of the previously made assertions regarding W, can be found,
for example, in McKean 1451. For fixed t, the nondifferentiability can be made
clear as follows: The distribution of the difference quotient (W,±k-W,)/h is
9 (0, (1/1hl)1). As this normal distribution diverges, so that, for every
bounded measurable set B,
P ](W,+k-W,)/h E B] -- 0
Therefore, the difference quotient cannot converge with positive probability to a
finite random variable.
We can get more precise information from the law of the iterated logarithm. For
d = 1(hence for every individual component of a many-dimensional process), we
obtain for almost every sample function and arbitrary e in the interval 0 < e < 1 ,
as hi0,
W,+ W, 2 log log 11h
(1- e) Y infinitely often
h
and, simultaneously,
W`+ 12 log hg 11h

h W ` _ (-1 + e) infinitely often.
Since the right-hand members approach + oo and - oo respectively as h i 0, the

ratio (W,+k-W j/h has with probability 1, for every fixed t, the extended real
line ] - oo, + oo) as its set of cluster points.
For the case of Brownian motion described by a Wiener process, the nondifferenti-
ability means that the particle under observation does not possess a velocity at
any instant. This disadvantage is offset in a model treated in section 8.3, namely,
the Ornstein-Uhlenbeck process.
The local law of the iterated logarithm reveals therefore enormous local fluctu-
ations in IV,. The crucial property as regards the difficulties in the definition of a
Steiltjes integral with respect to T, is that each portion of almost every sample
function of W, is of unbounded variation in a finite interval of time; that is, its
length is infinite. This is a consequence of the following more precise result:
(3.1.4) Lemma. Let W, denote ad-dimensional Wiener process and let s =
<tq"i = t denote a sequence of decompositions of the interval
Is, ti such that S" = max 4'21). Then (we write tk for te) for short),
3.1 Wiener Process 49
n
(3.1.5) qm-lim
b^ I (W,k Wgk-1)(W'k-W'k-1) _
k=1
where elementwise convergence of the matrices is meant. In particular,

n
(3.1.6) qm-lim IWtk-Wik 112 = d (t -s).
If &n approaches 0 so fast that a" < oo , then convergence occurs in (3.1.5) and
(3.1.6) also with probability 1.
Proof. Let W', for i = 1, ... , d, denote the i th component of W, If
Sn = k=1 (W'k-W'k-1) (W1ki-Wi Sk-1)
then,
E(S,)=(t-s)d;i
and
n
V (Sn) _ (E (W, k- (Wik-W(k 1)2-tS,j (tk-tk 1)2)

k=1
0
(1 +ay) E (4-4-1)2
k=1
2(t-s)8"--'0 (8"-,0),
which proves (3.1.5). If we apply the trace operator to both sides of (3.1.5), we
obtain (3.1.6). If an < co, then V <oo, which, by virtue of section 1.4,
is sufficient for almost certain convergence in (3.1.5).1
Let us look at the decomposition with intermediate points 4) = s +
(t - s) k/2", fork=O, 1, ... , 2" and n= 1, 2, ... . Since 8" = (t - s) 2 -" and
8" < oo, the left-hand member of the inequality
2^ 2n
IW`k-W-k-1I2 < Max 1Wlk-Wtk-11 Z IWtk-W,k_11
k=1 k=1
converges, for almost every sample function, to the finite random variable d (t - s)
as n-. oo. The almost certain continuity of sample functions implies that
max W,k - W,k - 1 1-, 0 (n-roo)

k=1.....2"
and this implies that
Fig. 4:
Sample function of
the Wiener process.
2n
IW,k-Wik_1I -- oo
k-1
with probability 1; that is, almost all sample functions of W, are of unbounded
variation in every finite interval.
(3.1.7) Remark. Equations (3.1.5) and (3.1.6) serve as motivation for the sym-
bolic notation (frequently used, especially in the case d = 1)
(dW,) (dW,) = I dt
and, ford = 1,
(dW,)2 = dt.
3.2 White Noise

We confine ourselves for the moment to the one-dimensional case. So-called
(Gaussian) white noise is generally understood in engineering literature as a sta-
tionary Gaussian process for - oo < t < oo, with mean E = 0 and a con-
stant spectral density f (2) on the entire real axis. If E C (t) is the co-
variance function of then
(3.2.1) f (1) = 1 e-'x' C (t) di = for all A E R1

2n _o, 2n
where c is a positive constant, which we can, without loss of generality, take
equal to 1.
Therefore, such a process has a spectrum on which all frequencies participate
with the same intensity, hence a "white" spectrum (in analogy with "white light"
in optics, which contains all frequencies of visible light uniformly). However,
such a process does not exist in the traditional sense because (3.2.1) is compati-
ble only with the choice
C (t) = 6 (t).,
where 6 is Dirac's delta function. In particular, we would have
3.2 White Noise 51
0
C(O)=E= J / (2) d1 = oo.
-00
Since C (t) = 0 for t * 0, the values of E, and &,+, would be uncorrelated for ar-
bitrarily small values oft (and independent, in fact, since the process is Gaussian),
a fact that explains the name "purely random process". Obviously, the sample
functions of a process with independent values at all instants must be extremely
irregular.
In fact, white noise was first correctly described in connection with the theory
of generalized functions (distributions), as is done, for example, in Gel'fand and
Vilenkin [22] , Chapter III. Let us discuss this matter briefly.
We start with the fact that, in every actual measurement of the values of a func-
tion / (t), the inertia of the measuring instrument allows us to get only an aver-
age value
0
(3.2.2) 01(9') = f 9, (t) l (t) dt,
where 9, (t) is a function characterizing the measuring instrument. The function

Of depends linearly and continuously on q,. It is the generalized function cor-
responding to /.
As a consequence of the smoothing effect of the measuring instrument, we ob-
tain a value for the integral (3.2.2) even when the values of / do not actually
exist at individual points. This leads to the following general definition:
Let K denote the space of all infinitely differentiable functions T (t), fort E Rt,
that vanish identically outside a finite interval (which in general depends on
9, (t)). A sequence 9't (t), T2 (t), ... of such functions is said to converge to
qp (t) = 0 if all these functions vanish outside a single bounded region and if all
of them and all of their derivatives (in the usual sense) converge uniformly to 0.
Every continuous linear functional 0 defined on the space K is called a general-
ized function (or a distribution*). The generalized function defined by
0 (q,) = 9, (to) for alig, E K, to E Rt fixed,
is called the Dirac delta function and is denoted by b (t -to). In contrast with
classical functions, generalized functions always have derivatives of every order,
which again are generalized functions. By the derivative 0 of 0, we mean the
generalized function defined by
$ (9,) = -(P (9') .
*Henceforth, we shall not use this term in this sense, so that whenever "distributions" are
mentioned, they have the probability-theory meaning.
A generalized stochastic process is now simply a random generalized function in

the following sense: to every q' E K is assigned a random variable 0 (97) (in other
words, 0 (q,) is an ordinary stochastic process with parameter set K) such that
the following two conditions hold:
1. The functional SP is linear on K with probability 1; that is, for arbitrary p
and ' in K and arbitrary numbers a and fi, we have
0(aq'+P W)= a0 (q)+P0(v')

with probability 1;
2. 0 (p) is continuous in the following sense: The convergence of the functions
q'ki to p' in the space K, for k = 1, 2,... , n, as j ---r oo implies convergence of
the distribution of the vector (0 (gu), ... , 0 to the distribution of
(0 (qqr), ... , 0 (9p.)) in the sense of distribution convergence defined in section
1.4.
For example, some generalized stochastic process corresponds to every ordinary
stochastic process with continuous sample functions via formula (3.2.2).
A generalized stochastic process is said to be Gaussian if, for arbitrary linearly
independent functions T1, ... , op. E K, the random variable (0 (971), ..., 0 (97.))
is normally distributed. Just as in the classical case, a generalized Gaussian pro-
cess is uniquely defined by the continuous linear mean-value functional
E 0 (4p) = m (4))
and the continuous bilinear positive-definite covariance functional
E(4) (op)--in(p))(0(W)-m(u))=C(9,v').
One of the important advantages of a generalized stochastic process is the fact
that its derivative always exists and is itself a generalized stochastic process. In
fact the derivative ( of 0 is the process defined by setting
(b (4,)_-4P (.0)-
The derivative of a Gaussian process with mean m (4p) and covariance C (q,, rp) is
again a Gaussian process and it has mean value rim (p,)= -m (47) and covariance
C(01 >a).
As an example, let us look at a Wiener process and its derivative. From the re-
presentation
0
0 (97) = 9, (t) W, dt
J
-00
(we set W, = 0 fort <0), we conclude immediately that, with W, regarded as a
generalized Gaussian stochastic process, we have
3 .2 White Noise 53
m(p)=0
and
00 00
C(p,ip)= J J min (t, s) q, (t) , (s) dt ds.

0 0
After some elementary manipulations and integration by parts, we get

0
C(gq,.p) = J (t) -
where
(t) = J 97 (s) ds, i p (t) = J y , (s) ds.

0 0
Let us now calculate the derivative of the Wiener process. This is a generalized
Gaussian stochastic process with mean value ri, (q,) = 0 and covariance
C(q,,YP)=C(0,Y')
0
= J p (t) Y' (t) dt.
0
This formula can be put in the form
J
J b(t - s) 97 (t) (t) dt.
0 0
Therefore, the covariance function of the derivative of the Wiener process is the
generalized function
C (s, t) = b (t -s).
But this is the covariance function of white noise! Thus, white noise E, is the
derivative of the Wiener process W, when we consider both processes as general-
ized stochastic processes. This justifies the notation
(3.2.3a) E, = W,
frequently used in engineering literature. Of course, we also have conversely
(3.2.3b) W, = i E, ds
0
in the sense of coincidence of the covariance functionals.

By virtue of what we have said, we can now give the definition: a Gaussian white
noise ,, for t E R', is a generalized Gaussian stochastic process 01 with mean
value 0 and covariance functional

0
(3.2.4) CE (9 0 = f 9, (t) V' (t) dt.
-00
From (3.2.4), we conclude that

Ce (v (t), o (t)) = Cl (q, (t + h), p (t + h)), h E RI,
a consequence of which is that, for arbitrary functions q>>, ... , p E K, the ran-
dom variable (4e (q, (t+h)), ... , 0E (p,, (t+h)) has the same distribution for
all h ; that is, white noise is a stationary generalized process. One can show that,
up to a factor, the spectral measure of this generalized process is Lebesgue mea-
sure and hence the process has a constant spectral density on the entire real axis.
We also conclude from (3.2.4) that
Ce(9', ')=0 if 9'(t)'V'(t)=0;

that is, the random variablesP (q)) and d'tie (V) are independent in this case. We
say that a generalized stochastic process with this property has at every point in-
dependent values. The class of stationary generalized processes with independent
values at every point is well known (see Gel'fand and Vilenkin [22) ). Roughly
speaking, we might say that they are obtained by differentiation of processes
with stationary and independent increments. All these processes can serve as
models of "noise", that is, of stationary and rapidly fluctuating phenomena. The
"noise" is "white" (that is, the spectral density is constant) if the covariance
functional has the form (3.2.4). Although in the present book we shall consider
Gaussian white noise exclusively, there exist yet other important (non-Gaussian)
white noise processes, for example, so-called Poisson white noise, which repre-
sents the derivative of a Poisson process (after subtraction of the mean value).
Thus, although a stationary Gaussian process 1, with everywhere constant spec-
tral density in the traditional sense does not exist, such a concept nonetheless
proves to be a very useful mathematical idealization. Furthermore, we conclude
from equations (3.2.3a) and (3.2,3b) that e, is, so to speak, only a derivation of
a classical stochastic process and that, therefore, only the smoothing effect of a
single integration is needed to return from e, to an ordinary process, namely,
W. This last is also the reason for converting differential equations containing
white noise into integral equations.
For treatment of integrals of the form
0
i / (t) s, dt,
-00
where 6, is now an arbitrary generalized stochastic process with independent

values at every point, we refer to the paper by Dawson 1341.
3.2 White Noise 55
Because of the independence of the values at every point, white noise is appro-
priate for describing rapidly fluctuating random phenomena, for which the cor-
relation of the state at the instant t with the state at the times when It -sI is
increasing becomes small very rapidly. For example, this is the case with the
force acting on the particle observed in the case of Brownian motion or for the
variation in current in an electrical circuit due to thermal noise.
White noise E, can be approximated by an ordinary stationary Gaussian process
X, for example, one with covariance
C(t) = a e-bI I (a>O,b>O).
Such a process has spectral density
1 ab
a V+,I2
If we now let a and b approach oo in such a way that alb --* 1/2, we get
f(1)- 1 for all AER'

2a
and
0, t 0,
C(t)--* 1 00, t 0'
but
0
1
so that
C (t) - (t);
that is, X, converges in a certain sense to E.
Let us now look at the indefinite integral
Y,= X,ds.
This is again a Gaussian process with E Y, = 0 and covariance

1
E Y, Y, = f i a e-b1-I du dv.
00
Taking the limit as above, we get
EY,Y,-*min (t,s),
that is, the covariance of the one-dimensional Wiener process W. This is a further
heuristic justification of formulas (3.2.3).
Now, we can define the d-dimensional (Gaussian) white noise as the derivative (in
the generalized function sense) of the d-dimensional Wiener process. I t is a sta-
tionary Gaussian generalized process with independent values at every point, with
expectation vector 0, and with covariance matrix 6 (t) 1. In other words, white
noise in Rd is simply a combination of d independent one-dimensional white
noise processes. The spectral density (now a matrix!) of such a process is 1/2 it.
A d-dimensional Gaussian noise process with expectation 0 and with covariance
matrix
Erl,i,=Q(t)6(t-s)
is treated by various authors (for example, Bucy and Joseph 1611 and Jazwinski
[661). Such a process is no longer in general stationary but, as a "delta-correlated"
process, it has independent values at every point. We shall see later (see remarks
(5.2.4) and (5.4.8)) that we can confine ourselves to the standard case Q (t) = I
without loss of generality. We obtain rl, from the standard noise process E, by
alt = G (t) J,
where G (t) is any (d x d)-matrix-valued function such that G (t) G (t)' = Q (t).
Chapter 4
Stochastic Integrals
4.1 Introduction
The analysis of stochastic dynamic systems often leads to differential equations
of the form
(4.1.1) X,= f (t, X,)+G(t,X,)E
where we can assume that t, is white noise. Here, X, and f are Rd-valued func-
tions, G (t, x) = (G;i (t, x)) is a d x m matrix and f, is m-dimensional white noise.
We saw in section 3.2 that, although E, is not a usual stochastic process, nonethe-
less the indefinite integral of f, can be identified with the m-dimensional Wiener
process W,:
ds
0
or, in shorter symbolic notation,

dW,=E,dt.
The solution of a deterministic initial-value problem
x,= f(t,x,), x,o=c,
for a continuous function f (t, x) is, as we know, equivalent to the solution of
the integral equation
x,=c+J f(s,x,)ds,
to
for which it is possible to find a solution curve by means of the classical iteration
procedure.
In the same way, we transform equation (4.1.1) into an integral equation
(4.1.2) X, = c+ j f(s,X,)ds+ G(s,X,)E, ds.

to of
58 4. Stochastic Integrals
Here, c is an arbitrary random variable, which can also degenerate into a con-
stant independent of chance. As a rule, the first integral in the right-hand mem-
ber of equation (4.1.2) can be understood as the familiar Riemann integral. The
second integral is more of a problem. Because of the smoothing effect of the in-
tegration, we still hope to be able to interpret integrals of this form for many
functions G (t, x) as ordinary random variables, which would spare us neces-
sity of using generalized stochastic processes. We now formally eliminate the white
noise in (4.1.2) by means of the relationship dW, _ E, dt, writing
e t
(4.1.3) G (s, X.) E, ds = G (s, X,) dW

40 to
so that (4.1.2) takes the form

t t
(4.1.4) X , = c+ j / (s, X,) ds+ j G (s, X,) dW, .
a to
Equation (4.1.4) is also written more briefly in the following differential form:
(4.1.5) dX, = / (t, X,) dt+G (t, X,) dW,.
Since, in accordance with section 3. 1, almost all sample functions of W, are of un-
bounded variation, we cannot in general interpret the integral in the right-hand
member of (4.1.4) as an ordinary Riemann-Stieltjes integral. We shall consider
this with the example given in the following section.
For fixed (i.e., independent of w) continuously differentiable functions g, we
could use the formula for integration by parts to give the following definition:
t t
(4.1.6) g (s) dW, = g (t) W,-g (to) W,o- (s) W. ds.

to to
The last integral is an ordinary Riemann integral, evaluated for the individual
sample functions of W, .
In many cases of importance in practice, however, the function G (t, x) in the
integral equation (4.1.2) is not independent of x. For this general case, K. Ito
[42] has given a definition of the integral (4.1.3) that, as we shall see, includes
the definition (4.1.6) as a special case.
4.2 An Example
Our task is now to define the integral
t t
X, = X, (w) = J G (s) dW, j G (s, w) dW, (to) (0 to <I, fixed)

4 4
4.2 An Example 59
for as broad a class of (d X m matrix}valued random functions G as possible.

Here, W, is an m-dimensional Wiener process.
If G ( , w) is, for almost all to, fairly smooth, it cannot, so to speak, penetrate
the local irregularities of the sample functions W. (w) and exhaust them.
For example, for G =1 and d = m = 1, we set
i 1dW,=W,-W,o
to
since every approximation of the integral by means of Riemann-Stieltjes sums of

the form
n
S. = G(ri)(W{i-W,i_1), to<t1<... :5tp=t, ti-1=Ti;9ti,

i=1
leads to this value.

The situation is different when G is about as irregular as W, itself. To this end,
let us look, as an example, at the case G (t) = W,, d = m = 1, so that the integral
is
X, = J
IV, dW,.
90
Formal application of the classical rules for integration by parts yields

{
(4.2.1) W,dW,=(W,-Wo)/2.
to
However, this calculation assumes the existence of the integral as an ordinary

Riemann-Stieltjes integral; that is, it assumes convergence of the sums
n
S = Y WTi (W,i-W,i-1)
i-1
with ever finer partitioning and arbitrary choice of the intermediate points Ti. Let
us now show that the limit of depends on the choice of the intermediate
points.
To do this, let us write the sum S. in the form
S.=We/2-W2So/2-1
2 I (W,i-W,i_1)2
n a
1 (WII- WT)) (WTi-Wll-1).
+ (WTi- W,1_1)2+
In accordance with Lemma (3.1.1) with 6n = max (t; - t; _ 1),

i
n
qm-lim (W,i-W,i_1)2 = t-to.
i=1
By calculating the two first moments, we can easily show that

n
qm-lim (W,i-WTi) (WTi-W1i_1) = 0.
en-'0 i=1
For the remaining sum, we have

n n
(4.2.2) E (WTi-W,i_1)2 = (r;-t;-1)
i=1 i=t
and
n n
v (WTi-Wti-1)Z =2 (ri-ti1)Z
i=1 i=1
Therefore, the convergence of {Sn} depends on the behavior of the sums (4.2.2),
which can assume any value in the interval [0, t -to] with appropriate choice of
the r; . More precisely,
(4.2.3) qm-lim
eno
(S.- i=t (ri-ti-t)) _ (W; -W,o)/2-(t-to)/2.
Therefore, in order to obtain a unique definition of the integral, it is necessary

to define specific intermediate points -r;. Of course, this definition should be such
that the corresponding integral is meaningful and has the desired properties.
For example, if we choose
r; =(1-a) t;_1+at;, 0<a:1, i=1,2,..., n,
we obtain from (4.2.3)
qm lm S. = (Wi -W2)/2+(a-1/2) (t-to)

d'-0
(4.2.4)
((a)) W, dW,.
to
In particular, if we choose a = 0, that is, ri = ti-1, we obtain the Ito, or stochas-

tic integral, with which we are concerned almost exclusively in the present book.
From (4.2.4),
4.3 Nonanticipating Functions 61
9 9
(4.2.5) (It8) IV, dW, _ ((0)) W, dW, _ (W, -W o)/2-(t-t(,)/2.

90 jo
Among integrals of the type (4.2.4), Ito's integral is characterized by the fact that,
as a function of the upper limit, it is a martingale. This can be seen as follows: If, for
simplicity, we take to = 0 and
Xr=W,/2+(a-1/2)t,
then, for t => s, we have, with probability 1,

E(X,IX., u<s)
= E(W9 1W,/2+(a-1/2) u, ups)/2+(a-1/2) t
= E(E(W, 1 W,,, u<s)IW.2/2+(a-1/2) u, ups)/2+(a-1/2) t
=E(E(W; W,) I W,/2+(a-1/2)u,u s)/2+(a-1/2)t
= E(t-s+W, lW,?/2+(a-1/2) u, ups)/2+(a-1/2)t
= (t-s)/2+(a-1/2) t+WZ/2
= X,+a (t-s),
where we have used the various simple properties of the conditional expectation
and of a Wiener process. The process X, is therefore a martingale, that is,
E (X,I X., u:-5 s) = X, with probability 1,
if and only ifa = 0, hence for Ito's choice of intermediate points. Similarly, we
have E X, = a t -=O if and only ifa = 0.
It is disconcerting that (4.2.5) does not coincide with the value obtained in (4.2.1)
by formal application of the classical rules. This disadvantage could be removed
by the choice a = 1/2, but this would entail other and more serious disadvantages.
The applicability of the rules of classical Riemann-Stieltjes calculus was also the
motivation for the definition of a stochastic integral given by R. L. Stratonovich
[48). In our special case,
` ^ W +W.
(Stmt) I W, dW, = qm-lim `i-l `` (Wl1-Wii_1)
to 6n i-i 2
t
_ ((1/2)) W. dIV, = (W, -W o)/2.

10
We shall return to this in Chapter 10 and shall give a conversion formula.
4.3 Nonanticipating Functions
In the example considered in the preceding section, to every decomposition

to <tl < ... <t,,= t of the interval (to, t]. and every choice of intermediate
points Ti E jti_1, ti] there corresponds a step function
n
2: WT, 1(ti_I, til (s), to ss<t,

W S=t,
approximating the integrand W,. If tS"=max then, by virtue of
the continuity of W,,
ac-lim W;") = W,
uniformly into, t], regardless of the choice of the intermediate points ri.
Although we define the integral of the approximating step function W;"> with
respect to W, as the corresponding Riemann-Stieltjes sum
W;°i dW, = S. _
S
WTi (W1
- Wti-1)
the existence and value of the limit of the S. depend on the intermediate points
Ti.
A particular property of It0's choice of intermediate points, namely, ?i = ti_ 1,

is that the value of the corresponding approximating step function W!") can, at
every fixed instant s E [to, t(, be obtained from knowledge of the values of W
from to up to the instant s. (Actually, knowledge of W,i_1 is all we need to be
able to determine W;") for s E (ti_ 1, ti).) In other words, W;"i is 28 (to, s)-mea-
surable, where
(4.3.1) 28 [to, s] = 21(W,,; to s u ; s).
We say that W;41 is a nonanticipating function of W,. Ito's stochastic integral
[42] has been defined for a broad class of these nonanticipating functions.
Therefore, let us look at this concept.
Let W, denote an m-dimensional Wiener process defined on the probability space
(9, 21, P). Suppose that to is a fixed nonnegative number, that 24 (to, tj is the
sigma-algebra (4.3.1), and that
", = 21(W,-W,; t=s<oo).
We recall the intuitive meaning of these two sub-sigma-algebras of 21. Roughly
speaking, they say that %B [to, t[, for example, contains all those events that are
defined by the conditions on the course of process W. in the interval [to, tj (and
nowhere else).
Since W, has independent increments, %B [to, tj and l3 are independent.
4.3 Nonanticipating Functions 63
(4.3.2) Definition. Let to denote a fixed nonnegative number. A family & for
t > to, of sub-sigma-algebras of 21 is said to be nonanticipating with respect to the
m-dimensional Wiener process W, if it has the following three properties:
(a) 5, Ea, (to - s < t),
(b) a, D 28 [to, t] (t to),
(c) 5, is independent of 28; (t > to).
Since o = 28 10, oo) (apart from sets of measure 0), condition (c) means, for
example, for t = 0,that Ro can contain only events that are independent of the
entire Wiener process W, fort >_ 0.
(4.3.3) Example. The family
51,=20 [to,t[
is the smallest possible nonanticipating family of sigma-algebras. However, it is
often necessary and desirable to augment 1 [to, t) with other events that are in-
dependent of %B+ (for example, initial conditions). In the case of stochastic dif-
ferential equations, we usually take
,=% [to,tJ,c),
where c is a random variable independent of Xli u .
(4.3.4) Definition. A (d X m matrix)-valued function C = G (s, to) defined on
[to, t[ x Q and measurable in (s, w) is said to bc-nonanticipating (with respect to
a family & of nonanticipating sigma-algebras) if G (s,.) is 5, -measurable for all
s E [to, t[. We denote by Ms'" [to, t[ = M2 [to, t] the set of those nonanticipa-
ting functions defined on [to, t] X Q for which the sample functions G (., w)
are with probability I in L2 [to, t] , that is, with probability 1
[G(s,w)[2ds<oo.
Here, the last integral is to be interpreted as the Lebesgue integral (which, for
example, coincides with the Riemann integral in the case of continuous func-
tions). We denote by
d
[G[=(
GA)tl2
=(trGG')h12
the norm of the matrix G. We have G E M2'" Ito, t[ if and only if G;l E M?' I
Ito,t[forall iandj.
Furthermore,
M2 [to, s[ D M2 [to, t], to<s t.
We set
M2 = M2'"` = n M2 [to, t].

r>4
(4.3.5) Example. Every function G (t, w) __ G (t) that is independent of w is of
course always nonanticipating. Such a function belongs to M2 [to, t] if and only
if it belongs L2 [to, t].
(4.3.6) Example. Since i (to, s], the function C is nonanticipating for
every choice of 5, if C (s, .) is IS (to, s ]-measurable, that is,
G (s, w) = C (s; W, (w), to < r < s), to s t.
In this case, G (s, .) is thus a functional of the sample functions of W, in the

interval [to, s]. An example of this is the case C (.s,.) = W discussed in
section 4.2. A more complicated example is
G (s,.) = max (W,l.
(4.3.7) Example. For 5, = M (to, s], the function

G(s,.)= max (W,)
Ns.s2,
is not 5,-measurable; that is, G is anticipating.
(4.3.8) Remark. If G is nonanticipating, every measurable function g (t, G) is
nonanticipating. Furthermore, M2 [to, t] is a linear space.
4.4 Definition of the Stochastic Integral

The purpose of the present section is to define the stochastic integral
r r r
GdW= JG(s)dW,= G(s,w)dW,(w)
a a a
for arbitrary t to and all GE MZ"'Ito, t ] = M2 (to, t] . We shall do this in two
steps. In the first step, we define the integral for step functions in M2 (to, t). In
the second step, we extend this definition to the entire set M2 [to, t] by means
of an approximation of an arbitrary function with the aid of step functions.
Step 1. A function G E M2 [to, t) is called*stet p1unctiop if there exists a de-
composition to < tt < ... < t = t such that G (s) = G (t; _ 1) (note that we omit
the variable w) for all s E [ti-,, ti), where i = 1, ... , n. For such step functions,
we define the stochastic integral of G with respect to Wr as the Rd -valued ran-
dom variable
4.4 Definition of the Stochastic Integral 65
G (t, w)
Fig. 5:
A nonanticipating step function. I I I I
0
to tj t2 ... te_l 1
ft 1 a
(4.4.1) G dW = JG (s) dW, _ G (tr-t) (Wei-W'i-t).

J
eo to
i=t
We summarize some of the more important properties of this integral in
(4.4.2) Theorem. Let t denote a number equal to or greater than to, let a and b
denote members of Rt, and let GI and G2 denote step functions belonging to
M2 [to, t]. Then, for the stochastic integral (4.4.1), we have
i 1 e
a) (a Gt + b G2) dW = a I GI dW + b G2 dW .
J 1
to to to
b) Also,
Gtk O s dWk,
W,
G dW = W=
to
2: S Gdk (S) dW; W
(k_1
k-t
to
c) For E IC (s)l < oo for all s in [to, t[ we have
E(ICdW)=O.
b
d) For E [G (3)12 < oo, where s E [to, fl, the following holds for the d x d co-
variance matrix of the stochastic integral (4.4.1):
I e
` r
(4.4.3) E(I IEC(s)G.(s'ds,
b / to
and, in particular,
9 9
(4.4.4) E JGdWI2= jEIGI2ds.

AQ 9A
Proof. a) A linear combination of step functions is a step function with break

points the same as the break points of the original functions (or possibly a por-
tion of them). The asserted linearity of the integral follows immediately from
the definition (4.4.1).
b) also follows immediately from the definition.
c) From (4.4.1), we have
E (f G dW) E G (ti-1) E (W,,-W,i-1)
since G (ti-1) is independent of W,,-W,i_1. Since E (W,i-W,i_1) = 0, the as-

sertion follows.
d) From (4.4.1), we have
e
E(I GdW)(SGdW
n n
_ E [G (ti-1) (W,,-W,i_1Y G (tl-1)'[

i=1 j=1
_ E [G (ti-1) (Wti-W,i-1) (W,i- W,i_1)' G (ti-1)')

i=1
+2 E[G(ti-1) (W ,-Wti-1) (W,1-Wy_1)'G (ti-1)'[
=S1+S2.
In particular, the matrix element ck'k in the ith summand of Si is
Ckk EGkp(ti-1) (Wi,-Wi,-1) (R'" 'R'-1) Gk9 (ti-1)

p=1 q-1
m m
E E(Gkp (ti-1)Gkq (ti-1))E((W -W! -1)(W -W9 -1))'
P=1 9-1
where we have used the fact that G is nonanticipating. Now, W,-W, has the
distribution 9 (0, It - sII), so that
m
ckk = E(Gkp(ti-1) Gkp(ti-i))(ti-ti-1)
P-1
= E (G (ti-1) G (ti-1)')kk (ti-ti-1)

Therefore,
S1 = E (G (t,_1) G (t,_1)') (t;-ti_1) = I E G (s) G (s)' ds.

i=1 to
We now treat S2 in the same way and again use the fact that the terms
Gkp (ti-1) (W, -W; _1) Gk9 (ti-1) and WQ -W; are, for i < j independent.
This yields
S2=0,
the desired outcome. Since tr (G G') = [G[2, equation (4.4.4) follows from (4.4.3)
by applying the trace operator. I
One should note that we do not assume that E G (s) = 0 in Theorem (4.4.2c).
Rather, the stochastic integral (4.4.1) has expectation 0 in every case. Also, the
form of the covariance matrix (4.4.3) is amazingly simple.
Step 2: Definition of the stochastic integral for arbitrary functions in M2 (to, fl:
Let us show first that the set of step functions is dense in M2 [to, tJ in the sense
of the following lemma:
(4.4.5) Lemma. For every function G E M2 [to, tJ, there exists a series of step
functions G. E M2 [to, t] such that
ac-lim i [G (s)-Gn (s)[2 ds = 0.

n-oo
to
Proof. If G (., w) is continuous with probability 1, we can approximate that

function uniformly and hence in the sense of mean square with the nonantici-
pating step functions
G. (s) = G (to + k (t - to)/n),
to+k (t-to)/n<s<to +(k+1) (t-to)/n, 0:k:n.
If G is bounded by a constant c independent of s and w, there exists a sequence
{Gn} of continuous functions in M2 [to, tJ, for example
G. (s) = n en(°-') G (u) du,

J
to
that is uniformly bounded by the constant c and converges, for almost all
s E [to, t ] [XI, to G-all this with probability 1. If we apply the theorem of
dominated convergence to the variables (see section 1.3), it then follows with
probability I that G. converges to G in the sense of L2 [to, t].
Finally, if G is any function in M2 [to, t], we can, for example, by shifting to
G (t, w) 111G(., w)I g ,j, approximate it to an arbitrary degree of accuracy in the
sense of L2 [to, t] with a function in M2 [to, t[ whose sample functions are

bounded by a constant c .
Thus, the set of step functions is dense in the sense of convergence in M2 [to, t[.
If the assertion of Lemma (4.4.5) holds for a function G E M2 [to, t[ and a se-
quence of step functions, the weaker assertion
e
st-lim IG (s)-G ($)I2 ds = 0

R-.00
of course follows (see section 1.4). We now wish to show that this last assertion
implies stochastic convergence of the sequence of integrals
G. (s) dW,
to a specific random variable. For this we shall use the following estimate for the
stochastic integral of step functions.
(4.4.6) Lemma. Suppose that G E M2 [to, t] is a step function. Then, for all
N>0 andc>0,
P(IG(s)dW,l>c1=N/c2+P[j IG(s)I2ds>N1.
l` 1 l
Proof. Suppose that G (s)=G (ti-1) for ti_1 s < t;, where to <t1 < ... <t = t.
The function
ti
J
IG(s)I2ds=N,ti_1;9 s<ti,
b
GN (s) _
IG (s)I2 ds > N, ti_ 1 s < ti,
J
to
is a nonanticipating step function and hence belongs to M2 [to, t) since
IG (s)I2 ds is 5,i_-measurable.
Since
IGN (s)I2 ds = IGN (ti-1)I2 (ti-ti-1) = N,

a i'1
we have IGN (ti_1)I2 <N/(t;-t;_1), so that

E I GN (s)I2 < o.
From Theorem (4.4.2d), we therefore have
(4.4.7) EI JGN(s)dW,I2=JEIGN(s)I2ds<N.
ro to
Finally, GN * G if and only if
JIG (s)I2 ds > N,

k
so that
(4.4.8) P1 sup IGN (s)-G (s)I > 0) = P [ J IG (s)I2 ds > Nl.

ro=,5r i0 J
If we use equations (4.4.7) and (4.4.8), Theorem (4.4.2a), and the triangle and
Chebyshev inequalities, we obtain
P[) G(s)dW,I >c ]

b 1
G"(s)dW,>c]+P[)J (C (s)-GN(s))dW,I >0]

P[I eo 90
GN(s)dW,I12+P[ J IG(s)I2ds>N]
<EI Jro to
< N/c2+P IG (s)I2 ds > N]. i

go
(4.4.9) Lemma. Suppose that G E M2 [to, t) and that G. E M2 [to, t) is a se-

quence of step functions for which
(4.4.10) st-lim IG (s)-G (5)12 ds = 0 .

4-00
go
If we define
j G. (s) d W,
to
by equation (4.4.1), then

t
st-lim G. (s) dW, =1(G),
n+oo 1
to
where I (G) is a random variable that does not depend on the special choice of
the sequence (Gn) .
Proof. Since
t t t
IGn-G., 2 ds < 2 IG-G.12 ds+2 ds,
to to to
it follows from the assumption that

t
st-lim 1
ds = 0
to
as n, m- oo. This is the same as saying

t
lim P
n,m_oo
IG (s)-G. (s)12 ds>el = 0 for alle> 0.
to J
Therefore, application of Lemma (4.4.6) to yields
lim sup P G. (s) dW, - i G. (s) d W, i > a ,< a/62

n,m-oo L
L I lit Ip
J
+lim sup P (J IGn (s) -G. (3)12 ds> e]
= e/a2,
Since e is an arbitrary positive number, we have
8 9
lim P[IJ G.(s)dW,-J G.(s)dW,I>61=0.

t to to J1
Since every stochastic Cauchy sequence also converges stochastically, there ex-
ists a random variable 1 (G) such that
i G. (s) dW, -1(G) (stochastically).

The limit is almost certainly uniquely determined and independent of the special
choice of sequence {Gn} for which (4.4.10) holds. This is true because, if {Gn}
and (Gn) are two such sequences, we can combine them into a single sequence,
from which the almost certain coincidence of the corresponding limits follows.
The following definition follows from Lemma (4.4.9).
(4.4.11) Definition. For every (d x m matrix)-valued function G E M2 (to, t],
the stochastic integral (or I08's integral) of G with respect to the m-dimensional
Wiener process Wt over the interval [to, t] is defined as the random variable
I (G), which is almost certainly uniquely determined in accordance with Lemma
(4.4.9):
r
G dW = G (s) dW, = st-lim j G. dW,

J n-00
to 80 to
where {G,,} is a sequence of step functions in M2 (to, t] that approximates G in

the sense of
st-lim J [G (s)-Gn (s)12 ds = 0 .

nti00
so
For special functions in M2 (to, t], we can give a stronger than mere stochastic
approximation of the stochastic integral. Specifically, we can approximate it in
mean square, as indicated by the following lemma:
(4.4.12) Lemma. For every function G E M2 [to, t] such that
t
(4.4.13) 1E[G(s)[2ds<oo,
t0
there exists a sequence {G0} of step functions in M2 [to, t] with the same prop-
erty, so that
E IGn (s)-G (s)12 ds = 0

nlim
.oo J
90
and
t r
qm-lim f G. (s) dW, _ G (s) dW,.

n-oo
tp to
Proof. According to Lemma (4.4.9), there always exists a sequence {G0} of step
functions such that
st-lim j IG-6n12 ds = 0.
to
Let
X, IxI-`-N,
gN (x) _
INx/IxI, IxI>N.
Since IN (x) - gN (y)I = Ix - yI, it follows that
j IgN(G,(s))-gN (C (s))I2 ds j IG (s)_C (s)I2 (stochastically).

to to
Since
gN (G (s)) -gN (Gft (s))12 ds 4 N2 t ,

to
it follows from the theorem on dominated convergence (with respect to the

variable w) that
t
E IgN(G)-gN(Ge)12ds= E19N(G)-gN(?;.)12ds---i0
!0 k
as n- oo. It follows from the same theorem (now applied to the variable
(s, w) E [to, tI x Q) that
E IgN (G (s))-G (s)12 d$ -- 0,
as N- oo (by virtue of the inequality IN (G (s)) - G (5)12 IG (3)12 and the as-
sumption (4.4.13)). Therefore, there exist sequences(Nk) and {nk} such that
/ /
E IG (s)-gN, (C,, (3))12 ds ; 2 E IG (s)-gNk (G (s))12 ds

to b
1
+2 j
N
-0
Accordingly, we can choose
Gk (3) 9Nk (GRk (s))
so as to get a sequence of step functions such that
kJim J EJGk(s)-G(s)12ds=0.
90
For this sequence, we have

t t
sk lim Gk dW = J G dW .
J
to to
However, by virtue of Theorem (4.4.2d) and what we have just proven, we see
that
t t
GpdWI2=JE[Gk-GpJ2ds-+0
Eli GkdW-J
to to to
ask, p - oo and hence

t t
qm limGkdWGdW. 0
to ti
In extending the concept of an integral from step functions to arbitrary func-
tions in M2 [to, t], the most important properties are naturally carried over also.
We summarize these in
(4.4.14) Theorem. Let G, G1, G2, and G. denote (d x in matrix)-valued func-
tions in M2 [to, tl and let Wt denote an in-dimensional Wiener process. Then,
the stochastic integral defined by (4.4.11) has the following properties:
t t t
a)
J
(aGr+bG2)dW =a J G1 dW+b J C2dW, a,bER'.
to 4 to
i Glk dWk\ W,
to
b) iGdW=
2:
90 k-1
J GdkdW
to
c) For N > 0 and c > 0,
ffl GdWI >c]:-SN/c2+P[ 1C12ds>Nl.

d) The relationship
r
st-lim IG (s)-GA (5)12 ds = 0

A-.= J
4
implies
r r
st-lim GdW
A-00
r0 to
(where the G. need not be step functions).

e) If
EIG(s)12ds <co,
to
then, for the expectation vector of the stochastic integral and its covariance ma-
trix, we always have respectively
E(GdW)=0
and
r e
E
q GdWI (f GdWI = EGG'ds;
hence, in particular,
E I i G dW IZ = f E IG12 ds.
ti I
Proof. Parts a) and b) follow immediately from parts a) and b) of Theorem

(4.4.2) by taking the limit. Similarly, part c) follows from Lemma (4.4.6). Part
d) can now be obtained from part c) just as in the proof of Lemma (4.4.9).
For the proof of part e), we use the fact that, in accordance with Lemma
(4.4.12), there always exists a sequence of step functions (GA) such that
lim
A. I EIGA-G12 ds=0
ro
and
4.5 Examples and Notes 75
I e
qm-lim f G. dW = G dW.
80 10
Now using Theorem (4.4.2), parts c) and d), we get from the last equation
E(f GdW)=0
90
E(f 1 EG.G.ds
- go
EGG'ds=E(i GdW)(JGdW).I
o/ to
Actual evaluation of the stochastic integrals is another matter. Of course, this
problem exists even with ordinary integrals. It is always possible to use the defi-
nition, which is of a constructive nature, to obtain an arbitrarily close approxi-
mation of any stochastic integral. Ita's theorem, which will be discussed in
section 5.3, is an important tool for explicit evaluation of many stochastic
integrals.
4.5 Examples and Remarks

The integral
(4.5.1) X, = i W, dW,
90
which we discussed in section 4.2, can now be evaluated without difficulty, for
example, in accordance with
(4.5.2) Corollary. Suppose that G E M2 [to, tj is continuous with probability 1.
Then,
G dW = stt-tim C (tk-1) (W,k-W,k-r)6--0

,
to k-1
Proof. We have
G. dW = G (tk
k=1 (W tk- Wtk-1)
t0
for the nonanticipating step functions
(4.5.3) G. (s) = j G (tk-1) 1Itk-1.'k) (S).

k=1
Therefore, in accordance with Lemma (4.4.9), what we need to show is that

ft
St- lim JIG (S)-G (S)I2 ds = 0.

80
This is true by virtue of the continuity of G w) with probability 1.1

Corollary (4.5.2) tells us that, for almost certainly continuous G, the simplest
nonanticipating step functions of the form (4.5.3), namely, those obtained from
the function itself, can be used for approximation of the stochastic integral. This
corollary remains valid if we replace almost certain continuity with stochastic
continuity.
If we apply this corollary to (4.5.1), we obtain
W dW = st-lim L Wtk-1 (Wtk-Wtk-1)

to
6"-o k=1
_ (W; -W;)/2-(t-to)/2,
which we can get from (4.2.4), for example.
For an m-dimensional Wiener process, we similarly obtain
W; M. = (IWgI2-IWtoI2)/2-m (t-to)/2.
to
(4.5.4) Corollary. If
IGI2 ds = 0 with probability 1,
where G E M2 [to, t], then
i G dW = 0 with probability 1.
to
Proof. The sequence of step functions G. _- 0 approximates G and yields the

result.
This last corollary tells us, for example, that the value of every function G can
be changed for every s in a fixed set of Lebesgue measure 0 in [to, tJ without
changing the value of its stochastic integral. In particular, this is true for the
values of the function at a finite or countable set of points s.
iEJG(s)J2ds<oo,
80
where G E M2 (to, tJ, then, for every positive c,
PNt GdWI>c]:5 EJG(s)[2ds/c2.

JJJ
o
This is simply Chebyshev's inequality. It holds in particular for every function

G E L2 (to, tj that is independent of w. However, in this case we can determine
exactly the distribution of the stochastic intergral.
(4.5.6) Corollary. If C is independent of to and belongs toL2 [to, t], it belongs
to M2 (to, tJ for any sub-sigma-algebras a, D 28 (to, sJ, and the stochastic integral
J GdW
90
is a normally distributed d-dimensional random variable with distribution
Ji (o, G (s) G (s)' ds).

to
Proof. If C is independent of w, we can find a sequence of step functions

that are also independent of w such that
(4.5.7) J JG-G.J2 ds -- 0
to
and (in accordance with (4.4.4))

e 9
(4.5.8) qm-lim j'G dW = I G dW.

90 to
Now the random variable
GR dW = .7 G. (ti-1) (W,,- W1i-1)

J
to i-1
certainly has distribution 91 (0, J

G. G;, ds) . Since, by virtue of (4.5.7),
to
G.G;,ds-r CG'ds,
b to
it follows that the limit in (4.5.8) is also normally distributed with the given first
and second moments.'
(4.5.9) Remark. Even when the step functions are such that
ac-lim IG (s)-G (s)II ds = 0,

n-00
80
this does not in general imply almost certain convergence of the integrals
to to
Let us now show that formula (4.1.6), which we have used as a definition of a
stochastic integral for smooth functions G that are independent of to, can be
proven for considerably more general Ito stochastic integrals. In other words,
Ito's integral and the stochastic integral (4.1.6) are consistent and coincide when
the latter is defined. Somewhat more generally, we have
(4.5.10) Corollary. Suppose that G E Md"n [to, t] and that the variation of
G w) defined on [to, t} is almost certainly bounded. Then,
(4.5.11) G(s) dW, = G(t)W,-G(to)W,0 - W; dG

10
where the last integral is the usual Riemann-Stieltjes integral. If, in fact, G (., w)
is almost certainly continuously differentiable on [to, t[ (or, more generally, ab-
solutely continuous) with derivative G, then
r
(4.5.12) G (s) dW, = G (t) W, - G (to) W,o - G (s) W, ds.
J
b b
Proof. By virtue of the continuity of W., both integrals in (4.5.11) exist under
our assumptions as ordinary Riemann-Stieltjes integrals, and (4.5.11) is the us-
ual rule for integration by parts, which, for continuously differentiable C, takes
the form (4.5.12). Of course, the stochastic integral of G with respect to W.
coincides with the Riemann-Stieltjes integral if the latter exists.
We note that G in Corollary (4.5.1) is not necessarily independent of to.
Chapter 5
The Stochastic Integral as a
Stochastic Process, Stochastic
Differentials
5.1 The Stochastic Integral as a Function of the Upper Limit
Let W, again denote an m-dimensional Wiener process. Let to denote a fixed non-
negative number. Let {5,; t to) denote a family of nonanticipating sigma-alge-
bras. Let M2 [to, TI denote the set of nonanticipating (d x m matrix)-valued
functions (see (4.3.4)) for which we have defined the Rd-valued stochastic inte-
gral
T T
I GdW=IG(s,w)dW,(w).
to to
Suppose that G belongs to M2 [to, T), that A C [to, T J is a Borel set, and that 1 A
is its indicator function. Then,
G1AEM2[to,TJ.
We therefore define
T
JGdW=IGIAdW.
A to
In accordance with Theorem (4.4.14a), for any two disjoint sets A, B C [to, TJ,
we have
J
GdW= IGdW+IGdW.
AvB A B
In particular, for to < a < b < c < T (by virtue of Corollary (4.5.4), finitely many
points do not change the situation),
c b c
IGdW=JGdW+f GdW.
a b
80 S. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
In particular, for every G in M2 [to, T],

t T
X,= G(s)dW,= G(s)IIo,,IdW
go 4
is an Rd -valued stochastic process defined (uniquely up to stochastic equivalence)
for all t E [to, T] such that
X,o = 0 almost certainly.
We have
X,-X, = J G(u) dW,,, to<s<t<T.

,
Now if G belongs to M2, that is, to M2 [to, t] for all t > to, then X, is defined for
all t_> to. Then, all the assertions made in this chapter regarding X, are valid (if
the corresponding assumptions are satisfied) without any upper bound on the
time, that is, for arbitrarily large intervals [to, T].
We now wish to investigate the process X, for fixed G E M2 [to, T1. For this,
we shall always assume that we have chosen a separable version of I, (see section
1.8), which is always possible.
(5.1.1) Theorem. Let G denote a function in M2 [to, T] and suppose that
X,= jG(s)dW t(,<t:5 T.

10
Then, the following hold:

a) X, is 5, -measurable (and hence nonanticipating).
b) If
(5.1.2) E ]G (s)12 ds < oo for alit < T,

to
then (X,, 5,), for t E [to, T1, is an Rd -valued martingale; that is, for to < s <
c9 T,
E (X, [ 5,) = X, .
Furthermore, for t, s E [to, T j we have
EX, 0,
5.1 The Stochastic Integral as a Function of the Upper Limit 81
min (t,,)
(5.1.3) E X, X; = E G (u) G (u)' du;
90
in particular,
t
(5.1.3a) E IX,I2 = E IG (u)12 du
J
ro
and, for all c>0 and to<a<b<<T,

b
(5.1.4) P[ sup
.
a.a a
IX,-Xal>c] <f EIG(s)12ds/c2
and
(5.1.5) E( sup IX, -XaI2) < 4 J

E IG (s)I2 ds.
a1t;g b a
c) X, has continuous sample functions with probability 1.

d) If, for some natural number k,
fr
EIG(s)12kds<oo, to<a<t<T,
then
r
(5.1.6) EIX,-XaI2k <(k(2k-1))k-r (t -a)k- r I EIG(s)I2kds.
a
Proof. a) The j5,; measurability of X, follows from the definition
X,(a) = T G. dW --. G dW = X, (stochastically)

J
to to
and the obvious 5,. measurability of the integral X;a) of the step functions
G,,.
b) 5, is an increasing family of sub-sigma-algebras 21. By part a) of the

present theorem, X, is &-measurable. By Theorem (4.4.14e), X, possesses,
under the assumption (5.1.2), finite first and second moments. Thus, we
need to show (see section 1.9) that, for to < s:5 t:5 T,
E(X,lif,)=X,
82 5. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
or, equivalently,
E(X,-X,E0 GdW15)=0.
This certainly holds for step functions since ` , and 9Z are independent
and E (W,,- W,,_,) = 0 and hence in general. Similarly, formula (5.1.3)
for the covariance matrix of the process X, is first proven easily for step
functions and then in general by taking the limit.
We know now that (X & is a martingale. Hence, (X, -Xa, 5,), for
tat a, is also a martingale and((X,-Xa12, &) is a submartingale (see
section 1.9). Then, inequality (1.9.1) yields, for every c > 0, to:-5 a b:-5
T, and p = 2,
(5.1.7) P( sup (X,-Xa(>c]< E(Xb-Xa(2.

C2
From (5.1.3), we obtain

t
E(X,-Xa(2=EX;X,-EXja-E X4 X,+E X. X. E(G(s)(2ds,
so that
b
E (Xb-Xa(2 = E (G (s)(2 ds.

J
If we substitute this into (5.1.7), we obtain (5.1.4). The estimate (5,1.5)

follows from inequality (1.9.2) for p = 2.
c) The continuity of X, is proven in three steps. Let T denote any finite
number.
Step 1. If G is a step function defined on (to, TI, then
T
X,=JGI1ot1dW=I G(k 1)(W,-Wr-1)+G(maxtj
<,
(Wt-Wma: tj)
The continuity of Xt follows from this formula and the continuity of W, .

Step 2' If G is a function in M2 (to, Tj such that
T
E (G(2 ds < oo,
to
5. 1 The Stochastic Integral as a Function of the Upper limit 83
we choose on the basis of Lemma (4.4.12) a sequence of step functions G,, such
that
T
lim
J
E IG (s)-G,,(3)I2 ds = 0.
to
By virtue of (5.1.4), if we set
X;"1= i G"dW,
90
we obtain
T
P [ sup IXt-X;"1I > c) < E IG-G"I2 ds/c2.
to;t:T 1
to
If we now choose a zero-approaching sequence {ck}of values of c and a subse-

quence {nk}of the sequence of natural numbers such that
T
E IG-C"k12 ds/ck < co,

k to
then
API SUP IXt-Xj"kll>Ckl<00.

k to:t:T
Therefore, by the Borel-Cantelli lemma (see section 1.6), there exists, for almost
all to E Q a ko = ko (w) such that
sup IX t
(w)-X("k) (w)I <= ck for all k ko (w).
to _taT
This makes Xt, with probability 1, the uniform limit of a sequence of continuous
functions, and hence continuous itself.
Step 3. Finally, for arbitrary G E M2 [to, T I, we approximate C with a function
GN, for N > 0, defined by
I
G (t) if IGI2 ds < N,
1
CN (t) =
0 if IGI2ds>N.
to
The process
X(N)
= J
GN dW
90
is, by virtue of step 2, continuous, and we have

X, (w) = X;N) (w) for all tin [to, Tj
for all w in the set
AN = {w: [G[2 ds<N} C D.
Now, P (Q - AN) can be made arbitrarily small by making N sufficiently large.

Consequently, those w at which X. (w) is discontinuous in [to, T j have probability 0.
d) For k = 1, we have the more precise result
I
E [X,-X4[2 = J
E [G (s)12 ds.
At Is = 2, we prove (5.1.6) again, first for step functions and then for general G,
by choosing a sequence of step functions G. with the property
E [G (s)-G (3)14 ds - 0
(see Gikhman and Skorokhod [5] , pp. 385-386). For proof for general k, we re-
fer the reader to Gikhman and Skorokhod [361, pp. 26-27. 1
5.2 Examples and Remarks

We showed in the example discussed in section 4.2 that only Ito's stochastic inte-
gral has the martingale property, from which the simple but nonetheless very
useful inequalities (5.1.4) and (5.1.5) follow.
Another consequence of the martingale property is
j E [G (s)[2 ds < 0
go
the process
X,= J GdW, to:tST,

to
has orthogonal increments; that is, for to < r <s t u <T,

(5.2.2) -
E (X X,) (X, - X,)' = 0.
Proof. We either use the martingale property or verify (5.2.2) directly by carry-
ing out the multiplication and applying formula (5.1.3) for the covariance matrix
)EG(v)G(v)'dv
go 90 90 90
0. 1
It would have been possible to obtain the orthogonality of the increments more
quickly from the following general formula:
(5.2.3) Corollary. Suppose that G and H belong to M2d "' [to, Tj and that
r r
E [G[2 ds < oo, f E [H]2 ds < oo.
1
to to
Then, for any two Borel sets A, B C [to, T], we have
E(jGdW)( HdWI = f EG(u) H(u)'du

A B / AnB
and, in particular,
t a min (t, s)
E(1GdW)(f HdWI = J
EG(u)H(u)'du, s,tE[to,T].
eo / to / to
Proof. From the definition in section 5.1, we have

T T
GdW = G1AdW,
J
HdW = I HIAdW.
J J
A k B h
Let us look at the individual matrix elements. We have
r r
E (SO IA Gi1 dWi) 1B Hkp dWp) = a1P E (G;1 Hh,,) ds
/ 4 / AnB
"
(certainly for step functions and hence in general), from which the assertion fol-
lows. I
(5.2.4) Remark. Although, under the assumption of Corollary (5.2.1), the stochas-
tic integral has orthogonal and hence uncorrelated increments, these increments
are not in general independent. The case in which G E M2 [to, TI is independent
of w constitutes an exception. Then,
8
(5.2.5) X, = T G dW,
to
where G E M2 [to, T] is independent of w, is, according to Corollary (4.5.6), a

Gaussian process on the interval [to, TI with expectation E X, = 0 and with co-
variance matrix
min (s, a)
EX,X;= G(u)G(u)'du.
Since uncorrelatedness of normally distributed random variables implies their

independence, X, is a process with independent increments.
Conversely, all (smooth) d-dimensional Gaussian processes for which E X, = 0
and X,, = 0 that have independent increments in the interval [to, Ti can be repre-
sented as stochastic integrals of the form (5.2.5). Specifically, if the variance
EX,X;=Q(t)
is an absolutely continuous (or even a continuously differentiable) function oft,
we can write
E X, X; _ q (s) ds, q (s) = Q (s).
Now, Q (t) is a nonnegative-definite symmetric d x d matrix that increases mono-

tonically with t ; that is,
Q(t)-Q(s)=0, tbs.
Therefore, q (a) is a nonnegative-definite symmetric d x d matrix and thus can
be written in the form
q (s) = G (s) G (s)',
where (A (a) is a d X d matrix. For example, we can always take
G (s) = U (s) A (s)112 U (s)'
or
G(s)=U(s)A(s)112
Here, the A (s) and U (s) satisfy

q (s) = U (s) A (s) U (s)';
i.e. A (s) is the diagonal matrix of the eigenvalues (arranged in increasing order)
and U (s) is the orthogonal matrix of (column) eigenvectors of q (s). With the
first choice, G (s) is again nonnegativedefinite. Now if W, is a d-dimensional
Wiener process, then the process
Y, = G (s) dW,, to < t < T

i
10
coincides, with respect to distribution, with the given process X,, as one can im-
mediately verify by calculating the first two moments.
(5.2.6) Remark. The Gaussian process mentioned in Remark (5.2.4) is, in the case
d = m = I and to = 0, "essentially" (that is, up to within a transformation of the
time axis) a piece of the Wiener process. To see this, let us set
r (t) _ G (s)2 ds, 0 : t : T.

0
Then, there exists a Wiener process W, such that
X,GdW=WT(,)
0
or
(5.2.7) XT-1(,) = W
as can be checked immediately by calculating the first two moments. For this
reason, we call r (t) the intrinsic time of X, . Here, T 1 (t) = min (s : s (s) = t) is
defined fort < r (T).
This consideration can be carried over to the case of arbitrary G E M2" ' [to, T j
if we define the intrinsic time by
r (t) = J
I G (s)12 ds, to = t = T.
10
Now, r (t) is itself a (nonanticipating) random function (see McKean [451, pp.
29-31). From the representation (5.2.7), for example, we immediately derive
the law of the iterated logarithm for a one-dimensional stochastic integral
X,= iGdW, GEM2 "[to,T[,

go
in the form
X, _
lim
94 ioP 2 T (t) log log -1
(5.2.8) Remark. The property of having Infinite length in every finite interval also
carries over from sample functions of W, to sample functions of the stochastic
integral X , . This follows, as for W, in section 3.1, from the following more
precise result (see Goldstein [37a] , Theorem 4.1): Suppose that GE Ill2d m Ito, TI,
to < tl < ... <t,= T < oo, 8" = mk x (tk - . Then for
X,= i GdW)
to
we have
T
(5.2.9) st-lim (X,k X1k_l) (Xtk-`Ytk_l)I = G (s) G (s)' ds
k it J
90
and, in particular,
" T
(5.2.10) stt-lim E IXtk-Xtk_,I2 = IG (S)I2 ds.
aft-0 k-1 J
to
One should note that random variables appear in general in the right-hand mem-
bers of (5.2.9) and (5.2.10). Thus, we have the alternative: Either X, = 0 (for all
t E Ito, TI), as will be the case if the right-hand member of (5.2.10) vanishes, or
X, is not of bounded variation on Ito, TI, as will be the case if the right-hand
member of (5.2.10) fails to vanish.
5.3 Stochastic Differentials. Ito's Theorem.

The relationship
t
Xt(w)= 1
G(s,w)dW,(w)
to
can also be written as
dXt = G (t) dW, .
This is a special so-called stochastic differential. We shall now define and investi-
gate such differentials. To do this, let us look at a somewhat more general sto-
chastic process of the form
5.3 Stochastic Differentials. Itt's Theorem. 89
1 e
(5.3.1) X, (w) = X,. (w)+ / (s, w) ds + G (s, w) dW, (w).

J J
N 9e
Here, we again assume the usual situation: W, is an m-dimensional Wiener pro-

cess, & is the accompanying family of sigma-algebras with events independent
of 28,', and G is a (d x m matrix)-valued function in Md., [to, TI = M2 [to, TI.
Then, the stochastic integral in (5.3.1) is completely defined for to t < T .
As regards X,0 and /, we now make the following assumptions:
a) X,o is an 5,o-measurable random variable (and as such independent of
and hence of W,-W,0 for t sto). In particular, this is the case when X,, is not
random.
b) The function / is an Rd -valued function measurable in (s, w) and nonantici-
pating: that is, / (t, .) is 5, -measurable for all t E Ito, TI, and we have with
probability 1
T
I if (s, w)I ds < oo.
to
We interpret the last integral, like the corresponding integral of / in (5.3.1), as

the usual (Lebesgue or possibly Riemann) integral of the sample functions
/ (', w)
(5.3.2) Remark. Both integrals in (5.3.1) are continuous functions of the upper
limit (the integral of / is in fact absolutely continuous!), so that the process X,
is an Ra -valued process that, with probability 1, has continuous sample functions.
Furthermore, X, ist`f; -measurable and hence nonanticipating, and we have, for
every s such that to _s:-5 t < T,
9 e
X, = X'+ f / (u) du + I G (u) dW".
Stochastic differentials are simply a more compact symbolic notation for rela-
tionships of the form (5.3.1).
(5.3.3' Definition. We shall say that a stochastic process X, defined by equa-
tion (5.3.1) possesses the stochastic differential / (t) dt +G (t) dW, and we shall
write
(5.3.4) dX,(t)dt+G(t)dW,
dt+GdW.
(5.3.5) Remark. In the shift from (5.3.1) to the stochastic differential (5.3.4), the
initial value X,, disappears, so that we can get from (5.3.4) only the differences
t t
X,-X,= J / (u) du+ G (u) dW
and must always specify X% when it is nonzero.

(5.3.6) Example. Suppose that d = m = 1, that to = 0, and that T is an arbi-
trary positive number. The differential notation of
W. dW, = W; /2 - t/2
0
is
(5.3.7) d(W;) = dt+2W, dW,.

If we construct the differential of W, formally, using Taylor's theorem, we ob-
tain
d (W,) = 2 W, dW,+(dW,)2.
Comparison with (5.3.7) shows that, in the case of the stochastic differential of
W,, we must regard the first two terms as first-order terms and must replace
(dW,)2 with dt (see remark (3.1.9)).
The overall explanation of the phenomenon is to be found in the following
theorem of Ite (42(. It says, in the language of stochastic differentials, that
smooth functions of processes defined by (5.3.1) are themselves processes of
this type. Here is It6's theorem in its most general form:
(5.3.8) Ita's Theorem. Let is = u (t, x) denote a continuous function defined
on Ito, TI x Rd with values in Rk and with the continuous partial derivatives
(k-vectors!)
= ut,
at u (t, x)
a
axi
u (t, x) = usi , x = (xl, , xd)',
a2
u(t,x)=u:i:i, i,j<d.
axi axi
If the d-dimensional stochastic process X, is defined on (to, T I by the stochastic

differential
dX, = / (t) dt+G (t) dW,, W, m- dimensional,
then thek -dimensional process
Y, = U (t, X,)
5.3 Stochastic Differentials. /td's Theorem. 91
defined on [to, TI with initial value Y,o=u (0, Xb) also possesses a stochastic dif-
ferential with respect to the same Wiener process W, , and we have
dY, _ (u(l, Xe)+ii (t, XI) / (t)+! uxixj (t, X,) (G (t)
(5.3.9a)
G (i)')i,) dt + ux (t, X,) G (t) dW, .
Here, ux = (ux1 , ... , uxd) is a k X d matrix and uxi z, is a k-dimensional column

vector.
(5.3.10) Remark. The double summation in (5.3.9a) can also be written as follows:
d rd
Lr U,i sj (G G')i1 = tr (uss G G') = tr (G G' uzs),

i=1 j=1
where uxx = (uxi x,) is a d x d matrix whose elements are k -vectors. Then, (5.3.9a)
takes the form
(5.3.9b) d Y, = u, dt + ux d X, + 1 tr (G G' u,.,) d t .

2
Let us now specialize Theorem (5.3.8) to the case k = m =1, which is important
for many applications:
(5.3.11) Ito's Theorem fork=m= 1. Let U= u (t, x1, ... , xd) denote a con-
tinuous function defined on [to, T] X Rd with continuous partial derivatives
u uxi and uxi xj for i, j,_5 d. Furthermore, suppose that d one-dimensional sto-
chastic processes Xi (t) are defined on [to, TI by the stochastic differentials
dXi (t) = fi (t) dt+Gi (t) dW i =1, 2, ... , d,
with respect to the same one-dimensional Wiener process. Then, the process
Y, = u (t, Xl (t), ... , Xd (t))
also possesses a stochastic differential on [to, T[, and
dY, = u, dt+ 2:ux. dXi+ ! Z ux.xt. dXi dXt1.

i=1 2 i=1 j=1
Here, the product dXi dXj can be calculated from the following multiplication
table:
x dW dt
dW dt 0
dt 0 0
In other words,
dX; dX; = G; G, dt, i, j : d,
and
dY,=(u,+ u:,/i+1 ± use=. C,C.)dt+(

i=1 2=1 i=1 ' ,=1
We single out the even more special case k = m = d =1 as

(5.3.12) Corollary (Ito's theorem for k = m = d 1). Let u = u (t, x) denote
a scalar continuous function defined on (to, T j X R' with continuous partial
derivatives u, us, and uss. If X, is a process defined on (to, T j with stochastic
differential
dX, =I dt+G dW,
where /, G, and W, are scalar functions, then Y, = u (t, X,) possesses on (to, T J
the stochastic differential
dY'_ u,(t, X,) +u: (t, X.) l (t)+ 2 usx (t, X.) G (t)2) dt
+us (t, X1) G (t) dW,
Before we give a proof of Theorem (5.3.8), let us make clear its scope and use-
fulness with various examples.
5.4 Examples and Remarks in Connection with It0's Theorem

The noteworthy, as compared with the usual differential extra term in (5.3.9) is
formed from the second derivatives uxi si . This term is the most frequent cause of
errors in purely formal manipulation of stochastic differential equations.
(5.4.1) Example. Theorem (5.3.11) yields in the case
u=x1 X2
the following result: if
dX1(t) = /1 (t) dt+G1(t) dW,
dX2 (t) = /2 (t) dt+C2 (t) dW
then
d (X 1(t) X2 (t)) = X, (t) dX2 (t)+ X2 (1) dX1(t)+C1 (t) C2 (t) dt

_ (X1 /2+X2 /1+G1 G2) dt+(X1 G2+X2 G1) dW,.
5.4 Examples and Notes in Connection With tto's Theorem 93
This is the rule for integration of stochastic integrals by parts. In integral form, it
is
X1 (t) X2 (t) = X1 (ta) X2 (to) +

J
X, dX2+ J X2 dX, + 1
G, G2 ds,
90 90 to
T.
In comparison with the corresponding formulas for ordinary integrals or differ-
entials, there is the extra term
G, G2 (dW)2 = G, G2 dt .
The choice X, (t) = t, X2 (t) = W, yields
d(tW,)=W,dt+tdW
and the choice X, (t) = X2 (t) = W, yields the familiar result
d(W;)=dt+2W,dW,.
(5.4.2) Example. We shall study, in particular, smooth functions of the Wiener
process itself. For the scalar situation, Corollary (5.3.12) yields with X, = W,
and 0<_ t<co
du (t, W,) _ u, (t, W,) + Z u :: (t, W,)) dt + u: (t, W,) dW, .
In the special case in which u = u (x) is independent of t and twice continuously

differentiable with respect to x, we obtain
2
(5.4.3a) du (W,) = u' (W,) dW,+ u' (W,) dt
or, what amounts to the same thing,
(5.4.3b) u (W,) = is (0) + u' (W,) dW, + 2 u' (W,) ds.

1
The most interesting term in (5.4.3b) is the stochastic integral with respect to
W,, for which we have thus found an expression containing only an ordinary in-
tegral.
Formulas (5.4.3a) and (5.4.3b) bring out sharply the essential characteristic of
the calculus of stochastic integrals, specifically, the presence of an extra first-
order term in the differential of smooth functions of a Wiener process IF,. Equa-
tion (5.4.3b) is sometimes called the "fundamental theorem of the calculus of
(Ito's) stochastic integrals."
(5.4.4) Example. For u (x) = x", where n = 1, 2, ... , and t z 0, formula
(5.4.3a) yields
d(W,)=nW,-'dW,+n(n-1)W`-2dt.
2
(5.4.5) Example. Let us look again at the one-dimensional case d = m =1. We
begin with the process
9 e
X, = X,o - J G (s)2 ds + J C(s) dW,, G E M2"' (to, T J ,
2 to to
and calculate the stochastic differential for the process

Y,=eX,
For u (x) = es, Corollary (5.3.12) yields
dY, = ex' G (t) dW,

or
(5.4.6) dY, = Y, G (t) dW, Y,a=ex'o=c>0.

This is a stochastic differential equation for the process Y, with initial condition
Ya> 0. From the above-given derivation of the equation, we know that the pro-
cess
Y, = Y,s exp 2 J G (s)2 d s + J G (s) dW,)

to N
satisfies this equation for t E (to, T].

For G w I and to = 0, we have the result that the equation
(5.4.7) dY,=Y,dW Yo=1,

has the solution
Y,=exp(W,-t/2), t2; 0.
If we interpret equation (5.4.7) as an ordinary differential equation for continu-

ously differentiable functions, we obtain Y, = c exp W, as its solution. There-
fore, we can say that the role of the usual exponential function in the calculus
of stochastic differentials is taken over by the function exp (W, - t/2).
(5.4.8) Remark. Let us return to the general equation (5.3.1). If X,o has a normal
distribution or if it is a constant and if the functions/ and G are independent of
w, we can generalize Remark (5.2.4) as follows: The process
X, (w) = Xt. (w)+ J f (s) ds + i G (s) dW, (w), to = t T,

Is
5.4 Examples and Notes in Connection With It3's Theorem 95
is a d-dimensional Gaussian process with independent increments, with expecta-

tion
8
EX,=EX4+f Ids
90
and with covariance matrix

min (t, s)
E(X,-EX,)(X,-EX,)'=Cov(X,o,X;o)+ J
G(u)G(u)'du.
so
Conversely, all smooth d-dimensional Gaussian processes with independent incre-

ments can be represented in this form. Specifically, if
E (X, - X4) = F (t), to : t 5- T,
is, for example, continuously differentiable, we can write
9
E (X, -X4) = J f (s) ds, f (s) = F (s).

4
The process
r
Y,=X,-X,o-J f(s)ds
90
satisfies the assumptions Y4 = 0 and E Y, = 0 made in remark (5.2.4). There-

fore, if the d x d matrix
EY,Y,=Q(t)
is, for example, continuously differentiable, there exists at least one d x d matrix
G such that
Q (t) = J q (s) ds = G (s) G (s)' ds, q (s) = Q (s).

J
0 4
Then, if W, is a d-dimensional Wiener process such that X4 and W, -W,o are, for
t >= to, independent, the process
8 s
Z,=X4+f f (s)ds+fG(s)dW to<=t=T,

4 go
coincides distributionwise with X,. It should be noted that W, can in general be

m-dimensional (where m d). It is only necessary to choose a d x m matrix G
such that 0 (t) = C (t) G (t)'.
Some authors (see, for example, Bucy and Joseph [611 ) have analyzed stochas-
tic differentials in terms of an m-dimensional Gaussian process V, with indepen-
dent increments Vo = 0, E V, = 0, and covariance
min (t, a)
EV,V;= q(u)du.
0
However, by virtue of what was said above, we can always represent V, in the
form
dV, = Go (t) dW Go (t) Go (t)' = q (t)-

Therefore,
dX,= f dt+GdV,=f dt+GGodW,;
that is, we can confine ourselves to differentials with respect to W, .
(5.4.9) Example. U (x) =x12 = x' x yields with (5.3.9b) fork = I and arbitrary
d and in
dJX,12 = 2 X; dX,+IG(t)12 dt.
5.5 Proof of lt8's Theorem

Since the proof of Theorem (5.3.8) differs from that of Corollary (5.3.12) only
in the more complicated notation, we shall confine ourselves to the proof of the
corollary. We can in fact simplify the situation still further without changing the
basic idea of the proof.
Itd's differential formula is a short notation for an integral expression for the
process Y, =u (t, Xt). It will be sufficient to prove the formula for step func-
tions f and G. As usual, the general case can be obtained by a passage to the limit.
Since the domain of definition of a step function is decomposed into finitely
many intervals in each of which it is constant (as a function of t), we can confine
ourselves to the case of constant f (t, w) f (w) and G (t, w) =G (w).
Thus, our initial process X, has the form
Xt = X" +f (t-to)+G (Wt-W,o), to=t =T,

where X,o, f, and G are random variables. The 5,o-measurability of X,o, f, and G
implies that they are independent of W,-Wto for t? to (though otherwise ar-
bitrary).
The process Y, has the form
Y, = u (t, X,) - u (t, Xb+f (t-to)+G (W,-Wt,))
5.5 Proof of /t8's Theorem 97
with
Yto = u (to, X to)
Suppose that to < t1 < ... < to = t _S T. Then,
(5.5.1) Yt - Yt0 = i (u (tk, X1k) - u (tk -1, Xtk _ 1))

k=1
With our assumptions on u (t, x), Taylor's formula yields

u (tk, Xtk)-u (tk-1, Xtk-1)= u't (tk-1 +dk(tk-tk-1),Xek-1)(tk-tk-1)
+us (tk-1, Xtk_1) (Xtk-Xtk-1)
(5.5.2)
+2 u, s (tk-1)Xtk-1+dk (Xtk-Xtk_1)) (Xtk-Xek-1 )2,
where 0 <dk and dk < 1. In view of the continuity of X, u, and u.., we see
that there exist random variables an and fn that converge with probability I to
0 as
tan = Max (1k-9k_1) --i 0
1_k_a
and that satisfy the inequalities
Max Iut (tk-1+dk (tk-tk-1),
1;ks Xek-1)-ut (1k-1, Xtk-1)) an
and
Max I uss (tk-1, Xtk_1+dk (Xtk-Xtk_1))-uss (tk-1, Xtk-1)I5 Yn

lik!n
Consequently, if we substitute (5.5.2) into (5.5.1) and replace dk and dk with 0,
since
E (tk-tk-1) = 1-to
k-1
and
n
st-lim Z (Xtk-Xtk-1)2 = G2 (t-to),
al-0 k=1
this does not change the limit of Yt - Yt, in (5.5.1) as 6n - 0. What we need to
show, then, is that
St-lim k2:I ut (tk-1, Xtk_1) (tk-tk-1)+ux (tk_1, Xgk-1)(Xtk-Xtk_1)

61-0 -1
+2 uss (tk-1, Xtk-1) (Xtk-Xek_1)2J =

= i (Ut (s, X,)+u2 (s, X,) / +I uxs(s,X,)GI ds+i ux(s,X,) GdW,.

90
2 96
By virtue of the continuity assumptions, we have
ac-lim Iu
an-0 k=1 t (tk-1, Xtk-i) (tk-tk-1) = i
ut (s, X,) ds
and
n
st-lim Y, u: (tk. i, Xtk-1) (Xek-Xtk-1) = u: (s, X,) f ds+

an_0 k=1 t0
us(s,X,)GdW,.
tp
We still need to take care of the sum
n n
IUss(tk-1,Xtk_1)(Xtk-Xtk_1)2=t/2 E U S(tk-1,X,k_1)(tk-tk-1)2
k=1 k=1
+2 / G E us: (tk-1, Xtk-1) (tk-tk-1) (Wtk-W tk-1)

k-I
n
+G2 Uzs (tk-1, Xtk-1) (Wtk- Wtk_1)2 ,
k-i
Since the first two sums in the right-hand member converge, by virtue of
the continuity of uss and W, to 0 with probability 1, it remains only to show
that
n t
(5.5.3) st-lim
anr0 E Uss (tk-1, Xtk_I)
k-I
(Wtk- Wtk-1)2 = 1 Uss (s, X,) ds.
to
Since
a t
ac-lim I u,s (tk-1, Xek_1) (tk- 1k-1) = uss (s, X,) ds
8n+0 k-1 to
(5.5.3) reduces to
st-lim S. = 0
an -0
with
n
Sn = Uxs(4-1'Xtk-1)((Wtk-Wtk_1)2-(tk-tk-1))
5.5 Proof of It8's Theorem 99
We now eliminate large values of ux by a truncation technique. For positive N,

let us define
(1 if .IX,,I-N for all ilk,
IN (w) _
10 otherwise,
ek = (Wtk- W,k-1)2-(tk-tk-1)
and
S.N _k-1I uss (tk-1, X,k_1) IN N

-*k-
Since E ek = 0 and E ek = 2 (tk - g1 )2 and the ek are independent of each other

and of uss (tk_1, Xtk-1) Ik_1, we have
ESN =0
and
0
E (SN E(uss(tk-1,Xek_1)Ik 1)2E4
k-1
a52 mIx Iu:s(s+Y)I (tk-tk-1)2

k-1
-+ 0 (6. -0).
For every fixed N>0, we therefore have
J-O
I'
-liSk=0 .
qm-IimS,k=s6,-0
The error resulting from the truncation is
(5.5.4) PIS,, SZJ=PI4max IX,I>NJ.

1SIS
Now, the quantity
ma_ IX,I = 94ma<t IXto+1(s-to)+G (W,-W,,)I
=IXt,I+111(t-to)+IG$ Amax IW,-W41

:5 a 16
is an almost certainly finite random variable, so that the right-hand member of

(5.5.4) can be made arbitrarily small by choosing N sufficiently great. Since
PIIS,I>eI`-PII4I>eI+PIS,, SZI
we have
stt-liomS, = 0.U
Chapter 6
Stochastic Differential Equations,
Existence and Uniqueness
of Solutions
6.1 Definition and Examples
Let us look at a stochastic differential of the form
(6.1.1 a) dX, = f (t, X,) dt + G (t, X,) dW,, X,,,=c, got < T < oo,
or, in integral form,
(6.1.1b) X, = c+ j f (s, X,) ds+ J G (s, X,) dW to: t:T < oo,
k 90
where X, is an Rd-valued stochastic process (for the time being, assumed known)
defined on [to, T] and W, is an m-dimensional Wiener process. The Rd-valued
function/ and the ('d x m matrix)-valued function G are assumed to be defined
and measurable on [to, T [ X Rd. For fixed (t, x), suppose that/ (t, x) and G (t, x)
are independent of w E d2 , i.e., that the random parameter w appears only
indirectly in the coefficients in equation (6.1.1) in the form/ (t, X, (w)) and
G (t, X, (w)). For a generalization, see remark (6.1.5).
Here, the process X which we assume known, must of course be constructed in
such a way that, after it is substituted into (6.1.1 a), the right-hand member will
become a stochastic differential in the sense of section 5.3. In particular, X, must
not anticipate (see remark (5.3.2)); that is, it must be , -measurable.
Equations (6.1.1) can also be interpreted as the defining equations for an unknown
stochastic process X, with given initial value X,, = c. With regard to the accompa-
nying family of sigma-algebras & we make once and for all the following
(6.1.2) Convention. For the purpose of treating stochastic differential equations
on the interval (to, T[, it is always sufficient to choose for J, the smallest sigma-
algebra with respect to which the initial value c and the random variables IV,, for
s t, are measurable, specifically,
t).
6. 1 Definition and Examples 101
By definition, 3r, must, for all t > to, be independent of

2ii,
For t = to, this means in particular that the initial value c and the Wiener
process W, -W,o must be statistically independent.
In particular, if c is with probability I constant, the independence of
W, - IV,, is trivially satisfied. In this case, we have (except for events of
probability 0)
a,=%a
Of course, can always be augmented with all events that are indepen-
dent of Xf3; . When a, is not specifically identified, we shall always assume
the above choice to have been made.
(6.1.3) Definition. An equation of the form (6.1.1 a) is called an (Ito's)
stochastic differential equation. The random variable c is called the initial
value at the instant to. Here, (6.1.1 a), together with the initial value, is only
a symbolic way of writing the stochastic integral equation (6.I.1b). A sto-
chastic process X, is called a solution of equation (6.1.1 a) or (6. 1.1 b) on the in-
terval (to, TJ if it has the following properties:
a) X, is5, -measurable, that is, nonanticipating for t E [to, TI.
b) The functions 7 (t, w) = / (t, X, (w)) and G (t, to) = G (t, X, (w)) (nonantici-
pating in accordance with a) are such that, with probability 1,
T
I I/(s,w)Ids<oo
10
and
r
(G (s, w)IZ ds < o0
J
90
(that is, G belongs to Mr" Ito, TI). Then, in accordance with section 5.3, the
right-hand member of (6. 1.1 a) is meaningful.
c) Equation (6.1.1 b) holds for every t E [to, TJ with probability 1.
(6.1.4) Remark. We therefore have the following situation: There are, on the
one hand, the fixed functions/ and G that determine the "system" and, on the
other hand, the two independent random elements c and IV.. For almost every
choice of c (w)and almost every Wiener sample function W. (w), we obtain via /
and G, in the case of a unique solution of (6.l .1), the sample function X. (w) of
a new process defined on [to, T j that satisfies (6.1.1). In accordance with (6.1.3a)
102 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
and our definition of it j,, X, is a functional of c and W, for s < t; that is, there
exists a function g (uniquely determined by / and G alone) such that
X,=g(c;W,,s:t)
Thus, (6.1.1) can be interpreted as a formula (in general very complicated) de-
termined by the functions f and G with the aid of which the process X. can be
constructed from c and W,. Here, only c (to) and the values of W, (tu), for s < t,
are used for construction of the value X, (to).
5t
Fig. 6.
Xr as a function of c and W fors
c I. G X,
(6.1.5) Remark. Stochastic differential equations of the form

dY, = f (t, Yt, Wt) dt+G (t, Yg, W,) dW1, Y,a = c,
can be converted to the type (6.1.1a) by adding the equation
dW, = dW,
and shifting to the (d+rn)-dimensional state vector
and hence to the equation
dX,=(oldt+IG)dWi, X,s=(c
a\\nd//G
The coefficients/ can depend on to in a more general manner as long as

they are nonanticipating. For further details, see Gikhman and Skorokhod [361,
pp. 50-53.
(6.1.6) Remark. An nth-order differential equation of the form
Y(,") = / (t, Y ... ,
Y.(A-1))+G
(t, Y ... , y,(n-1))
,
with initial values Y(,.= c; for i = 0, 1, ... , n- 1, where Y, is Rd-valued and f,
is an rn-dimensional white noise, can be converted in the usual manner by
1Y, 0
Y,
dX, = d Y` = )d+ dW,
Yt"-t) tY Y("-1))
G (t Y Y,(""'))
6.1 Definition and Examples 103
into a first-order stochastic differential equation of the type (6.l.la) for the Rd"-
valued process X, with initial value X10= (Co, ... ) C.- I)'. We shall consider the
case n = 2 in detail in section 7.1 (example (7.1.6)).
(6.1.7) Remark. Equation (6.1.1b) is equivalent to
9 ,
X,-X, =f /(u,X.)du+ G(u,X.)dWU, X88 =c,
where to - s:_5 t:5 T . From this it follows that, if X, (to, c) satisfies equation
(6.1.1 b), then the "semigroup property"
X, (to, c) = X, (s, X, (to, c)), t0 : s < t T.
holds.
(6.1.8) Remark. If X , is a solution of (6.1.1), then every process stochastically
equivalent to X, is also a solution. Specifically, if, for every fixed t E (to, TI,
X,=X,
with probability 1 (where the exceptional set must belong to a,), then, by virtue
of the tacitly assumed separability of all processes, we have, for almost all w,
X. (w) = X. (w) in Ito, T1.
This implies
J
J (s, X,) ds = J (s, X,) ds
J
90 to
and, by virtue of Corollary (4.5.4),
i G (s, X,) dW, = i G (s, X,) dW, .

to to
Substitution of a solution (which for the moment we assume to exist) into the
right-hand member of (6.1.1 b) yields, in accordance with remark (5.3.2), a con-
tinuous function of t, which, being a solution, is at the same time almost certainly
equal to the left-hand member. It follows that, for every solution of (6.1.1),
there exists a stochastically equivalent solution with almost certainly continuous
sample functions. Therefore, we shall always consider continuous solutions of
stochastic differential equations
(6.1.9) Example. If G =_ 0, the fluctuational term in (6.1.1) disappears. We in-
terpret (6.1.1) as the ordinary differential equation
Xr =1 (t, X,), t0= t:_5 T,

with initial condition X,o=c. A random influence can show up only in the initial
value c.
(6.1.10) Example. If the functions I (t, x) (t) and G (t, x) = G (t) are inde-
pendent of x E Rd and if / E Lt [to, T) and G E L2 [to, T(, then
dX, _ / (t) dt+G (t) dW,
is a stochastic differential whose coefficients are independent of X, and hence in-
dependent of w. Therefore, in [to, T[, the unique solution of (6.1.1) is
X, (w) = c (w)+ / (s) ds+ J G (s) dW, (w).

J
to 90
In accordance with remark (5.4.8), the process X, is, in the case of normally dis-
tributed or constant c, ad-dimensional continuous Gaussian process with inde-
pendent increments, with expectation
e
EX,=Ec+ /(s)ds
to
and with covariance matrix (X,o=c and X,-X,o are independent!)

min (t,,)
E(X,-EX,)(X,-EX,)'=Cov(c,c')+ J
G(u)G(u)'du,
to
for s, t E [to, T(.

(6.1.11) Example. The example (5.4.5) shows that, if we choose d = m = I and
a function g E L2 (to, TI, the process
9 9
X,=exp(-2 J g(s)2ds+j g(s)dW,)

k 80
is a solution of the equation
dX, = g (t) X, dW, X,a=1.
In this case, / (t, x) = 0 and G (t, x) = g (t) x. The uniqueness of the solution for
all functionsg that are bounded on (to, T[ follows from section 6.2. For the
special case g =_ 1, the equation
dX,=X,dW , X,o=1,
has, for every interval [to, TJ C 10, oo), the solution
X, = exp (W,-W,o-(t-to)/2), t:to.

6.2 Existence and Uniqueness of a Solution 105
6.2 Existence and Uniqueness of a Solution

To ensure existence and uniqueness of a solution of an ordinary differential
equation
(6.2.1) X, X,), X,o=c,
on the interval [to, TI , we usually assume that / (t, x) satisfies a so-called Lip-
schitz condition in x and boundedness with respect to t for every x. These con-
ditions ensure that the Picard-Lindelof iteration procedure
Xt") = c+ / (s, X,(°-1)) ds, X(O)

= c,
*0
will converge to a solution of the integral equation
X,=c+ f (s, X,)ds

90
which is equivalent to (6.2.1). Since an ordinary differential equation is a special

case (with G-0) of a stochastic differential equation and we wish to obtain a
solution of a stochastic differential equation by means of a similar iteration pro-
cedure, our sufficient conditions are modeled after the classical ones. We have the
following theorem, which we shall refer to in what follows as the existence-and-
uniqueness theorem.
(6.2.2) Theorem. Suppose that we have a stochastic differential equation
(6.2.3) dX,=J(t,X,)dt+G(t,X,)dW X,0=c, to<t<T<oo,
where W, is an R'"-valued Wiener process and c is a random variable independent
of W, -W,, for t > to. Suppose that the Rd -valued function / (t, x) and the
(d x in matrix)-valued function G (t, x) are defined and measurable on Ito, TJ X
Rd and have the following properties: There exists a constant K > 0 such that
a) (Lipschitz condition) for alit E (to, TJ, x E Rd, y E Rd,
(6.2.4) U (t, x)- f (t, y)I + IG (t, x)-G (t, y)I = K Ix--yI
(IGI2=trGG').
b) (Restriction on growth) For all t E Ito, TJ and z E Rd,
(6.2.5) If (t, x)12 + I G (t, x)I2 < K2 (1 + Ix12).
Then, equation (6.2.3) has on Ito, TI a unique Rd-valued solution X continuous
with probability 1, that satisfies the initial condition X,,,= c; that is, ifX, and Y,
are continuous solutions of (6.2.3) with the same initial value c, then
P( sup IX,-Y,1>01=0.
to gigr
Proof: First we shall prove the uniqueness; then, with the aid of an iteration pro-
cedure, we shall prove that a solution exists.
a) Uniqueness. Let X, and Y, denote two continuous solutions of (6.2.3). We
would like to show that
EIX,-Y,I2=0 forall tE[to,TI.
However, since the second moments of X, and Y, are not necessarily finite, we
must again work with a truncation procedure. Suppose that, for N>0 and
t E [to, TI,
(1, if I X,I : N and I Y,) - N for to _<s : t,

IN (t) _ {I
0, otherwise.
Since
IN (t) = IN (t) IN (s), s= t,

we have
(6.2.6) IN (t) (X,-Y,) = IN (t) IN (s) (f (s, X,)-f (s, Y,)) ds
+ J IN (s) (G (s, X,)-G (s, Y,)) dW,l.
The last integral is meaningful since IN (s) is nonanticipating (and, by virtue of

the assumption on X, and Y, so are G (s, X,) and G (s, Y,) [see remark (4.3.8)] ).
For s E [to, t), the Lipschitz condition (6.2.4) implies
IN (s) (11 (s, X,) - f (s, Y,)I + IG (s, X,)-G (s, Y,)) )
KIN (s)IX,-Y,I,2KN.
Therefore, the second moments of the two integrals in (6.2.6) exist. By virtue of
the inequality 1.% + y12 - 2 (IxI2 + Iyi2), Schwarz's inequality, and formula (5.1.3a),
we obtain from (6.2.6)
E IN (t) J X,_ Y,I2 < 2 E 1

IN (s) (f ($, X,)-f (s, Y,)) dsl2
+2 E IIN (s) (G (s, X,)-G (s, Y,)) dW,I2

to
,
< 2 (T -to) E IN (s) I f (s, X,)-f (s, Y,)I2 ds
+2 J
EIN(s)IG(s,X,)-G(s,Y,)I2ds.
90
We now use condition (6.2.4) for the integrands and we obtain, for L = 2 (T -to
+1)K2,
,
(6.2.7) EIN(t)IX,-Y,IZ<L J EIN(s)IX,-Y,I2ds.
90
From this we would like to conclude that

(6.2.8) E IN (t) IX,_ Y,I2 = 0, t E Ito, TI,
which we can, in fact, do with the aid of the following lemma of Bellman and
Gronwall: Ifg0 and h are integrable on Ito, T] and if
g(t)5L g(s)ds+h(t), to5t---- T,

90
for L > 0, then
g(t)Sh(t)+L f e6(,-,)h(s)ds, to <tST.

to
Proof can be found, for example, in Gikhman and Skorokhod [51, p. 393. The
choices h (t) = 0 and
g (t) = E IN (t) IX, - Y,I2
yield (6.2.8); that is,
IN (t) X, = IN (t) Y, with probability 1
for every fixed t E [to, T1. The inequality
P[IN(t)*1inIto,T]1;g PI sup IX,I>N1+PI sup IY,I>N1
toS,ST ,o;E ,_1
and the continuity (hence boundedness) of X, and Y, imply that we can make
the right-hand member of the last inequality arbitrarily small by taking N suf-
ficiently great. Therefore,
X, = Y,
with probability I for every fixed t E Ito, T1 and hence for a countable dense
set M in Ito, T1. Now X, and Y, are assumed to be almost certainly continuous,
so that coincidence in M implies coincidence throughout the entire interval
Ito, T) and hence
P[ sup [Xt-Y,I>0J=0.
to it =T
This proves the uniqueness of a continuous solution-with only satisfaction of
the Lipschitz condition (6.2.4) assumed.
b) Existence. We first treat the case
E [c[2 < oo.
Let us begin an iteration procedure with X,°) = c and let us define, for n > 1 and
t E [t0, TJ,
t t
(6.2.9) X;")=c+1 /(s,X;n-1))ds+G(s,X,' 1))d W,.

to t0
If X("-1) is nonanticipating and continuous, then, by virtue of the assumption

(6.2.5), the right-hand member of (6.2.9) is a stochastic differential and hence,
in accordance with remark (5.3.2), defines a nonanticipating continuous process
Xt on [t0, T J. Now, X;°) is nonanticipating and continuous, and with it so are all
the processes X;") for n > 1.
Let us now show that converges uniformly on [t0, TI to a solution X, of
equation (6.2.3) (in the sense of definition (6.1.3)). The assumption E I c 12 < 00
implies
sup E JX(0)[2 < oo.

to99ZT
That this holds also for the subsequent processes X,) can be seen as follows: by
virtue of the inequality lx+y+z[2 < 3 ([x[2 + [yJ2 + [:J2) and (6.2.5), it follows
from (6.2.9) that
t
E [X'(0[2 < 3 E [c[2+3 (T-t0) ( K2 (1 + E [X."-1)[2) ds
90
+3 1
K2 (1 + E JX;n-1)12) ds
to
<3E[c12+3(T-t°+1)K2(T-t0)(1+ sup E[X;""1)[2),
to;5*ST
so that %
(6.2.10) sup E [X;4)12 < oo for all n 1,

to:t: T
since this is true for n = 0.
Since E ICV < oo, equation (6.2.7) for X("+')-X;") can now be derived without
IN (t) ; that is,
E IXtn+1)-X,(n)I2 G L
jE IX:n)-X,("-1)I2 ds
80
with the same constant L = 2 (T - to + 1) K2. If we carry out an iteration on this

inequality, using the familiar Cauchy formula
<
t Sn 1
J J
... g(s)ds dt,...dt"-'= g(s)((n-)1 ds,
$0 10 90 to
we get
( )
(6.2.11) E I X("X,n)i2 L" EIX;')-X;°)I2 ds.
(n - 1)!
Now, under the assumption (6.2.5),
EIX(')-X,o)I2 <2(T-to+1)K2 10 r-EIcI2)ds

to
<L(T-to)(I+EIc12)=C.
Therefore, it follows from (6.2.11) that
(6.2.12) sup E IX(n+')-X;")I2 <C(L (T-to))nln!, n2!0.
t0 ; t9T
To prove the uniform convergence of X;") itself, we need to find an estimate for
do = sup IX`n+')-X(,n)I
to:t;9 T
It follows from (6.2.9) that
T
do (l (s, Xin))-' (s, X,"-'))I ds
80
+ sup I (C (s, X,n))-G (s, dW.

to:tgT t0
If we now use inequality (5.1.5) and the Lipschitz condition, we get

r T
E d2 < 2 (T-to) K2 E IX;4)-X;n-')I2 ds+2.4 K2
J 1
t0 to
E IX'(n)-X;4-1)I2 ds.
Then, with (6.2.12) for n 0,

E do (2 (T-to)+8) K2 (T -to) C (L (T -to))"-'/(n-1)!
= C1 (L (T -to))"-1/(n-1)!.
By virtue of the Borel-Cantelli lemma and Weierstrass's convergence criterion,

convergence of the series
00 cc
P[d">n-2J<C1 I (L(T-to))"-1 n4/(n-1)!
"=1 n=1
implies that
n
ac-lim X; 0i+ i=1 (X;`1-X(`-'1) =ac-lim
"roc X;"1 = X,
A-.oo
uniformly on (to, TI. Since X, is the limit of a sequence of nonanticipating func-

tions and the uniform limit of a sequence of continuous functions, it is itself
nonanticipating and continuous. Because of the restriction on the increase in /
and G and the continuity of X, , the right-hand member of (6.2.3) becomes,
when we substitute X, into it, a meaningful stochastic differential. Therefore, it
remains to show that X, satisfies the equation
< t
(6.2.13) X, = c+ / (s, X,) ds+ G (s, X,) ds
to to
for all t E Ito, TI. Since X(') = c (for n 0), this is obvious for t = to. Fort E (to,
we take the limit in equation (6.2.9). By virtue of (6.2.4) and the uniform con-
vergence of we have with probability 1
(s, X;"1) ds- (s, X,) ds < K i JX;"1-X,i ds --- 0

to $0 0
and
G (s, X;"))-G (s, X,)12 ds < K2 ds -, 0,

90 to
so that
e t
ac -lim / (s, X;"i) ds = / (s, X,) ds

"roo
to to
and, by Theorem (4.4.14d),

6.3 Supplements to the Existence-and-Uniqueness Theorem III
t t
st-lim j G (s, X;°i) dW, = G (s, X,) dW,

n -00 J
90 90
Equation (6.2.13) then follows. Thus, X, is a solution of equation of (6.2.3). The

case of a general initial condition c is reduced to the case E Ic12 <oo by defining
{c, if lc, N,
CN =
0, otherwise,
and taking the limit as N-- oo. For more details, see, for example, Gikhman and
Skorokhod [51, pp. 395-397. 1
6.3 Supplements to the Existence-and-Uniqueness Theorem

The method used for proving Theorem (6.2.2) consists basically in the Picard-
Lindelof iteration procedure with the Borel-Cantelli lemma forming the bridge
from estimation of the error to convergence of the iteration procedure.
The Lipschitz condition (6.2.4) guarantees that f (t, x) and G (t, x) do not
change faster with change in x than does the function x itself. This implies in
particular the continuity off (t, -) and G (t, ) for all t E Ito, TI. Thus, functions
that are discontinuous with respect to x and even continuous functions of the
type
f(t,x)=IxIu, 0<a<1,
are excluded as coefficients. It is known that the classical problem (d=1)
Xt = J
IX,I°` ds
to
has, for a > 1, only the solution X, = 0 but, for 0 < a < 1, it has also the solution
X, = (A(t-to))ua
where =1- a. I.V. Girsanov has shown [37[ that the equation
t
X, =J IX,I°dW, (d=m=1)
0
has exactly one nonanticipating solution for a> 1/2 but infinitely many for
0<a<1f2.
In order to be able to admit functions like sin x2 (which become steeper and
steeper with increasing x) as coefficients, we need
1 12 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
(6.3.1) Corollary. The existence-and-uniqueness theorem remains valid if we

replace the Lipschitz condition with the more general condition that, for every
N>0, there exist a constant KN such that, for all t E Ito, T), IxI < N, and
IyI : N,
(6.3.2) 1/0, x)-f (t, y)i + IG (t, x)-G (t, y)I KN Ix-yi.
To prove this, we again use a truncation procedure, as we have done several times
already, and then take the limit as For more details, see Gikhman and
Skorokhod 1361, pp. 45-47.
A convenient sufficient condition for satisfaction of a Lipschitz condition follows
from the mean-value theorem of differential calculus. According to this theorem,
for a scalar function g (x), where x E Rd, with partial derivatives gx,, we have
g(b)-g(a)=g=(a+$(b-a))'(b-a); 0<6<1; a,bERd.
where g: _ (9X1, ... , gxd)'. If Ig,,l is bounded on Rd by a constant C, then, for
all x and yin Re',
Ig(Y)-g(x)i` sup Ig=(=)IIY-xI:!E-:CIY-XI,

sERd
which is a Lipschitz condition. When we apply this to each component off and
G, we get
(6.3.3) Corollary. For the Lipschitz condition in the existence-and-uniqueness
theorem (or its generalization (6.3.2)) to be satisfied, it is sufficient that the
functions f (t, x) and G (t, x) have continuous partial derivatives of first order
with respect to the components of x for every t E Ito, TI and that these be
bounded on Ito, T) X Rd (or, in the case of the generalization, on [to, TI x
{I xi = N}).
Let us now discuss the meaning of the second assumption (namely, inequality
(6.2.5)) in the existence-and-uniqueness theorem. This assumption bounds / and
G uniformly with respect to t E (to, T] and allows at most linear increase of
these functions with respect to x. If this condition is violated, we get the effect
(familiar from the study of ordinary differential equations) of an "explosion" of
the solution. Let us illustrate this with the scalar-ordinary-differential-equation
initial-value problem
dX, = X; dt, Xo = c.
The solution is
10, if C=O,
X, =
1 (11C-t)-1, if C*0.
Thus, the trajectory X, is defined for c>0 only on the interval to, 1/c). At t
6.3 Supplements to the Existence-and-Uniqueness Theorem 113
1/c, a so-called explosion takes place. For given 10, T), there always exist initial
values, namely, those for which c >--1/T, for which the solution X, is not defined
throughout the entire interval (0, T(. The restriction on the growth of / and G
guarantees that, with probability 1, the solution X, does not explode in the inter-
val [to, T(, whatever the initial value X,.= c. For further remarks about explos-
ions, see McKean [451 and also (6.3.6)-(6.3.8).
(6.3.4) Remarks concerning global solutions. If the functions / and G are de-
fined on [to, oo) x Rd and if the assumptions of the existence-and-uniqueness
theorem hold on every finite subinterval [to, TI of [to, oo), then the equation
e
X,=c+f /(s,X,)ds+f G(s,X,)dW

to to
has a unique solution X, defined on the entire half-line [to, oo). Such a solution
is called a global solution. The assumptions listed are satisfied, in particular, in
the following special case:
(6.3.5) Corollary. Consider the autonomous stochastic differential equation
dX, = / (X,) dt +C (X,) dW X,.=c,
(By autonomous is meant that / (t, x) - / (x) and G (t, x) = G (x), where
f (x) E Rd and G (x) is a d x m matrix.) For every initial value c that is inde-
pendent of the m-dimensional Wiener process W,-W,0 for t =to, this equation
has exactly one continuous global solution X, on the entire interval Ito, oo) such
that X,0 = c provided only the following Lipschitz condition is satisfied: there
exists a positive constant K such that, for all x, y E Rd,
If (x)-/(y)I+IC (x)-G (y)I = K Ix-y(.
The restriction on the growth of/ and C follows from this global Lipschitz con-
dition (we fix y = yo).
(6.3.6) Example. Suppose that d = m = 1. Consider the autonomous equation
dX, exp (- 2 X,) dt + exp (- X,) dW,.

2
The coefficients in this equation do not satisfy any Lipschitz condition (or any
growth restriction) for x<0. Therefore, we must allow for possible explosions
of the sample functions of X, . The function
X, = log (Wt-Wto+e`)
is a unique local solution on the interval (to, rl) with instant of explosion
t)=inf(t:W,-W,.=-ec)>0
One can verify this formally with the aid of Itb's theorem. The existence and
uniqueness are consequences of the following theorem of McKean Q451, p. 54):
(6.3.7) Theorem. Suppose that d = m = 1. The autonomous stochastic differ-

ential equation
dX, = f (X,) dt+G (X,) dW X"=C,
where f and G are continuously differentiable functions, has a unique local solu-
tion that is defined up to a (random) explosion time rt in the interval to <,q< oo.
If 11 < oo, then X,, _ o = - oo or + oo.
(6.3.8) Remark. In accordance with Corollary (6.3.5), the satisfaction of a
Lipschitz condition (in fact, the growth restriction) on/ and G implies that
r7 = oo with probability 1.
A second sufficient condition for non-occurrence of an explosion in the case of
continuously differentiable G is that f =- 0, that is, that the systematic part be
absent. This follows from application of a sensitive test for explosion discovered
by W. Feller. (This test, as well as a d-dimensional analogue discovered by Khas'-
minskiy, can be found in McKean [451, p. 65.) It should be emphasized that, for
d >- 2, the condition / = 0 is no longer in general sufficient to preclude an ex-
plosion (see McKean (451, p. 106 (problem 3)) if the growth of G is not res-
tricted.
(6.3.9) Remark. The above cited example of Girsanov
dX, = IX,I' dW d=m=1, c=0,
suggests that the Lipschitz condition for G might possibly be replaced in the
scalar autonomous case with the so-called Holder condition with exponent
a>1f2:
IG(x)-G(y)p<Kjx-yj°, x,yER', a>1/2, K20,
This is in fact the case (see, for example, W. J. Anderson [301, p. 76).
(6.3.10) Remark. By definition, the solution X, of a stochastic differential
equation (as well as X, - X,,, where t > u > to ) is nonanticipating and hence is, for
every t E Ito, T), statistically independent of W,-W for s> t. We emphasize
once again the functional point of view and refer to remark (6.1.4). By virtue
of the construction of the solution by means of an iteration procedure, X, is not
only-measurable but measurable with respect to the sigma-algebra generated
by c and W, -W,o for to = s ;g t. The functiong mentioned in remark (6.1.4)
therefore depends only on c and W, - W,0, for to < 3:_5 t :
X, (w) = g (c (w); W. (w) - W,. (w), to s = t)

For ill %tration, consider the example (6.3.12). More generally, we have the re-
sult that X, is a function of X, and W. - W, for s < u < t, i.e.
X, = g, (X.; W. - W., s :5 u = t).
6.3 Supplements to the Existence-and-Uniqueness Theorem 115
(6.3.11) Example. Let us return to example (6.1.11) and consider, for d = m =

1, the stochastic differential equation
(6.3.12) dX,=g(t)X,dW X,,=c, to:t<T.
Thus, f (t, x) = 0 and G (t, x) = g (t) x. The assumptions of the existence-and-
uniqueness theorem are satisfied if g (t) is measurable and bounded on (to, T J.
Therefore a unique solution exists. We assert that this solution is
X,=cexp J g(s)2ds+f g(s)dw,)

(_2 'o
In any case, X,o = c. If c = 0, then X, - 0 is obviously a solution. Let us now sup-

pose that c> 0. If we set
Y,=log c-2 Jg(s)2ds+jg(s)dW,

go 10
and evaluate the stochastic differential of X, = exp Y, with the aid of Ito's
theorem, we get equation (6.3.12). If c<0, we consider the process -X, and
obtain the same result.
Other examples will be found, in particular, in Chapter 8.
Chapter 7
Properties of the Solutions of
Stochastic Differential Equations
7.1 The Moments of the Solutions

In this section, we shall assume that the conditions of the existence-and-unique-
ness theorem (6.2.2) are satisfied and we shall investigate the moments E IX,Ik of
the solution of
(7.1.1) dX, = f (t, X,) dt +G (t, X,) dW X'4=C,
on [to, T). These moments do not in general have to exist but their existence car-
ries over from the initial value c for all valuesX,. More precisely, we have
(7.1.2) Theorem. Suppose that the assumptions of the existence-and-uniqueness
theorem are satisfied and that
E )C)2" < oo
where n is a positive integer. Then, for the solution X, of the stochastic differen-
tial equation (7. 1.1) on [to, T1, where T < oo,
(7.1.3) E IX,12n <(1 +E Icl2") eC(t-to)
and
(7.1.4) E IX,-cl2n < D (1 + E Ic[2a) (t-to)" eC(t-to)
where C = 2 n (2 n + 1) K2 and D are constants (dependent only on n, K, and

T - to).
Proof. In accordance with Ito's theorem, IX,12n has a stochastic differential with
the following integral form
t t
IX,12n = 1ICI2n+ f 2 n iiIX,112n-2 X, f (s, X,) ds + f 2 n IX,I2n-2 X,
`Jo
t0
t
IX,I2n-2
G (s, X,) dW, + J n IG (s, X,)I2 ds
to
ZI The Moments of the Solutions 117
+ J 2 n (n-1) IX,I2n-4 IX, G (s, X,)I2 ds

80
(where the last term is absent if n = 1). That E IX,I2n exists if E IcI2" <oo fol-
lows from step b) in the proof of Theorem (6.2.2). We take the expectation on
both sides of the last equation and, keeping the relationship
E(J 2n IX,I2n-2 X,G(s,X,)dW,f =O

180
(see (4.5.14)) and equation (6.2.5) in mind, we obtain
iX,12n _ E ICI2n+ J E (2 IX,12n-2 X;

E n / (5, X,)
90
+nIX3I2n-2IG
(6,X,)12+2 n(n-1)IX,I2n-4IX,G (S,X,)I2)ds
EIC12n+(2n+1)nK2
J E(1+IX,I2)IX,12n-2ds.
90
Since (1 + 1x12) IxI2n-2 < I + 21x12", we have
EIX,I2n <EIcI2n+(2 n + 1) n K2 (t - to)

I
+2n(2n+1)K2 J EIX,I2nds,
90
Therefore, by virtue of the Bellman-Gronwall lemma already used in the proof

of Theorem (6.2.2), we have
I
J exp(2n(2n+1)K2(t-s))
to
h (s) ds
where
h (t) = E IcI2,.+(2 n+ 1) n K2 (t-to),
from which (7.1.3) follows.
Inequality (7.1.4) is obtained in a similar manner. We shall treat only the case n =
I and refer the reader to Gikhman and Skorokhod [36] , pp. 49-50, for general n
(though the scalar case).
Since Ia + b12 < 2 (Ia1 + IbI), we have
EIX,-cI2<2EIIf(s,X,)dsI2+2E18,0 JG(S,X,)dW,l2
to
1 18 7. Properties of the Solutions of Stochastic Differential Equations
t t
<2(T-to) J Elf (s, X,)12ds+2 jEIG(s,X,)12ds.
to 90
The growth restriction (6.2.5) yields, with L = 2 (T - to + 1) K2,

t
EIX,-cl2<L J (1+EIX.I2)ds,
$o
and, with the result (7.1.3),

t
E IX,-c12 < L (1 +(1 +E IcI2) ec(t-to)) ds

J
90
= L (t -to) (1 + (1 +E IC 12) (ec(t-to) _ 1)/C (t - to))

< L (t - to) (1 + (1 + E IcI2) ec(t-to))
D (1 +E Icl2) (t -to) ec(t-to)
with D=2 L.1

(7.1.5) Remark. By virtue of remark (6.1.7), X, = X, (to, c) is also a solution of
the same stochastic differential equation on every subinterval is, TI, where to < s,
with the initial condition X, = X, (to, c). Therefore, in (7.1.4), we can replace c
with X, and to with s. Then, by virtue of (7.1.3), we get the inequality
E IX.-X.12,, < C, It-sI^, t, sE [to, TI,
where C, depends only on n, K, T - to, and E Ic I2 n. For n = 1, it then follows

under the assumption E IcI2 < oo that
lim E IX,-X,I2 = 0,
t+t
that is, the solution X, is mean-square-continuous at every point of the interval
[to, TI (but this does not imply mean-square differentiability [see also section
7.21).
Of great importance are the functions E X, and K (s, t) = E X, X;, which are
meaningful for E IcI2 < oo although these do not in the general (nonlinear) case
satisfy any simple equation. For example,
t
m,=EX,=Ec+jEf(s,X,)ds, to <_ t:_5 T,
98
but E f (s, X,) cannot in general be expressed as a function of m,. A similar situ-
ation (see example (5.4.9)) holds for
Z/ The Moments of the Solutions 119
trK(1,t)=EJX,12=EIc12+f 2E(X; f (s,X,))ds

1 10
+ E IG (s, X,)I2 ds.

Closed expressions can be obtained for m, and K (s, t) in the linear case (see
Chapter 8). For the case d = 2, m = 1, the following is an example:
(7.1.6) Example (of a second-order stochastic differential equation). J. Gold-
stein [37a] has investigated the scalar second-order differential equation
Y,Y,)+G(t,Y:, tort<T,
with initial conditions Y10 = co, Y,o = cl, which is disturbed by a scalar white
noise. Using remark (6.1.6), we convert this equation into a stochastic differ-
ential equation for the two-dimensional process X, _ (Y,, k,),:
0
dX, = d (Y`1= ( Y' \ dt+( 1 dw,.
Y Y,) G (t, Y Y,)/
The existence-and-uniqueness theorem (6.2.2) ensures the existence of a unique

solution if the original coefficients/ and G satisfy the Lipschitz and bounded-
ness conditions.
The sample functions of the process Y, are ara, differentiable with derivative Y,
although k, is not in general of bounded variation or differentiable (see section
(7.2)). If Y, is interpreted as the position of a particle, this particle possesses a
velocity though in general (for G $ 0) no acceleration. Compare Chapter 8 for
linear / and G.
For the evaluation of the first two moments of Y, and k, (whose existence is
ensured, by virtue of (7.1.2), under the assumptions E co < oo, E cl < oo, we
write for simplicity to (t) f (t, Y, Y,) and Co (t)=G (t, Y, Y,). Then,
r
EY,=Eco+f EY,ds,
EY,=Ecl+ Eto(s)ds,
b
and, for the variances (see Goldstein [37a] , pp. 48-49),
V (Y,) = V (co) + V (t cr + (t - s) fo (s) ds) + J (t - s)2 E Go (s)2 ds,

90 / k
V (Ye)=V (cr)+V to(s)dsl+ EGo(s)2ds.

ti So
120 7. Properties of the Solutions of Stochastic Differential Equations
In particular, if IG (t, x', x2)1 >_ a> 0 in [to, Tj X R2, the last two equations imply
V (Y,) >= a2 (t-to)3/3,

V (P,) ? a2 (1 10).
In the case T = oo, the variances of both components must therefore approach
oo. A consequence of this is that neither Y, nor Y, can remain indefinitely in a
bounded set.
7.2 Analytical Properties of the Solutions
Again, we assume that the conditions of the existence-and-uniqueness theorem

(6.2.2) are satisfied. Let us investigate the sample functions of the solution pro-
cess X, of
(7.2.1) X, = c+ 1 I($, X,) ds+ G (s, X,) dW,

J
to to
as Rd-valued functions of t on the interval [to, TI, where T < oo.

We know from the existence-and-uniqueness theorem that almost all sample
functions X. (w) are continuous functions on [to, T j . The results already obtained
regarding other properties of X, allow us to state qualitatively that, as long as G
does not vanish, the properties of W, (unbounded variation, nondifferentiability,
the local log log law [see section 3.1)) carry over to X, . Here, the presence of the
systematic term f (t, X,) dt plays no role since
Xi11= f (s, X,) ds

90
has absolutely continuous (that is, everywhere [A] differentiable and of bounded
variation) and, in the case of continuous f , continuously differentiable sample
functions such that
XCt 1= f (t, X,).

On the other hand, the fluctuational part
X;21= i C (t, X,) dW,

4
reflects the above-mentioned irregularities no matter how smooth the function G
may be as long as it does not vanish (see remark (5.2.8)). Only at those points
such that G (t, x) = 0 can we hope for smooth (for example, differentiable) sample
functions of X, .
7.2 Analytical Properties of the Solutions 121
We now cite a few new results that provide a justification for these qualitative
remarks.
(7.2.2) Theorem (Goldstein [37a), p. 31). If X, is the solution of equation
(7.2.1), to<t1 < ... <t,=T is a partition of the interval [to, TJ, and 6.=max
(tk-tk_1), then
T
stt-lim (Xrk-Xrk-,) (Xrk-Xrk_d'=
J
G (s, X) G (s, X,)' ds ,
k=1 ro
and, in particular,
a T
(7.2.3) st-lim
do-0
E IXrk-X,k_1I2 = 1 IG (s, X,)12 ds.
k=1 10
(7.2.4) Corollary. If, for some pE 11, 2, ... , d}, the inequality
m
I Gpi (t, Xr)I2 > 0
holds at almost all [2] points t E [to, T] with probability 1, then the pth com-
ponent of X, is almost certainly of unbounded variation in every subinterval of
[to, T1.
Proof. The assertions follow immediately from the inequality
IXk-Xk_,I2 < Max IXj -Xi{-1I I IX°k-Xrk-1I ,
Z
k=1 k=1
the continuity of X, and Theorem (7.2.2). 1

With regard to the law of the iterated logarithm forX1, one can find only results
for the autonomous case in McKean [451, p. 96, and Anderson [30], pp. 51-57.
However, Theorem 3.2.1 of Anderson ([301, p. 51) can be applied directly to the
proof of the following assertion:
(7.2.5) Theorem. If X, is the solution of equation (7.2.1) and if/ and G are
continuous with respect to t for t E [to, TI, then, for every fixed t E [to, TI,
Xr+h-Xr
Jim sup = 1(t, Xr),
h-o 2hloglog1/h
with probability 1, where A (t, x) is the greatest eigenvalue of the (necessarily
nonnegative-definite) matrix G (t, x) G (t, x)'.
More precisely, we can show that the cluster points of (X,+h - X,)1(2 h log
log 1/h)112 as h-i.0 are almost certainly all points of that ellipsoid in Rd whose
principal axes have the directions of the eigenvectors and whose lengths are the
roots of the corresponding eigenvalues of G (t, X,) G (t, X,)'. Therefore, this el-
lipsoid depends on t and X, (and hence also on w) (see Arnold [30a] ).
Just as in section 3.1 for W, we can again conclude from the last remark that
X. (w) is nondifferentiable at the instant t provided G (t, X, (w)) * 0.
The possibility of smooth behavior on the part of X, occurs only in the case
G (t, X,) = 0. In this connection, we cite a result of Anderson Q301, p. 59):
(7.2.6) Theorem. If X, is the solution of equation (7.2.1), if f and G are con-
tinuous Ior t E f to, TI, and if the initial value c is almost certainly constant,
then, in the case G (to, c) = 0,
lira
X,-c = f (to, c).
t-to t
with probability 1. Thus, differentiability of the solution of a stochastic differ-
ential equation is the exception and nondifferentiability is the rule. The formal
differential equation
X, = / (t, X,)+G (t, X,)
where , is an m-dimensional white noise, cannot therefore as a rule be inter-
preted as an ordinary differential equation for the functionX,.
7.3 Dependence of the Solutions on Parameters and Initial Values

The value X, (w) of the solution trajectory of the stochastic differential equation
dX,= f(t,X,)dt+G(t,X,)dW X,,=c, to:t<T,
is, in accordance with remark (6.3.10), a function (determined uniquely by / and
G) of the two (independent) random elements c (w) and W, (w)-W,Q (w), for
to:5: 3:_5 t:
X, (w) = g (c (w): W. (w)- W,o (w), to s t)

Just as with ordinary differential equations, we are interested again here in the
manner in which the function g depends on the initial value c and on any para-
meters that may appear in f and G.
In this section, we shall give two theorems, for the proof of which we refer to
Gikhman and Skorokhod [36] or Skorokhod 1471.
(7.3.1) Theorem. Let X, (p) denote the solution of the stochastic differential
equation
dX, = f (p, t, X,) dt+G (p, t, XJdW , X90 = c (p),
in the interval [to, TI, for T <oo. Here, p is a parameter, and the
7.3 Dependence of the Solutions on Parameters and Initial Values 123
functions f (p, t, x) and G (p, t, x) satisfy, for all p, the conditions of the exis-
tence-and-uniqueness theorem. Suppose also that the following conditions are
satisfied:
a) st-lim c (P) = c (po),
P- PO
b) for every N > 0,

lim sup (I f (p, t, x) -f (Po, t, x)I
PPO LEloo, TI, 1-1:N
+IG(P,t,x)-G(Po,t,x)I)=0,
c) there exists a constant K independent of p such that
f (p, t, x)II + I G (p, t, x)12 = K2 (1 + 1x12).

Then,
st-lim sup IX, (p)-X, (po)I = 0.

P-'PO g1;91;9T
(7.3.2) Remark. If the functions/ and G are independent of p, Theorem (7.3.1)

implies the stochastically continuous dependence of the solution of a stochastic
differential equation on the initial value c.
(7.3.3) Examples. a) Suppose that
dX, (e) = e /(I, X, (e)) dt+dW ,
where a is a small parameter. The solution for e = 0 is
X, (0) = c (0)+W,-W,o.
If
st-lim c (E) = c (0)
a- 0
we also have
st-lim sup IX, (e) -X, (0)I = 0.
r-+0 so1t;j T
b) An analogous conclusion may be drawn in the case
dX, (e) = f (t, X, (e)) dt+e dW
where It (0) is the solution of the ordinary differential equation is = f (t, X,)
with a possibly random initial value.
We shall now investigate the special case of a constant initial value though at an
arbitrary instants E Ito, t]. Let X, (s, x) denote the solution of the equation
(7.3.4) X, (s, x) = x + f (u, X (s, x)) du + G (u, X. (s, x)) d W,, ,

where to s:_5 c5 T, which satisfies the initial condition X, (s, x) = x E Rd.

(7.3.5) Definition. A stochastic process X, of the real parameter t is said to be
mean-square-differentiable at the point tl with random variable Y,1 as its deriva-
tive it the second moments of X, and Y,, exist and if
lim E I (X,1+,t-X,r)lh- Y,1I2 = 0.
4-0
This concept of a derivative is used when one investigates the dependence of the
solution X, (s, x) of equation (7.3.4) on the parameter x.
(7.3.6) Theorem. Suppose that the coefficients/ (t, x) and G (t, x) of equation
(7.3.4) are continuous with respect to (t, x) and that they have bounded con-
tinuous first and second partial derivatives with respect to the xi, where x = (x1,
... , xd)'. Then, for fixed t E Is; TI, the solution X, (s, x) is mean-square-continu-
ous with respect to (s, x) and is twice mean-square-differentiable with respect to
the xi. The derivatives
a a2
X, X, (s, x)
axi axi axi
are, as functions of x, mean-square-continuous and they satisfy the stochastic

differential equations that one obtains from (7.3.4) by partial differentiation, so
that, for example, for Y, = aX, (s, x)/axi,
, ,
Y,=ei+f (u,X.(s,z))Y.du+ G,, (u,X.(s,x))Y.dW,,.
Here, e; E Rd is the unit vector in the x; direction, /s is the d X d matrix with

column vectors /s,, and G _ (G,1, ... , Gxd), where the are d x m matrices.
(7.3.7) Remark. For later use, we assert that, if the conditions of Theorem
(7.3.6) are satisfied and if g (x) is a bounded continuous function in Rd with
bounded continuous first and second partial derivatives, then, for fixed t E Ito, TI,
the function
u (s, x) = E g (X, (s, x))
is continuous with respect to (s, x) E Ito, tI x Rd, is bounded, and has continuous
bounded first and second partial derivatives with respect to the xi and a continu-
ous bounded derivative with respect to s.
Chapter 8
Linear Stochastic Differential
Equations
8.1 Introduction
For the differential equations of the type that we are studying in this book,
namely,
X,=1(t,X,) +G(t,X,)t,
where is a Gaussian white noise, the right-hand member represents a linear func-
tion of the disturbance E. On the other hand, the functions/ and G are in general
nonlinear functions of the state X, of the system.
Just as with ordinary differential equations, a much more complete theory can
be developed in the stochastic case when the coefficient functions J (t, x) and
G (t, x) are linear functions of x, especially when G is independent of X.
(8.1.1) Definition. A stochastic differential equation
dX, = f (t, X,) dt+G (t, X,) dW,
for the d-dimensional process X, on the interval [to, T) is said to be linear if the
functions f (t, x) and G (t, x) are linear functions of x E Rd on [to, T j X Rd, in
other words, if
I(t,x)=A (t)x+a(t),
where A (t) is (d x d)-matrix-valued and a (t) is Rd-valued, and if
G (t, x) = (Br (t) x+br (t), ... , B. (t) x+b. (t)),
where Bk (t) is (d x d }matrix-valued and bk (t) is Rd-valued. Thus, a linear dif-
ferential equation has the form
dX, = (A (t) X, +a (t)) dt+ (B; (t) X,+b; (t)) dW,,
where W , _ (W, , ... , WN. It is said to be homogeneous if a (t) = br (1)= ...

bI, (t) __ 0. It is said to be linear in the narrow sense if B, (t) _ ... = B. (t) = 0.
126 8. Linear Stochastic Differential Equations
(8.1.2) Remark. We obtain linear stochastic differential equations if in ordinary

linear differential equations of the form
t. = A (t) X, +a (t)
the coefficients A (t) and a (t) are "noisy" and/or a disturbance term indepen-
dent of the state X, of the system is added to the right-hand member of this
equation. Then, the equation takes the form
(8.1.3) X, = (A (t)+d (t)) X,+(a (t)+a (t)).
Here, d (t) is a d x d matrix and a (t) is a d-dimensional vector whose elements
are (possibly) correlated Gaussian noise processes with time-varying intensity
such that
E di; (t) dkp (s) = C11,kp (t) 6 (t - s),

E Ai1(t) ak (s) = Disk (t) b (t -s)
and
E ai (t) ak (s) = Eik (t) d (t-s).

As usual, we make the substitution
(d (t), a (t)) dt = dY,
(the elements of d (t) and a (t) are assumed written columnwise successively as a
(ds+d)-dimensional column vector), where Y, is an m-dimensional Gaussian pro-
cess with independent increments such that m = d2 + d and E Y, = 0 and with
m x m covariance matrix
min (t, i)
EY,Y.= J
Q(u)du
to
where
(t)
Q (t) = C D (t)
(Y(t) E (t),
By virtue of remark (5.2.4), such a process (we assume integrability of Q (t) over
the interval (to, T) can be represented as
Y,= Q dW,,
to
that is,
dY,= dW
8.I Introduction 127
where W, is now an m-dimensional Wiener process (with independent compo-

nents). Instead of YQ, any m X p matrix G such that G G'= Q can be used. This is
important if Q is a singular matrix because we can then choose p <m . If we de-
compose ! in the form
Y- (Bi (t) ... B (t)l bk E Rd, Bk E Rd',

b1 (t) ... b. (t) /
equation (8.1.3) acquires the desired differential form

M
dX, = (A (t) X, +a (t)) dt + (B, (t) X, +bi (t)) dW,
i-t
where
(Bpd+k),
Bi (t) _ (BkP (t)) _ 1 = k, p < d.
Thus, Bi is the vector Bi arranged as a d x d matrix.
(8.1.4) Example. On the basis of remark (6.1.6), let us rewrite the nth-order
scalar differential equation
Y;n-1)+
Y'(n)+(bt (t)+E, (t)) ... +(bn(t)+En (t)) Y,
+(bn+1(t)+4+1(t)) = 0,
where the Ei (t) are in general correlated Gaussian noise processes with covariance
E Ei (t) Ei (s) = Qii (t) 6 (t -s)

as a first-order differential equation for
Y,
X,
X, d=n,
X; y,n - l)
specifically,
n
+t-k_
(bk (t)+Ek (t)) X, (bn+1 (t)+E,,. (t)),
k-1
and finally, in accordance with (8.1.2), as a linear stochastic differential equation

in X, of the form
dX;=X;+'dt, i=1,...,n-1,
dX,
k..1
- r Gkp (t) X` +1 k+Gn+l,p

(t) dWI ,
p-1 1\k.l
where G is an (n + 1) X (n + 1) matrix such that G G'= Q ; hence, m = n + 1.

The following theorem is an immediate consequence of the existence-and-unique-
ness Theorem (6.2.2).
(8.1.5) Theorem. The linear stochastic differential equation
m
dX, = (A (t) X,+a (t)) dt+ (Bi (t) X,+bi (t)) dW;
has, for every initial value X,0 = c that is independent of W, - W,0 (where t >= to),
a unique continuous solution throughout the interval [to, Ti provided only the
functions A (t), a (t), Bi (t), and bi (t) are measurable and bounded on that inter-
val. If this assumption holds in every subinterval of [to, oo), there exists a unique
global solution (i.e. defined for all t E [to, oo)).
(8.1.6) Corollary. A global solution always exists for the autonomous linear dif-
ferential equation
m
dX,=(AX,+a)dt+ (BiX,+bi)dW;, X,o=c,
i-1
(with coefficients A, a, Bi, and bi independent oft ).
We now wish, if possible, to get a closed and explicit expression for this solution
and to investigate it.
8.2 Linear Equations in the Narrow Sense

In this section, we shall investigate those equations that are obtained from a de-
terministic linear system
I, = A (t) X,+a (t),
(where A (t) is a d X d matrix and X, and a (t) are vectors with components in
Rd) by the addition of a fluctuational term
B (t) E
(where B (t) is a d x m matrix and E, is an m-dimensional white noise) that is in-
dependent of the state of the system; that is, we shall investigate equations of
8.2 Linear Equations in the Narrow Sense 129
the form
(8.2.1) dX, = (A (t) X,+a (t)) dt+B (t) dW,.
Here, we have combined the in vectors b; appearing in definition (8.1.1) into a
single d x in matrix B = (bt, ... , b.). If the functions A (t), a (t), and B (t) are
measurable and bounded on [to, T] (as we shall assume to be the case in what
follows), there exists, by virtue of Theorem (8.1.5), for every initial value X,o=c
a unique solution.
Let us review a few familiar items regarding deterministic linear systems (B (t)
0) (see, for example, Bucy and Joseph [61) , p. 5).
The matrix 0 (t) = 0 (t, to) of solutions of the homogeneous equation
JIC,=A(t)X,
with unit vectors c=e; in thex;-direction as initial value, in other words, the so-
lution of the matrix equation
o (t) = A (t) 0 (t), 0 (to) =1,
is called the fundamental matrix of the system
X, _= A (t) X, +a (t).
The solution with initial value X,o = c can be represented with the aid of 0 (t) in
the following form:
X,=0(t)(c+f 0(s)-I a (s)ds

90
If, for example, A (t) __ A is independent oft, then

0
(t) = CA ('-to) = Z Aft (t - to)-/n!.
n=0
Therefore,
I
X, = e't('-'o) c + i e40-')a (s) ds.
to
With this knowledge, we can now easily determine the solution of the "nonho-
mogeneous" equation (8.2.1):
(8.2.2) Theorem. The linear (in the narrow sense) stochastic differential equa-
tion
dX, = (A (t) X,+a (t)) dt+ B (t) dW X,o = c,

has on [to, T] the solution
9 9
(8.2.3) X,=O(t) (c+ J 0 (s)-1

a (s) ds+ J 0(s)-1 B (s) dW,l.
90 90
Here, 0 (t) is the fundamental matrix of the deterministic equationX, = A (t) X.

Proof. If we set
0 t
Y, = c+ J 0 (s)-1 a (s) ds + J 0 (s)-1 B (s) dW

90 to
Y, has the stochastic differential

dY, = 0 (t)-' (a (t) dt+B (t) dW,).
Then, in accordance with Theorem (5.3.8), the process
X,-0(t)Y,
has the stochastic differential
dX,=0(t)Y,dt+0(t)dY,
= A (t) 0 (t) Y, dt+a (t) dt + B (t) dW,
=(A(t)X,+a(t))dt+B(t)dW,.
In the above representation of the solution, we see clearly that the value X, is a
functional (uniquely determined by the coefficientsA (t), a (t), and B (t)) of c
and W, - W,, for to :-5 s.'-5 t (see remark (6.3.10)).
We mention in particular the following special cases:
(8.2.4) Corollary. If the matrix A (t) _- A in equation (8.2.1) is independent of
t, then
X, = eA('-'o) c + i e'1('-,) (a (s) ds +B (s) dW,) .

90
(8.2.5) Corollary. Ford = 1(but m arbitrary),
0 (t) = exp (J A (s) dsl

90 /
and hence
X,= exp(J A(s) ds)(c+ exp (_cA(U) du)(a (s) ds+B(s) dW,)).
b
In accordance with (7.1.2), the solution X, has moments of second order if

E Ic12 <oo. In our special case, the first two moments of X, can easily be calcu-
lated from the explicit form of the solution:
(8.2.6) Theorem. For the solution X, of the linear stochastic differential equa-
tion
dX, _ (A (t) X,+a (t)) dt+B (t) dWt, X0=c,
we have, under the assumption E fci2 < oo,
a) mt= EX,=0(t)(Ec+f 0(s)-1 a (s)ds

w
Therefore, m, is the solution of the deterministic linear differential equation

ta,=A(t)m,+a(t), m,O=Ec.
b)
K (s, t) = E(X,-EX,) (X,-EX,)'
min (t, t)
(E
(8.2.7)
B (u)' (0 (u)-1)'d U) 0 (t)'.
components
In particular, the covariance matrix of the of X,
K (t) = K (t, t) = E (X, - E X,) (X, - E X,)'
is the unique symmetric nonnegative-definite solution of the matrix equation
(8.2.8) K (t) = A (t) K (t) + K (t) A (t)'+B (t) B (t)'
with the initial value K (to) = E (c - E c) (c- E c)'.
Proof. a) If we take the expectation on both sides of (8.2.3), we get the for-
mula for m, . Differentiation of this expression with respect to t yields
m,=A (t) m,+a(t),
which we also get immediately from the (integral form of the) stochastic differ-
ential equation by taking the expectation.
b) The formula for K (s, t) follows also from (8.2.3) if we keep in mind the in-
dependence of c and
j 0 (s)-' B (s) dW,

to
In particular, if we differentiate K (t, t) with respect tot, we get the differential

equation for K which we could have obtained directly from the stochastic differ-
ential equation. The differential equation for K (t) = K (t)' satisfies the Lipschitz
and boundedness conditions on (to, TI, so that a unique solution exists.
Equation (8.2.8) therefore represents (in view of the symmetry of K) a system of
d (d + 1)/2 linear equations.
(8.2.9) Remark. Of particular interest is the behavior of
d
EIX,-EX,I2=EIX,12-Im,I2=trK(t)= Ki;(t).
i=I
By using the relationship tr A A' = IA I2 , we obtain from the formula for K (s, t)
i
trK(t)=EIt(t)(c-Ec)I2+J 10(t)0 (s)-' B(s)I2ds.
to
In formula (8.2.3), the solution X, is represented as the sum of three statisti-

cally independent terms, the second of which is completely independent of w
and the third is, in accordance with Corollary (4.5.6) normally distributed. There-
fore, the process Y, in
X,=(P (t)c+Y,
is always a Gaussian process independent of 0 (t) c with independent increments
and with distribution
R (t) 0 (s) r a (s) ds, J0(t)0(s)'1B(s)B(s)'(0(s)-r)'0(t)'ds).

`0 b 90
The process X, is itself Gaussian if and only if the initial value c is normally dis-
tributed (or constant). We write this important special case as
(8.2.10) Theorem. The solution (8.2.3) of the linear equation
dX, = (A (t) X,+a (t)) dt + B (t) dW X,,=c,
is a Gaussian stochastic process Xt if and only if c is normally distributed or con-
stant. The mean value m, and the covariance matrix E (X, - m,) (X, - m,)' are
given in Theorem (8.2.6). The process X, has independent increments if and only
if c is constant or A (t) _- 0 (that is, 0 (t) _- 1) -
Now that we know the process is Gaussian in the case of normally distributed c,
the question arises as to when it is stationary. A necessary and sufficient condi-
tion for this is
m, = const,
K (s, t) = K (s -t).
These conditions are certainly satisfied if

Ec=0,
a (t)-0
(in this case, m, - 0) and
A(t)A
B(t)B
(that is, the original equation is autonomous and the solution X, exists on
[to, oo)). Furthermore, by virtue of (8.2.7),
(8.2.11) AK(0)+K(0)A'=-BB'
and
K (0) = E cc'.
The matrix equation (8.2.11) has a nonnegative-definite solution K (0), namely,
00
K(0)= eA1BB'e't''dt,
0
if the deterministic equation X, = A X, is asymptotically stable (that is, if all

the eigenvalues of A have negative real parts [see example (11.1.4) and Bucy and
Joseph [61) , p. 91 ). Furthermore, from formula (8.2.7) we get fort = s
r
e-A(S-to) K (0) e-A'(1-1o) = K(O) + f e- A(s -go) BB' e-A'(+-to) ds.
90
Therefore, for general t, s s to,

(e'r ('-0) K (0), s = t,
K (s -t) = K (s, t) = j K(O) eA'(*-,),
s t.
We write this result as

(8.2.12) Theorem. The solution of the equation
dX, = (A (t) X,+a (t)) dt+B (t) dW,, X,,=c,
is a stationary Gaussian process if A (t) __ A, a (t) __ 0, B (t) __ B, the eigenvalues
of A have negative real parts, and c is 9 (0, K) -distributed, where K is the solu-
tion
0
K= J
eA*BB'eA''dt
0
of the equation A K + K A'= - B B'. Then, for the process X,

EX,=_0
and
(eA(,-t)K, s>t>to,
KeA'(,-t), t=s>to.
Obviously, under the above conditions, the process X, is stationary in the wide
sense with the above first and second moments even when c is not normally dis-
tributed but E c = 0 and E c c'= K.
8.3 The Ornstein-Uhlenbeck-Process

We shall now investigate the historically oldest example of a stochastic differ-
ential equation. For the Brownian motion of a particle under the influence of
friction but no other force field, the so-called Langevin equation
X,=-aX,+a$
where a > 0 and a are constants, has been derived in many ways (see, for exam-
ple, Uhlenbeck and Ornstein [491 or Wang and Uhlenbeck [501). Here, X, is one
of the three scalar velocity components of the particle and , is a scalar white
noise. The corresponding stochastic differential equation (here, d = m = I and we
set to=0)
dX,=-aX,dt+adW X0=c,
is linear in the narrow sense and autonomous. Therefore, in accordance with
Corollary (8.2.4), its unique solution is
t
X, = e-61 c+a e-a(t-s) dW,.

0
In accordance with (8.2.6), X, has, in the case E c2 < co, mean value
m,=EX,=a-°tEc
and covariance
K (s,t) = E (X,-m,) (X,-m,) = e(Var (c)
+0 2 (e2a min (t, s)-1)/2 a).
In particular,
K (t, t) = Var (X,) = e-2at Var (c)+o2 (I - e-2at)/2 a.
For arbitrary c,
8.3 The Ornstein-Uhlenbeck-Process 135
ac -lira e-a' c = 0,
4 -* 00
so that the distribution of X, approaches 91(0, a2/2 a) for arbitrary c as t --+oo.

For normally distributed or constant c, the solution X, is a Gaussian process, the
so-called Ornstein-Uhlenbeck velocity process. If we begin with an 92 (0, a2/2 a)-
distributed c, then X, is a stationary Gaussian process (sometimes called a colored
noise) such that E X, - 0 and
EX, X, = e-a1'-'1 a2/2 a,
which we can also get from Theorem (8.2.12).
By integration of the velocity X we obtain the position
e
Y,=Y0+f X,ds
0
of the particle. If c and Y0 are normally distributed or constant, Y, is, with X a

Gaussian process, the so-called Ornstein-Uhlenbeck (position) process. Of course,
we can treat X, and Y, simultaneously by combining their equations into the
single equation
d 1" (Y,) dt+ ()dW (Y0°) _ (YOc ).

\Y,/ - \ 0/
We have
EY, = EYo+(1-e-a')Ec/a
and
E(Y,-E Y,) (Y,-E Y,)= Var(Y0)+2Dmin(s,t)

D a' a'- 0It-SI- a(l+s)
of
where, as is customary, we have set

D = a2/2 a2.
If we now let a approach oo in such a way that D remains constant, we obtain
EY, --iEYo,
E (Y,-E Y,) (Y,-EY,) - Var (Yo)+2 D min (s, t),
that is, all finite-dimensional distributions of the Ornstein-Uhlenbeck process Y,
converge to the distributions of the Gaussian process
Ye°) = Yo+ F 2_D W,.
But this is the Wiener process that starts at Yo, multiplied by f2-D. In this sense,
the Wiener process approximates the Ornstein-Uhlenbeck process. The sample
functions of Y, possess a derivative, namely, X, . In the Ornstein-Uhlenbeck theory
of Brownian motion, the particle therefore possesses a continuous velocity (but no
acceleration), which ceases to exist when we shift to
(8.3.1) Remark. An analogous electrical problem leads formally to the same
Langevin equation. Let X, denote the current in an inductance-resistance circuit.
Then,
Ll,+RX,=ai,,
where E, is a rapidly fluctuating electromagnetic force generated by the thermal
noise and again idealizable as a "white noise".
(8.3.2) Remark. The Langevin equation for the position X, of a Brownian parti-
cle in an external force field is
X,+S
By setting V, =,k, we obtain from this equation the system
d(v`)
\X, \ Y, II
0//
We can find a closed solution in the case of a harmonic oscillator, for which
K (t, x) = -V2 x, so that the corresponding equation is linear (see Chandrasekhar
[32) , pp. 27-30).
8.4 The General Scalar Linear Equation

As preparation for the study of the vector-valued case, let us investigate first the
case d =1 (with m arbitrary) of the equation
(8.4.1) dX, = (A (t) X,+a (t)) dt+ (Bi (t) X,+bi (t)) dW X,o = c.
i=t
All the quantities in this equation (except W, E R'°) are scalar functions. Suppose
that the coefficients A, a, Bi, and bi are measurable and bounded on the interval
Ito, Tj, so that there always exists a unique solutionX,, which we shall now de-
termine explicitly.
(8.4.,) Theorem. Equation (8.4.1) has the solution
e
X,=0,(c+j 0, t (a 0,
i=1 i_ito ///
8.4 The Genera! Scalar Linear Equation 137
where
,
0, = exp (J (A(s)_i Bi (s)2/2) ds + .r) Bi (s) d W,)

,o i=1 i=1 l0
is the solution of the homogeneous equation
ddy,=A(t)0,dt+ B; (t) P,dW,

i=I
with initial value 0,0 = 1.
Proof. Let us use It6's theorem to show that this process has the stochastic dif-
ferential (8.4.1). If we now set O, = exp Y, and
Z, = c+ e-Y' (a (s) - B; (s) b; (s)) ds+ e-Y'b; (s) dW'

` i=1 i=1 o
we get
X, = u (Yo , Z)
where u is defined by
u (x, y) = ell y.
Application of formula (5.3.9b) yields
tr(`Y` 00,09-1 Big B,-b; )dt

dX,=X,dY,+e1'dZ,+1
2;=1 o, / ;bi02b?
= X, (A (t)- Bi (3)2/2) dt+X, Bi (t) dW;)

i=1 / a 1
+(a (t)- B; (t) bi (t)) dt+ b; (t) dW;

i=1 i=1
M
+ (X, B; (t)2/2 + Bi (t) bi (t)) dt

i=1
_ (A (t) X,+a (t)) dt+ (Bi (t) X,+bi (t)) dW,.

i=1
Example (6.3.11) is a special case of this result. If c has a normal distribution, X,

is a Gaussian process only when B1 (t) _ ... = BN (t) as 0 (i.e. the equation is lin-
ear in the narrow sense). Then, Theorem (8.4.2) reduces to Corollary (8.2.5).
We single out two special cases:
138 & Linear Stochastic Differential Equations
(8.4.3) Corollary. Suppose that d = 1. Then, the solution

a) of the homogeneous equation
dX,=A(t)X,dt+>n' B;(t)X,dW;, X,o=c,

;=t
is
X, = c exp (A (s)- B; (s)2/2) ds+ B; (s) dW;l

(ti eo /
b) of the homogeneous autonomous equation (A (t) = A, Bi (t) = B;)
dX,=AX,dt+> BiX,dW', X,o=c,

i=1
is
X, = c exp A - i BZ/2 (t - to) + B (W9'; - W,o)l
We note that, by virtue of the law of large numbers for W, we have in the last
case for arbitrary c
ac-lim X, = 0
provided
AB?/2
c-t
In general, in the homogeneous case, X, has, for all t E [to, TI, the same sign as c
Let us now calculate, the moments of X, X. For this we use
(8.4.4) Lemma. If Xis%(a, a2)-distributed, then, for every p>0,
E (e')p = epa+pta=l2.
Proof.
1 °r°
exp(px-(x-a)2/2a2)dx=exp(pa+p2a2/2).
2na 2 J
_o,
(8.4.5) Theorem. The solution X, of the scalar linear stochastic differential

equation (8.4.1) has, for all t E (to, TI, apth-order moment if and only if
E IcIp <oo. In particular,
8.4 The General Scalar Linear Equation 139
a)
m,=EX,=9c,(Ec+92-1a(s)ds)
to
where
r
92,=exp(1A(s)ds)
that is, m, is the solution of the ordinary differential equation

tie, = A (t) m, +a (t), m,o = Ec.
b) P (t) = E X' is the unique solution of the ordinary linear differential equation
P (t) 2 A (t) + B; (ty P (t) + 2 m, (a (t) + B; (t) b; (t

(8.4.6) 1 i=1 // ` =1
+ b; (t)2
i=1
with initial condition

P(to)=Ec2,
where m, = E X.
c) In the homogeneous case a (t)-=O, b; (t)=-O (for i=1, ... , m), for thepth
absolute moment of the solution
r
X, = c exp (A(s)_ B; (s)2/2l ds+ B; (s) dW

l i=1 // i-1 y //
we have
E jX,jP = E jcjP exp (p J (A (s) - B; (s)2/2 I ds+ 22 B; ($)2

to /
(8.4.7)
Proof. The homogeneous equation has the solution
X,=cog,
where c and 0, are statistically independent and 0, has moments of every order.
Therefore,
E IX,IP = E IcIP E OP
is finite if and only if E IcIP is finite. In the nonhomogeneous case, we add to the
solution of the homogeneous equation only terms with finite moments of every
order, so that it is a matter only of E ICIP. We obtain the form of m, immediately
from (8.4.2) by using Lemma (8.4.4) with p = 1, and we obtain the differential
equation for m, either by differentiating this result or directly from the integral
form of equation (8.4.1).
From example (5.4.9), we have
dX, = 2 X, (A (t) X, +a (t)) dt+2 X, (Bi (t) X,+bi (t))dW;

M
+ (Bi (t) X,+bi (t))2 dt.
If we take the expectation on both sides of the integral form of this equation, we
get
P(t)=Ec2+` (2A(s)P(s)+ Bi(s)P(s))ds

=t /
+`j 2 m, (a (s) + Bi (s) bi (s)) ds+ bi (s)2 ds,

i=t
and, after differentiation, equation (8.4.6).

Finally, (8.4.7) is a consequence of the independence of c and 0, and of Lemma
(8.4.4).1
(8.4.8) Example. For the homogeneous autonomous equation
ft
dX, = A X, dt + > Bi X, dW;, X,o = c,
i=t
we have
EJXJP=EIcIPexp(p A- B?/2) (t-to)+p B? (t - to) .

2 i=1
Therefore,
0
lim E1XJP= EIcIP
t -t 00
1+00
if and only if
8.5 The General Vector Linear Equation 141
A>(1-P) > B?/2.
8.5 The General Vector Linear Equation

We now return to the general linear stochastic differential equation for an Rd-
valued process X,:
M
(8.5.1) dX, = (A (t) X, +a (t)) dt + (Bi (t) X, +bi (t)) dW; ,
where A (t) and Bi (t) are d X d matrices, a (t) and b; (t) are Rd-valued functions,
and W , = (We', ... , Wi")' is an m-dimensional Wiener process, By Theorem (8.1.5),
there exists, for every initial value c that is independent of W, - W,0 for t E [to,
TI, a unique solution of (8.5.1) on the interval [to, T1, provided the coefficients
A, a, Bi, and b; are measurable bounded functions on that interval, as we shall
always assume.
We now model the general solution of (8.5.1) after the scalar case d = 1, so that
it will include the case treated in section 8.2 as a special case.
(8.5.2) Theorem. The linear stochastic differential equation (8.5.1) with initial
value X,0 = c has on Ito, TI the solution
(8.5.3) X,=0,c+ 0,-' dY,l.

90
Here,
dY, = a (t) - Bi (t) bi (t dt + bi (t) dWJ ,
and the d x d matrix 0 is the fundamental matrix of the corresponding homo-

geneous equation, that is, the solution of the homogeneous stochastic differen-
tial equation
M
d0, = A (t) 0, dt + Bi (t)., dMt
with initial value

$,0=1.
Proof. Since a unique solution is known to exist, it will be sufficient to verify
directly that the given X, satisfies equation (8.5.1). To do this, we set
Zt=c+i Os' dY
to
dZ, = OS-' dYt,
and use Ito's theorem to calculate the stochastic differential of

Xt=0,Z,.
Application of example (5.4.1) to each component of X, yields
dX, = 0, dZ,+(d4Pt) Z,+(` B; (t) 0t 0'-' bi (t)l dt
n, m
=dY,+A(t)X,dt+ B, (t) X,dW;+ B; (t) b; (t) dt
m
_ (a (t) + A (t) X,) dt + (B, (t) X, +b; (t)) dW; .
The initial value is

X,O=dtibZ,O=lc=c.
(8.5.4) Remark. In complete analogy with ordinary differential equations, the
general solution (8.5.3) can be represented as a sum as follows:
X,_0,c+O, ds;'H,.
Here, the first term on the right is the general solution of the corresponding homo-
geneous equation (which here in contrast with Theorem (8.2.2) is in general also a
stochastic process), and the second term is the particular solution of the nonho-
mogeneous equation corresponding to the initial value X,, = 0. For d = 1, we have
0, given explicitly in Theorem (8.4.2). For Bi (t) - 0, Theorem (8.5.2) reduces
to Theorem (8.2.2).
Let us look again at the first-order ordinary differential equations that the first
two moments of the solution must satisfy.
(8.5.5) Theorem. For the solution (8.5.3) of the linear stochastic differential
equation (8.5.1), we have under the assumption E Ic12 <00,
a) E X, = m, is the unique solution of the equation
m, = A (t) m,+a (t), m,o = E c.
b) E X, Xj'=P (t) is the unique nonnegative-de finite symmetric solution of the
equation
8.5 The General Vector Linear Equation 143
P (t) = A (t) P (t) + P (t) A (t)'+a (t) m; +m, a (t)'

(8.5.6) m
+ I (B; (t) P (t) B; (t)'+ B; (t) m, b; (t)'+ b; (t) mi B; (t)'+ b; (t) b; (t)')
i-I
with initial value

P (to) = E c c'.
Proof. Part a) follows when we take the expectation on both sides of the integral
form of (8.5.1). Part b) follows in the same way from
dX, X; = X, dX;+(dX,) X;+ (B; (1) X,+b; (t)) (X; B; (t)'+ b; (t)') dt
_ (X, X; A (t) + X, a (t)'+ A (t) X, X; + a (t) X,'

M
+ (B; (t) X, X' B. (t)'+ B; (t) X, b; (t)'
+b; (t) X; B; (t)'+ b; (t) b; (t)')) dt

M
+ (X, X; B;(t)'+X,b;(t)'+B;(t)X,X,+b;(t)X,)dW;.
This formula is obtained from lto's theorem. Both equations have unique solutions
on the interval [to, T) since the right-hand members satisfy the boundedness and
Lipschitz conditions. Since
P (t) = P (t)',
(8.5.6) represents a system of d (d + 1)/2 linear equations. The solution P (t), be-
ing the covariance matrix of X, is of course nonnegative-definite. 1
(8.5.7) Remark. The function m, =-E X, is independent of the fluctuational
part (that is, independent of the B; and the b;) of equation (8.5.1).
(8.5.8) Remark. We have
d
E[X,12=trP(t)= P;;(t)
;=t
However, the differential equation for E IX,12, which follows from (8.5.6), con-
tains in general in its right-hand member all other elements of the matrix P (t).
(8.5.9) Remark. Even if the homogeneous equation
dX, = A (t) X, dt + Bi (t) X, dW;
corresponding to (8.5.1) is autonomous, that is, if A (t)=_ A and B; (t) B; are

independent of t, the corresponding fundamental solution 0, nonetheless can-
not in general be given explicitly. Only when, for example, the matrices A, B1,
B,, commute, that is, when
ABi=BiA, B;Bi=BiBi, alli,j,
does the fundamental matrix assume the form
0,=exp((A-i BZ/2) (t-to)+21 B.(Wi-Wto)l
To show this, we set
B,/2)dt+ BidW;, Y,a=O,
and calculate the stochastic differential of

0,=exp(Ye)
By virtue of the commutativity of the participating matrices, we have
do, = exp (Y,) dY,+Z exp (Y,) (dYe)2
=0,dY,+141, B2)dt
2 i- /
M
=A4f,dt+Bi0,dW
that is, 0, satisfies the homogeneous equation.
Chapter 9
The Solutions of Stochastic Differential
Equations as Markov and
Diffusion Processes
9.1 Introduction
In the preceding three chapters, we have constructed and examined in a sort of
"stochastic analysis" the solutions of stochastic differential equations. This puts
us in a position to calculate explicitly, for given sample functions of the initial value
and of the Wiener process, an arbitrarily accurate solution trajectory, for exam-
ple, with the aid of the iteration procedure used in the proof of Theorem (6.2.2).
On the other hand, the solution X, is a stochastic process on the interval Jto, T]
and, as such, it can be regarded as a set of compatible finite-dimensional distri-
butions
P(X,1EB1,...,X,nEB.]=P,t....... (B1,...,B.).
In accordance with Theorem (2.2.5), for the important class of Markov processes
all these distributions can be obtained from the initial probability
P(X,,EBJ=P4 (B)
and the transition probability
P(X,EBJX,=x)=P(s,x,t,B), to:5s<t:5T.
Specifically,
r f
P(X,1EB1i...,X,.EBn]= f f ... f P(tn_l,xn-1, tn,B.).
Rd B1 tin-1
.P (tn-2, to-2, P (tt, x1, t2, dx2) P (to, xo, il, dxt) P,e (dxo),
to < t l < ... < to < T , Bi E td .
Stochastic differential equations owe their significance and expanding study not
146 9. The Solutions of Stochastic Differential Equations
least to the fact that, as we shall show, their solutions are Markov processes. There-
fore, we have for them the powerful analytical tools developed for Markov pro-
cesses at our disposal. The keystone of the Markov property of the solution pro-
cesses is the fact that a white noise , in the form
%, = f (t, X,)+G (t, X,) ,
of the stochastic differential equation
dX, = f (t, X,) dt+G (t, X,) dW,
is a process with independent values at every point.
Furthermore, in many cases X, is in fact a diffusion process whose drift vector and
diffusion matrix can be read in the simplest conceivable manner from the equation
(see section 9.3).
9.2 The Solutions as Markov Processes

Consider the stochastic differential equation
(9.2.1) dX, = f (t, X,) dt+G (t, X,) dW,, X,,=c,
on the interval [to, TI. Here,X, and f assume values in Rd, G is (d X m matrix)-
valued, and W, is an RI-valued Wiener process. The initial value c is an arbitrary
random variable independent of W, - W,0 fort ? to.
Together with (9.2.1), let us consider the same equation, but now on the interval
Is, TI, for to = s'_5 T and with the fixed initial value X. = x E Rd, hence the
equivalent integral equation
(9.2.2) X,=x+ f to_<s<t<T.
(9.2.3) Theorem. If equation (9.2.1) satisfies the conditions of the existence-

and-uniqueness theorem (6.2.2), the solution X, of the equation for arbitrary
initial values is aMarkou process on the interval (to, TI whose initial probability
distribution at the instant to is the distribution of c and whose transition prob-
abilities are given by
P(s,x,t,B)=P(X,EBJX,=x)=P(X,(s,x)EBI
where X, (s, x) is the (existent and unique) solution of equation (9.2.2).
Profit Let (Q, 21, P) denote the basic probability space on which c and IV,, for
t ; 0, are defined. As usual, let a, C 21 denote the sigma-algebra generated by c
and W, fors _<_ t, which is independent of the sigma-algebra 28,' generated by
W, -W, for s az t. We need also to consider the sigma-algebras
9.2 The Solutions as Markov Processes 147
which contain the "history" of the process X, up to the instant t. We need to

prove the Markov property for X, (see definition (2.1.1)): For to < s < t:5 T and
all B E !8d, we have
(9.2.4) P (X, E B(2(,) = P (X, E BIX,),
almost certainly [P]. Now, X, is 5,-measurable (that is, nonanticipating), so that
21, C 5, -
The validity of (9.2.4) therefore follows from the stronger equation
(9.2.5) P(X,EBI j5,)= P(X,EB(X,)
by virtue of (1.7.1).
Furthermore, instead of (9.2.5), it will be sufficient to prove the following: For
every scalar bounded measurable function h (x, w) defined on Rd X Q for which
h (x, ) is, for every fixed x, a random variable independent of Ru, , we have
(9.2.6) E (h (X w) (`f. ,) = E (h (X w) (X,) = H (X,)
with H (x) = E h (x, w). This is true because, if we choose
h (x, w) = IB (X, (s, x, w)),

where X, (s, x, w) = X, (s, x) is the solution of equation (9.2.2) and IB is the in-
dicator function of the set B, then h (x, ) is independent of ` , since, by virtue
of the constant initial point at the instant s, the function X, (s, x) is 2B; -mea-
surable. By virtue of remark (6.1.7), we have
X, = X, (to, c) = X, (s, X. (to, c)) = X, (s, X,)

so that
Therefore, in this case, equation (9.2.6) yields

P(X,EB(5,)=P(X,EB]X,)= P(X,(s,x)EB]s=x,,
from which, by virtue of (9.2.5), not only the Markov property (9.2.4) but also
the asserted form
P(s,x,t,B)=P(X,(s,x)EB]
of the transition probabilities follows.
Thus, it remains to prove (9.2.6). We shall do this for the set of functions of the
form
h (x, w) _ Y; (x) Z; (w), Z; independent of

t-t
which is dense in the set of bounded measurable functions h under consideration.

For functions of this form, we have
E (h (X Y, (X,) E (Zj) = H (X,)
with H (x) = E h (x, w). Since X, is of course 21(X,)-measurable and Z; is inde-

pendent of X,, we have in addition
n
Y, (X,) E (Z1) = E (h (X,, w)IX,),

c=i
which proves (9.2.6).

Since X,0 = c, we have, for the initial probability of X, ,
P,a(B)=P[X,0EB]=PIcEB]. 1
(9.2.7) Remark, Whereas the initial probability of the process X, is identical to
the distribution of c, the transition probabilities do not depend on c but are com-
pletely determined by the coefficients f and G, which can be read from equation
(9.2.1), and hence by the "system". Furthermore, the transition probabilities of
X, acquire, by virtue of the formula
P(X,EBIX,=x)=P[X,(s,x)EB]
not only in a heuristic but also in a strictly mathematical way, the following sig-
nificance: The (conditional) probability of the event [X, E B) under the condi-
tionX, = z is equal to the absolute probability of the event [X, (s, x) E B[, where
the process X, (s, x) begins at the instant s with probability I at x.
Theorem (9.2.3) can be successfully used for practical calculation of the transi-
tion probabilities only when we know explicitly the solution X, (s, x) of equa-
tion (9.2.2) or at least its distribution. Better methods for calculating the func-
tion P (s, x, t, B) or the density p (s, x, t, y) will be presented in section 9.4.
In accordance with definition (2.2.9), we call a Markov process homogeneous if
its transition probabilities are stationary, that is, if the condition
P(s+u,x,t+u.B)=P(s,x,t,B), O1-;u1i;T-t,
is satisfied identically. In this case, the function P (s, x, t,, B) = P (to, x, to+
(t-s), B) =P (t - s, x, B) is therefore a function only of x E Rd, t-s E 10,
T-toj,andBEV.
(9.2.8) Theorem. Suppose that the conditions of the existence-and-uniqueness
theorem ('6.2.2) are satisfied for equation (9.2.1). the c -
f (x) and G (t, x) = G (x) are independent oft on the interval [to, TI, then the
solution X, is, for arbitrary initial values c, a homogeneous Markov process with
the (stationary) transition probabilities

P(X,EBIX,o=x)=P(t-to,x,B)=P[X,(to,x)EB],
where X, (to, x) is the solution of equation (9.2.1) with initial value X,o=x. In
particular, the solution of an autonomous equation
d X, = f (X,) dt +G (X,) dW, t ? to ,
is a homogeneous Markov process defined for all t? to.

The assertions of this theorem are intuitively so clear that we refrain from giving
a proof.
(9.2.12) Example. In accordance with Theorem (8.2.2), the linear (in the nar-
row sense) stochastic differential equation
dX;_(A(t)X,+a(t))dt+B(t)dW X,o=c, to<t<T,
has, corresponding to the initial value x at the instants, the solution
X, (s, x)= 0 (t, s) (x+ 10 (u, s)-1

a (u) du+ JO (u, s)-1B(u)dW
where 0 (t, s) is the solution of the homogeneous matrix equation
4s (t,
From Theorems (8.2.6) and (8.2.10), the transition probability P (s, x, t, ) of

X, is a d-dimensional normal distribution
P (s, x, t, ) = 92 (m, (s, x), K, (s, x)),
with expectation vector
J
yP(s,x,t,dy)=EX,(s,x)=mt(s,x)
Rd
=0(t,s)(x+ 0(u,s)-l a (u) du)
and d x d covariance matrix
I (y - m, (s, x)) (y - m, (s, x))' P (s, x, t, d y) _

Rd
= E (X, (s, x) - m, (s, x)) (X, (s, x) -m, (s, x))'
= K, (s, x)
,
_ 0 (t, s) J 0 (u, s) B (u) B (u)' ('P (u, s)-') du 0 (t, s)'.
,
In accordance with Theorem (8.2.10), for Gaussian or constant c the process X,

is itself a Gaussian process, frequently known as the Gauss-Markov process.
In the autonomous case A (t) - A, a (t) - a, B (t) - B, the transition probability
of the now homogeneous Markov process X, is, by virtue of the relationship
0 (t, s) = eA(t-,)
specialized to
P (s, x, s + t, ) = P (t, x, ) _ t (m,+, (s, x), K,+, (s, x))
with the functions
m,+, (s, x) = eAt (x+( i CAM du) a)
and
K s, x = eA('--) B B' eA'(t-a)dr

0
depending only on t and x. In the case in which A is nonsingular and A and B

commute (that is, A B = B A), these expressions are simplified yet further:
m,+, (s, x) = eA' (x+A-1 a)-A-1
a
and
K,+, (s, x) = B (A+A')-t (e(A+A')t-1) B'.

(9.2.13) !Example. In accordance with Corollary (8.4.3), the scalar linear ho-
mogeneous stochastic differential equation (d = 1, arbitrary m)
dX,=A(t)X,dt+I B; (t) X,dW Xb=c, t0<-t<T,

-I
with initial value x at the instant s has the solution
X, (s, x) = x exp (A (u) - Bi (u)2/2) du+ B; (u) dW;,).
Since X, (s, 0) = 0, we have P (s, 0, t, ) = bo and, since X, (s, - x) X, (s, x),

we have P (s, - x, t, B) = P (s, x, t,- B). Therefore, we can confine ourselves to
positive x. Then, X, (s, x) is also positive on Is, T) and we have, for y > 0,
P(s,x,t,(O,Y))=PIX,(s,x)=YJ
=P( log
x
1y/-f (A(u)-5 Bi(u)2/2)du]
iI J
z
=( 2na)-'
J e-x=12a=du
- 00
where
z =lo x- Au- B, u 2 du
and
a2 = J
Bi (u)2 du.
In the case of an autonomous equation (A (t) - A, Bi (t) - Bi) , the transition

probability is stationary with the parameters.
z = log (Y/x) - (A
1
- i-t B?/2) (t - s)
and
02 =( B? (t-s)
The moments of P (s, x, t, ) can be obtained from Theorem (8.4.5c).

(9.2.14) Remark. When is the solution X, of (9.2.1) a stationary Markov pro-
cess? In accordance with remark (2.2.11),X, must always be homogeneous,
which is the case for the solution of an autonomous equation. For the existence
of a stationary distribution P, that is, a distribution with the property
P (B) = f P (t, x, B) dP (x), B E Vd, t 0 ,

Rd
there exist analytical conditions (see Prohorov-Razanov [15] , pp. 272-274,

Khas'minskiy 1651, p. 119ff, and Ito and Nisio [43] ). In many cases, the density
p (x) of the stationary distribution can be obtained from the stationary form of
the forward equation (see section 9.4)
- ;I ;
axa32
(ti (z) P (x)) ((G (x) G (x)')y P (x)) = 0

,Z ax; 2 ax;
See also Theorem (8.2.12).
9.3 The Solutions as Diffusion Processes

According to definition (2.5.1), diffusion processes are Markov processes with
continuous sample functions whose transition probabilities P (s, x, t, B) have
certain infinitesimal properties as
The solutions of stochastic differential equations are Markov processes with
continuous sample functions. When are these solutions diffusion processes? We would
suppose that the coefficients/ and G determining the transition probabilities
need to satisfy additional conditions. Then, how are the drift and diffusion co-
efficients determined from the equation?
For simple cases, we can verify directly that the solution is a diffusion process.
For example, if / (t, x) _ to and G (t, x) - Go, then the solution of
dX,=todt+GodW,, Xt,=c,
is the process
X, = c+/o (t-to)+Go (W,-W,o),
for whose transition probabilities, we have, in accordance with example (9.2.12)
P(s, x, t, ) _ 9 (x+/o(t-s), Go Go (t-s)).

It follows that
E,,= (Xt-x) = to (t-s),

E,,: (X, - x) (X, - x)I = Go Go (t -s) + to to (t - $)2,
where we have used the notation introduced in section 2.5:
E,,:B(X,)= J B(y)P(s,x,t,dy)
Rd
In accordance with remark (2.5.2), X, is a d-dimensional diffusion process with

drift vector to and diffusion matrix Go C.
More generally, we have
(9.3.1) Ttlsorem. The conditions of the existence-and-uniqueness theorem
(6.2.2) are satisfied for the stochastic differential equation
dXt=t(t,Xt)dt+G(t,Xt)dW X,o=c, to;:;-t9T,
9.3 The Solutions as Diffusion Processes 153
where X, and f (t, x) belong to Rd, W, belongs toRm, and G (t, x) is a d X m

matrix. If in addition, the functions f and G are continuous with respect to t,
the solution X, is a d-dimensional diffusion process on [to, T I with drift vector
f (t, x) and diffusion matrix
B (t, x) = G (t, x) G (t, x)'.
The limit relationships in definition (2.5.1) hold uniformly ins E [to, T), where
T < oo. In particular, the solution of an autonomous stochastic differential equa-
tion is always a homogeneous diffusion process on Ito, oo).
Proof. From remark (7.1.5), we have
E,,.IXr-X,J4=EIXS(s,x)-xl4 <C1(t-s)2, toas;gt:5T,
so that
+m (t-s)-t
f ly-xJ4 P (s, x, t, dy) = 0.
Rd
To prove that X, is a diffusion process with given drift and diffusion coefficients,
it is therefore sufficient, by virtue of remark (2.5.2), to show that
(9.3.2) E (X, (s, x) - x) = / (s, x) (t - s) + o (t - s)
and
(9.3.3) E (X, (s, x) - x) (X, (s, x) - x)' = C (s, x) G (s, x)' (t -s) + o (t - s) .
In particular, the asserted uniformity is obtained from the fact that o (I - s) is

independent of s.
We begin with equation (9.3.2) and write the left-hand side in the following
form:
r
EX, (s, x)-x=j Ef
(u, x) f (u, (u, x)) du.
Using the Schwarz inequality, the Lipschitz condition, and inequality (7.1.4), we
obtain for c = x and n = 1
Ii E (/ (u, X. (s, x))-/ (u, x)) du1 = E If (u, X. (s, x))- f (u, x)l du
(t-s)tlz ( Ell (u, X. (s, x))-l (u, x))2 du) 112

((t-s)tl2) ( E IX. (s, x)-xI2 du 112
=(t-s)312 p(1).
The continuity off ( , x) now implies that
if (u,x)du= /(s, x)(t-s)+i (f (u, x)-f (s, x)) du
= J (s' x) (t-s)+o (t-s),

so that
EX,(s,x)-x = f (s,x)(t-s)+o (t-s),
which is equation (9.3.2).
Equation (9.3.3) is proven in a completely analogous manner. I
(9.3.4) Example. The (existing and unique) solution of the linear stochastic dif-
ferential equation
dX, = (a (t) +A (t) X1) dt +) (Bi (t) X, +bi (t)) dW; ,

i=t
for to < t T, is certainly a diffusion process if the functions a (t), A (t), Bi (t),
and bi (t) are continuous on the interval (to, T]. In particular, the solution of the
autonomous linear equation is always a homogeneous diffusion process (see ex-
ample (9.2.12)). The drift vector of X, is
f(t,x)=a(t)+A(t)x
and the diffusion matrix is
B (t, x) _ (Bi (t) x+bi (t)) (x Bi (t)'+bi (t)')

i=t
n
_ (Bixx'B;+Bixb;+bix'B;+bib;).
i=t
(9.3.5) Remark. We shall now discuss the opposite question as to when a given
diffusion process is the solution of a stochastic differential equation. In other
words, if X, is a d-dimensional diffusion process on the interval (to, T ], do there
exist a Wiener process W, such that X,. and W, -W,o are statistically indepen-
dent and functions / and G such that the sample functions of X, can be obtained by
means of the equation
dX, = f (t, X,) dt+G (t, X,) dW,

9.3 The Solutions as Diffusion Processes 155
from the sample functions of W,? After all, in accordance with Remark (6.1.4), this
equation represents a transformation that maps W. (w) and X,0 (w) into X. (w).
Sufficient conditions for this can be found, for example, in Prohorov and Ro-
zanov [15] , pp. 261-262 or Gikhman and Skorokhod [361, p. 70.
If, for a given diffusion process X, with drift vector/ (t, x) and diffusion matrix
B (t, x), we wish to find a stochastic differential equation whose solution coin-
cides with X, only in the initial distribution P,0 and the transition probabilities
P (s, x, t, B) (and hence in all finite-dimensional distributions), in other words,
if we wish to reproduce not the given realizations of X, but only their distri-
butions, we proceed as follows: We choose a probability space (Sa, 21, P) on
which an m-dimensional Wiener process IV, and a random variable c independent
of W, - W,0 for t > to with distribution P,0 can be defined and we consider the
stochastic differential equation
(9.3.6) dY, = f (t, Y,) dt +G (t, Y,) dW,, Y,,,=c, to < t _<_ T .
Here, G (t, x) is a d x m matrix with the property
(9.3.7) B (t, x) = G (t, x) G (t, x)'.
Now, there are various possibilities for decomposing a given symmetric nonnega-
tive-definite d x d matrix B (t, x) in the form (9.3.7), so that the coefficient G in
equation (9.3.6) is not uniquely determined. If we represent B in the form
B=UAU'
(where A is the diagonal matrix of the eigenvalues 1;>0 (arranged in increasing
order) and U is the orthogonal d x d matrix of the column eigenvectors u; of B
[see remark (5.2.4)] ), then the choice d = m and
G=UA'12U'=B'12
yields again a symmetric nonnegative matrix G, while
(YI, u',...,VAd ud)
has the advantage of pointing the column vectors of G in the direction of the eigen-
vectors of B. If k of the A; are identically equal to 0, we can go on to the d x m
matrix
G=(+'uk+',...,VAd ud), m=d-k.
This lack of uniqueness is not important, however, if the given diffusion process
is uniquely determined by its parameters / (t, x) and B (t, x) (in the sense of
unique determination of the transition probabilities P (s, x, t. B) by f and B).
For this nontrivial property (we have obtained / and B from the first and second
moments of P (s, x, t, B) only!) we have given a sufficient condition in remark
(2.6.5). If the given process is uniquely determined by f and B, then all equations
of the form (9.3.6) in which G is chosen on the basis of (9.3.7) and which satisfy
the assumptions of Theorem (9.3.1) lead to the same diffusion process. In parti-
cular, for homogeneous diffusion processes, an autonomous equation can always
be found as a dynamic model.
Summing up, we may say that the solutions of stochastic differential equations
and diffusion processes represent essentially the same classes of processes despite
their completely different definitions.
9.4 Transition Probabilities

In Theorem (9.2.3), we pointed out that the transition probabilities P (s, x, t, B)
of the solution X, of the equation
(9.4.1) dX,= f(t,X,)dt+G(t,X,)dW,. X,0=c,
to the ordinary probability distributions of the solution X, (s, x) that
begins at x at the instants:
P(s,x,t,B)=PJX,(s,x)EBJ, toys=<t<T, xERd, BE$d.
In the notation of Chapter 2,
E,,.g(t,X8)= J
g (t, y) P (s, x, t, dy)
Rd
Therefore,
E,,,, g (t, X,) = E g (t, X, (s, x)),
and, in particular,
E,,x g (X8) = E g (X, (s, x)).

A method of determining the transition probabilities P (s, x, t, B) of X, without
having to solve equation (9.4.1) is the subject of the present section. Equation
(9.4.1) represents the law of development for the state X, of the stochastic dy-
namic system under consideration. If we turn to the law of development for the
functions P (s, x, t, B), what we are doing is shifting from a stochastic differ-
ential equation to a second-order partial differential equation.
From here on, we shall assume that the conditions of Theorem (9.3.1) are satis-
fied, so that X, is a diffusion process. The differential operator (2.6.1) corre-
sponding to the process X, is
d 1 d d 32
(9.4.2)
+- 1 bii (s, x) axi axi
3x; 2 i_1 i_1
B (s, x) _ (bii (s, x)) = G (s, x) G (s, x)', f = (h, ... ,1d),
9.4 Transition Probabilities 157
where the derivatives are evaluated at the point (s, x). The infinitesimal operator
A, which uniquely determines the transition probability of X is defined in ac-
cordance with (2.4.10) as the uniform limit
(9.4.3) E g (t + s, X, +i(s, x)) - g (s, x)

A g (s, x) _ J ±m
where g (s, x) is a bounded measurable function defined on [to, T) X Rd.

For the moment, we neglect the requirement of uniformity of the limit and seek
to ascertain when at least the pointwise limit in (9.4.3) exists. If we call the result
L, we have
(9.4.4) Lg=i+Zg,
which holds for all functions g defined on (to, T (X Rd that have continuous first
partial derivatives with respect to i and continuous second partial derivatives with
respect to the components of x and that, together with their derivatives, do not,
as a function of x, increase faster than some fixed power of x. This is easily seen
from (9.4.3) by replacing g (t + s, X,+, (s, x)) in that expression with the sto-
chastic integral of Ito's theorem, namely (for brevity, we write X,+, forX,+, (s, x)),
t+s d t+s
g (t +s, Xt+,) = g (s, x)+ S gs (u, X.) du+ f. (u, X.) gX, (u, X.) du
, ,-1 s
i d d t+s
+-2 f-1
Z i-1 ft by (u, g:, g:i (u, du
d ss t+s
+2 f g,,; (u, X,t) G1 (u, X.) dW;,
i-1 k-r S
and taking the limit. If g depends only on x, we have in the homogeneous case,
in place of (9.4.4),
L=Z.
The limit in (9.4.3) exists and is uniform if g has the above-listed properties and
vanishes identically outside a bounded subset of (to, T] X Rd. Such functions
therefore, belong to the domain of definition DA of A. For them,
Ag=ag+Dg,
and, in the case of homogeneous processes,
Ag=`ng.
Thus, we have found the form of the infinitesimal operator of the solution of the
stochastic differential equation (9.4.1). In this case, the solution is uniquely de-
termined by / and G.
In remark (2.6.5), we showed how the transition probabilities of a diffusion pro-
cess can be found in theory from knowledge of the functions
u (s, x) = E g (X, (s, x)),
where t is fixed and g ranges over a set of functions that is dense in the space
C (Rd) of continuous bounded functions defined on Rd. For given g, we can
calculate u (s, x) from Kolmogorov's backward equation. This equation is valid
here under the following assumptions:
(9.4.4) Theorem. Suppose that the assumptions of Theorem (9.3.1) are satis-
fied for equation (9.4.1). Suppose also that the coefficients f and G have con-
tinuous bounded first and second partial derivatives with respect to the compon-
ents of x. Then, if g (x) is a continuous bounded function with continuous
bounded first and second partial derivatives, the function
u (s, x) = E g (X, (s, x)), to < s< t< T, x E Rd,
and its first and second partial derivatives with respect to x and its first deriva-
tive with respect to s are continuous and bounded. Also, the backward equation
au (s, x)
(9.4.5)
as
where Z is the differential operator (9.4.2), with the end condition
! tm u (s, x) = g (x)
is valid.
The proof of these assertions follows, on the basis of remark (7.3.7), from
Theorem (2.6.3).
Instead of solving the backward equation (9.4.5) for a set of end values g that is
dense in C (Rd), we can confine our attention to the family
g(x)=e`rx, AE Rd .
We obtain
u (s, x) = E exp (i 2' X, (s, x)),
that is, the characteristic function of X, (s, x), which determines uniquely the
probibitity distribution of X, (s, x), namely, P (s, x, t, ).
If P (s, x, t, B) has a density p (s, x, t, y), we can get equations for p (s, x, t, y)
itself from Theorem (2.6.6) and (2.6.9). The density is a fundamental solution of
the backward equation; that is, for fixed t and y and for s <t,
d
8
as
P(s,x,t,y)+ fi(s,x) axi P(s,x,t,Y)
(9.4.6)
+Z d
2 i_1 ;-1
d
b1(s,x)
axi ax;
a2
P(s,x,t,Y)=0,
aim P (s' x, t, y) = a (y-x),
just as it is a fundamental solution of the forward orFokkerPlanck equation;

that is, for s and x fixed and t > s,
d
P (s, x, t, Y) + 2 (fi (t, Y) P (s, x, t, Y))
ayi
d d 2
(9.4.7) _21 tst ayiaayi
i-t
imP(s,x,t,Y)=a(y-x).
However, these laws of development for p are valid only under certain assump-
tions regarding the coefficients f and B = G G' (see section 2.6).
We refer to a theorem of Gikhman and Skorokhod Q36], pp. 96-99) that, for
the scalar case, gives sufficient conditions for existence of a density with certain
analytical properties.
(9.4.8) Example. In accordance with section 8.2, the scalar autonomous linear
stochastic differential equation (d = m =1)
dX,_(AX,+a)dt+bdW,, X0=c, t 0,
has the solution
X, =Cc A,+A (e4'-1)+b e4('-') dW,.

J
0
Special cases are the Ornstein-Uhlenbeck process (a = 0), the deterministic linear
equation with random initial value (b = 0), and the Wiener process (A = a = 0, b =
1, c = 0). For b = 0, X, has a density if c has a density, but the transition proba-
bilities degenerate to
P (s, x, t, ) = b s = x e4(t-,)+A 1) .
For b * 0, there exists a density p (s, x, t, y) so smooth that it can be obtained

from the backward equation
a a
-p(s,x,t,y)+(Ax+a) p($,x,t,y)+1 b2 a2
p(s,x,t,y)=0
as ax 2 axe
or from the forward equation
a 2
at p
(s, x, t, y) +
ay
((A y+a) p (s, x, t, y)) - 2 aye (b2p (s, x, t, y)) = 0
as the fundamental solution. As boundary condition, we assume that p and its
partial derivatives with respect to x, andy, vanish respectively as (xj --I- oo and
jyj -oo. We know from example (9.2.12) that
p (s, x, t, y) = (2 it K1 (s, x))-112 exp (- (y - m, (s, x))22 K, (s, x))
where
mi(s,x)= xeA(9-,)+ a(eA(9-,)

A
and
2
2d(t-,)-
that is, p (s, x, t, y) is the density of 91 (m, (s, x), K, (s, x)). In the special case
A = 0, we have
m, (s, z) = x + a (t -s),
K, (s, x) = b2 (t-s).
We observe that p (s, x, t, y) depends only on t - s. Thus, X, is a homogeneous

diffusion process, as has to be the case for the solution of an autonomous equa-
tion.
The backward and forward equations (9.4.6) and (9.4.7) have up to now been
explicitly solved only in a few simple cases (see, for example, Uhlenbeck and
Ornstein [491, Wang and Uhlenbeck [501, or Bharucha-Reid 1191 ). These have
usually been found by taking the Fourier or Laplace transformation of p.
(9.4.9) Example. The forward equation for the general linear equation
dX, =(a (t) +A (9)X,) d9+7 (B; (t) X,+b; (t)) dW;
-t
with drift vector
f (t, x) = a (t)+A (t) x, a = (at, ... , ad)', A = (A;l),
and diffusion matrix

m
B(t,x)= (Bixx'B;+Bixb;+b;x'B;+bi6:)
(see example (9.3.4)) becomes, for the density p (s, x, t, y) and t > s
a d d d
(s'
a a
-at P x, t, Y) + Z ai
i=t
(t) P (s, x, h Y) + 2] Aii (t) (Y; P)
aYi i=t ;=1 aYi
1 d d a2
--Z T.
2 i1 1=1 hi ay1
(bt,(t,Y)P(s,x,t+Y))=0,
with the initial condition

lim p (s, x, t, y) = a (y-x).
941
For the case B1 (t) _ ... = B. (t) = 0, in the notation

G (t) _ (b1 (t), ... , bm (t)),
(a a2
-PA,
a
at ay,
a
aYd
1'
-p,...,-p =P
r ayi aYi
P Prr
we obtain, more specially,
(9.4.10) p, +a (t)' py+ p tr (A (t)) + pr A (t) y- I tr (G (t) G (t)' pry) = 0.
We know from example (9.2.12) that the solution of this equation is the density
of a normal distribution whose parameters m, (s, x) and K, (s, x) are given in
(9.2.12). The dynamic development of these parameters is characterized by the
differential equations of Theorem (8.2.6).
(9.4.11) Remark. The distribution of an RP-valued functional g (X,) depen-
dent only on the state X, at the instant t can always be obtained via the char-
acteristic function
u(s,x)=Eeia's(x,(,,x)), 2ERP
from the backward equation (9.4.5) with the end condition
u (t, x) = eirt(x)
Many interesting quantities (for example, the time of first entry or the length of
stay in specific regions) depend, however, on the overall course of a trajectory in a
time interval. But, in certain cases, one can give a differential equation for their charac-
teristic function. For example, if g (x)isRP-valued and h (t, x) is R9-valued, put
VA,, (s, x) = E,,x exp ( 2' g (Xj)+i p' h (u, du)

,
for s < t, A E RP, and µ E R9. Under certain conditions on 1, G, g, and h (see
Gikhman and Skorokhod [51, p. 414 or 136], p. 302), we have the equation
Vx M (s, x)+Z VA.,. (s, x)+i u' h (s, x) VA, 1, (s, x) = 0

as
with the end condition
lim Vx,M (s, x) = e`x a(x),
,tt
Here,`, is the operator (9.4.2).
(9.4.12) Remark. Diffusion processes whose state X, can assume only values in
a subset of Rd the boundary of which can be absorbing, reflecting, or elastic
have been discussed, for example, by Gikhman and Skorokhod 151, Prohorov
and Rozanov [ 15 ) , Dynkin [21) , Ito and McKean [261, and Mandl [281.
Chapter 10
Questions of Modelling and
Approximation
10.1 The Shift From a Real to a Markov Process

Many continuous dynamic systems under the influence of a random disturbance
can be represented by an ordinary differential equation (in general nonlinear) of
the form
(10.1.1) X,= f t=to, Xb=c,
Here, X, is the d-dimensional state vector and Y, is an m-dimensional disturbance
process whose probability-theoretic characteristics (finite-dimensional distribu-
tions) we assume to be given. We also assume that the distribution of the initial
value c is given. Now if Y, is a stochastic process with sufficiently smooth (e.g.
continuous) sample functions, (10.1.1) can be considered as an ordinary differential
equation for the sample functions X. (w) of the state of the system.
However, the solution process X, is a Markov process only if, for known X, = x,
the value of X,+,, given approximately by
X,+It=x+f(t,x,Y,)h
is independent of what took place prior to the instant t, that is, only if Y, is a
process with statistically independent values at every point. This is the case, for
example, when Y, does not depend at all on chance but is equal to a fixed func-
tion, especially if Y, w 0 does not appear. Then, (10.1.1) degenerates into a de-
terministic equation.
However, if Y, is a nondegenerate stochastic process with independent values at
every point, for example, a stationary Gaussian process such that E Y, = 0 and
0 for t * s,
1I for t=s,
the sample functions of Y, are extremely irregular; for example, they are discon-
tinuous everywhere and unbounded on every interval, and their graphs are "point
clouds" dense in [to, oo) X R. If we approximate such a process with a sequence
164 10. Questions of Modelling and Approximation
Y;") of continuous stationary Gaussian processes such that E Y(") __ 0 and

EY(") Y;")'=e-"I'-11, we obtain
2
E Y;") ds) = 01 E I
1
Ys(n) ds l - 0 (n- co).
\i
90
Thus, the disturbing action of the process Y;') has, on the (timewise) average,
less and less effect on the left-hand member of (10.1. 1) as n - * oo since the
variance E I y,(1)12 = d (that is, the average energy of the disturbance) remains
finite.
Thus, we are led inexorably to rapidly fluctuating processes with infinite energy
and hence to generalized stochastic processes with independent values at every
point (in the sense of section 3.2). So-called "delta-correlated" Gaussian pro-
cesses are examples of this.
We now need to confine ourselves, for a meaningful theory, to functions
f (t, x, y) that are linear in y, so that (10.1.1) has the form
X, = f (t, X,) +G (t, X,) Y, .
For it is still possible to define the product of an ordinary and a generalized

function whereas the square of a generalized function (for example,d (t) 6 (t)) is
no longer in general defined.
In accordance with the remark at the end of section 3.2, we can without loss of
generality confine ourselves to white noise , as the prototype of a "delta-cor-
related" Gaussian noise process, thus considering only equations of the form
(10.1.2) X, = f (t, X,) +G (t, X,) E,
for which we have, with the aid of the stochastic integral, developed a precise
mathematical theory. The solution of (10.1.2) is a Markov process and hence
belongs to a class of processes for the analysis of which efficient mathematical
methods exist though it has, on the other hand, the disadvantage that its sample
functions are not smooth functions (see section 7.2). This last fact is typical of
Markov processes since the Markov property, formulated in a negative way,
states that, for a known present, it is forbidden to transmit information from
the past into the future. The "jagged" behavior of X, arises from this.
Now, physically realizable processes are always smooth processes and hence
are at best only approximately Markov processes. In equation (10.1.2), for ex-
ample, we would have in actuality not the white noise f, but only an approxi-
mate "delta-correlated" process C,, hence not a white but a "colored" noise
in a general sense.
Let us look at the scalar equation
(10.1.3) X, = f (t, X,)+G (t, X,) C
10.1 The Shift From a Real to a Markov Process 165
where C, is the stationary Ornstein-Uhlenbeck process, hence the solution of the

4,=-aCe+aE,
(see section 8.3) with a>0 and9l (0, a2/2 a)-distributed initial value. The pro-
cess (, has the covariance function
EC, Cs = e-s1r-,1 a2/2 a.
Therefore, in accordance with section 3.2, it becomes an approximation of white
noise as 2 o , , but a2/2 a2=D--+ 1/2. By virtue of the continuity of
2; equation (10.1.3) can be regarded as an ordinary differential equation. Hence,
for sufficiently smooth f and G, it provides a process X which is now differenti-
able though not a Markov process. However, we can regain the Markov property
by shifting to a two-dimensional state space and considering the vector process
(X C,). The replacement of f, with a stationary Gaussian process C, possessing
analytical sample functions is discussed in Stratonovich [76} , pp. 125-126. The
basic rule is as follows: For every newly attained derivative of X we need (in the
scalar case) to add a component to the accompanying Markov process.
The question as to what stochastic differential equation must now be chosen in
order to describe adequately a given physically realizable process is the question
of modelling. A controversy has arisen in this connection (see Gray and Gaughey
1391 and McShane [461) since different authors have obtained different solu-
tions for apparently identical problems. These discrepancies arise, as we shall
see, not from errors in the mathematical calculation but from a general discon-
tinuity of the relationship between differential equations for stochastic processes
and their solutions. Let us clarify this with an example.
(10.1.4) Example. By virtue of Corollary (8.4.3a), the solution of the scalar
X,=A(t)X,+B(t)X,E X,,=cEL2,
where E, is a scalar white noise is, after we have written it in the form
dX,=A(t)X,dt+B(t)X,dW X,.=c,
the process
X, = c exp (S (A (s)-B (s)2/2) ds+ J B(s) dW,l.
Let us now replace in the original equation the white noise E, with a sequence of
physically realizable continuous Gaussian stationary processes {E(")} such that
E E(,") __ 0 and E E;") E(,") = C. (t - s), where
Jim C. (t) = d (t).
" 00
Then,
y, = A (t) Y, + B (t) Y, fi"), Xto = c ,
is an ordinary differential equation and hence has the solution
Y(")=cexp A(s)ds+ B(s)$(")ds).

eo to
The process
ft
Zt" = 1
B (s) s(n) ds
to
is a Gaussian process with mean 0 and with covariance

t a
EZ;")Z;")= B(u)B(v)C"(u-v)dudv
to to
- min (t, )
B (u)2 du,
that is, Y;") converges in mean square to a process whose distributions coincide
with the distributions of the process
8 9
Y, = c exp A (s) ds + B (s) dW,)

\a
Therefore, the processes X, and Y, are quite different for B (t) 4 0. From Corol-
lary (8.4.3a), Y, is the solution of the stochastic differential equation
dY, = (A (t)+B (t)2/2) Y, dt+B (t) Y, dW Y,.=C.
To getX,, we made the limiting shift to the white noise E, in the original equation
and solved the equation as a stochastic differential equation. In contrast, we ob-
tained Y, by solving the ordinary differential equation disturbed by 4(") and then
made this shift to E, in the solution. Obviously, this leads to different results.
Both processes (though not the processes Yt")) are Markov processes since they
satisfy (different!) stochastic differential equations. Which of these is the "cor-
rect" process (in the sense of giving the better description of the basic system)
can in general be decided only pragmatically.
If we denote by L (g) the solution of an equation g and by g (t,) or g (J(1)) the
X,= f+GEt, f, is a white noise,
10.2 Stratonovich's Stochastic Integral 167
or its approximating ordinary differential equation
JY, = f -t-G l:;n), jn) is a continuous stationary Gaussian process,
E&)=O, E ijn) s;n) = Cn (t-s),

respectively, then example (10.1.4) shows that, in general,
L (g (4,)) 4 Ism L (g (din)))

CO -.d(t)
The question now arises as to whether we can modify the definition of the sto-
chastic integral in such a way that equality will hold in the last relationship. This
is in fact possible-by means of the definition (already mentioned in section 4.2)
of Stratonovich's time-symmetric stochastic integral [48] .
10.2 Stratonovich's Stochastic Integral

In section 4.2, we pointed out that, in the attempt to evaluate the integral
W, d I,
J
go
as the limiting value of the approximating sums

n
Sn= E Wii(W,i-W,i_1), to:5 tt:...-tn=t, ti-1:Ti=ti
i-1
the result depends very much on the choice of intermediate points Ti. lto's choice
ri=ti_1, on which we have based our exposition exclusively up to now led to the
value
i W, dW, = qm-liim S. = (W, -W;)/2-(t-to)/2

n
'0
where
an=max (ti-ti_1)
and, in general, to a concept of an integral that is not symmetric with respect to

the variable t since the increments d IV, "point into the future".
However, it is just this lack of symmetry that leads to the simple formulas for
the first two moments of the integral (see Theorem (4.4.14e)) and to the mar-
tingale property (see Theorem (5.1.lb)). Furthermore, a stochastic differential
equation
dX,=il(t,X,)di+G(t,X,)dW,
explained on this basis yields, under the assumptions of Theorem (9.3.1), a dif-
fusion process as solution. The intuitive significance of the coefficients f and G
is explained by regarding ! as the drift and G G' as the diffusion matrix of that
process. A disadvantage is that the calculus valid for stochastic differential equa-
tions, which operates in accordance with Ito's theorem, deviates from the
familiar one.
This disadvantage (together with all the advantages of Ito's integral that we have
mentioned) is removed by Stratonovich's definition (481, which yields for the
special case considered at the beginning
(S) J W,dW,=qm-lim " Wli-i+W,i (W,i-W<<-1)

1
e
+.-0 ;.t 2
=(Wi -Wy)/2
hence a value that we can also obtain by formal integration by parts.
Somewhat more generally, let us define
H (s, W,) dW, =q m lam H

(_1,w't_2+ (W,i-W,i_1).
(10.2.1) (S)
Here, W, is an in-dimensional Wiener process and H (t, z) is a (d x in matrix)-

valued function that is continuous with respect to t, that has first-order partial
derivatives Hsi with respect to the in components xi of x, and that satisfies
the condition
JE1H(s,W,)I2ds<oo
a
It follows from Theorem (10.2.5) that the limit in (10.2.1) exists. The result is
connected with Ito's integral, which is defined in the present case by
I
w"'0 i'1
ft
H (s, W,) dW, = qm-lim Z H (tiW,i_1) (W,i-W,i_1)
as shown in the following equation:

I I 1(i
(10.2.2) (S) H (s, W,) dW, _ H (s, W,) dW, + 2 1 I (Hfk (s, ds.
Here, the d-vector is the k th column of the d x in matrix Hsk.

To build a theory of stochastic differential equations on the basis of Stratono-

vich's idea, we use the following general
(10.2.3) Definition. Let Y, denote an m-dimensional diffusion process on the
interval [to, 7'] for which relations b) and c) in definition (2.1.5) hold for e - oo
and whose drift vector a (t, x) and diffusion matrix B (t, x) together with the
derivatives aB (t, x)/axi, for j =1, ... , m, are continuous in both arguments.
Suppose also that H (t, x) is a (d X m matrix)-valued function that is continuous
in t, that has continuous partial derivatives aH (t, x)/axi, and that satisfies, for
t E [to, T), the conditions
1
EIH(s,Y,)a(s,Y,)I ds<oo
to
and
i EIH(s,Y.)B(s,Y,)H(s,Y,)'Ids<oo
'o
Then, the limiting value

t n
(10.2.4) (S) I H(s, Y,)dY, = qm-liim H(t;_Y`` 2 Y```(Y,,-Y,,_j),

'o
where to t1 < ... < t = t is a partition of the interval [to, t] and d = max
(t, - t; _ 1), is called the stochastic integral in the sense of Stratonovich.
This integral is connected with Ito's integral as follows:
(10.2.5) Theorem. The limit in (10.2.4) exists under the conditions mentioned
in definition (10.2.3). It is connected with Ito's stochastic integral, defined here
by
I is
(10.2.6) f H (s, Y,) dY, = qm liim H (t,-t, Yr,-t) (Y',- Y,,-t)

'o
;by the formula
(S) i H(s,Y,)dY,= J H(s,Y,)dY,+

to 2 i-t k-t y
Here, the d-vector (Hsk).j is the jth column of the d X m matrix Hsk = (aH,l/3xk).
To prove this, we consider the difference between the sums in (10.2.4) and
(10.2.6):
n Y,. + Y,.
H _H (ti-1, Y,j-1) (Y°+- Y,i_1)
We then apply the mean-value theorem to the terms H (ti-1, (Y,i-1 + Y,4)/2). For
details, see Stratonovich [481.
(10.2.7) Remark. For d = m = 1, the conversion formula is
(S) H(s,Y,)dY,= JH(s,Y,)dY,+12 J

aH(s'Y')B(s,Y,)ds.
ax
90 k k
We can see from this, in particular, that the stochastic integrals of Ito and Stra-
tonovich coincide when H (t, x) = H (t) is independent of x.
We can now define the stochastic differential equations of the customary form
in terms of the integral
(S) i (0, G (s, X,)) d (S) G (s, X,) dW,

1
t0 (XW.,) 90
(where X, is d-dimensional and W, an m-dimensional Wiener process and G (t, x)

is (d x m matrix)-valued) and the equation
X, = X,,+ f (s, X,) ds + (S) G (s, X,) dW to: t:-:5 T.

J J
go to
We again write symbolically

(S) dX, = / (t, X,) dt+G (t, X,) dW
where the parenthesized S on the left means that the stochastic integral on the
basis of which the differential notation is defined is to be understood in the
sense of Stratonovich.
For our case, where H = (0, G) and Y, = (X,, W,), the conversion formula of
Theorem (10.2.5) yields
I I I M d
(S) G (s, X,) dW, _ G (s, X,) dW, +- (Gxk (s, X,)).1
to k 2 i-1 k-1 eo
Gk1(a,X,)ds
(see Stratonovich [481), so that the Ito equation
dX,= l/ (t, Xt)+1 I Z(Gxk (t,Xl)).i Gik/(t,X,)dt+G(t,X,)dF,

\ 2 i=1k-1
corresponds to the above-defined Stratonovich differential equation in the sense

of coincidence of solutions. Conversely, the Stratonovich equation
(S) dX, = (J-1 2] (Gxk).1 GkJ) dt+G dW,

2 j-1 k=t
corresponds to the Ito equation

(10.2.8) dX, = J dt+G dW,
Thus, whether we think of a given formal equation (10.2.8) in the sense of Ito
or Stratonovich, we arrive at the same solution as long as G (t, x) - G (t) is inde-
pendent of x, as in the case of an equation that is linear in the narrow sense (see
section 8.2). In general, we obtain two distinct Markov processes as solutions,
which differ in the systematic (drift) behavior but not in the fluctuational be-
havior. This last means, in particular, that the sample functions of the solution of a
Stratonovich equation are also in general functions that are not differentiable
or of bounded variation.
(10.2.9) Example. The formal scalar linear equation
(?) dX, = A (t) X, dt+B (t) X, dW X,o = c E L2,
taken as an ltd equation has the solution
X, = c exp (f (A (s) - B (s)2/2) ds+ B (s) dW,

to /
but taken as a Stratonovich equation has the solution
Y, = c exp (f A (s) ds + B (s) dW,).

ti
One should compare the example (10.1.4). We can also obtain Y, by formal solu-
tion of the original equation. In general, the two processes have quite different
global properties. For example, if we set A (t) = A, B (t) - B, and to = 0, then
X, = c e(A-B=l2)9+Bwt --r 0 (almost certainly)

as t--+oo if and only if A < B2/2 whereas
Y, = c eA`+BWt - 0 (almost certainly)
as if and only if A<0.

It can be shown in general (see Stratonovich [48] ) that the Stratonovich integral
and the Stratonovich differential defined in terms of it satisfy all the formal rules
of an ordinary integral or differential (integration by parts, change of variable,
chain rule) and hence in this respect can "more easily" be manipulated than the
Ito integral or differential. Unfortunately, the-price we have to pay for this is the
loss of all the advantages of It6's integral that were mentioned earlier. However,
the conversion formulas that we have given enable us at all times to shift from
one type of integral to the other.
The system-theoretic significance of Stratonovich equations consists in the fact
that, in many cases, they present themselves automatically when one approximatess
a white noise or a Wiener process with smoother processes, solves the approxima-
ting equation, and in the solution shifts back to the white noise. Comparison of
(10.1.4) and (10.2.9) shows this immediately. In the following section, we shall
discuss this matter in greater detail.
10.3 Approximation of Stochastic Differential Equations

Suppose that
(10.3.1) X'=f (t,Xg)+G (t, X,) C, toat:T, X,o=c,
is a formal differential equation for the d-dimensional state of a dynamic system.
In this equation, let us assume that f (t, x) E Rd, that G (t, x) is (d X m matrix)-
valued, and that the m-dimensional process C, is a "rapidly fluctuating" station-
ary Gaussian process that is independent of c and has expectation 0. How should
equation (10.3.1) be interpreted?
If we know or can assume that r;, is exactly or nearly equal to a white noise pro-
cess ,, (10.3.1) should be interpreted as the Ito differential equation
(10.3.2) dX, = f (t, X,) dt+G (t, X,) dW X,o = c.
The same is true if (10.3.1) serves as an approximation or a limit of a discrete

(timewise nonsymmetric) problem
X 8k+ 1 - X 1k
= f (tk, X,k) + G (tk, X,k) i;,k
tk+ l - tk
where the i;,k are Gaussian, independent, and identically distributed.
On the other hand, if C, is a continuous process and only an approximation of
the white noise (for example, with the delta-like covariance function C (t)), we
can treat (10.3.1) as an ordinary differential equation and solve it by the classi-
cal procedures. The solution is not a Markov process, but under certain condi-
tions (see Gray [381 or Clark [331) it converges in mean square, as
C (t) --+ b (t)
or
10.3 Approximation of Stochastic Differential Equations 173
qm-lim J
C, ds =W,
to a Markov process, which is now the solution of the corresponding Stratonovich

equation
(10.3.3) (S) dX, (t, X,) dt+G (t, X,) dW X,0 = C
One should again compare examples (10.1.4) and (10.2.9).
We emphasize once again that the solutions of equations (10.3.2) and (10.3.3)
coincide when G (t, x) - G (t) is independent of x.
Suppose that to < t1 < ... <t" - T < oo is a decomposition of [to, T) and that b"
= max (tk+1- tk). Khas'minskiy ([651, pp. 218-220) points out that, as a" -- 0,
the sequence of "Cauchy polygonal lines" X;") defined by
X,o=cEL2,
Xtk+1 = Xek)+ I (ti, Xe4)) (tk+1 -tk)+G (tk, X(k))(W,k+1 -W,k)
and by linear interpolation between the partition points converges in mean square
to the solution of the Ito equation (10.3.2). If we set
Xt")
9i+1
= X(")
ek tki+1'
+ (t i,Xl") )(t ti)+ G(41
2
(tW k+l-W i)
1k)
this leads to the solution of Stratonovich's equation (10.3.3).

Since a stochastic differential equation of the form (10.3.2) is merely a short way
of writing the integral equation
t t
X,=c+I f(s,X,)ds+ G(s,X,)dW toat_<T,
b 4
it seems natural to try to construct an approximation for X, by replacing W, in
the second integral with a smooth integrator, so that the integral can be regarded
as an ordinary Riemann-Stieltjes integral and can be evaluated. We again see that
a smoothing prior to evaluation of the stochastic integral leads, when we take
the limit in the result, to the solution of the formally identical Stratonovich equa-
tion.
For example, we may approximate the trajectories W. (w) with a sequence
W!"1 (w) of continuous functions of bounded variation and piecewise-continu-
ous derivatives, so that, for all a E [to, T] and almost all to E Q,
sup W;"1(w)J C (w) (for all n no (w))
$95116
and
sup I W() (w)- Wt (cu)l - 0
tea
This is the case, for example, for the polygonal approximation at the points
to<t1< , <ta=a:
W;")=Wtk+(Wtk+1-Wtk)
9-t ,
k
9k+1 -tk
and
b"=max (tk+1-tk)--i0.
Fig. 7;
Polygonal approxi-
mation of the sam-
ple functions of a
Wiener process.
In the equation
X;"1= c + f (s, X;"l) ds + G (s, X;"1) d W;")

i
to to
the last integral is an ordinary Riemann-Stieltjes integral for the individual tra-
jectories. Under certain conditions on the functions/ and G, the sequence of
the X;") then converges with probability 1, uniformly on Ito, a), to the solution
of the Stratonovich equation (10.3.3); that is,
ac-lim ( sup )X;")-XtI) = 0,
4-00 491sa
where Xt satisfies (10.3.3).
In this connection, we cite a result of Wong and Zakai (52] for the scalar case.
(10.3.4) Theorem4 Suppose that d = m = I and that (W(") } is a sequence of
approximations of the Wiener process Wt with the properties mentioned above.
Suppose that the functions f (t, x) and G (t, x) are continuous functions de-
fined on )to, T) x R1 and that G has continuous partial derivatives Gt and G.,
Suppose-that the functions/, G, and G G satisfy a Lipschitz condition in x (see
(6.2.4)). Suppose that
G(t,x)j: a>0 (or a>0)
10.3 Approximation of Stochastic Differential Equations 175
and
JG,(t,x)J<flG(t,x)2
Suppose finally that the initial value c is independent of W, -W,o for t E [to, T J
and that X;") is the solution ofthe equation
X,"i=c+f /(s,X,"i)ds+I G(s,X,"i)dW("), to--5t--5T,

90 to
Then, for every finite a E [to, TJ ,

ac-lim ( sup [X;") -X,J) = 0,
"-.°° to 5t5a
where X, is the unique solution of the Stratonovich equation
(S) dX, = / (t, X,) dt +G (t, X,) dW,
or of the equivalent Ito equation
dX, X,) + I Gs (t, X,) G (t, X,)) dt +G (t, X,) dW,
with initial value X,o = c.

Thus, as a rule of thumb, one arrives at a Stratonovich equation when one shifts
to a white noise or Wiener process in the result after evaluation of an ordinary
integral.
Chapter 11
Stability of Stochastic Dynamic
Systems
11.1 Stability of Deterministic Systems

Crudely speaking, stability of a dynamic system means insensitivity of the state
of the system to small changes in the initial state or the parameters of the sys-
tem. The trajectories of a stable system that are "close" to each other at a spe-
cific instant should therefore remain close to each other at all subsequent instants.
For the theory of stability of deterministic systems, especially those systems
that are described by ordinary differential equations, we refer to the mono-
graphs by Hahn [641 and l3hatia and Szego [581. We mention here only a few
basic facts.
Suppose that
(11.1.1) X = f (t, X,), Xb=c, tic to
is the ordinary differential equation for a d-dimensional state vector X,. We as-
sume that, for every initial value c E Rd, there exists a global solution (that is,
one defined on Ito, oo) ) X, (c) (as is guaranteeed by the Lipschitz and bounded-
ness conditions) and that f (-, x) is continuous. Furthermore, suppose that
f (t, 0) = 0 for all t =' to,
so that (11.1.1) has the solution X, s 0 corresponding to the initial condition
c = 0. We shall refer to this solution as the equilibrium position. The equilibrium
position is said to be stable if, for every e > 0, there exists a b = 6 (e, to) > 0 such
that
sup IX,(c)11i e
to _1 <oo
whenever id z b. Otherwise, it is said to be unstable. The equilibrium position is

said tb be asymptotically stable if it is stable and if
lim X, (c) = 0
9 Go
for all c in some neighborhood of x = 0.

11.1 Stability of Deterministic Systems 177
The definition of stability contains to as a parameter. However, for a 8 (e, to)> 0

corresponding to the definition, there exists a b (e, t,) > 0 for every it > to.
The definition of stability just given leads to difficulties in that only in special
cases can we solve equation (11.1.1) explicitly. As far back as 1892, A. M. Lya-
punov developed a method (the so-called direct or second method) for deter-
mining stability without actually solving equation (11.1.1).
(11.1.2) Definition. A continuous scalar function v (x) defined on a spherical
neighborhood of the zero point
Uh = (x: [x[ S h) c Rd, h> 0
is said to be positive-definite (in the sense of Lyapunov) if
v(O)=O, v(x)>0 (forallx*O).
A continuous function v (t, x) defined on [to, oo) X Uh is said to be positive-defi-
nite if v (t, 0) = 0 and there exists a positive-definite function w (x) such that
v (t, x) w (x) for all t ji to .
A function v is said to be negative-definite if - v is positive-definite. A continuous
nonnegative function v (t, x) is said to be decrescent (in Russian literature, it is
said to have an arbitrarily small upper bound) if there exists a positive-definite
function u (x) such that
v (t, x) s u (x) for all t to.
It is said to be radially unbounded if
inf v (t, x) --i oo ([x[-+00).
sale
Every positive-definite function v (x) that is independent of t is also decrescent.
If X, is a solution of (11.1. l) and v (t, x) is a positive-definite function with con-
tinuous first partial derivatives with respect tot and to the components x; of x,
then
V,=v(t,X,)
represents a function of t whose derivative is, by virtue of (11.1.1),
d
V,=av+2 av fi(t,X.).
at ,., axi
If V,:5 0, then X, varies in such a way that values of V, do not increase, that is,
the "distance" of X, from the equilibrium point measured by v (t, X,) does not
increase. This elementary consideration leads to the following sufficient criteria
discovered by Lyapunov.
(11.1.3) Theorem, a) If there exists a positive-definite function v (t, x) with
178 11. Stability of Stochastic Dynamic Systems
continuous first partial derivatives such that the derivative formed along the tra-
jectories of
X' =1(t, X,), t > to, 1(t, 0) 0,
satisfies the inequality
av d av
at i_I ax,
in a half-cylinder
{(t, x): t?to, JxJ<h}
then the equilibrium position of the differential equation is stable.
b) If there exists a positive-definite decrescent function v (t, x) such that v (t, x)
is negative-definite, then the equilibrium position is asymptotically stable.
A function v (t, x) that satisfies the stability conditions of Theorem (11.1.3) is
said to be a Lyapunov function corresponding to the differential equation in
question.
(11. 1.4) Example. The linear autonomous equation
Xt=AXg, t to, Xa=c,
has the solution
Xt = e4('-a) c.
Let 11, ... , ld denote the eigenvalues ofA A. Then, the equilibrium position is as-
ymptotically stable if and only if
(11.1.5) Re (1;) <0, i=1,...,d.
if at least one eigenvalue has a positive real part, it is unstable. If some of the real
parts vanish, the equilibrium position is stable (though not asymptotically stable)
provided the elementary divisors corresponding to the eigenvalues with vanishing
real parts are all simple. If any of the elementary divisors are of higher order, the
equilibrium position is unstable. One can see whether (11.1.5) holds or not by
using the criteria of Routh, Hurwitz, and others (see Hahn [641) or one can see
whether the equation
(11.1.6) A'P+PA= -Q,
has, for some positive-definite Q, a positive-definite matrix P as its solution. Then,
we can choose the Lyapunov function
v(x)=x'Px>0
for which
v(x)=2(Px)'Ax=x'PAx+x'A'Px=-x'Qx<0.
11.2 The Basic Ideas of Stochastic Stability Theory 179
11.2 The Basic Ideas of Stochastic Stability Theory

When we try to carry over the principles of deterministic Lyapunov stability
theory to the stochastic case, we encounter two problems:
1. What is a suitable definition of stability?
2. What is the corresponding definition of a Lyapunov function? With what
must we replace the inequality v 0 in order to get assertions of the type
of Theorem (11.1.3)?
Of the numerous attempts along these lines in recent years, the one that has re-
ceived general acceptance is that of Bucy [591, who recognized in 1965 that a
stochastic Lyapunov function should have the supermartingale property, which
corresponds to a quite strict stability concept. For a detailed account and further
references, we refer the reader to the books by Kushner [72] , Bucy and Joseph
161], Morozan [731, and, above all, the profound work by Khas'minskiy 165],
which we shall essentially follow in the present book.
In all that follows, we shall make the
(11.2.1) Assumptions. Suppose that
(11.2.2) dX, = J (t, X,) dt +G (t, X,) dW X'9=c, t = to,
is a stochastic differential equation that satisfies the assumptions of the exis-
tence-and-uniqueness theorem (6.2.1) and has continuous coefficients with
respect to t. Then, in accordance with Theorem (9.3.1), corresponding to every
c that is independent of W. there exists a unique global solution X, = X, (c) on
[to, oo) , which represents a d-dimensional diffusion process with drift vector
f (t, x) and diffusion matrix B (t, x) =G (t, x) G (t, x)'. Furthermore, let us as-
sume once and for all that c is with probability I a constant. Then, Theorem
(7.1.2) implies the existence of all moments E [X,[k for k> 0 and also
P(X,EBIX,o=c) = P(X,(c)EBJ.
The solution beginning at the instant s to at the point x will be denoted by
X, (s, x). Finally,
J (t, 0) = 0, G (t, 0) = 0 for all t to,
so that the equilibrium position X, (0) __ 0 is the unique solution of the differ-
ential equation with initial value c = 0. The case in which the equilibrium posi-
tion is a solution of the undisturbed equation and the disturbance acts even at
x = 0 (that is, f (t, 0) __ 0 but G (t, 0) * 0), is considered in remark (11.2.18).
To clarify the basic idea of stochastic stability theory, we make first of all the
following observation:
Let X, denote the solution of equation (11.2.2) and let v (t, x) denote a positive-
definite function defined everywhere on (to, oo) X Rd that has continuous partial
180 H. Stability of Stochastic Dynamic Systems
derivatives v vzi and vxi .. Then, the process

V,=v(t,X1)
has, in accordance with Ito's theorem, a stochastic differential. If we denote by
L the extension of the infinitesimal operator A of X, to all functions that are con-
tinuously differentiable with respect to t and twice continuously differentiable
with respect to the xi (see section 9.4), that is,
L at+T, L :A,
(11.2.3)
d d d
3 a2
Z= fi (t, x) + (G (t, x) G (t, x)')ij
i=1 axi 2 i-1 j=i axi axi
then, in accordance with formula (5.3.9a),
d m
(11.2.4) dV, = (L v (t, X,)) dt+ Usi
(t, X,) Gij (t, X,) MI.
Now, a stable system should have the property that V, does not increase, that is,
d V, < 0. This would mean that the ordinary definition of stability holds for each
single trajectory X. (w). However, because of the presence of the fluctuational
term in (11.2.4), the condition d V, < 0 can be satisfied only in degenerate cases.
Therefore, it makes sense to require instead that X, not run "up hill" on the aver-
age, that is,
E(dV,)a0.
Since
E(dV,)= E(Lv(t,X,)dt)
this requirement will be satisfied if
(11.2.5) Lv(t,x)=0for all ti=to, xERd.
This is the stochastic analogue of the requirement that i:_5 0 in the deterministic
case and it reduces to that case if G vanishes. We shall refer to the function v (t, x)
used here as a Lyapunov function corresponding to the stochastic differential
equation (11.2.2).
For what stability concept is (11.2.5) a sufficient condition? In this connection,
we remember that, in accordance with Theorem (5.1.1b), the second integral in
t
V, = U (to, C)+ J L v (s, X,) ds+ 12: Usi (S, Gij (3, Xi) dWi
to k`-I j-1
is a martingale, Here, the accompanying family 5, of sigma-algebras is the one de-
fined in (6.1.2). Therefore, for t s, we have by virtue of (11.2.5)
E(V,_V,I3,)=E(f Lv(u,X.)du1 5):50

,
or
E(Vt13'r,)=V
that is, under condition (11.2.5), V, is a (positive) supermartingale. For every in-
terval la, bJ C [to, oo), the supermartingale tnequality yields
P1 sup v (t, X,) = eJ < 1 E v (a, X.)

a-<_t:9 b e
and, in particular, for a = to and (the initial value c is constant!),
P[ sup v(1,X,)=eJ<lv(to,c),for all e>0, cERd.

tog8<oo e
Now, v (t, x) is positive-definite. To simpl y the analysis, we exclude Lyapunov
functions that become arbitrarily small fe large JxJ by requiring that v (t, x)?
w (x)-*oo as IxI-aoo, that is, by requirii that v be radially unbounded. Then,
for arbitrary positive e1 and e2, there exist an a =e (el) and (subsequently) a
b = 6 (e1, 82, to) such that
v (t, x) i< e, if IxI er ,
and
v (to, c)/e = e2, if id ) = 6,

so that, with X, = X, (c),
P[ sup IXt (c)I = 811:5e2 for Ici d.
tp_<t<oo
Therefore, for arbitrary positive el and e2, there exists a positive t such that all
the trajectories with starting point at c, where JcJ < b, range, for all time, in ane1-
neighborhood of the zero point-with the exception of a set of trajectories that
occur with some probability not exceeding 82. This is the e6-formulation of the
following stability definition:
(11.2.6) Definition. Suppose that the assumptions (11.2.1) are satisfied. Then
the equilibrium position is said to be stochastically stable ("stable in probability"
in Khas'minskiy's terminology (65] or "stable with probability 1" in Kushner's
terminology (72] ) if, for every e > 0,
limP[ sup eJ=0
y:5 t<oo
Otherwise, it is said to be stochastically unstable. The equilibrium position is said
to be stochastically asymptotically stable if it is stochastically stable and
lim P I lim X, (c) =01 =1.

C-0 t-oo
The equilibrium position is said to be stochastically asymptotically stable in the
large if it is stochastically stable and
PIlim X, (c)=0I=1
too
for all c E Rd.
It should be noted that, in the case G (t, x) = 0, these definitions reduce to the
corresponding deterministic definitions.
Thus, we have shown that the existence of a positive-definite function v (t, x)
with the property (11.2.5) implies stochastic stability.
For asymptotic stability, we must exclude the case L v (t, x) = 0 (because then
there is no average trend in direction x = 0) and replace (11.2.5) with the more
stringent condition
(11.2.7) L v (t, x) is negative-definite.
Suppose also that v is decrescent and, together with -L v, radially unbounded.
Now, V, is, by virtue of (11.2.7), again a positive supermartingale. Therefore, in
accordance with section 1.9, there exists
ac- limV,=VW ; 0.
9-.00
Here, the limit V(c) may depend on the initial point c. If V(c) were at least equal
to some positive cl on an w-set Bc with positive probability, we would have, for
these w,
c2>0forallt r(w),
and, by virtue of the decrescence of v,
I X,I ac3>0for all
The assumption (11.2.7) and the radial unboundedness of -L v implies the
existence of a positive c4 such that
L v (t, x) -c4 for JxI C3.
By virtue of (11.2.4),
0:5 V,=v(to,c)+ J L v (s, X,) ds + M,

to
where M, is a scalar martingale such that E M, = 0 and, by virtue of (11.2.7),

Ms - v (to, c).. The supermartingale inequality yields
M«eJ<v(to,c)
P[sup
t to e
For all w E Bc n [sup Mt <eJ, we have, fort 2:'r (w),
0 < vt < v (to, C)-C4 (t - r (w))+e.
When we let t approach oo, this leads to a contradiction; that is, Bc n [sup Mt <el
has probability 0. Therefore,
V (to, c)
P (Be) = P (Bc n [sup M, eJ) <
e
and, finally,
P [ lim v (t, Xt) = 01 =a I- v (to, c)

t-.oo e
By virtue of the radial unboundedness of v, we therefore have
P [t+oo
lim X, (c)=0J > I -v (to'
e
c) -+ I
Thus, the equilibrium position is stochastically asymptotically stable.

This proves a special case of the following general theorem (see Khas'minskiy
[65] ):
(11.2.8) Theorem. Suppose that the assumptions (11.2.1) are satisfied.
a) Suppose that there exists a positive-definite function v (t, x) defined on a half-
cylinder [to, oo) X Ul,, Uh = {x: [x[ < h}, where h > 0, that is everywhere, with
the possible exception of the point x = 0, continuously differentiable with respect
a;
to t and twice continuously differentiable with respect to the components x; of z.
Furthermore,
Lv(t,x)=0, tffi?to, 0<JxJ=h,
where
I d d
a d a2
L +7 +- Y Z (G (t, x) G (t, x)');j
2 ;,, ;.,
at ax; x; axi
Then, the equilibrium position of equation (11.2.2) is stochastically stable.

b) If, in addition, v (t, x) is decrescent and L v (t, x) is negative-definite, then the
equilibrium position is stochastically asymptotically stable. In both cases,
a>0, sisto.
t=, e
c) If the assumptions of part b) hold for a radially unbounded function v (t, x)

defined everywhere on [to, oo) X Rd, then the equilibrium position is stochasti-
cally asymptotically stable in the large.
(11.2.9) Remark. A sufficient condition for negative-definiteness of L v is the

existence of a constant k > 0 such that
L v (t, z) s -k v (t, x).
(11.2.10) -Remark. For an autonomous equation
dX,= f (X,)dt+G(X,)dW , f (0)=0, G(0)=0
it is sufficient to consider a function v (t, x) _- v (x) that is independent of t. Khas'-
minskiy [651 has shown that the existence of a Lyapunov function v (x) is a
necessary condition for stochastic stability as long as the disturbance is "nonde-
generate", that is, as long as
(11.2.11) y G (x) G (x)' y' a m (x) Iy(2 for all y E Rd, x E Uk ,
where m (x) is positive-definite. Under condition (11.2.11), stochastic stability
implies stochastic asymptotic stability (see Khas'minskiy [651, p, 213).
We state the following criterion for stochastic instability (Khas'minskiy [651):
(11.2.12) Theorem. Suppose that the assumptions (11.2.1) are satisfied. Sup-
pose that there exists a function v (t, x) defined on[to, oo) x {0 < (x(:5 h) that
has continuous partial derivatives with respect to t and continuous second partial
derivatives with respect to the x;. Then, if
lim inf v (t, x) = oo
x-'O tZto
and
(11.2.13) sup L v (t, z) <0for all0<e<h,

tat,,1<IxI9k
or, instead of (11.2.13), only L v (t, x) 0 (for x * 0) and, in addition,

m(x)(y(2forallyERd, (xj;Eh,
(11.2.14)
t;sto, m positive-definite,
then the equilibrium position of (11.2.1) is stochastically unstable. Furthermore,
P ( sup (X, (c)(< h1= 0 for all c E Uk .
t2b
(11.2.15) Example. The solution of the scalar linear autonomous homogeneous
differential equation (d = m = 1)
(11.2.16) dX,=AX,dt+BX,dW X,,=c,
is, in actbrdance with Corollary (8.4.3b),
X, (c) = c exp ((A - B2/2) (t - to) + B (W, - W,, )).
From this we conclude the following: a) for no positive e does there exist a
positive b such that all the sample functions of the bundle of sample functions
originating at c * 0, 1 c I < d remain in an e -neighborhood of x = 0
b) Since
IV`-9/`0
ac- lim, =0
t-.00 t-to
the equilibrium position is stochastically asymptotically stable in the large for
A < B2/2
and stochastically unstable for
B2/2>0.
Let us derive the same result with the aid of Lyapunov's technique. For X, the
operator L takes the form
2
L at +Ax
X B2 x2 2x2
A trial with v (t, x) 2 where r > 0, leads, for x * 0, to
v (x) = jxj',
L(Ixl')=(A+!B2 (r_l))r lxi'.
As long as A< B2/2, we can choose r such that 0 < r < I - 2 A/B2 and hence
satisfy the condition
Lvs-kv
which, according to remark (11.2.9), is sufficient for stochastic asymptotic sta-
bility. Since v (x) = IxI' is radially unbounded, we have, in accordance with
Theorem (11.2.8c), stochastic asymptotic stability in the large. As an estimate,
we have
P1 sup iXi (c)i :eJ < I cl .

80 ;g l<oo e'
If A=' B2/2>0, we choose v (x) _ -log lx and we obtain L v =-A+B2/2 <0.

Since condition (11.2.14) is satisfied for B * 0 with m (x) =B2 Jx12, stochastic
instability is ensured by Theorem (11.2.12). For B = 0, the system is determinis-
tic and we see from example (11.1.4) that we have asymptotic stability for A <0,
simple stability for A = 0, and instability for A > 0.
(11.2.17) Remark. In example (11.2.15), the "undisturbed" system (B = 0) is
therefore stable only for A <0, and the addition of a "disturbance term" does
not change this property. However, the system can be stabilized even in the case
of A> 0 by adding a disturbance term of strength (= diffusion coefficient) B2 x2
provided we choose B2/2 > A.
Now, equation (11.2.16) is an abstraction of

X, = (A + B ,) X
where ri, is a Gaussian stationary process with E ri, = 0 and a delta-like covariance
function. If we shift from ri, to the white noise ,=, only after we have found the
solution
A(t-to)+B J n,di
X,=ce So
we obtain
X, = c eA(*-to)+B(Wt-Ir'k).
In accordance with section (10.2), this is the solution of equation (11.2.16), now
interpreted as the Stratonovich equation, or of the equivalent Ito equation
dX, = (A+B2/2) X, dt+B X, dW,, X18=C,
whose equilibrium position is now stochastically asymptotically stable for arbi-
trary B * 0 if A <0 and stochastically unstable if A ? 0.
With either interpretation, an asymptotically stable undisturbed system (A <0,
B = 0) remains stochastically asymptotically stable upon addition of arbitrarily
strong disturbances (only ordinary stability A = 0 is destroyed). This property
disappears with lto equations for d > 3 and with Stratonovich equations for
d >= 2 (see Khas'minskiy [651, p. 222).
(11.2.18) Remark. The use of Lyapunov's method depends on knowledge of
Lyapunov functions v (t, x). As in the deterministic case, there are a number of
techniques that can be used to find suitable functions. For example, one can
seek a positive-definite solution of the equation L v = 0 or of the inequality
L v <. 0 (see Kushner [721, pp. 55-76, for a number of examples). The choice
v(x)=x'Cx,
where C is a positive-definite matrix, leads to the goal if
L v = 2 f (t,x)'Cx+tr(G(t,x)G(t,x)'C)<_0
in some neighborhood of x = 0 for t 2_1_ to -

(11.2.19) Remark. If the functions/ and Gin equation (11.2.2) are such that
f (t, 0)- 0 but G (t, 0) * 0, the equilibrium position is a solution of the undis-
turbed system but no longer of the disturbed system. However, we can in prin-
ciple apply our definitions of stability (11.2.6). Suppose, that for arbitrary d
and in,
dX, = A X, dt +B (t) dW X,,=C,

where the undisturbed system is asymptotically stable (see example (11.1.4)). Ac-
cording to Corollary (8.2.4), the solution of this equation is
t
Xt = eA(t - to) c + eA(' B (s) dW,.
to
By assumption eA ('-'o) c- -*O as t oo for all c E Rd. The second term is nor-
mally distributed with mean 0 and
EI eA(t-,) B (s) dW,l = IeA(t-1) B (3)12 ds

J
tp to
(see remark (8.2.9)).

Let A, denote the eigenvalues of A and let us set
-r = max (Re (1;)) < 0.
i
Then,
Ie4('-,) B (s)I2 < IB (3)12 e-2r('-,)
so that
t fft
IeA(t-1) B (3)12 ds _
< e-2 r(t-s) IB (s)I2 ds
1
to to
N2 t
< e-rt f IB (s)I2 ds+ IB (s)I2 ds

to t/2
-0
this last holding under the condition
0
(11.2.20) f IB (3)I2 ds < oo.
to
This means that
qm-limX,=0.
t-.00
Moreover, by virtue of a theorem of Khas'minskiy ([65, pp. 307-310), for every

initial value c, we have under the condition (11.2.20)
Pflim
I -. cc
Xt=O1=1.
11.3 Stability of the Moments

The definition of stochastic stability involves the overall behavior of the sample
functions X. (w) on the interval Ito, oo). For given e > 0 and starting point c, we
group together all those sample functions that remain for all t in an a -neighbor-
hood of the zero point x = 0. The probability of this set must in the case of a
stable system be close to I provided Icl is small.
In many cases, the following concept (historically, the older one) is more feasi-
ble: For fixed t, we take the average over all possible values of X, or of a func-
tion g (X,); that is, we take E g (X,) =gt (t) and examine the so-obtained deter-
ministic function gt for its behavior in the interval Ito, oo). We shall treat the
cases g (x) = x, g (x) = Ix IP (for p > 0), and g (x) = x x'.
(11.3.1) Definition. Suppose that the stochastic differential equation
(11.3.2) dX, = J (t, X,) dt +G (t, X,) dW X,o = c (fixed),
satisfies the assumptions (11.2.1). The equilibrium position is said to be stable in
pth mean (where p>0) if, for every a>0, there exists a b>0 such that
sup EIX,(c)IP=e for IcI 6.
to5t<cc
The equilibrium position is said to be asymptotically stable in pth mean if it is
stable in pth mean and if
lim E IX, (c)I P = 0 for all c in a neighborhood of x = 0.
It is said to be exponentially stable in pth mean if there exist positive constants

ct and c2 such that, for all sufficiently small c,
E IX, (c)I' = ct I cI P e-10-41

In the case p = 1 or p = 2, we speak of (asymptotic or exponential) stability in
mean or in mean square respectively. The equilibrium position is said to possess
a stable expectation value m, = E X, (c) or a stable second moment P (t) _
E X, (c) X, (c)' if, for every e > 0, there exists a d > 0 such that
sup IEX,(c)I=eforal1IcI d,
tost<oo
or.
sup J E X, (c) X, (c)'I = e for all Ic I d

toZt<oo
respectively.
(11.3.3) Remark. a) Since the function (E IXIP)tlP is monotonically increasing
when p> 0, (asymptotic, exponential) stability in pth mean implies the same
stability in q th mean (for all 0 < q <p). In particular, stability in mean square
implies stability in mean.
11.3 Stability of the Moments 189
b) It follows from the Chebychev inequality that
sup PIIX,(c)I>eJ< 1 sup EIX,(c)IP,

EP togt<oo
to gt<oo
that is, stability in pth mean implies

lim sup P[IX,(c)I>_eJ=Oforalle>O.
C-0 toge<oo
This is called weak stochastic stability. It is obviously weaker than stochastic sta-
bility
lim P[ sup IX,(c)I eJ=Oforalle>O.
C-0 tOgt<oo
For an autonomous linear equation, the two concepts are, however, equivalent
according to Khas'minskiy Q65], p. 296).
c) Stability in mean square is equivalent to stability of the second moment. This
is true because, on the one hand, for P (t) = E X, (c) X, (c)' = (E X; X;)
IP (t)I = E J X, (c)12,
and, on the other hand,

d
E JX t (c)12
= tr P (t) = i-t
P, (t)
d) Since I E X I E IXI, stability in mean implies stability of the expectation

value E X, = m,. However, the converse does not in general hold! Therefore,
stability of the expectation value is a necessary condition for stability in mean
square.
e) For the covariance matrix K (t) = E (X, - E X,) (X, - E X,)', we have
K(t)=P(t)-m, m;.
From a combination of c) and d), we see that stability in mean square implies
stability of K (t). This last is identical to stability of E JX, - m,12.
(11.3.4) Example. We have already, in example (11.2.15), investigated the
stability behavior of the scalar linear autonomous homogeneous differential
equation
dX,=AX,dt+BX,dW,, X,,=c,
For the moments, we obtain from example (8.4.8)
E J X, (c)I' = Id" e (P(A-8=12)+P' 8s12) (t -to) .
From this we see that the equilibrium position is exponentially stable in pth__
mean if and only if
190 1 1. Stability of Stochastic Dynamic Systems
A < (1- p) B2/2.

Thus, the points of stochastic asymptotic stability of the region A < B2/2 are just
those for which exponential stability in pth mean exists for all sufficiently small
p> 0. This holds true in general for autonomous linear equations (see remark
(11.4.17)).
Conversely, from the fact that the second moment E IX,I2 approaches 0 expon-
entially, we conclude that the equilibrium position is stochastically stable (see
Kozin [70), Gikhman and Skorokhod [361, p. 331). We cite more generally
(Khas'minskiy [651, pp. 232-237).
(11.3.5) Theorem. A sufficient condition for exponential stability in pth mean
is the existence of a function v (t, x), that is defined and continuous on Ito, oo) X
Rd, that for all x * 0 is once continuously differentiable with respect to t and
twice continuously differentiable with respect to the xi, and that satisfies the
inequalities
c1 IxIP <v(t, x) <c2IxIP
and
Lv(t,x)<=-c3IxIP
for certain positive constants c1, c2, and c3.
Then, there exists a positive constant c4 and an almost certainly finite random
variable K (c) dependent on c E Rd such that
IX, (c)I = K (c) e-c' (1 -4)for all t to,
for almost all sample functions starting at c.
11.4 Linear Equations
For the linear homogeneous equation (see section 8.5)
(11.4.1) dX,=A(t)X,dt+ B; (t) X,dW;, X,o=c, t?to,
considerably more precise stability assertions can be made for the equilibrium
position X, = 0. In accordance with our assumptions (11.2.1), the d x d matrices
A (t), B, (9), ... , B. (t) are continuous functions on the interval t >_ to. Suppose
again tha'i'c E Rd is a constant.
Stability of the first and second moments acquires for (11.4.1) considerable signif-
icance in that simple ordinary differential equations can now be written for
them. In accordance with Theorem (8.5.5), we have
11.4 Linear Equations 191
a) E X, = m, is the unique solution of the equation

(11.4.2) rn, = A (t) m m,p = c.
b) E X, X,= P (t) is the unique nonnegative-definite symmetric solution of the
equation
(11.4.3) P(t)=A(t)P(t)+P(t)A(t)'+ Bi(9)P(t)B,(t)', P(to)=Ecc'.

Therefore, stability of the expectation value is, for a linear equation, equivalent
to stability of the undisturbed deterministic equation (11.4.2). Also, stability of
the second moments of X, (which, in accordance with section 11.3, is identical
to stability in mean square) is equivalent to stability of the deterministic matrix
equation (11.4.3).
Since
Pi1(t)=EX'X, =EX; X, =P,;(t)
so that P (t) is symmetric, (11.4.3) represents a system of d (d + 1)/2 linear equa-
tions.
Let us group the d (d + 1)/2 elements P1 (t), where i> j, in such a way as to
form a vector p (t). Then, (11.4.3) can be written in the form
(11.4.4) p(t)=21(t) p(t), p(to)=Ec,

where c is the [d (d + 1)/2] -dimensional vector corresponding to the matrix c c'.
Therefore, mean-square stability of (11.4.1) is identical to ordinary stability of
the equilibrium position p ° 0 of the linear equation (11.4.4). This last condition
is especially easy to check if (11.4.1) is autonomous, so that (11.4.4) becomes a
system with constant coefficients. The reader should compare the criteria of
Theorem (11.4.11) and Corollary (11.4.14). We recall that exponential stability
of (11.4.4) implies stochastic stability of (11.4.1).
(11.4.5) Example. The second-order scalar differential equation with constant
but "noisy" coefficients
Y,+(bo+bE;)1',+(ao+ae,)Y,=O, t,-;:0,
where e; and i are uncorrelated scalar white noise processes, is equivalent to
the linear stochastic differential equation
)xdt+(0O 0
dX,=(-a
0 -b0)
b)X,dW;+
a 0)X,dw,.
Here, we have again set
%t
- CY)
(see example (8.1.4)). The differential equation (11.4.2) for the expectation value
fl1t is
m,
=( 00 1 0)
M,
-a b
and it is asymptotically stable if and only if both ao and bo are positive. Equation
(11.4.3) for the 2 X 2 matrix P (t) of the second moments yields
P11 = 2 P12,
P12 = -ao P11 - bo P12 +P22,
P22 = a2 P11 - 2 ao P12 + (b2 - 2 bo) P22
The 3 x 3 matrix 21 in equation (11.4.4) is
0 2 0
I
`2 = -ao -bo 1
a2 -2 ao b2- 2 bo l
and its characteristic equation is
-det (W-11) =13+ 12 (3 bo - b2) + 1(4 ao + 2 b02-b0 b2)

+2 (2 ao bo-ao b2-a2) = 0.
The real parts of its roots are then negative if and only if
b2 < 2bo, a2 < (2 bo-b2) ao
(by the Routh-Hurwitz criterion). Therefore, for fixed ao and bo, the intensity
of the disturbance must not exceed a certain value if the (exponential) stability
in mean square is not to be destroyed. For the nth-order scalar differential equa-
tion with disturbed constant coefficients, compare Theorem (11.5.2).
(11.4.6) E plc Gikhman ([36] and [631, pp. 320-328) has investigated
second-order scalar equations of the form
(11.4.7) Yt+(a (t)+b (t) ti,) Yt = 0, t to,
where be a general disturbance process (the derivative of a martingale). The
stability properties of Yt are closely connected with those of the undisturbed
equation
s(t)+a(t)x(t)=0.
11.4 Linear Equations 193
For example, if rig is a white noise, if, for every solution z (t) of the last equation,
z (t)--+0 and
0
J
z (t)2 dt < oo,
to
and if b (t) is bounded, then E Y; --0 uniformly for all initial values such that
I YtoI + I Yj< R as t- oo. Furthermore, the equilibrium position of equation
(11.4.7) is then stochastically stable.
(11.4.8) Example. The general homogeneous equation (11.4.1) for the case
d=m=1
dX,=A(1)X,dt+B(1)X,dW1, X,s=c,
has, by Corollary (8.4.3), the solution
exp t IV,).
80 80
From Theorem (8.4.5), we have
EX,=cexp(IA(s)ds)
to
and
P (P- 1) B(s)2dsl.
EJX,IP=IcIPexp(p i A (s) ds+ i
to 2 to
From all this, we conclude that the equilibrium position is asymptotically stable
(resp. stable, resp. unstable) in pth mean if and only if
limp J
(pA(s)+P(P-1B(s)2)ds
2 /
is - oo (resp. is less than oo, resp. is oo). In particular, this yields criteria for the
first and second moments. Similarly, the equilibrium position is stochastically
asymptotically stable (resp. stochastically stable, resp. stochastically unstable) if
and only if
lim sup (A (s) - B (5)2/2) ds + B (s) dW,)
is - oo almost certainly (resp. is less than oo almost certainly, resp. is oo with

194 11. .,tability of Stochastic Dynamic Systems
positive probability). Now, in accordance with remark (5.2.6), we can write
1B (s) dW, = Wztei, T (t) = B (s)2 ds,

1
lp /0
where W, is again a Wiener process. In the case

0
T (oo) = B (s)2 ds < o0
J
a
we therefore have as a criterion
- 00
limn p
i
A(s)ds= <0o
to +00
(the behavior of the undisturbed equation), characterizing stochastic asymptotic

stability (resp. stability, resp. instability).
In the case
0
T (oo) = B (s)2 ds = co
J
to
let us consider the quantity
i (A (s) - B (5)2/2) ds
to
I (t) =
2 T (t) log log r (t)
By virtue of the law of the iterated logarithm for W, (,), we see that a sufficient
condition for stochastic asymptotic stability (resp. stochastic instability) is that
lim sup I (t) < -1 (resp. lim inf J (t) > -1).
(11.4.9) Remark. For the solution X, (c) of (11.4.1), we have X, (a c) = of X, (c)
for all a E R'. Therefore,
P I lim X, (c) = 0] = pc = const, for all c E Rd.
1-00
In the case of stochastic asymptotic stability of the equilibrium position, pc- I
as c-r0. This is compatible only with pc =1, hence with
ac- lim X, (c) = 0 for all c E Rd,
that is, if the equilibrium position of a linear equation is stochastically asymptot-

ically stable, it must automatically be stochastically asymptotically stable in the
large.
11.4 LinearEquations 195
The extension L (introduced in (11.2.3)) of the infinitesimal operator A of X, as-

sumes, for the linear equation (11.4.1), the following form:
m 2
A(t)x'ax)+2(B`(t)x'ax)
L-at+
d a
(11.4.10) Yt ax.
i
Yax -
a 2 d d a2
YI ax) = 1
j-1 k-1
Yj Yk
axi axk
We now cite a criterion for exponential stability in mean square, for the proof of
which for general p > 0 we refer the reader to Khas'minskiy [651, p. 247.
(11.4.11) Theorem. Suppose that the functions A (t) and B; (t) in equation
(11.4.1) are bounded on [to, oo). Then, a necessary (resp. sufficient) condition
for exponential stability in mean square is that, for every (resp. some) symmet-
ric positive-definite continuous bounded d x d matrix C (t) such that x' C (t) x>
k1 Ix 12 (where kl > 0), the matrix differential equation
(11.4.12) chit)+A(t)'D (t)+D(t)A(t)+ B,(t)'D(t)B,(t) _ -C(t)

have as its solution a matrix D (t) with the same properties as C (t).
(11.4.13) Remark. When B, (t) = 0 (for all i), Theorem (11.4.11) reduces to a
criterion (well-known in the study of deterministric equations) for ordinary ex-
ponential stability (see Bucy and Joseph [611, p. 11). Equation (11.4.12), which
has a structural similarity to (11.4.3), can be written with the quadratic forms
w (t, x) = x' C (t) x and v (t, x) = x' D (t) x and the operator L in the form
L v (t, x) = -w (t, x)
that is, v (t, x) can be used as Lyapunov function. Theorem (I 1.2.8c) again yields
stochastic asymptotic stability in the large as a consequence of the exponential
stability of the second moments.
(11.4.14) Corollary. If equation (11.4.1) is autonomous, a necessary (resp. suf-
ficient) condition for exponential stability in mean square is that, for every (resp.
for some) symmetric positive-definite matrix C, the matrix equation
(11.4.15) A' D+D A+ 'ft B; D B; _ -C

have a symmetric positive-definite solution D. If this is the case, we also have sto-
chastic asymptotic stability in the large.
For verification of this criterion, we begin with any positive-definite matrix C (for
example C = 1), calculate D from the system (11.4.15) of d (d + 1)/2 linear equa-
tions, and check to see whether the D found is positive-definite or not.
(11.4.16) Example. If for the autonomous equation
dX, = A X, dt+ Bi X, dW;
we have A + A'= cl I and B; = d, I, then D =1 is a solution of (11.4.15) for

C = - (c1 + 2] d?) I. This last matrix is positive-definite if and only if
cr+2: d?<0.
(11.4.17) Remark. Exponential stability in pth mean (for p>0) implies sto-
chastic asymptotic stability of the equilibrium position of the linear equation
(11.4.1). Conversely, we have for the autonomous linear equation (see Khas'-
minskiy [65], pp. 253-257): If the equilibrium position is stochastically as-
ymptotically stable, it is asymptotically stable inpth mean for all sufficiently
small p>0. Just as in the deterministic case, this last fact always implies ex-
ponential stability inpth mean.
11.5 The Disturbed nth-Order Linear Equation

The equilibrium position of the deterministic n th-order linear equation with con-
stant coefficients
y("1+br y("-1)+ ... +b" y = 0
is, according to the Routh-Hurwitz criterion, asymptotically stable if and only if
d1=br>0,
b, b3
d2 = > 0,
1 b2
by b3 bs ...
1 b2 b4 ... 0
A. _ 0 b, b3 ... 0 > 0.
0 ... ... b"
The correspoQding equation with noisy coefficients

Y,"-tl+
Yi"i+(br+ft (t)) ... +(b"+i:" (t)) Y, = 0,
where Er (t), ... , Z. (t) are in general correlated Gaussian white noise processes
11.5 The Disturbed nth-Order Linear Equation 197
with
E (t) Gj (s) = Qij 6 (t-s)
is now rewritten, just as in example (8.1.4), as a stochastic first-order differential
equation for the n-dimensional process
Y,
X"
` y -
X,=
X,
Yin - ')
We obtain
dX, = X;+' dt, i =1, ... , n -1,
(11.5.1)
dX, _- biX;+'-idt-1 GijX;+1 idR,9
i=1 i=1 j=1
with an n X n matrix G such that G G'= Q.
In accordance with the criteria of section 11.4 (the Routh-Hurwitz criterion for
equation (11.4.4) or Corollary (11.4.14)), to prove the asymptotic (= exponential)
stability in mean square of (11.5.1), we must treat a system of n (n + 1)/2 linear
equations. We now cite a criterion of Khas'minskiy ([65] , pp. 286-292) that op-
erates with only n + 1 decisions.
(11.5.2) Theorem. The equilibrium position of (1 1.5.1) is asymptotically stable
in mean square if and only if 41 > 0, ... , A. > 0 (the di are the Routh-Hurwitz
determinants mentioned above; their positiveness implies stability of the undis-
turbed equation) and where
q(O) qee) qss ... q(-1)

I b2 b4 ... 0
A = 0 b1 63 ... 0
0 0 0 ... b
(4-k-1) Qit (- l y+1.
qA
i+j=2(n-k)
For n = 2, Theorem (11.5.2) provides the conditions
bi > 0, b2 > 0, 2 b1 b2 > Q11 b2+Q22,
(see example (11.4.5)). For n = 3,
b,>0, b3>0, b1 b2>63,

2 (b1 b2-b3) b3 >Q11 62 b3+Q33 b, +b3 (a22-2 Q13).
In the case of uncorrelated disturbances (Qii = 0 for i j), we have
IQ11 -Q22 ... (-1)n-1 Qne

1 b2 ... 0
A= 0 b1 ... 0
11.6 Proof of Stability by Linearization

Just as in the deterministic case, stability of a nonlinear equation is in general dif-
ficult to prove. However, the proof is facilitated by the fact that the equation
linearized by means of a Taylor expansion usually exhibits in the neighborhood
of the equilibrium position the same behavior as regards stability as does the
original equation. We mention the following theorem of Khas'minskiy Q651, p.
299):
(11.6.1) Theorem. Suppose that
(11.6.2) dX,= f (t,X,)dt+G(t,X,)dW X,.=C,
is a stochastic differential equation for which the assumptions (11.2.1) hold. Let
us suppose that
If (t, x)-A (t) xI = o (Ixl )
and
IG (t, x)-(B1 (t) x, ... , B. (t) x)I = o (IxI),

uniformly in t >= to as Ixl-+0. Let A (t) and B; (t) denoted x d matrices that are
bounded functions oft. Consider the linear equation
(11.6.3) dX, = A (t) X, dt + B; (t) X, dW; , X,o = c.

i-1
a) Suppose that, for the equilibrium position of (11.6.3),
lim P1 sup IX,(s,x)I>e1=0 for all a>0,
c+o
,<t<oo
uniformly in s and that P I lim X, (s, x) = 01 = 1(uniform stochastic stability in
ta,
the large). Then the equilibrium position of (11.6.2) is stochastically asymptotical-
ly stable.
11.7 An Example From Satellite Dynamics 199
b) If the matrices A and Bi are independent oft, then stochastic asymptotic sta-
bility of the equilibrium position of the linearized equation (11.6.3) implies sto-
chastic asymptotic stability of the original equation (11.6.2).
11.7 An Example From Satellite Dynamics

The study of the influence of a rapidly fluctuating density of the atmosphere of
the earth on the motion of a satellite in a circular orbit leads to the equation (we
follow Sagirow [75])
(11.7.1a) sin Y,-Csin(2 Y,)=O,
where B and C are positive numbers and e, is a scalar white noise. The equivalent
stochastic differential equation (with d = 2 and m =1) for X, = (Y Y,)' is
__ X; 0
dX,
(11.7.1b)
(-Sin X,+Csin2X19 -BX2t ) (sin X, +BX2))dW
9
Replacement of sin y (resp. sin 2 y) with y (resp. 2 y) yields the linearized equa-
tion with constant coefficients
(I1.7.2a) Y,+B(1+Ae,)Y,+(1-2C+Ae,)Y,=0
in the first case and
(11.7.2b) dX, iB) X dt+( -AB) X, dW,

_ (2 C 1 A
in the second. A necessary and sufficient condition for asymptotic (and hence ex-
ponential) stability in mean square of the linearized equation (11.7.2) is, accord-
ing to the criterion (11.5.2) with
= (B2 A2 B A2)
B A2 . A2
Q
and with d - n = 2, the following:
B>O, 1-2C>0, 2B(1-2C)>B2A2(1-2C)+A2.
The conditions B > 0 and 1- 2 C > 0 ensure asymptotic stability of the undis-
turbed system as a necessary condition. The last condition yields the inequality
A2< 2B(1-2C)
B2(1-2C)+1*
for the intensity of the disturbance. Therefore, under these conditions, the equi-
librium position of (11.7.2)-hence, by Theorem (11.6.1), the equilibrium posi-
tion of the nonlinear equation-is stochastically asymptotically stable.
We now seek to use Lyapunov's technique to carry these assertions over to the
nonlinear original equation. The operator L assumes for (11.7.1) the following
form:
a
L= a+x2 a +(-sin xi+Csin2xl-Bx2) ax2
at ax,
A2 32
+ (sin x1 +B x2)2
2 axe2
For the Lyapunov function, we try an expression consisting of a quadratic form
and integrals of the nonlinear components:
xl z1
v(t,x)-v(x)=axl+bxlx2+xi+d sinydy+e sin2ydy

0
I2
=ax1+bxlx2+x2+2d sin-' +e(sinx1)2.
2
0
This yields
Lv(x)=(2a-bB)x1x2-(2B-b-A2 B2) x2
+(d-2+2 42B+(4 C+2 e) cos x1) x2 sin x1
-(b-2 b C cos xi -A2 sinxi ) xl sin x1.

xi
To convert this to a negative-definite function, we set 2 a - b B = 0, 4 C + 2 e = 0,
andd-2+2 A2 B=0. We obtain
v(x)=2bBx12 +bx1 x2+x2+4(sin 2)2 (i-BA2-2C(cos2)
and
sin xi)
L v (x)= -(2 B-b-A2 B2) x2-(b - 2 b C cos x1- A2 x, sin xl.
For v to be positive-definite, we must have B > 0, 0 < b < 2 B, 1- 2 C > 0, and

A2 < (1- 2 C)/B. Furthermore, we can get the following estimate for L v:
Lv(x)--(2B-b-A2B2)x2-(b-2bC-A2)x1 sin x1.
This last is negative-definite if A2 <(2 B-b)/B2 and A2 <b (1- 2 C) and hence
if
A2 < min ((2 B-b)/B2, b (1-2 Q.
If we choose b in the interval 0 < b < 2 B in such a way that the minimum is as
large as possible, we get
11.7 An Example From Satellite Dynamics 201
b=- 2B
1+B2(1-2 C)
and hence, for the intensity of the disturbance,
A2< 2B(1-2C)
B2 (1- 2 C) + 1
Theorem (I I.2.8b) ensures, under this condition, the stochastic asymptotic sta-
bility of the equilibrium position of (11.7. 1). This result is identical to the re-
sult obtained by linearization.
Chapter 12
Optimal Filtering of a Disturbed
Signal
12.1 Description of the Problem

If the state X, of a stochastic dynamic system is used to make a decision (see
Chapter 13), we naturally assume that this state is exactly known to us. Such
an assumption, however, is unrealistic in many practical problems. What can be
observed on the basis of technical or economic considerations is a process Z, that
in some way depends on the previous behavior of X, and, in addition, is disturbed.
The question then arises as to how one can, in an optimal way (i.e. with the smal-
lest possible error), draw a conclusion on the basis of observation of Z, as to the
true state X, of the system in question. For the case of stationary linear sys-
tems, this problem was solved in the 1940's independently by N. Wiener [791
and A. N. Kolmogorov [691 (see also, for example, Gikhman and Skorokhod [51
or Prohorov and Rozanov [IS I ).
Nonlinear systems and the case of observations made only during a finite interval
have been treated since 1960 by R. E. Kalman, R. S. Bucy, R. L. Stratonovich,
and H. J. Kushner. We shall now discuss their results, confining ourselves to the
basic ideas. For additional results and references in the literature, we mention
Bucy and Joseph [611 and the detailed book by Jazwinski [661.
The basis of our studies is the following class of models:
Suppose that the d-dimensional state X, of a dynamic system (the so-called signal
process) is described by a stochastic differential equation
(12.1.1) dX,=/(t,X1)dt+G(t,X,)dW,, X89 =cEL2, t> _to.
Here, f (t, x) is again a d-dimensional vector, G (t, x) is a d X m matrix, and W, is
an m-dimensional Wiener process whose "derivative" is therefore a white noise,
that is, a generalized stationary Gaussian process with mean value 0 and covari-
ance ma rix 16 (t-s). As usual, we reduce the case (often encountered in the
literature to which the derivative of W, is a stationary process with covariance
matrix Q (t) 6 (t - s) to (12.1.1) by shifting from G to G V-Q.
12.1 Description of the Problem 203
Suppose that the observed process Z, (the quantity being measured) isp-dimen-
sional and that it is a disturbed functional of X, of the form
(12.1.2) dZ, = h (t, X,) dt +R (t) dV Z0=b, t _>_ to.
Here, h (t, x) is a p-vector, R (t) is a p x q matrix, and V, is a q-dimensional Wien-

er process. We assume that the four random objects W., V., c, and b are inde-
pendent. For correlated noise processes W, and V,, one should compare, for ex-
ample, Kalman [67] and Kushner [711. Of course, X, and Z, are dependent in
any case.
We also assume that equation (12.1.1) satisfies the assumptions of the existence-
and-uniqueness theorem (6.2.1), so that there exists a global solution in the inter-
val ]to, oo). Equation (12.1.2) should be understood as meaning that, when we
substitute the solution X, of (12.1.1) into the argument of h, this renders the
right-hand member of (12.1.2) an ordinary stochastic differential in the sense of
section 5.3 since Z, does not appear in that member. To be sure that (12.1.2) is
always meaningful, we must require that
IR(s)12ds<ooforallt>to,
ro
and, for example,

Jh(t,x)I:5C(1+IxI')forallt>to, xERd,
for some r > 0.
Now, the filtering problem can be formulated as follows:
(12.1.3) Problem. Consider the observed values Z,, where to < s< t. For this
piece of a function we write Z_(to, ti. Suppose that t> > to. Let us construct ad-
dimensional random variable X,1 as a measurable functional of Z Ito, t] such
that, for every other measurable functional F (Z [to, t]) with range in Rd such
that
E X,1 = E F (Z [to, t]) = E X,1 (unbiasedness)
the inequality
(12.1.4) E (y' (X,1- X,1))2 E (y' (X,1-F (Z [to, t])))2 for ally E Rd,
holds. The quantity X,1 is called the (optimal) estimate of X, for the given obser-
vation Z [to, t]. For t, < t, we speak of interpolation, for t, = t, we speak of
smoothing or filtering; and for t> > t, we speak of extrapolation or prediction. By
virtue of the optimality criterion (12.1.4), we speak of the method of minimum
variance.
(12.1.5) Remark. It should be noted that the value X,1 (w) may depend on the
total trajectory Z. (w) of the process observed in the time interval[to, t] and it
204 12. Optimal Filtering of a Disturbed Signal
Disturbance W Disturbance V
Fig. 8: The scheme of optimal filtering.
must, conversely, be uniquely determined by this piece of trajectory. Measura-

bility of the functional F (Z Ito, tJ) has the following meaning: by virtue of re-
mark (5.3.2), the sample functions Z. (w) of the observed process are, with
probability 1, continuous functions. If we neglect the w -exceptional set of proba-
bility 0 (or if we take, for example Z, (w)=0 on that set), then the mapping
Z [to, tl that assigns to w E Q the corresponding piece of the trajectory Z, (w)
for £0s is a mapping of Q into the space C ([to, t)) of continuous
functions on the interval [to, fl:
(12.1.6) Z Ito, ti: Q - C ([to, tl).
The sigma-algebra21 is defined in Q. If we choose in C ([to, ti) the sigma algebra
t£ (Ito, tl) generated by the "spheres"
{y'EC((to,t)): I-p(s)-Y'(s)I<e}, 'EC([to,tl), a>0,
969S61
(e-neighborhoods of the fixed continuous function yi), then the mapping (12.1.6)
is 21-41 ([to, ti)-measurable. A functional that is measurable in the sense of the
above-given formulation of the problem is a mapping
F: C (Ito, tl) --+ Rd
that is measurable with respect to ( ([to, fl) and the Borel sigma-algebra $d in
Rd. Then, of course, the combined mapping
F(Z[to,t}):Q-+Rd
is a d-dimension:.) random variable that depends on w only through Z Ito, tl. We
now seek a measurable Fo: C ([to, tl)--+Rd such that
t,=Fo(ZIto,tl)
has a second moment and expectation value E X,, and possesses the mini-
mality property (12.1.4).
(12.1.7) Remark. The condition (12.1.4) is equivalent to the following condi-
tion: For every nonnegative-definite symmetric matrix C, we have the inequality
''% _e
IXy-X,,Ic = E IX.,-F (Z [to, tl)Ic
with the abbreviation x' C x = IxI2 for x E Rd. The equivalence with (12.1.4)
is seen from the spectral decomposition of C:
12.2 The Conditional Expectation as Optima! Estimate 205
d
C= 1 (ui u;); li and ui are the eigenvalues and eigenvectors of C,
i=1
the relationship
d
Ixl2
c= Ai (us' x)2
i=1
and the choice of the special matrix C = y y'. We mention that the requirement
EIX,,-X,112 = EIX1,-F(Zfto, 11)12
is for d > 1 weaker than (12.1.4). This last also includes the nondiagonal mem-
bers of the matrix E (X,1- X,1) (X,1- X,1)' in the comparison.
12.2 The Conditional Expectation as Optimal Estimate

The existence and uniqueness of the optimal estimate are relatively easy to prove.
(12.2.1) Theorem. The unique solution (up to a set of probability 0) of the
problem'(12.1.3) is
X11=E(X,,IZfto,tD
Therefore (see section (1.7)), there exists a measurable function Fo defined on
C (Ito, t)), unique on the set of possible trajectories Z Ito, tI, such that
X,, = Fo (Z Ito, t)).
Proof. For an arbitrary estimate X11,
E (y' (X,1- F (Z Ito, t1))2 = E (y' (X,1-X,1))2 +E (y' (X11- F))2

+2 y' E (X,1-X,1) (X,1-F)' y.
On the other hand, in accordance with section 1.7, we have
E(X1-Xl,)(. -F)'E(E((X11F)'IZ(to,t)))
= E (E ((X,1-11)IZ Ito, t)) (2,1- F)')
= 0,
if we set
X,,=E(X,,IZIto,t1)
Then, for this 111,
E(y'(X,1-X1,))2 E(y'(K1-F))2,
so that the conditional expectation is a solution of the problem. The uniqueness

follows from the geometrical fact that E (X,,I Z (to, tJ) in the Hilbert space
L2 (a, 21, P) is the unique orthogonal projection of X,, onto the linear sub-
space of those elements that are measurable with respect to the sigma-algebra
generated by Z (to, tJ (see Krickeberg [71, pp. 124-125).1
For the value of the functional Fo in Theorem (12.2.1) at the "point" T E C ((to,
91), we write suggestively
Fo (9,) = E (X.1 fZ [to, tJ = 9,), m E C ([to, t])

In particular, if t = to, we have, by virtue of the independence of Z,0 = b and X,
X,, = E (X,, I ZO) = E X,, = const.
From here on, we shall concern ourselves only with the filtering problem (t1= t).
Theoretically, it is completely solved by (12.2.1), a fact that, however, does not
satisfy us. What we are seeking is an algorithm with the aid of which we can cal-
culate the numerical value of the optimal estimate X, of the state X, of the sys-
tem from the numerical observations Z, for to < s:_5 t. (By virtue of the continui-
ty of Z., it will be sufficient to make observations in (to, t).) This calculation
may, apart from the observation itself, use only the system parameters 1, G, h,
and R and the initial distribution P,, of X,e = c .
12.3 The Kalman-Bucy Filter

We assume the same situation as in section 12;1. Throughout, X, will denote the
conditional expectation. If X, has a conditional density p, (xI Z [to, ti) under the
condition Z (to, tJ, then, in accordance with section 1.7,
X, = E (X,(Z (to, ti) = x p, (x[Z [to, ti) dx,

J
Rd
or, more generally, for g (x) with g (X,) E L1,
(XJ) = E (g (X,) I Z (to, tJ) = f g (x) p, (xI Z [to, ti) dx.

g
Rd
Therefore, if we know the conditional density p, (xJZ (to, ti), we can easily find
the optimal estimate X, by means of an ordinary integration.
(12.3.1) Theorem (Busy': representation theorem). Suppose that we are given
equation& (12.1.1) and (12.1.2) with the assumptions stated in section 12.1. We
also assun tli t R (t) R (t)' is positive-definite, that the finite-dimensional dis-
tributions of the process X, have densities, and that
E exp ((t-to) sup
tos,:, (h (s, X,)I2(R(,)R(+)')-1) < oo -
12.3 The Katman-Bucy Filter 207
Here, we have set Ix1A = x' A x. Then, the conditional distribution P (X, E B I Z
Ito, tJ)has a density p, (xIZ Ito, tJ) that, for a fixed observation Z [to, t], is given by
E(eQJX4=x)P4(x)
(12.3.2) P,(xIZ[to,t])= EeQ
Here, p, (x) is the density of X, and
Q = Q (X [to, tJ, z [to, eJ)
h (s, X,)' (R (s) R (s)')-' h (s, X,) ds +

(12.3.3) 2
90
+f h (s, X,)' (R (s) R dZ,.

90
Formula (12.3.2) was conjectured by Bucy [60] and proven by Mortensen [741
(see also Bucy and Joseph 1611 ). For fixed Z Ito, tJ, the integral with respect to
Z, in (12.3.3) is, in complete analogy with the integral with respect to W,, de-
fined by the approximation of the integrand by means of step functions and the
choice of the left end-points of a decomposition as the intermediate points.
For useless observations (h -=O or (R (t) R (t)') ' MO), we obtain from Theorem
(12.3.1) Q =- 0 and hence
P, (xIZ [to, tJ) = P, (x),

so that
X, = EX,.
Similarly, for t =to, we always have
p,o (xIZ,o) = p,0 (x) (density of the initial value c),
that is,
X,o=Ec.
By applying Ito's theorem to the representation (12.3.2), we obtain (see Bucy
and Joseph [611 ) the so-called fundamental equation of filtering theory:
dP, (xIZ [to, tJ) = W (Pt (xIZ [to, tj)) dt

(12.3.4)
+(h (t, z) -9 (t, x))' (R (t) R (t)')-' (dZ,-h (t, x) dt) p, (xIZ Ito, t1).
Here, Z* is the adjoint operator (forward operator) to the differential operator
a2
Z*g -ax;UrB)+2
d a 1 d d
ax;axl((GG'),,B)
i-1 i.1 J-1

and
h (t, x) = E (h (t, X,)IZ Ito, tI) = J

h (t, x) p, (xIZlto, tI) dx.
Rd
Equation (12.3.4) shows the possibility, at least in theory, of calculating the con-
ditional density p, (xIZ (to, t1), beginning with the initial value p,0 (x) with pro-
gressive observation of Z.. Here, p, (xIZ Ito, tI) depends only on the two systems
defined by the functions J, G, h, and R, on the density of the initial value, and
on the observations up to the instant t. Therefore, (12.3.4) can be regarded as the
dynamic equation for the optimal filter.
Equation (12.3.4) yields, upon integration with respect to x, equations for the
moments of the conditional density, in particular, for the optimal estimateX, :
dX, = j (t, x) dt+(a (t, x)'-X, h (t, x)') (R (t) R
(dZ,-h (t, x) dt)
with the initial value X,o = E c. Also of interest is the estimation error, that is,
the conditional covariance matrix
P (tIZ Ito, fl) = E ((X,-.k,) (X,-X,)'IZ (to,'I)
For this, we get from (12.3.4) (see Jazwinski [66] , p. 184)
d (P (t I Z I to, tI )),i = ((xt i) - x; l i) + (7i - ll xi) + (G G %j

-(x h-z;h)'(RR')-' (X-h zi) dt
+(x;xi -x;xih-ii xih-zix;h+2ii zih)'(RR')(dZ,-hdt)
with the initial value P (to I Z,,) = E c c'.
In the following section, we shall specialize these equations to linear problems.
12.4 Optimal Filters for Linear Systems

Suppose that a stochastic differential equation for a d-dimensional signal process
X, is linear (in the narrow sense), that is, of the form
dX,=A(t)X,dt+B(t)dW Xa=c, tZto,
where A (t) is a d x d matrix, B (t) is a d X m matrix, and W, is an m-dimensional
Wiener process. Suppose that an observed p-dimensional process Z, is described
by an equation that is linear in X,:
'r.
&Z,=H(t)X,dt+R(t)dV Z,.=b, tZto,
where H (t) is a p x d matrix, R (t) is a p X q matrix such that R (t) R (t)' is
positive-definite, and V, is a q-dimensional Wiener process. Suppose that the
12.4 Optimal Filters for Linear Systems 209
matrix functions A (t), B (t), H (t), R (t), and (R (t) R (t)') -I are bounded in
every bounded subinterval of (to, oo). We again assume the independence of
W., V., c, and b. If c and b are normally distributed or constant, then, in ac-
cordance with Theorem (8.2.10),X, and hence Z, are Gaussian processes.
Therefore, all the conditional distributions are normal distributions. In parti-
cular, we have
(12.4.1) Theorem (Kalman-Bucy filters for linear systems). In the linear case,
the conditional density p, (xI Z Ito, tJ) of X, under the condition that Z [to, tJ
was observed is the density of a normal distribution with mean
X, = E (X, I Z (to, tJ)
and covariance matrix
P (t) = E((X,-X,) (X,Z [to, tJ) = E(X,-X,)(X,
The dynamic equations for these parameters are
dX,=AX,dt+PH'(RR')-t (dZ,-HX,dt), X,,=Ec,

P=AP+PA'+BB'-PH'(RR')-1 HP, P(to)=Ecc'.
In particular, the matrix P=P (t) is independent of the observation Z (to, tJ.
For various proofs of this theorem, see Jazwinski (166], pp. 218ff).
In the case of a useless or missing observation (H (t) =-0 or (R (t) R (t)')-I = 0),
the equations for X, and P (t) reduce to the equations for the mean X, = E X,
and the covariance matrix P (t) = K (t) of the process X, (Theorem (8.2.6)).
Since the coefficient of dZ, in the equation for X, is deterministic, the corre-
sponding integral can be evaluated without any precautions (arbitrary inter-
mediate points!).
The equation for P (t) is a (matrix) Riccati equation. It has been carefully in.
vestigated by Bucy and Joseph [611. In particular, in spite of the quadratic term,
there exists a global solution when one starts with a nonnegative-definite initial
value P (to). Since P (t) is independent of the observation, the estimation error
can be evaluated in advance.
(12.4.2) Remark. For a linear model, the optimal estimate X, is identical to
the optimal linear estimate
X,= jD(t,s)dZ,.
60
The d x p weight matrix D (t, s) is obtained from the so-called Wiener-Hopf

equation
D (t, u) R (u) R (u)'+ D (t, s) H (s) E (X, X.) H (u)'d s

1
to
=E(X,X;)H(u)'
(see Bucy and Joseph [611, p. 53, or Gikhman and Skorokhod [5], p. 229).
(12.4.3) Example. Suppose that the signal process is undisturbed (B (t) = 0)
and that it starts with a random 9? (0, E c c')-distributed initial value with posi-
tive-definite E c c'. The (unconditional) distribution of X, (before observation!)
is, by Theorem (8.2.10),
%(0,0(t)Ecc'0
where 0 (t) is the fundamental matrix of X, =A (t) X. The conditional distri-
bution of X, after observation of Z [to, t] is, as one can verify by substitution
into the equations of Theorem (12.4.1), a normal distribution 9 (X P (t))
with
X, = P (t) (0 (0')-1
0 (s)' H (s)' (R (s) R (s)')-1 dZ,
to
(so that X, is a linear estimate in the sense of remark (12.4.2)) and
P (t) _
H (s) 0 (s) ds) 4P (t)-1) .

+(0(9)1-1 (J 0 (s)' H (s)' (R (s) R (s)')
1 Jl
Since the second summand in P (t) is positive-definite, the error covariance ma-
trix P (t) is "smaller" than the original covariance matrix of X.
Chapter 13
Optimal Control of Stochastic
Dynamic Systems
13.1 Bellman's Equation
The analytical difficulties that arise with a mathematically rigorous treatment of
stochastic control problems are so numerous that, in this brief survey, we must
confine ourselves on didactic grounds to a more qualitative and intuitive treat-
ment.
As in the case of stability theory, there is again here a well-developed theory for
deterministic systems, which one can study, for example, in the works by Athans
and Falb 1561, Strauss [78] , or Kalman, Falb, and Arbib (681. It is again a mat-
ter of replacing, in the shift to the stochastic case, the first-order derivatives at
the corresponding places with the infinitesimal operator of the corresponding
process. For a more detailed treatment of optimal control of stochastic systems,
we refer to the books and survey articles by Aoki [55] , Kushner [72] , Strato-
novich 1771, Bucy and Joseph 1611, Mandl [28] , Khas'minskiy 1651, Fleming
[621, and Wonham [80] and to the literature cited in those works.
Let us now consider a system described by the stochastic differential equation
dX, = f (t, X u (t, X,)) dt+G (t, X u (t, X,)) dW,,
(13.1.1) X,o = c, t?to,
where, as usual, X f (t, x, u), and c assume values in Rd, G (t, x, u) is (d x m)-
matrix-valued, and W, is an m-dimensional Wiener process. The new variable u in
the arguments of / and G varies in some RP, and the functions f (t, x, -) and
G (t, x, -) are assumed to be sufficiently smooth. The function u (t, x) in equa-
tion (13.1.1) is a control function belonging to a set U of admissible control
functions. We shall confine ourselves here to so-called Markov control functions
which depend only on t and on the state X, at the instant t (and not, for ex-
ample, on the values of X, for s <t). The system (13.1.1) is also called a "plant".
If we substitute a fixed control function u E U in (13.1.1), we get a stochastic
differential equation of the usual form. Now, the set U must be narrowed down
by boundedness and analytical conditions, which we shall not further specify, in
212 13. Optimal Control of Stochastic Dynamic Systems
such a way that existence and uniqueness of a solution X, = X", which now de-
pends on u E U, are ensured for the differential equation. The solution that starts
at x at the instants will be denoted by X, (s, x) = X, (s, x). We shall also write.
E g (X, (s, x)) = E,, = g (X,).
Suppose that the costs arising from the choice of control function u up to an in-
stant T < oo are, in the case of a start at x at the instant s,
T
(13.1.2) V" (s, x) = E,,s k (r, X u (r, X,)) dr+M (T,
\I
Here, we shall confine ourselves to fixed-time control. In general, T in (13.1.2) is
replaced with a random instant rr, at which the process reaches a specified target
set. The functions k and M have respectively nonnegative and real values. The
integral in (13.1.2) represents the running costs, and the second term in the large
parentheses represents the unique cost for a stop at XT at the instant T.
We now seek the optimal control function, that is, the control function u` E U
that minimizes the costs:
V (s, x) = V"' (s, x) = m u V" (s, x).
Disturbance W
Plant (13.1.1)
Fig. 9:
Scheme of the optimal control. X,
Optimal controller or
1
optimizer for (13.1.2)
In accordance with Bellman's optimality principle (the principle of dynamic pro-

gramming, Bellman [57] ), a control function defined on the interval [to, T] is
optimal if and only if it is optimal on every subinterval of the form Is, T], where
to s < T. Here, x = X, (to, c) is chosen as the initial value at the instants. In
complete analogy with the deterministic case, we then get the result that the
minimal costs V (s, x) satisfy Bellman's equation
(13.1.3) 0 =min (L" V (s, x) +k (s, x, u)), to 5 s _< T ,
"
with the end condition V (T, x) = M (T, x). Here,
a a2
'=a+E/i(s,x,u) +1 E (G(s,x,u)G(s,x,u)')ij axi ax; .
was ;t axi 2 ;_t j_1

The u in the expression L" is treated as a parameter. In equation (13.1.3),
13.2 Linear Systems 213
Lu V (s, x)+k (s, x, u) is, for given V and fixed (s, x), a function of u E Ri',
whose minimum is sought. The position u* of this minimum depends on (s, x);
thus, u*=u* (s, x). If V (s, x) is equal to the optimal costs and if the function
u* (s, x) resulting from the search for the minimum is an admissible control
function, it is also an optimal control function. Then,
Lu'(s,x) V (s, x)+k (s, x, u* (s, x)) = min (Lu V (s, x)+k (s, x, u)) = 0.
The following steps therefore yield (under certain conditions) both the optimal
control function and the minimum costs:
1. For fixed V, we determine the point u = u (s, x; V) at which Lu V (s, x) +
k (s, x, u) attains its minimum.
2. We substitute the function u for the parameter u in Lu V (s, x) + k (s, x, u)
and solve the partial differential equation
L" V (s, x)+k (s, x, u (s, x; V)) = 0, to <s ;ET,
with the end condition V (T, x) = M (T, x). The solution V (s, x) yields the
minimum costs.
3. The function V (s, x) is inserted into the function ii determined in step 1. It
yields the optimal control function u* = u* (s, x) = u (s, x; V (s, x)).
We shall illustrate this in the next section for the linear case and a quadratic
"criterion" (13.1.2).
13.2 Linear Systems
In equation (13.1.1), suppose that / (t, x, u) is linear in x and u and that

G (t, x, u) = G (t) depends only on t. We get
dX,=A(t)X,dt+B(t)u(t,X,)dt+G(t)dW tk-.to,
where A (t) is a d x d matrix, B (t) is a d X p matrix, and G (t) is a d x m matrix.
For the functions appearing in the cost functional (13.1.2), we choose
k (t, x, u) = z' C (t) x + WD (t) is,
(where C (t) is symmetric and nonnegative-definite and D (t) is symmetric and
positive-definite) and
M(T,x)=x'F(T)x+a(T)'x+6(T).
We have
a+(A(t)x+B{t)u, 32
Lu= as ax 2 ;.1 ,_1 ax; ax;
,
so that
L" V (s, x) = as +(A (t) x)' Vs+(B (t) u)' Vs+2 tr (C (t)G (t)' Vsx),
where V. = (V:1, ... , V,d)' and V ': - ( V 1 . . ).
First, we determine the minimizing u from Bellman's equation
aV
+(A (s) x)' V. +(B (s) u)' 1 tr (G (s) C (s)'
(13.2.1) as 2
+x'C(s)x+u'D(s)u=min.
The quadratic function (B (s) u)' D (s) u assumes its minimum when
(13.2.2) u (s, x; V) = - I D (s)-1 B (s)' V..
When we substitute this into (13.2.1), we get the partial differential equation for
the minimum costs V (s, x), for to :-5 s:-5 T:
(13.2.3) +2 tr(GG'Vs,)+V;Ax-4V,,BD-1 B' Vs+x'Cx =0

as
with the end condition V (T, x) = z' F (T) x + a (T)' x + b (T).
To solve (13.2.3), let us try
V(s,x)=x'Q(s)x+q(s)'x+p(s),
where Q (s) is symmetric and nonnegative-definite. When we substitute this into
(13.2.3) and equate the coefficients, we get, for to = s < T, the ordinary (coupled)
differential equations for Q (s), q (s), and p(s) :
O(s)+A'Q+QA+C-QBD-1 B'Q=O, Q(T)=F(T),
(13.2.4) q (s)+(A'-Q B D-1 B') q = 0, q (T) = a (T),
p(s)+tr(GG'Q)-q'BD-'B'q=O, p(T)=b(T)-
4
These must be solved backwards in direction to beginning with T.
Since Vs = 2 Q x+ q, the optimal control function u* is now obtained from
(13.2.2):
u * (s, x) _ - D (s)-1 B (s)' (2 Q (s) x + q (s)) .

2
For a (T) = Q; we have q (s) - 0 and hence
(13.2.5) u* (s, x) = -D (s)- l B (s)' Q (s) x.
13.3 Control on the Basis of Filtered Observations 215
13.3 Control on the Basis of Filtered Observations

A control function u (t, X,) = U, depends on the state X, of the system, which
we have up to now assumed to be known exactly. However, in many cases, only
noisy observations of a function of X, are possible, so that the control and filter-
ing must be combined. In the linear case, this yields
dX, = A (t) X, dt + B (t) U, dt +G (t) dW
(13.3.1)
X,o = c 9Z (0, E cc') -distributed,
(13.3.2) dZ, = H (t) X, dt+R (t) dV,
where X U Z W, and V, assume values in Euclidean spaces of arbitrary di-
mensions, W, V, and c are independent, and R (t) R (t)' is positive-definite. If
we fix the cost functional (13.1.2) by
k(t,x,u)=x'C(t)x+u'D(t)u,
M (T, x) = x'F (T) x,
where C (t) and F (t) are symmetric and nonnegative-definite and D (t) is sym-
metric and positive-definite, then the control and filtering can be separated from
each other. In the following discussion, we shall follow Bucy and Joseph ([61) ,
pp. 96-102).
Since instead of X, we know only the observations Z, our control functions u
are allowed to depend not on X, but only on the observations Z [to, tJ; that is,
we are considering functionals of the form
U,=u(t,ZIto,tJ).
Also, the cost functional must now be considered under condition of a certain
observation:
V°E((XT)'F(T)XT+! ((X")'C(r)X; +U; D(r)U,)drlZ[to,sil.

Here, X, = E (X; Z [to, s l) and X; is a solution of (13.3.1). Then, there exists an
optimal control function, namely,
(13.3.3) U' = -D (9)-l B (t)' Q (t) X, .
Hpre, Q (t) is the symmetric solution of
(13.3.4) Q (t)+A'Q+QA-QBD-1 B'Q+C(t) = 0
on the interval [to, T j with the end condition Q (T) = F (T) (see equation (13.2.4)),
and X, =E (X, [Z [to, tJ) is the solution of
(13.3.5) dX,=AX,dt+BU; dt+PH'(RR')-t (dZ,-HX,dt)
on the interval [to, TI with the initial value X,o = 0. Finally, P (t) = E ((X, -
X,) (X,-X,)'I Z Ito, tI)is the error covariance matrix which is independent of
the control function and of the observation and which satisfies the equation
P(t)=AP+PA'-PH'(RR')HP+GG', to:t<_T,
P(to)=Ecc',
The minimum costs arising from use of the control function Ut with the esti-
mated starting point X, in is, TI are then
T
V (s, X,) = X; Q (s) X, + J tr (P H' (R R') - t H P.Q) dr
T
+tr F (T) P (T) + S tr (C (r) P (r)) dr.
These equations are said to contain the so-called separation principle:
Disturbance W Disturbance V
Fig. 10:
Separation of
filtering and
control.
U, z,
The combined filtering and control problem can obviously be broken into the
following problems:
1. Filtering: determination of the optimal estimate X, of X, on the basis of the
observation Z Ito, ti from equation (13.3.5).
2. Determination of the optimal control function u* = u5 (t, X,) for the deter-
ministic problem (G m 0). We get
u' = -D(t)1 B (t)' Q (t) X 1 ,

where Q is the solution of equation (13.3.4) (cf. (13.2.5)). The optimal control
function for the stochastic problem is then simply
U'* _- u s (t, X,),
and henceis i4 fact a functional of the observations Z [to, ti.
Bibliography
1. Selection of Texts on Probability Theory and the Theory of Stochastic

Processes
[1] Bauer, H., Wahrscheinlichkeitstheorie rind Grundzuge der Masstheorie I,

Berlin, de Gruyter 1964 (Sammlung Goschen, Bd. 1216/1216a).
[2] Bauer, H., Wahrscheinlichkeitstheorie and Grundziige der Masstheorie,
Berlin, de Gruyter 1968.
131 Doob, J., Stochastic Processes, New York: Wiley 1960.
[4] Feller, W., An Introduction to Probability Theory and its Applications,
Vols. 1, 2, New York, Wiley 1955/1966.
[5] Gikhman, 1. 1., and Skorokhod, A. V., Introduction to the Theory of Ran-
dom Processes, Philadelphia, W. B. Saunders, 1969 (translation from
Russian).
[61 Hinderer, K., Grundbegriffe der Masstheorie and der Wahrscheinlichkeits-
theorie, Ausarbeitung einer Vorlesung an der Universitat Hamburg,
1968/ 1969.
[7] Krickeberg, K., Wahrscheinlichkeitstheorie, Stuttgart, Teubner, 1963.
[81 Lamperti, J., Probability, New York, Benjamin, 1966.
[9] Loeve, M., Probability Theory, Princeton, Van Nostrand, 1953.
[10] Meyer, P. A., Probability and Potentials, Waltham, Mass., Blaisdell, 1966.
[11] Morgenstern, D., Einfuhrung in die Wahrscheinlichkeitsrechnung and
mathematische Statistik, Berlin, Gottingen, and Heidelberg, Springer,
1964.
[12] Neveu, J., Mathematical Foundations of the Calculus of Probability, San
Francisco, Holden-Day, 1965.
[13] Papoulis, A., Probability, Random Variables, and Stochastic Processes, New
York, McGraw-Hill 1965.
[14] Prabhu, N. V., Stochastic Processes, New York, MacMillan, 1965.
[15] Prohorov, Yu. V., and Rozanov, Yu. A.: Probability Theory, Berlin, Heidel-
burg, and New York, Springer, 1969 (translation from Russian).
[16] Renyi, A., Foundations of Probability, San Francisco, Holden-Day, 1970.
[17] Richter, H., Wahrscheinlichkeitstheorie, Berlin, Gottingen, and Heidelberg,
Springer, 1956.
218 Bibliography
2. Markov and Diffusion Processes, Wiener Processes, White Noise

See the corresponding sections in [21, 131, [41, [51, 171, [81, 191, [ 101,
[12], [13], [14], [15], [45].
[18] Bauer, H., Markoffsche Prozesse, Ausarbeitung einer Vorlesung an der Uni-
versitat Hamburg, 1963.
[19] Bharucha-Reid, A. T., Elements of the Theory of Markov Processes and
their Applications, New York, McGraw-Hill, 1960.
[20] Dynkin, E. B., Theory of Markov Processes, Englewood Cliffs, New Jersey,
Prentice-Hall, 1961 (translation from Russian).
[211 Dynkin, E. B., Markov Processes, Vol. 1, 2. Berlin, Gottingen, and Heidel-
berg, Springer, 1965 (translation from Russian).
[22] Gelfand, I. M.; Vilenkin, N. J., Generalized Functions, vol. 4, New York,
Academic Press, 1961 (translation from Russian).
[23] Hunt, G. A., Martingales et processus de Markov, Paris, Dunod, 1966.
[24] Ito, K.: Lectures on Stochastic Processes, Bombay, Tata Institute of Fun-
damental Research, 1961.
[25] Ito, K.: Stochastic Processes, Aarhus, Universitet, Matematisk Institut,
1969 (Lecture Notes Series, No 16).
[26] Ito K., and McKean, H. P., Diffusion Processes and their Sample Paths,
Berlin, Heidelberg, and New York, Springer, 1965.
[27] Levy, P., Processus stochastiques et mouvement brownien, Gauthier-
Villars, 1948.
[28] Mandl, P., Analytical Treatment of One-Dimensional Markov Processes,
[29] Nelson, E., Dynamical Theories of Brownian Motion, Princeton University,
Press, 1967.
3. Stochastic Differential Equations

See the corresponding sections in [31, [51, [211, [251, 1291, [61], [651,
1661, [721, [801.
[30] Anderson, W. J., Local Behaviour of Solutions of Stochastic Integral Equa-
tions, Ph.D. thesis, Montreal, McGill University, 1969.
[30a] Arnold, L., "The loglog law for multidimensional stochastic integrals and
diffusion processes," Bull. Austral. Math. Soc., 5 (1971), pp. 351-
356.
1311 Bharucha-Reid, A. T., Random Integral Equations, New York, Academic
' -dress (in press).
[32] Chandrasekhar, S., "Stochastic problems in physics and astronomy," Rev.
Mod. Phys., 15 (1943), pp. 1-89 (contained in [511).
Bibliography 219
[33] Clark, J. M. C., The Representation of Nonlinear Stochastic Systems with

Application to Filtering, Ph. D. thesis, London, Imperial College,
1966.
[34] Dawson, D. A., "Generalized stochastic integrals and equations," Trans.
Amer. Math. Soc., 147 (1970), pp. 473-506.
[35] Doob, J. L., "The Brownian movement and stochastic equations," Ann.
Math., 43 (1942), pp. 351-369 (contained in [51 ] ).
[361 Gikhman, 1. I., and Skorokhod, A. V.: Stochastische Differentialgleichun-
gen, Berlin, Akademie-Verlag, 1971 (translation from Russian).
[371 Girsanov, I. V., Primer neyedinstvennosti resheniya stokhasticheskogo urav-
neniya K. Ito (an example of nonuniqueness of the solution of K.
Ito's stochastic equation), Teoriya veroyatnostey i yeye primeneniya,
7 (1962), pp. 336-342.
[37a] Goldstein, J. A., "Second order Ito processes," Nagoya Math. J., 36 (1969),
pp. 27-63.
[38] Gray, A. H., Stability and Related Problems in Randomly Excited Systems,
doctoral thesis, Pasedena, California Institute of Technology press,
1964.
[39] Gray, A. H., and Caughey, T. K., "A controversy in problems involving
random parametric excitation," J. Math. and Phys., 44 (1965), pp.
288-296.
[40] Itb, K., "Stochastic differential equations in a differentiable manifold,"
Nagoya Math. J., 1 (1950), pp. 35-47.
[411 Ito, K., "On a formula concerning stochastic differentials," Nagoya Math.
J., 3 (1951), pp. 55-65.
[421 Ito, K., On Stochastic Differential Equations, New York, Amer. Math. Soc.,
1951 (Memoirs, Amer. Math. Soc. No. 4).
[43] Ito, K., and Nisio, M., "On stationary solutions of stochastic differential
equations," Journ. of Math. of Kyoto Univ., 4 (1964), pp. 1-79.
[441 Langevin, P.: "Sur la theorie du mouvement brownien," C. R. Acad. Sc,.,
Paris, 146 (1908), pp. 530-533.
[451 McKean, H. P.: Stochastic Integrals, New York, Academic Press, 1969.
[461 McShane, E. J., "Toward a Stochastic Calculus," Proc. Nat. A cad. Sci.,
USA, 63 (1969), p. 275-280, and 63 (1969), pp. 1084-1087.
[47] Skorokhod, A. V., Studies in the Theory of Random Processes, Reading,
Mass., Addison-Wesley, 1965 (translation from Russian).
[48] Stratonovich, R. L., "A new representation for stochastic integrals and
equations," SIAM J. Control, 4 (1966), pp. 362-37 1.
[49] Uhlenbeck, G. E., and Ornstein, L. S., "On the theory of Brownian mo-
tion," Phys. Rev., 36 (1930), pp. 823-841 (contained in (511).
(50] Wang, M. C., and Uhlenbeck, G. E., "On the theory of Brownian motion
1I," Rev. Mod. Phys., 17 (1945) pp. 323-342 (contained in [511 ).
220 Bibliography
151 1 Wax, N. (ed.), Selected Papers on Noise and Stochastic Processes, New
York, Dover, 1954, (contains [321, [351, [491, [501).
[521 Wong, E., and Zakai, M., "On the convergence of ordinary integrals to sto-
chastic integrals," Ann. Math. Statist., 36 (1965) pp. 1560-1564.
[531 Wong, E., and Zakai, M., "The oscillation of stochastic integrals," Z. Wahrs-
cheinlichkeitstheorie verw. Geb., 4 (1965) pp. 103-112.
[541 Wong, E., and Zakai, M.: "Riemann-Stieltjes approximation of stochastic
integrals," Z. Wahrscheinlichkeitstheorie verw. Geb., 12 (1969), pp.
87-97.
4. Stability, Filtering, Control

[55] Aoki, M., Optimization of Stochastic Systems, New York, Academic Press,
1967.
[56] Athans, M., and Falb, P. L., Optimal Control, An Introduction to the
Theory and its Applications, New York, McGraw-Hill, 1966.
[57] Bellman, R., Dynamic Programming, Princeton University Press, 1957.
[58] Bhatia, N. P., and Szego, G. P., Stability Theory of Dynamical Systems,
[59] Bucy, R. S., "Stability and positive supermartingales," J. Differential Equ.,
1(1965), pp. 151-155.
[60] Bucy, R. S., "Nonlinear filtering theory," IEEE Trans. Automatic Control,
10 (1965), p. 198.
[61] Bucy, R. S., and Joseph, P. D., Filtering for Stochastic Processes with Ap-
plications to Guidance, New York, Interscience Publ., 1968.
[62] Fleming, W. H.: "Optimal continuous-parameter stochastic control," SIAM
Review, 11 (1969), pp. 470-509.
[631 Gikhman,1.1., "On the stability of the solutions of stochastic differential
equations" (in Russian), Predel'nyye teoremy i statisticheskiye vyvody,
Tashkent, 14-45, 1966.
[64] Hahn, W., Stability of Motion, Berlin, Heidelberg, and New York: Springer,
1967.
[65] Khas'minskiy, R. Z., Ustoychivost' sistem differentsial'nykh uravneniy pri
sluchaynykh vozmushcheniyakh (Stability of systems of differential
equations in the presence of random disturbances), Moscow, Nauka
press, 1969.
[66] Jazwinski, A. H., Stochastic Processes and Filtering Theory, New York,
Academic Press, 1970.
[67] Kalmtan, R. E.: New methods in Wiener filtering theory, Proc. First Sympos.
'1b Fngin. Appl. on Random Function Theory and Probability (edit.
by J. L. Bogdanoff and F. Kozin), New York, Wiley 1963, pp. 270-388.
Bibliography 221
[68] Kalman, R. E., Falb, P. L., and Arbib, M. A., Topics in Mathematical System
Theory, New York, McGraw-Hill, 1969.
[69] Kolmogorov, A. N., Interpolirovaniye i ekstrapolirovaniye statsionarnykh
sluchaynykh posledovatel'nostey (Interpolation and extrapolation of
stationary random sequences), Izvestiya Akad. nauk (seriya matemati-
cheskaya), 5 (1941), pp. 3-14.
[70] Kozin, F., "On almost sure asymptotic sample properties of diffusion pro-
cesses defined by stochastic differential equations," Journ. of Math.
of Kyoto Univ., 4 (1965), pp. 515-528.
[711 Kushner, H. J., "On the differential equations satisfied by conditional prob-
ability densities of Markov processes," SIAM J. Control, 2 (1964),
pp. 106-119.
[72] Kushner, H. J., Stochastic Stability and Control, New York, Academic Press,
1967.
[73] Morozan, T., Stabilitatea sistemelor cu parametri aleatori (stability of sys-
tems with random parameters), Bucharest, Editura Academiei Repub-
licii Socialiste Romania, 1969.
[74] Mortensen, R. E., Optimal Control of Continuous-Time Stochastic Systems,
Ph.D. thesis (engineering), Berkeley, Univ. of California press, 1966.
[75] Sagirow, P., Stochastic Methods in the Dynamics of Satellites, Lecture
Notes, Udine, CISM, 1970.
[76] Stratonovich, R. L.: Topics in the Theory of Random Noise, Vol. 1, New
York, Gordon and Breach, 1963 (translation from Russian).
[77] Stratonovich, R. L.: Conditional Markov Processes and Their Application
to the Theory of Optimal Control, New York, American Elsevier,
1968 (translation from Russian).
[78] Strauss, Aaron: An Introduction to Optimal Control Theory, Berlin, Heidel-
berg, and New York, Springer, 1968 (Lecture notes in operations re-
search and mathematical economics, Vol. 3.)
[79] Wiener, N.: Extrapolation, Interpolation and Smoothing of Stationary Time
Series with Engineering Applications, Cambridge, Mass., Mass. Inst.
Tech. press, 1949.
[80] Wonham, W. M., "Random differential equations in control theory" in A.
T. Bharucha-Reid (ed.): Probabilistic Methods in Applied Mathema-
tics, Vol. 2, New York, Academic Press, 1970, pp. 131-212.
Subject Index
Absolutely continuous, 6 Cauchy polygonal lines, 173

Anderson, W. J., 114, 122 Central limit theorem, 17
Approximation: Chapman-Kolmogorov equation, 29
by Cauchy polygonal lines, 173 Characteristic function:
of Ornstein-Uhlenbeck process, 136 of normal distribution, 10
of stochastic differential equations, 172 of random variable, 10
of white noise, 55 of transition probability, 158
of Wiener process, 174 (Fig. 7) Chebyshev's inequality, 10
Asymptotic stability, 176 Colored noise, 135, 164
in pth mean, 188 Conditional density, 20, 206
stochastic, 181 of normal distribution, 21
in the large, 182 Conditional distribution, 20
Autonomous stochastic differential Conditional expectation, 18
equation, 113 as optimal estimate, 205
linear, 128 Conditional probability, 18, 19
solution as homogeneous diffusion Control:
process, 153 on basis of filtered observations, 215
solution as homogeneous Markov fixed-time, 212
process, 148 for linear systems, 213
transition probability, 159 optimal, 211
separation principle with, 216
Control functions, 211
Backward equation, 42, 158 admissible, 211
fundamental solution of, 43 optimal, 212
Bellman, R.: Convergence:
equation, 212 almost certain, 13
optimality principle, 212 in distribution, 13
Bellman-Gronwall lemma, 107 dominated, 11
Borel sets, 2 in mean square, 13
Borel-Cantelli lemma, 17, 110 monotonic, 1 L
Brownian motion, xi, 39, 134 in pth mean, 13
of harmonic oscillator, 136 with probability 1, 13
in Ornstein-Uhlenbeck theory, 136 in quadratic mean, 13
Bucy, R. S., 179 stochastic, 13, 70
representation theorem, 206 of supermartingales, 26
(See also Kalman-Bucy filter) Costs of control, 212
Covariance, 10
Covariance matrix, 10
Cantelli (see Borel-Cantelli lemma) of solution of linear equation, 131, 143
Cartesian product, 14 Cylinder sets, 14
224 Index
Decrescent function, 177 Estimate: (Contd.)

Delta-correlated process, 56, 164 optimal linear, 209
Density: Estimation error, 208
conditional, 20, 206 Events, 1
of a distribution, 6 Expectation, 8
of a measure, 12 conditional, 18
of normal distribution, 6 of solution of linear equation, 131, 138,
of transitional probability, 31, 43, 158 142
Deterministic motion, 34 stable, 188
Differential equations, stochastic (see of stochastic integral, 74, 80
stochastic differential equations) Explosion, 112
Differential operators: Exponential stability in pth mean, 188
`D, 41, 156, 180 Extrapolation, 203
Z*, 207
Differentials, stochastic, 88
Diffusion matrix, 40
Filter, Kalman-Bucy, 206
of the solution of a linear equation, 155
for linear systems, 209
of the solution of a stochastic
Filtering, optimal, 202
differential equation, 152
estimation error in, 208
Diffusion processes, 39
Filtering problem, 202, 206
coefficients of, 40
Filtering theory, fundamental equation of,
diffusion matrix of, 40
207
drift vector of, 40
Fixed-time control, 212
homogeneous, 43, 153
Fokker-Planck equation, 44, 159
infinitesimal operator of, 41, 158
Forward equation, 44, 159, 160
as solution of stochastic differential
Frequency interpretation of probability
equation, 152, 154, 156
theory, 4
Distribution, 5
Fubini's theorem, 15
absolutely continuous, 6
Fundamental equation of filtering theory,
convergence in, 13
207
finite-dimensional, 22
Fundamental matrix, 129, 141
of a functional, 161
Fundamental theorem of Kolmogorov, 22
invariant, 33
marginal, 6
normal (see under Normal)
Gauss-Markov process, 150
(see also generalized functions)
Gaussian distribution, 6
Distribution function, 5
Gaussian stochastic process, 25, 132
joint, 5
generalized, 52
Dominated convergence:
with independent increments, 86, 95,
theorem on, 11
132
Drift vector, 40, 153
Gaussian white noise, 50
of solution of linear equation, 155
Gel'fand, 1. M., 51
Dynamic programming, 212
Generalized function, 51
Dynamic systems, 163
Gikhman, I. I., 84, 111, 112, 117, 122
optimal control of, 211 Girsanov's example, 1 I 1
stability of, 176 Global solutions, 113
stochastic, 57 Goldstein, J. A., 88, 119, 121
Gronwall, lemma of Bellman and, 107
Growth, restriction on, 105, 113
Elementary events, 1
Equilibrium position, 176, 179 Holder condition, 114
Estimate: Holder's inequality, 9
conditional expectation as optimal, 205 Homogeneous diffusion process, 43, 153
optimal, 203 Homogeneous Markov process, 33, 148
index 225
Independent events, 15 Kolmogorov, A. N. (Contd. )

Independent increments: criterion of, 24
of solution of linear stochastic equation of Chapman and, 30
differential equation, 132 fundamental theorem of, 22
of stochastic integral, 86, 95 Kushner, H. J., 179
of Wiener process, 46
Independent random variables, 16
L-operator, 180
Independent sigma-algebras, 16
for linear equations, 195
Indicator function, 2
Langevin equation, xi, 134, 136
Inequality:
Law of iterated logarithm, 17
Chebyshev's, 10
for solution of stochastic differential
Holder's, 9
equation, 121
Minkowski's, 9
for stochastic integral, 87
Schwarz's, 9
for Wiener process, 46
supermartingale, 26
Law of large numbers:
triangle, 9
strong, 17
Infinitesimal operator, 36
of diffusion process, 41, 158
Lebesgue integral, l I
extension of, 180
Lebesgue measure, 5
of Wiener process, 37
Lebesgue-Stieltjes integral, 8
Instability, 176
Lemma:
stochastic, 181
Bellman-Gronwall, 107
Integrability, 8
Borel-Cantelli, 17, 110
Integral of a measurable function, 7
Limit theorems, 17
Ito's, 60, 71 Linear stochastic differential equations, 125
Lebesgue, 11
autonomous, 128, 159
Lebesgue-Stieltjes, 8
backward, 158
Riemann-Stieltjes, 8, 59, 78
forward, 159
stochastic, 71
general vector, 141
Stratonovich's, 61, 69
homogeneous, 125
Interpolation, 203
in narrow sense, 125, 129
Intrinsic time, 87
moments of scalar, 138
Invariant distribution, 33
solution of as diffusion process, 153
Iterated logarithm, law of, 17
solution of as Gaussian process, 132
for solution of stochastic differential
solution of as stationary process, 133,
equation, 121
135
for stochastic integral, 87
stability of, 190
transition probabilities of solution of,
Ito's stochastic differential equation (see
156
stochastic differential equation)
Lipschitz condition, 105, 111
Ito's stochastic integral, 60, 71
sufficient condition for satisfaction of,
connection with Stratonovich's
112
integral, 169
Local solution, 113
Ito's theorem:
LP spaces, 9
proof, 96
Lyapunov's direct method, 177
statement, 90
Lyapunov function, 178, 180
as supermartingale, 181
Jazwinski, A. H., 202
Kalman-Bucy filter, 206 Marginal distribution, 6

for linear systems, 209 Markov process, 28
Khas'minskiy, R. Z., 179, 183, 189, 198 homogeneous, 33, 148
criterion of, 197 invariant distribution of, 34
Kolmogorov, A. N., 202 shift from real to, 163
226 Index
Markov process (Contd.) Optimality principle, Bellman's, 212

solution of stochastic differential Ornstein-Uhlenbeck process, 134, 165
equation as, 146
stationary 33, 151
Path, 22
Markov property, 27, 28
Plant, 211, 212
negative formulation of, 164
Poisson white noise, 54
Martingales, 25
Positive-definite function, 177
stochastic integral as, 80
Positive-definite matrix, xv
McKean, H. P., 87, 113, 121
Probability, 4
Mean square:
conditional, 18
convergence in, 13
frequency interpretation of, 4
stability in, 188
Probability space, 4
Mean-square continuity, 24
Product probability, 14
of solution, 118
Product sigma-algebra, 14
Mean-square differentiability, 124
pth mean, convergence in, 13
Measurable mapping, 2
pth mean, stability in, 188
Measurable set, 1
Measurable space, 1
Quadratic mean, convergence in, 13
Measure, 3
finite, 4
Radial unboundedness, 177
Lebesgue, 5
Radon-Nikodym theorem, 12
probability, 4
Random variable, 1
sigma-finite, 4
Realization, 22
Measure space, 4
Representation theorem of Bucy, 206
Minkowski's inequality, 9
Restriction on growth, 105, 113
Modelling, 165
Riccati equation, 209
Moments, 10
Routh-Hurwitz criterion, 196, 197
of normal distribution, 10
of solutions, 116, 138
stability of, 188 Sagirow, P., v, 199
Monotonic convergence, 11 Sample function, 22
Satellite dynamics, 199
Negative-definite function, 177 Schwarz's inequality, 9
Noise: Separable processes, 23
colored, 135, 164 Separation principle, 216
thermal, 136 Sigma-algebra, 1
white, 50, 136, 164 Sigma-additivity, 4
Nonanticipating family of sub-sigma- Signal process, 202
algebras, 63 Skorokhod, A. V., 84, 111, 112, 117, 122
Nonanticipating function, 63, 64, 101 Smoothing, 203
Norm, xv Solution of a stochastic differential
Normal distribution, 6 equation, 101
characteristic function of, 10 analytical properties of, 120
conditional density of, 20 continuity of, 103
density of, 6 dependence on parameters and initial
moments of, 10 values, 122
equivalence of independence and as diffusion process, 152, 156
uncorrelatedness with, 16 existence of, 105
functionals of trajectories of, 161
Observed process, 203 as Gaussian process, 132
Optimal control, 211 general vector linear equation, 141
Optimal control function, 212 global, 113
Optimal filters, 202 as homogeneous diffusion process, 153
for linear systems, 208 as homogeneous Markov process, 149
Index 227
Solution of a stochastic differential Stochastic differential equations (Contd.)

equation, (Contd.) Second-order, 119
law of iterated logarithm for, 121 in the sense of Stratonovich, 170, 173,
local, 113 174, 175
as Markov process, 146 solution of, 101
moments of, 116, 138 Stochastic equivalence, 23
nondifferentiability of, 122 Stochastic integral, 71
nonunique solution of, 124 continuity of, 81
polygonal-line approximation of, 173 integration of by parts, 93
as stationary Gaussian process, 133 Ito's, 60, 71, 170
as stationary Markov process, 151 martingale property of, 80
transition probability of, 146 of step function, 64
unbounded variation of, 121 Stratonovich's, 61, 169
uniqueness of, 105 unbounded variation of, 88
Spectral density, 24 Stochastic processes, 21
of white noise, 50 with continuous sample functions, 24
Spectral distribution function, 24 equivalence of, 23
Stability, 176, 188 finite-dimensional distributions of, 22
of autonomous equation, 184 Gaussian, 25, 132, 133
of disturbed linear equation, 196 generalized, 52
of expectation, 188 with independent increments, 46, 86, 95,
of linear autonomous equation, 178, 132
189, 196 parameter set of, 21
of linear equation, 190 sample function of, 22
of linearized equation, 198 separable, 23
of moments, 188 state space of, 21
stochastic, 181 stationary, 24
stochastic asymptotic, 181 version of, 23
in the large, 182, 195 Stochastic stability, 181
of Stratonovich equation, 186 weak, 189
of undisturbed equation, 185, 186, 197 Stratonovich's stochastic integral, 61, 167
Stability criteria: connection with Ito's integral, 169
of Khas'minskiy, 197 system-theoretic significance of, 172
of Routh-Hurwitz, 196, 197 Submartingales, 25
Standard deviation, 10 Supermartingale property of Lyapunov
Stationarity, 24 functions, 179, 181
Stationary distribution, 34, 151 Supermartingales, 25
Stationary Gaussian process, 133, 135 convergence theorem on, 26
Stationary Markov process, 33, 151 inequalities involving, 26, 181, 182
Stationary transition probability, 33, 149
Stochastic convergence, 13, 70 Theorem:
Stochastic differential, 89 Bucy's representation, 206
Stochastic differential equations, 94, 101 on dominated convergence, 11
approximation of, 172 Fubini's, 15
autonomous, 113 M'S, 90
continuous solutions of, 103, 156 Kolmogorov's fundamental, 22
diffusion process as solution of, 41, 154 on monotonic convergence, 11
existence and uniqueness of solution of, Radon-Nikodym, 12
105 Wong-Zakai, 174
global solutions of, 113 Trajectory, 22
linear, 125 Transformation theorem, 8
local solution of, 113 Transition operators, 37
ordinary differential equations as Transition probability, 31
special case of, 103 continuous, 37
228 index
Transition probability, (Contd.) polygonal approximation of sample

density of, 31, 43, 158, 159 function of, 174
of Gauss-Markov process, 150 rotation-invariance of, 47
stationary, 33, 149 strong law of large numbers applied to,
Triangle inequality, 9 46
transition density of as solution of
backward or forward equations, 43
Uhlenbeck (see Ornstein-Uhlenbeck process) unbounded variation of, 48
Uncorrelatedness, 10 white noise as derivative of, 53
Wiener-Hopf equation 209
Variance, 10 White noise, 50, 164
approximation of, 55
d-dimensional, 56
Wiener process, 35, 45 as derivative of Wiener process, 53
diffusion matrix of, 41 Gaussian, 50
drift vector of, 41 as limit of stationary process, 55
independent increments of, 46 Poisson, 54
law of iterated logarithm applied to, 46 as process with independent values, 54
nondifferentiability of sample functions as stationary generalized process, 54
of, 48 Wong-Zakai theorem, 174

Ludwig Arnold - Stochastic Differential Equations - Theory and Applications - Wiley Interscience PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ludwig Arnold - Stochastic Differential Equations - Theory and Applications - Wiley Interscience PDF

Uploaded by

Copyright:

Available Formats

Stochastic

JOHN WILEY & SONS, New York 0 London 0 Sydney Toronto

Library of Congress Cataloging in Publication Data:

Printed in the United States of America

The theory of stochastic differential equations was originally developed by

Notation and Abbreviations ........... ' ..................... xv

1.1 Events and random variables ............................ 1

2. Markov Processes and Diffusion Processes .................. 27

2.1 The Markov property ................................ 27

4. Stochastic Integrals .................................. 57

5. The Stochastic Integral as a Stochastic Process, Stochastic

5.3 Stochastic differentials. Ito's theorem ..................... 88

6 Stochastic Differential Equations, Existence and

6.3 Supplements to the existence-and-uniqueness theorem .......... III

7. Properties of the Solutions of Stochastic Differential

7.1 The moments of the solutions ......................... 116

7.2 Analytical properties of the solutions ..................... 120

8. Linear Stochastic Differential Equations ................... 1 25

8.1 Introduction ..................................... 125

8.2 Linear equations in the narrow sense ..................... 128

8.3 The Ornstein-Uhlenbeck-process ....................... 134

8.4 The general scalar linear equation ....................... 136

8.5 The general vector linear equation ....................... 141

9. The Solutions of Stochastic Differential Equations as Markov and

9.1 Introduction ..................................... 145

10. Questions of Modeling and Approximation ................. 163

10.1 The shift from a real to a Markov process .................. 163

11.1 Stability of deterministic systems ....................... 176

11.2 The basic ideas of stochastic stability theory ................ 179

11.4 Linear equations .................................. 190

11.7 An example from satellite dynamics ..................... 199

12. Optimal Filtering of a Disturbed Signal ................... 202

12.1 Description of the problem ........................... 202

12.2 The conditional expectation as optimal estimate ............. 205

13. Optimal Control of Stochastic Dynamic Systems ............. 211

13.1 Bellman's equation ................................. 211

Bibliography ......................................... 217

Name and Subject Index ................................. 222

Differential equations for random functions (stochastic processes) arise in the

Xt = f (t, X,, sle), X,0 = c,

(e) X,=c+1 /(s,X,)ds+J G(s,X,)dW,.

for a broad class of so-called nonanticipating functionals G of the Wiener process

dX, = J (t, X,) dt+G(t, X,) dY,.

Vectors of dimension d are basically treated as d x I matrices (column vectors).

A' Transpose of the matrix A

z y' Matrix(xi y1)

o (g (t)), 0 (g (t)) Quantity whose ratio to g (t) (usually as t --+ 0) approaches

LP = LP (0, 21, P) All random variables such that E I X IP < oo

P (s, x, t, B) Probability of transition of the point x at the instants into the

n (t, x, Y) = (2 n t)-d12 e-ir-X12129

9 (m, C) d -dimensional normal distribution with expectation vector m

Fundamentals of Probability Theory

1.1 Events and Random Variables

An Rd-valued function X : 9 Rd, where (9, 21) is a measurable space, is mea-

{w: X (w) <a} = (X <aj E 21 for all a E R1.

X11 (lit) ... X1m (w)

The indicator function 1A is measurable if and only if A is measurable, that is,

1.2 Probability and Distribution Functions

a) O:p(A):oo for allAE21,

C) l4 (U A) = ` u (An) if A,, E 21 for all

P(A)P(B) for A C B, A, B E 21.

P (,(i, j)}) = Pij a O, 21 Pi) = 1.

increasing function that is everywhere continuous from the right; also,