Professional Documents
Culture Documents
Differential Equations:
Theory and Applications
LUDWIG ARNOLD
A WILEY-INTERSCIENCE PUBLICATION
provement. For this I wish to express my gratitude to him. I also thank sincerely
Hede Schneider of Stuttgart and Dominique Gaillochet of Montreal for typing
the manuscript. Finally, I wish to express my thanks to the Centre de Recherches
Mathematiques of the University of Montreal, where I was able to complete the
manuscript during my stay there in the academic year 1970-1971.
Ludwig Arnold
Montreal, Quebec
March 1971
Contents
Introduction .................. . . xi
1.2 .....................
Probability and distribution functions 3
1.3 Integration theory, expectation ......................... 7
1.4 Convergence concepts ................................ 12
1.5 Products of probability spaces, independence ................ 14
1.6 Limit theorems .................................... 17
1.7 Conditional expectations and conditional probabilities .......... 18
1.8 ................................. 21
Stochastic processes
1.9 Martingales ....................................... 25
with random functions A (t) and B (t) as coefficients and with random initial
value c or
a = 6 it a j7/m, where a is the radius of the (spherical) particle, m is its mass, and
ri is the viscosity of the surrounding fluid. On the other hand, the term a , repre-
sents the force exerted on the particle by the molecular collisions. Since under
normal conditions the particle uniformly undergoes about 1021 molecular colli-
sions per second from all directions, a E, is indeed a rapidly varying fluctuational
term, which can be idealized as "white noise." If we normalize , so that
its covariance is the delta function, then a2 = 2 a k Tim (where k is Boltzmann's
constant and T is the absolute temperature of the surrounding fluid). The same
equation (b) arises formally for the current in an electric circuit. This time, E,
represents the thermal noise. Of course, (b) is a special case of equation (a), the
right-hand member of which is decomposed as the sum of a systematic part /
and a fluctuational part G E, .
In model (b) of Brownian motion, one can calculate explicitly the probability
distributions of X, even though , is not a random function in the usual sense.
As a matter of fact, every process X,with these distributions (Ornstein-Uhlenbeck
process) has sample functions that, with probability 1, are non differentiable, so that
(b) and, more generally, (a) cannot be regarded as ordinary differential equations.
For a mathematically rigorous treatment of equations of type (a), a new theory
is necessary. This is the subject of the present book. It turns out that, whereas
"white noise" is only a generalized stochastic process, the indefinite integral
(c) W, = E, ds
0
i
can nonetheless be identified with the Wiener process. This is a Gaussian stochas-
tic process with continuous (but nowhere differentiable) sample functions, with mean
E W, = 0 and with covariance E W, W, = min (t, s).
If we write (c) symbolically as
dW,=e,dt,
(a) can be put in the differential form
(d) dX, = / (t, X,) dt+G (t, X,) dW , X,0=c.
This is a stochastic differential equation (Ito's) for the process X,. It should be
understood as an abbreviation for the integral equation
Since the sample functions of W, are with probability 1 continuous though not of
bounded variation in any interval, the second integral in (e) cannot, even for
smooth C, be regarded in general as an ordinary Riemann-Stieltjes integral with
respect to the sample functions of W, because the value depends on the intermediate
points in the approximating sums. In 1951, 1 to [421 defined integrals of the form
Iniroduction xiii
Y, = f G (s) dW,
90
Probability theory deals with mathematical models of trials whose outcomes de-
pend on chance. We group together the possible outcomes-the elementary events
-in a set Q with typical element w E Q. If the trial is the tossing of a coin, then a =
{heads, tails); for the throwing of a pair of (distinguishable) dice, Q = {(i, j): 1 < i,
i-:5 6}; for the life length of a light bulb,Q =(0, oo); in the observation of water
level from time tl to time t2, Q is the set of all real functions (or perhaps all con-
tinuous functions) defined on the interval [t1, t2). An observable eventA is a sub-
set of Q, which we indicate by writing A C Q. (In the dice example,A might be
((i, j): i + j = an even number), and in the light bulb example, .4 might be {w:
w ? c}.)
On the other hand, not every subset of'Q is in general an observable or interest-
ing event. Let 21 denote the set of observable events for a single trial. Of course,
21 must include the certain eventQ, the impossible event 0 (the empty set) and,
for every event A, its complement A. Furthermore, given two events ,4 and B in
21, the union A U B and the intersection A n B also belong to 21; thus, 21 is an
algebra of events. In many practical problems, one must be able to make count-
able unions and intersections in 21. To do this, it is sufficient to assume that
0
U
n=1
when An E 21 for n > 1. An algebra 21 of events with this property is called a sig-
ma algebra. Henceforth, we shall deal with sigma algebras exclusively. In the
terminology of measure theory, which is parallel to the terminology of proba-
bility theory, the elements of 21 are called measurable sets and the pair (a, 21) is
called a measurable space. Two events A and B are said to be incompatible if
they are disjoint, that is, if A n B = 0. If A is a subset of B, indicated by writing
A C B (where A = B is allowed), we say that A implies B.
Let Cr denote a family of subsets of Q. Then, there exists in Q a smallest sigma-
algebra 21((i) that contains all sets belonging to (E. This 21 (() is called the sigma
algebra generated by (F.
2 1. Fundamentals of Probability Theory
Let (9, 21) and (Q', 21') denote measurable spaces. A mapping X: Q -Q'that
assigns to every w E 9 a member w' = X (w) of Q' is said to be (21-21')-measur-
able (and is called an Q'-valued random variable on (9, 21)) if the pre images of
measurable sets in 9' are measurable sets in 0, that is, if, for A' E 21',
{w:X(w)EA'}=[X(w)EA']=X-1 (A')E21.
The set 21(X) of preimages of measurable sets is itself a sigma algebra in .Q and is
the smallest sigma algebra with respect to which X is measurable. It is called the
sigma algebra generated by X in 0.
Suppose that 9' is the d-dimensional Euclidean space Rd with distance function
Y
d 112
1 x-YI=21 Ixk-Yk12(k-1 ) +
Yd
In this special case, we shall always choose as the sigma algebra 21' of events the
sigma algebra $d of Borel sets in Rd generated by the d-dimensional intervals
ak < xk < bk for k = 1, 2,... , d. Borel sets include in particular all closed and all
open sets and all kinds of d -dimensional intervals (half-open, unbounded, etc.).
Although there are "many" non-Borel sets, it is not easy to exhibit specific ex-
amples of them. If Q' is a subset of Rd (examples for d =1 are 10, oo),10, 11'(0,
1, 2, ...), etc.), then we always choose 21' =!Bd (Q) = IA' = B n dl': B E V).
A (d x m}matrix-valued function
1 for w E A,
14 (w)_
0 for wi$A.
b) N (QS) = 0,
4 1. Fundamentals of Probability Theory
P(A)=IY, Pij
U. j)EA
1.2 Probability and Distribution Functions S
If the dice are independent (see section 1.5), the pit have the form
6 6
Pii = pi q1, p, = qi = 1.
i=1 i=1
Finally, if the dice are "fair", then pi = qi = 1/6, so that pit = 1/36.
Of all nonnormed measures, we are interested here only in Lebesgue measure 1,
defined on the set of Borel sets t$d in Rd. This measure assigns to every d-
dimensional interval its "length":
d
1 ({x: ak < xk S bk}) _ 11 (bk -ak) .
k=1
Thus, in the case of simple sets, it corresponds to their elementary geometric
content and it can be carried over in an unambiguous way from intervals to all
Borel sets. Since 1 (Rd) = oo, Lebesgue measure is not finite. However, since
1({x:-n_xk<n})_(2n)d<oo, n=1,2,...,
it is a sigma-finite measure. Every countable set (for example, the set of points
with rational coordinates) has Lebesgue measure 0.
Let (Q, 21, p) denote a measure space and let E (w) denote a proposition regard-
ing the elements w ofQ. Then, we shall write E [,u] to mean that E is true for all
w in Q except possibly for those to in some set N (belonging to 21) such that
fr (N) = 0 , so that E, is true for all to E N is a probability P , we shall say
"certain [P] " or, since P (N) = 1, "with probability I".
Now, suppose that (Q, 21, P) is a probability space, that (Q', 21') is a measur-
able space, and that X is a random variable with values in Q'. The function X
maps the probability P onto the measurable space of the images by
PX (A') = P (X -1(A')) = P {w: X ((o) E A'} = P (X E A'} for all
A' E 21'.
The function PX is called the distribution of X. It contains the information
needed for probability-theoretic examination of X. For an Rd-valued random
variable, the distribution PX is uniquely defined on 5 d by its distribution func-
tion
F(x)=F(x1,...,xd)=P{w:X1(w):x1,...,Xd(w)-xd)
= P {X : x],
which shows how likely it is that X will assume values to the "left" of the point
xE Rd. The function F (x) is a convenient tool for describing the distribution
of X inasmuch as it is not a set function but an ordinary point function defined
on Rd. It is also called the joint distribution function of the d scalar random
variables X1, ... , Xd, which are the components of X . Ford = 1, F (x) is an
6 1. Fundamentals of Probability Theory
Step 2. Now, let X> 0 be any measurable function. There exists an increasing
sequence {X.} of nonnegative measurable step functions such that
lim X. (w) = X (w) for all w E Q
n - oo
and
n
lim
oo
XndP=c:oo.
Q
JXdP=c.
Step 3. Now, let X denote any measurable function. We decompose it into posi-
tive and negative parts:
XdP= X+dP-X-dP,
Q Q Q
8 1. Fundamentals of Probability Theory
E(X)=EXXdP.
For example, for every A E 21, we have E 1 A = P (A).
For Rd - and matrix-valued random variables, we define
EX1 X,
XdP=EX= X=
Q
E Xd Xd
and
A d P= E A= (E Aii) , A= (Aii )d x m
EX= xdF(x),
Rd
and, for an integrable absolutely continuous random variable with density / (see
the Radon-Nikodym theorem below),
J
x1 /1 (x1) dx1
/i is the density of X.
Here, these integrals are in general Lebesgue integrals, which we shall treat at
the end of this section.
Suppose that, for p? 1,
QP = QP (a, 21, P) _ X: X is an Rd-valued random variable,
EIXIP<oo}.
We have 2P C 29 (where p>_ q) and 2P is a linear space. If in QP we shift to the
set LP of equivalence classes of random variables that with probability 1 coin-
cide and if we set
IIXIIp = (E IXIP)11p,
then, LP is a Banach space with respect to this norm. In fact, L2 is a Hilbert
space with scalar product
d
(X, Y)=EX'Y= EXiYi.
i=1
InL',wehaveIEXI - EIXIand
E (a X+ P Y) = a E X+ fi E Y (Linearity)
For d = 1, we have
EX E Y for X: Y (Monotonicity).
In addition, we have Holder's inequality (for p = 2, Schwan's inequality)
I(X, Y)I = IIXIIp 11Y119 (p> 1, 1/p+ 1/q =1, X E LP, Y E L9),
Minkowski's inequality (the triangle inequality in LP)
IIX + YIIp < IIXIIp'+" 11 YIIp (p i=1, X, YELP),
10 1. Fundamentals of Probability Theory
is called the covariance matrix of the Rd -valued random variables X and Y. For
Cov (X, X), we write simply Cov(X). The characteristic function of a random
variable X or of its distribution function F is
This last relation can also serve as a definition of normal distribution, provided C
is nonnegative-definite (and the distribution can therefore be concentrated on a
certain linear subspace of Rd whose dimension is equal to the rank of C). For
d = 1, % (m, a2) has central moments
n?I, odd,
E (X - m)" =
1.3.5-... (n-1) a", n?2, even.
(see section 1.4), and IX,, I:-5 YELP. Then, X E LP, IIXn - X JJP --.0, and
lim EXn=EX.
This theorem holds, in particular, when almost certain convergence of X. to X
rather than stochastic convergence obtains.
An integration theory can be developed for an arbitrary measure space (S2, 21, p),
in exactly the same way as in the case pu (Q) = 1. However, the relation £P C 2q
for p>q is false in the case p (Q)=oo. The only infinite measure space that we
consider here is the space (Rd, 23d, 1). The resulting integral is called the
Lebesgue integral. We write
f fd1= f f(x)dx
Rd Rd
f (x) dx = J
f IB d1.
J
B Rd
In the case d = I and B is[a, b[,(- oo, b),R1, etc., the notations
b 6 CO
J
f (x) dx, f f (x) dx, J
f (x) dx
are the usual ones. The Lebesgue integral is more general than the familiar Rie-
mann integral defined in terms of upper and lower sums, inasmuch as it is de-
fined for more functions. Every bounded Riemann-integrable (for example, every
continuous) function defined on a bounded interval is also Lebesgue-integrable
12 1. Fundamentals of Probability Theory
and the two integrals coincide. The same holds for nonnegative integrands and
for improper Riemann integrals. All the integrals over Rd or subsets of it that
we have been considering are basically Lebesgue integrals though, for the most
part, they might, by virtue of the smoothness of the integrands, be regarded as
Riemann integrals.
Let v and u denote any two measures on(Q, 21). The measure v is said to bep-
continuous if v (N) = 0 wheneverp (N) = 0. Also, v is said to havep-density
f __>_ 0 whenever
for allAE21.
A
We then write
f=dp
Radon-Nikodym theorem: Let v and u denote two measures defined on (Q, 21),
and suppose that p is sigma-finite. Then, v isp-continuous if and only if v has a
p-density. This density is uniquely defined [p] . We then have
Xdv=Xdvdu
sZ Q du
as long as one side of this equation is meaningful.
If (Q, 21) = (Rd, `$d) and p = A, we also speak of Lebesgue-continuity. If v = P
is a Lebesgue-continuous probability on (Rd, %d) with distribution function F,
then there exists a density f ? 0 uniquely defined [A] such that
ad F
(XI, ... , xd).
axl ... axd
sequence of the X,, (w) E Rd converges in the usual sense to X (w) E Rd, they,
is said to converge almost certainly [P] or with probability 1 to X. We
write
ac-lim X,, = X.
n -+oo
n
lim
+Rd
CO
j g (x) dFn (x) _ g (x) dF (x)
Rd
for every real-valued continuous bounded function g defined on Rd, the se-
quence {X , } is said to converge in distribution to X. This is the case if and only
if
lim F. (x) = F (x)
n-.oo
at every point at which F is continuous or
lim qPn (t) = T (t) for all t E Rd,
n+oo
where ipn and q' are characteristic functions.
These convergence concepts are in the following relationship to each other:
convergence in q th mean
4
convergence in p th mean, where (p < q)
4
almost certain convergence = stochastic convergence convergence in distribution
00
E JXn-XIP < oo for some p>0.
n=1
Let {X.} denote a sequence of Rd-valued random variables with distribution
9t (m., Q. This sequence converges in distribution if and only if
mn-rm,
The limit distribution is 9t (m, C). This follows from consideration of the char-
acteristic functions.
Fubini's theorem, written for the case n = 2, is as follows: Let X denote a non-
negative or (P1 X P2)-integrable X 212 - $1)-measurable scalar function de-
fined on Sgt X Q2. Then,
X d (Pt X P2) =
J
(f X (wt, (02) dP2 (w2)) dP1 (oil)
Q1 x Q2 Q1 Q2
_
I: ( J X (wt, (02) dP1 (wt)) dP2 (w2)
a= X Qi
iEl
21=X21 i
iE1
Again, 21 is also the smallest sigma-algebra with respect to which every pro-
jection pi defined by pi (w) = wi is measurable. Then, there exists exactly one
product probability P defined on (9, 21), that is, one probability P that assigns
to the cylinders the value
P (X A) = H Pi (Ai)
,El iEl
holds for every possible choice of events Ai E %i. Finally, random variables
X1, ... , X. (whose ranges may differ for different values of the subscript) are
said to be independent if the sigma-algebras 21 (X1), ... , 21 (Xn) generated by
them are independent.
Any family of events (sigma-algebras, random variables) is said to be independ-
ent if the events belonging to every finite subfamily of that family are inde-
pendent in the sense of the definition given above.
In a product probability space, the projections are independent random vari-
ables. For a given sequence F1, F2, ... of d-dimensional distribution functions,
we can therefore construct an explicit sequence X1, X2, ... of independent
random variables such that the distribution function of Xi is Fi. We do so by
choosing Q = (Rd)°° = set of all sequences w= (w1, w2, ...) with elements
wi E Rd,21= (%d)°°, P = PF1 X PF1 x ..., and Xi (w) = pi (w) = wi.
Let Xl denote an Rd -valued and X2 an R"`-valued random variable. Let F (x1, x2)
denote their joint distribution function and let Fl (xl)=F (xl, oo) and F2 (x2)=
F (oo, x2) denote the marginal distributions of X1 and X2, respectively. Then, Xl
and X2 are independent if and only if, for every x1 E Rd and X2 E Rm,
Sn = Xi
i-1
has distribution
1.6 Limit Theorems 17
I^mssup
S -na
2nloglogn
and
lim inf
S .-n a -Ian.
n--°° 2nloglogn =
(Here, "log" denotes the natural logarithm.)
All versions of the central limit theorem assert that the sum of a large number of
independent random variables has, under quite general conditions, an approxi-
mately normal distribution. In our special case of identically distributed sum-
mands, it is as follows: Suppose that 0 < V (Xn) = a2 < oo and that E (Xn) = a.
Then, (Sn- a n)/a f tends in distribution to 91(0,1); that is, for all x E R',
nlym P [Sa n _ xJ i e-r'IZ dy,
18 1. Fundamentals of Probability Theory
P (B)
However, we frequently encounter a whole family of conditions, each of which
has probability 0. Then we need the following more general concept of condi-
tional expectation.
Let X E L1 (Q, 21, P) denote an Rd -valued random variable and let ( C 21 de-
note a sub-sigma-algebra of 21. The probability space (Q, U, P) is a coarsening
of the original one and X is, in general, no longer ( -measurable. We seek now a
1 -measurable coarsening Y of X that assumes, on the average, the same values
as X, that is, an integrable random variable Y such that
Y is (!-measurable,
YdP=JXdP for all CECF.
C c
According to the Radon-Nikodym theorem, there exists exactly one such Y, al-
most certainly unique. It is called the conditional expectation of X under the
condition (!. We write
Y=E(XI(F).
As a special case, we consider a sub-sigma-algebra Cr whose elements are arbi-
trary unions of countably many "atoms" (A,,) such that
0
U
A=1
The quantity E (X I(S) is constant on the sets A,,. ForP (A,,) > 0,
(AS)
E (X () (w) = E (X I A.) = X dP for all w E A,, ,
P J
a) (S={0,Q} -*E(XI(F)=E(X),
b) X>0 =o E(XI(E)?0,
c) X CS-measurable -* E(XI(E) = X,
d) X = const = a - E (XI(E) = a,
e) X,YEL' -.E(aX+bYI(E)=aE(XICS)+bE(YI(E),
f) X <_Y ..E(Xl(F)_E(YICS),
g) X CS- measurable , X, X Y' E Ll - E (X Y'l(E) = X E (Y'I (F),
in particular E (E (XI(F) Y'l(F) = E (XI(F) E (Y'l(s),
h) X, CS independent - E (X I (F) = E (X).
For later use, we point out in particular, that, for (SI C CsZ C 21,
(1.7.1) E (E (X leDler) = E (E (X I(S1)I(Sz) = E (X I(!,).
The conditional probability P (A I (F) of an event A under the condition CS C 21 is
defined by
P(AI(E) = E(141(S)
Being a conditional expectation, the conditional probability is aQ -measurable
function on Q. In particular, for a CS generated by at most countably many
atoms {An},
E(g(X)I f)= J
g(x)p(w,dx).
Rd
h(y)=E(XIY=y)
In the special case of the conditional probability P (X E B I Y), we go on to the
conditional distribution p (w, B), for which there is now an almost certainly
unique q such that
p (w, B) = q (Y (w), B).
We also write
q(y,B)=P(XEBIY=y),
which is measurable with respect toy and, for fixed y, is a probability with
respect to B. For g (X) E Lr,
P[XEBI= P(XEBIY=y)dPy(w'),
Q
where Py is the distribution of Y.
If q (y, ) = P (X E I Y = y) has a density h (x, y) in Rd, this density is called
the conditional density of X under the condition Y = y.
1.8 Stochastic Processes 21
12(Y)= f f(x,y)dx
Rd
1(x, Y)
h (x, y) =
12 (Y)
For all integrable functionsg (x), we have, in accordance with formula (1.7.2),
J g(x)1(x,y)dx
E(g(X)IY-Y)- Rd
1z (Y)
If X and Y are independent, then 1(x, y) =1r (x) 12 (y), so that h (x, y)=1t (x).
Suppose that the joint distribution of X and Y is a normal distribution 9 (m, C).
Let us write E X = ms, E Y = my, and
The example cited in section 1.1 of the measurement of water level during an in-
terval of time (ta, T) and the description of the position of a particle subject to
Brownian motion as a function of time make it necessary to consider simulta-
neously a family of random variables that depend on a continuous parameter
(time).
More generally, let 1 denote an arbitrary nonempty index set and let (a, 21, P)
denote a probability space. Then, a family {X,; t E 1) of Rd-valued random vari-
ables is called a stochastic process (random process, random function) with pa-
rameter set (index set) l and state space Rd.
22 1. Fundamentals of Probability Theory
If I is finite, we are dealing simply with finitely many random variables. In the
case 1= { ... , -1, 0, 1, ... } or { 1, 2.... }, we speak of a random sequence or a
time series. It is preferable to reserve the term "process" for! uncountably in-
finite.
In what follows, l is always an interval (to, T(, where to<T, of the real axis R1.
We interpret the parameter t as time. We wish to admit the cases to = - oo and
T=oo. Then, (to, T} is interpreted as (- oo, T(, (to, oo), or (- oo, oo).
If {X,; t E (to, T(} is a stochastic process, then,X, (-)is, for every fixed i E (to, T I,
an Rd-valued random variable whereas, for every fixed w E Q (hence for every
observation), X. (w) is an Rd-valued function defined on (to, T], hence an ele-
ment of the product space (Rd)<<o. Tl. It is called a sample function (realization,
trajectory, path) of the stochastic process.
The finite-dimensional distributions of a stochastic process {X,; t E (to, T(} are
given by
where t and t; belong to Ito, TI, x and xi belong to Rd (the symbol : applies to
the components), and n > 1.
Obviously, this system of distribution functions satisfies the following two con-
ditions:
a) Condition of symmetry: if {il, ... , in} is a permutation of the numbers 1,
n, then for arbitrary instants and n ? 1,
Flit..... #in (xil, ... , xi,) = F,1 ,n (x1, ... , xn).
F81 . In. in+1. . to (xl, ... , x,,,, co, ... , oo) = F11,..., ,m (x1, ... , x,)-
In many practical cases, we are given not a family of random variables defined on
a probability space but a family of distributions P,1 ,n (B1,... , Bn) or their
distribution functions F,1, , In (xl, ... , xn)which satisfy the symmetry and com-
patibility conditions. That these two concepts are equivalent is seen from the fol-
lowing theorem:
(1.8.1) Kolmogorov's fundamental theorem. For every family of distribution
functions that satisfy the symmetry and compatibility conditions, there exists a
1.8 Stochastic Processes 23
probability space (9, 21, P) and a stochastic process (X,; t E Ito, TJ) defined on
it that possesses the given distributions as finite-dimensional distributions.
In particular, if we start off with given distributions, we shall always assume the
following choice to have been made:
Q= (Rd)11o, T I
= set of all Rd-valued functions to = w (.) defined on
(to, TJ,
2( = ($d)[90. TI = product sigma-algebra generated by the cylinder
sets,
X, (w) = w (t) = projection of w onto the "t-axis", that is, the value
of the function w at the point t.
Now, the probability P on (Q, 21) is not (as in section 1.5 for independent X,)
simply the product probability but is determined on the cylinder sets by
,,,(B1,...,B.)
and can be continued in a unique manner to all21. Henceforth, this canonical
choice, for which the elementary events coincide with the sample functions, will
be the one made.
Also, for(X,; t E (to, TJ} we shall write brieflyX, orX (t), usually omitting the
variable to.
Two stochastic processes X, and X, defined on the same probability space are
said to be (stochastically) equivalent if, for every t E (to, TJ, we have X, = IS
with probability 1. Then, X, is called a version of X, and vice versa. The finite-
dimensional distributions of X, and X, coincide. However, since the set N, of ex-
ceptional values of w for which X, - X, depends in general on t, the sample
functions of equivalent processes can have quite different analytica properties. For
example, for Q = J to, T J = 10, 11 and P =1, the processes X, (w) = 0 and
(0, w # t,
T, (w) _
and
{w: X, (w) E A for all t E (a, b)} (in general not measurable)
differ, if at all, only on a subset of N. If we arrange for all subsets of the sets of
measure zero to belong to 21 (which is always possible), the second set will also
belong to 21 and will possess the same probability as the first.
How can we tell from the finite-dimensional distributions of a process whether
this process has continuous sample functions or not? The following criterion of Kol-
mogorov asserts that only the two-dimensional distributions are necessary for
this: Let a, b, and c denote positive numbers such that, for t and s in [to, T J,
(1.8.2) E (X,-X,la < c It-s(1+b
1.9 Martingales
Let (0, 21, P) denote a probability space, let {X,; I E Its, T]} denote anRd-
valued stochastic process defined on (Q, 21, P), and let {21,},El,o, TI denote an in-
creasing family of sub-sigma-algebras of 21, that is, one having the property
21, C 21, for to <s_St_T.
If X, is 21, -measurable and integrable for all t, then the pair {X 21,}tEl,o,Tl is
called a martingale if
E (X, [ 21,) = X, almost certainly
for all sand tin [to, TI, where s < t. 1 f X, is a real-valued process and if we re-
place the equality sign in the last formula with < orwhat we have is a super-
martingale or a submartingale. In particular, if
21,=21([to,t])=21(X,;toys<t),
that is, the history of the process X, prior to the instant t, is chosen as a condi-
tion, then X, is called a martingale (or a supermartingale or submartingale).
Martingales are an abstract presentation of the concept of "fair game" and they
constitute one of the most important tools in the theory of stochastic processes.
Sample functions of a (separable) martingale have no discontinuities of the second
kind; that is, they have, at worst, jumps.
Let X, and Y, denote two martingales with respect to the same monotonic
family 21, Then, A X, + B Y, (where A and B are fixed p x d matrices) is a
martingale and, in particular, X, - X,o is a martingale. Furthermore, for every
martingale X, the process [XJP (where p ? 1) is a submartingale whenever
X, E LP. For a real-valued martingale X, X$ = max (X 0) and X,- = max
26 1. Fundamentals of Probability Theory
of the real axisR'. For our purposes, it will be sufficient in all cases to assume
[to, TI C 10, oo) = R+.
Thus, 0:5 to < oo. Here, we admit T = oo, in which case (to, TI should be inter-
preted as Ito, oo). For (X,; t E }to, TI} we shall write simplyX,. We shall refer to
the index t as the "time". We shall always assume that the state space Rd is
endowed with the sigma-algebra $d of Borel sets.
The process X, is defined on a certain probability space (0, 21, P) . A sample
function X. (w) is therefore an Rd -valued function defined on the interval
Ito, T j . We emphasize again that we always assume that the choice
Q = (Rd)Ito,TI
is made, where (Rd)Ito,TI is the space of all Rd-valued functions defined on the
interval (to, TI,
21 = ($d)lto.Tl
is the product sigma-algebra generated by the Borel sets in Rd, and X, = to (t)
for all w E 0. Then, P is the probability uniquely defined (according to Kolmo-
gorov's fundamental Theorem (1.8.1)) by the finite-dimensional distributions of
the process X, on (0, 21). If we have further information regarding the analyti-
cal properties of the sample functions, we can choose for Q certain subspaces of
(Rd)Ito.TI (for example, the space of all continuous functions).
Suppose that, for to < t1 < t2 < T,
2 1 (Jt1, t21) = 21 (X,, t1 < t :_5 t2)
is the smallest sub-sigma-algebra of 21 with respect to which all the random vari-
ablesX for t1 <t 9 t2, are measurable. In terms of our time figure, 21 (It,, t2J)
contains the "history" of the process A, from time t1 to time t2, that is, those
events that are determined by the conditions imposed on the course of the pro-
cess X, during the interval [t1, t21 and at no other time. 21 (It,, t21) is generated
by the cylinder events
B2
Fig. 1:
Cylinder event.
1
tt
L
Si
I l
Se
t t
t2
-
holds with probability 1. We summarize the various equivalent clarifying formu-
lations of the Markov property in
(2.1.3) Theorem. Each of the following conditions is equivalent to the Markov
property:
Let X,, for g o <t:5 T, denote a Markov process. In accordance with what was
said in section 1.7, there exists a conditional distribution q (X B) = P (s, X t,
B) corresponding to the conditional probability P (X1 E BJX,). The function
P (s, x, t, B) is a function of the four arguments s, t E Ito, TI (with s -:5-i t),
30 2. Markov Processes and Diffusion Processes
Fig. 2: x
The Chapman-Koimogorov equation.
21 (Ito,sl))
= E (P (X, E B 121 ([to, uJ)12 Qto, sJ))
= E (P (X, E B IXU)121 (Ito, sJ))
= E (P (u, X,,, t, B) I X,)
= J
P (u, y, t, B) P (s, X,, u, dy).
Rd
(1 for x E B,
P (s, x, s, B) = IB (x) = jl
0 for x4$ B.
This last statement follows from the fact that
P (X, E BMX:) =1Ix,EBI
for all [P] values X, = x.
(2.2.3) Definition. A function P (s, x, t, B) with the properties (2.2.lb-e)
(where (2.2.2) is satisfied for all x E Rd ) is called a transition probability (transi-
tion function). If X, is a Markov process and P (s, x, t, B) is a transition prob-
ability, so that (2.2.1 a) is satisfied, then P (s, x, t, B) is called a transition prob-
ability of the Markov process X, . Then, for fixed s, t E [to,T J such that s < t, it
is uniquely defined as a function of x and B with the possible exception of a set
N of values of x (independent of B) such that P [X, E NJ = 0.
We shall also use the notation
P (s, x, t, B) = P (X, E BI X, = x),
which is the probability that the observed process will be in the set B at time t if
at times, wheres <t, it was in the state x. Here, the number P (X, E BJX, = x)
is completely defined by the equation above, even though the condition [X, = xJ
may have probability 0 (as it does for most of the processes examined in the pres-
ent book).
Fig. 3:'
The transition probability x
As, x, t, BY
I I 0-
s t
(2.2.4) Remarks. (a) If the probability P (s, x, 1, ) has a density, that is, if, for
all s, t E [to, TJ, where s < t (for s = t, existence of a density is impossible by vir-
tue of (2.2.1 e)), all x E Rd, and all B E mod, we have
P,. (A)=P[X,,EA1,
then, for finite-dimensional distributions
P (X,1 E B1, ... , Xen E Bn1, to : t1 < ... <tn : T, B; E 8d,
we have
J J
...
1
P(tn-l,xn-1, tn,Ba)
Rd Bt Bn-,1
(2.2.6)
'P (tn-2, xn-2, to-1, dxn-1) ... P (to, xo, it, dx1) PO (dxo),
and hence, in particular,
space (Q, 21, P) and a Markov process X, (where t E [to, TI) defined on it, which
has transition probability P (s, x, t, B) and for which X,a has the distribution P,0.
To prove this, we use equation (2.2.6) to construct from P (s, x, t, B) and P,0
consistent finite-dimensional distributions and from them, in accordance with
Kolmogorov's fundamental theorem (1.8.1), the desired process. Here, we can al-
ways use forQ, X and 21 the special choice discussed in section 2.1.
(2.2.8) Definition. A Markov process X, for t E [to, TI is said to be homogen-
eous (with respect to time) if its transition probability P (s, x, t, B) is station-
ary, that is, if the condition
P(s+u, x, t+u, B) = P (s, x, t, B)
is identically satisfied for t° < s < t < T and to < s + u <_ t + u < T. In this case,
the transition probability is thus a function only of x, t - s, and B. Hence, we
can write it in the form
P(1-s, x, B)=P(s,x, t, B), 0:_5t-s:T-to.
Therefore, P (t, x, B) is the probability of transition from x to B in time t, re-
gardless of the actual position of the interval of length t on the time axis. For
homogeneous processes, the Chapman-Kolmogorov equation becomes
Furthermore, if there exists an invariant P°, we have, for arbitrary initial distri-
butions and T = oo,
lim P [X, E BI = P° (B)
l -OO
for all B E `8d whose boundary has P°-measure 0; that is, the invariant distribu-
tion is a stationary limit distribution and is in fact independent of the initial dis-
tribution. There are probabilistic and analytical conditions under which a homo-
geneous transition function P (t, x, B) admits an invariant distribution (see
Prohorov and Rozanov [15], p. 272, or Khas'minskiy [65], p. 99). Compare
Theorem (8.2.12) and remark (9.2.14).
2.3 Examples
By Theorem (2.2.7) an initial and a transition probability fix a Markov process.
In the following examples, we shall assume these probabilities given.
(2.3.1) Example: Deterministic motion. Suppose that to every pair(s, t),
where t° <s <t <T, is assigned a measurable mapping G,,, of Rd into itself
such that, for all x E Rd,
G,,,(x)=x
and
i,
with initial condition x, = x (and f is such that there exists a unique solution on
the interval is, TJ).The corresponding transition probability is
Property (2.2.1b) is obvious, (2.2.1c) follows from the measurability of the map-
pingG,,, , (2.2.1e) follows from the property G,,, (x) = x, and the Chapman-
2.3 Examples 35
j 6G.', (Y) (B) 8c,. u (x) (dy) = 8c.., (G,,. (x)) (B).
Rd
A nontrivial stochastic effect can be achieved only by the choice of the initial
probability.
(2.3.3) Example: Whether a process is Markovian depends essentially on
the choice of the state space. Whereas the solution x, of the first-order differen-
tial equation
xt = f (t, x,), x,0 = c, to-:S t, T,
is a Markov process (and this also holds for a differential equation of the form
Yt =
x+n-1)
n(s,x,z)n(t,z,y)dz=n(s+t,x,y),
Rd
As a rule, the initial probability Po is taken equal to bo; that is, Wo = 0. Since
n(t,x+z,y+z)=n(t,s,y)forallzERd,
we are dealing with a spacewise as well as a timewise homogeneous process. We
shall examine it in greater detail in section 3.1 . With the criterion (1.8.2), we
shall be able later to show easily that W, can be chosen in such a way that it
possesses continuous sample functions with probability 1. Henceforth, we shall
assume that W, was so chosen. The function W, is a mathematical model of the
Brownian motion of a free particle in the absence of friction.
(2.4.1) T, g (x) = E. g (X i) = J
g (y) P (t, x, dy).
Rd
into itself, they are linear, positive, and continuous, and they have the norm
11T,11 = 1. The operator To is the identity operator, and
T,+,= T, T,=T,T,, t, s,t+sE10,T-toj.
In particular, in the case T = oo the T, constitute a commutative one-parameter
semigroup, the so-called semigroup of Markov transition operators.
(2.4.3) Examples. In the case of deterministic motion generated by an autono-
mous ordinary differential equation ai, (x,) we have
T, g (x) = g (x, (0, x)),
where x, (0, x) is the solution with the initial value xo = x. For the Wiener pro-
cess W, , we have
t)-1l2
T, g (x) = (2 7r J
g (y) e-IY"zl=l2e dy
Rd
T,g(x)=g(x+vt)
and
38 2. Markov Processes and Diffusion Processes
g(x+v t)-g(x)
lim
t
I
i=1
vi
ag (x)
axi
The existence and the required uniformity of the limit are guaranteed in the
domain 0 4 = set of all bounded uniformly continuous functions with bounded
uniformly continuous first partial derivatives.
(2.4.7) Example. For the d-dimensional Wiener process W, we must calculate
in accordance with (2.4.3). For this we use Taylor's theorem, which, for every
twice continuously partially differentiable function g, yields
d t d d
g (x+Viz)-g (x) = Vi zi g,, (x)+- zi zj g.izj (x)
2 i=lj=t
d d
t
+ ZiZj(gxizi(x)-gzizi(x)),
2 i=1 j=1
where z is a point between x and x+Yi z. When we substitute this into the ex-
pression given above for A g (x), we get
d 32g-1
g=12
0<t=T-s,
2.5 Diffusion Processes 39
where the limit means the uniform limit in (s, x) E [to, T] X Rd. Once again, sto-
chastically continuous transition probabilities P (s, x, t, B) (in particular, the tran-
sition probabilities of Markov processes with sample functions that are continuous
from the right) are uniquely defined by A.
Frequently, it is sufficient to determine the actions of A on only those functions
g that are independent of s, that is, to consider
T, g (s, x)-g (x)
(2.4.9) Ag (s, x) = lim , g E B (Rd).
+0 t
For homogeneous processes, this reduces to (2.4.5) but, in general, T, g is, like
A g, a function of s and x.
Diffusion processes are special cases of Markov processes with continuous sample
functions which serve as probability-theoretic models of physical diffusion phenom-
ena. The simplest and oldest example is the motion of very small particles, such
as grains of pollen in a fluid, the so-called Brownian motion. The Wiener process
W, of example (2.3.4) is a mathematical model of this timewise homogeneous
phenomenon in a homogeneous medium (see section 12.1 and Wax [511).
Besides the original significance of the diffusion process, there is another one,
which is emphasized in this book, namely, the description of technical systems
subject to "white noise". Also, continuous models for random-walk problems
lead to diffusion processes.
Depending on the classification of methods (see the Introduction), there are two
basically different approaches to the class of diffusion processes. On the one hand,
one can define them in terms of the conditions on the transition probabilities
P (s, x, t, B), which is what we shall do in the present section. On the other
hand, one can study the state X, itself and its variation with respect to time. This
leads to a stochastic differential equation for X, . As we shall see in Chapter
9, the two approaches lead essentially to the same class of processes.
(2.5.1) Definition. A Markov processX, for t0 < t <T, with values in Rd and
almost certainly continuous sample functions is called a diffusion process if its tran-
sition probability P (s, x, t, B) satisfies the following three conditions for every
s E [to, T), x E Rd, and e> 0:
40 2. Markov Processes and Diffusion Processes
I
a) lim P (s, x, t, dy) = 0;
Its t-S Ir-=I>,
f
lim 1
,1, t-s Ir-=I=
J (y-x) P (s, x, t, dy) _ l (s, x);
then, since
jy-xlk E2+aI
ly-x21 6 P
P (s, x, t, dy) < -k (s, x, t, dy)
Ir-:I >8 Rd
(2.6.2) A g (s, x) _ l io1 (g (s+ t, y)-g (s, x)) P (s, x, t +s, dy)
t
A= a +Z
as
or, for time-independent functions and the homogeneous case,
A=Z.
Therefore, the diffusion process is uniquely determined in this case by f and B.
Furthermore, we see that the first derivatives in 7) arise as a result of systematic
drift and the second derivatives as a result of the local irregular "chaotic" fluc-
tuational motions.
In the next chapter, we shall take a purely probabilistic route to the construction
of a diffusion process for a given operator. A purely analytical approach yields
(2.6.3) Theorem. Let X , for to<t<T, denote ad-dimensional diffusion pro-
cess with continuous coefficients/ (s, x) and B (s, x). The limit relations in defi-
nition (2.5. 1) hold uniformly ins E [to, T). Let g (x) denote a continuous
bounded scalar function such that
fors <t, where t is fixed, and x E Rd is continuous and bounded, as are its deri-
vatives au/axi and a2U/axi axi for I i, j:_5 d. Then, u (s, x) is differentiable
with respect to s and satisfies Kolmogorov's backward equation
au+`) u = 0,
(2.6.4)
as
where Z is the operator (2.6.1), with the end condition
u (s, x) = g (x).
backward time arguments s and x in contrast with the forward equation (see
Theorem (2.6.9)), in which the transition density p (s, x, t, y) is differentiated
with respect to t and y.
(2.6.5) Remark. Theoretically, the backward equation (2.6.4) enables us to
determine the transition probability P (s, x, t, ) This transition probability is
uniquely defined if we know all the integrals
u(s,x)= J
g(y)P(s,x,t,dy),
Rd
where g ranges over a set of functions that is dense in the space C (Rd) of con-
tinuous bounded functions. If the solution of (2.6.4) is unique for these func-
tionsg, we can, for known f and B, calculate u (s, x) from it and then calculate
P (s, x, t, ).
(2.6.6) Theorem. Suppose that the assumptions of Theorem (2.6.3) regarding
X, hold. If P (s, x, t, ) has a density p (s, x, t, y) that is continuous with re-
spect to s and if the derivatives ap/ax; and a2p/ax; axe exist and are continuous
with respect to s, then p is a so-called fundamental solution of the backward
equation
ap+Zp=0;
that is, it satisfies the end condition
lim p (s, x, t, y) = a (x-y)
t9 '
ap+l a2p=0.
as
E ax?
2 tat
(2.6.8) Remark. If X, is a homogeneous process, then the coefficients f (s, x)
I (x) and B (s, x) = B (x) (and hence the operators) are independent of s. Since
P (s, x, t, B) = P (t - s, x, B), the sign of ap/as changes in the backward equa-
tion, for example, for the density p (s, x, y); that is,
ap+`.np=0.
(2.6.9) Theorem. Let X, for to :-5 t < T, denote a d-dimensional diffusion pro-
cess for which the limit relationships in definition (2.5.1) hold uniformly in s
44 2. Markov Processes and Diffusion Processes
and x and which possesses a transition density p (s, x, t, y). If the derivatives
ap/at, 3(J; (t, y) p)/3y;and a2 (b;, (t, y) p)/ay; 3y; exist and are continuous func-
tions, then, for fixed s and x such that s : t, this transition density p (s, x, t, y)
is a fundamental solution of Kolmogorov's forward or the Fokker-Planck equa-
tion
ap d a 1 d d a2
(l, (t, y) P) - 1
at + E ay,
(2.6.10) (b41(t, y) P) = 0.
2 i-I an ay;
For proof, we again refer to Gikhman and Skorokhod (51, p. 375.
If we define the distribution of X,O in terms of the initial probability P,Q, we ob-
tain from p (s, x, t, y) the probability density p (t, y) of X, itself:
If we apply the integration with respect to P,o (dx) to (2.6.10), we see that
p (t, y) also satisfies the forward equation.
(2.6.11) Example. For the Wiener process, the forward equation for the homo-
geneous transition density
p (t, x, y) = (2 a t)-d12 e-Ir-xlsl2t
becomes
ap-I d a2p
at iay?'
which in this case is identical to the backward equation with x replaced
by y.
Chapter 3
Wiener Process and White Noise
Let us now summarize the most important properties of the d-dimensional Wiener
process W, defined in section 2.3. This process, a mathematical model of Brownian
motion of a free particle with friction neglected, is a spacewise and timewise homo-
geneous diffusion process with drift vector / = 0 and diffusion matrix B I. A
clear understanding of this process is especially important since it proves to be the
fundamental building block for all (smooth) diffusion processes. We have already
seen this in the heuristic derivation of the stochastic differential equation of sec-
tion 2.5.
Since W, is a Markov process, all the distributions of W, are defined, in accord-
ance with (2.2.6), by the initial condition
Wo=0
and the stationary transition density
p(t,x,y)= n(t,x,Y)=(2 rt)-d12 eXp(-IY-xI2/2 t).
From this we get the density p (t, y) of W, itself:
P (t, Y) = n (9, y) = n (t, 0, y) = (2 n t)-d12 exp (-Iy12/2 t).
This is the density of the d-dimensional normal distribution `R (0, t I). The n-
dimensional distribution of W P I W,1 E Bt, ... , W,n E B I,where 0<t1 < ...
<ta, then has, according to formula (2.2.6), the density
(3.1.1) n (t1, 0, x1) n (t2-t1, x1, x2) ... n (t.-t,-1, x11-1, x,).
Since E IW4 - W,I' _ (d2+ 2 d) (t - s)2 , it follows immediately from Kolmo-
gorov's criterion (see section 1.8) that for these finite-dimensional distributions
there exists a stochastic process with continuous sample functions. Since
d
Y) (2 at)-t12exp(-IY;-x,12/2 t),
i=1
are independent. Here, W, - W, (for s < t) has the distribution 91(0, (t - s) I),
which depends only on t - s; that is, the increments are stationary. The last two
assertions follow immediately from (3.1.1) when we remember that, in our case,
the value of n (s, x, t, y) = n (t - s, x, y) depends only on y - x and t - s.
In fact, a Wiener process can be defined as a process with independent and sta-
tionary 91(0, (t - s)1)-distributed increments W, -W,, with initial value Wo = 0,
and with almost certainly continuous sample functions.
The fact that W, is a process with independent and stationary increments makes
it possible to apply the limit theorems for sums of independent identically dis-
tributed random variables. This provides valuable information regarding the
order of magnitude of the sample functions of W, . The strong law of large
numbers states that
The true order of magnitude of the sample functions follows from the law of the
iterated logarithm: for d = 1 (that is, for each individual component of a d -
dimensional Wiener process),
Wt _
lim
p 2 t log log t- 1
and
lim inf
t.oo 2t l glogt--1
t
both with probability 1. This means that, for every e > 0 and for almost every
sample function W, (w) , there exists an instant to (w) subsequent to which we
always have
- (1 + e) 2 tlog log t < W, (co) < (1 + e) 2 t log log t .
On the other hand, the bounds (1-e) 2 i log log t and -(1- e) V2 t log log t
3.1 Wiener Process 47
(for 0 < e < 1) are exceeded in every t -neighborhood of oo for almost every
sample function.
For a d-dimensional Wiener process, we have in general
W,
lim inf -1,
t+o 2tloglogl/t
for d-dimensional W,
limssu 1,
l/t -
+
WIo g
p V2 t log
for almost all sample functions. One consequence of this is that every component of
sample functions W, has, with probability 1, in every interval of the form `.0, E)
with e > 0 infinitely many zeros, which cluster about the point t = 0 . This
behavior is exhibited at every point c > 0 because, by Lemma
48 3.- Wiener Process and White Noise
(3.1.3), part c), when W, is a Wiener process, W,+,-W, (for fixed s and non-
negative t) is also a Wiener process (independent in fact of W, fort < s).
Almost all sample functions of a Wiener process are continuous though, in accordance
with a theorem of N. Wiener, nowhere differentiable functions. Proof of this as-
sertion and most of the previously made assertions regarding W, can be found,
for example, in McKean 1451. For fixed t, the nondifferentiability can be made
clear as follows: The distribution of the difference quotient (W,±k-W,)/h is
9 (0, (1/1hl)1). As this normal distribution diverges, so that, for every
bounded measurable set B,
P ](W,+k-W,)/h E B] -- 0
Therefore, the difference quotient cannot converge with positive probability to a
finite random variable.
We can get more precise information from the law of the iterated logarithm. For
d = 1(hence for every individual component of a many-dimensional process), we
obtain for almost every sample function and arbitrary e in the interval 0 < e < 1 ,
as hi0,
W,+ W, 2 log log 11h
(1- e) Y infinitely often
h
and, simultaneously,
n
(3.1.5) qm-lim
b^ I (W,k Wgk-1)(W'k-W'k-1) _
k=1
If &n approaches 0 so fast that a" < oo , then convergence occurs in (3.1.5) and
(3.1.6) also with probability 1.
Proof. Let W', for i = 1, ... , d, denote the i th component of W, If
then,
E(S,)=(t-s)d;i
and
n
2(t-s)8"--'0 (8"-,0),
which proves (3.1.5). If we apply the trace operator to both sides of (3.1.5), we
obtain (3.1.6). If an < co, then V <oo, which, by virtue of section 1.4,
is sufficient for almost certain convergence in (3.1.5).1
Let us look at the decomposition with intermediate points 4) = s +
(t - s) k/2", fork=O, 1, ... , 2" and n= 1, 2, ... . Since 8" = (t - s) 2 -" and
8" < oo, the left-hand member of the inequality
2^ 2n
IW`k-W-k-1I2 < Max 1Wlk-Wtk-11 Z IWtk-W,k_11
k=1 k=1
converges, for almost every sample function, to the finite random variable d (t - s)
as n-. oo. The almost certain continuity of sample functions implies that
Fig. 4:
Sample function of
the Wiener process.
2n
IW,k-Wik_1I -- oo
k-1
with probability 1; that is, almost all sample functions of W, are of unbounded
variation in every finite interval.
(3.1.7) Remark. Equations (3.1.5) and (3.1.6) serve as motivation for the sym-
bolic notation (frequently used, especially in the case d = 1)
(dW,) (dW,) = I dt
and, ford = 1,
(dW,)2 = dt.
0
C(O)=E= J / (2) d1 = oo.
-00
Since C (t) = 0 for t * 0, the values of E, and &,+, would be uncorrelated for ar-
bitrarily small values oft (and independent, in fact, since the process is Gaussian),
a fact that explains the name "purely random process". Obviously, the sample
functions of a process with independent values at all instants must be extremely
irregular.
In fact, white noise was first correctly described in connection with the theory
of generalized functions (distributions), as is done, for example, in Gel'fand and
Vilenkin [22] , Chapter III. Let us discuss this matter briefly.
We start with the fact that, in every actual measurement of the values of a func-
tion / (t), the inertia of the measuring instrument allows us to get only an aver-
age value
0
(3.2.2) 01(9') = f 9, (t) l (t) dt,
*Henceforth, we shall not use this term in this sense, so that whenever "distributions" are
mentioned, they have the probability-theory meaning.
52 3. Wiener Process and White Noise
E(4) (op)--in(p))(0(W)-m(u))=C(9,v').
One of the important advantages of a generalized stochastic process is the fact
that its derivative always exists and is itself a generalized stochastic process. In
fact the derivative ( of 0 is the process defined by setting
(b (4,)_-4P (.0)-
The derivative of a Gaussian process with mean m (4p) and covariance C (q,, rp) is
again a Gaussian process and it has mean value rim (p,)= -m (47) and covariance
C(01 >a).
As an example, let us look at a Wiener process and its derivative. From the re-
presentation
0
0 (97) = 9, (t) W, dt
J
-00
(we set W, = 0 fort <0), we conclude immediately that, with W, regarded as a
generalized Gaussian stochastic process, we have
3 .2 White Noise 53
m(p)=0
and
00 00
where
Let us now calculate the derivative of the Wiener process. This is a generalized
Gaussian stochastic process with mean value ri, (q,) = 0 and covariance
C(q,,YP)=C(0,Y')
0
= J p (t) Y' (t) dt.
0
J
J b(t - s) 97 (t) (t) dt.
0 0
Therefore, the covariance function of the derivative of the Wiener process is the
generalized function
C (s, t) = b (t -s).
But this is the covariance function of white noise! Thus, white noise E, is the
derivative of the Wiener process W, when we consider both processes as general-
ized stochastic processes. This justifies the notation
(3.2.3a) E, = W,
(3.2.3b) W, = i E, ds
0
a consequence of which is that, for arbitrary functions q>>, ... , p E K, the ran-
dom variable (4e (q, (t+h)), ... , 0E (p,, (t+h)) has the same distribution for
all h ; that is, white noise is a stationary generalized process. One can show that,
up to a factor, the spectral measure of this generalized process is Lebesgue mea-
sure and hence the process has a constant spectral density on the entire real axis.
We also conclude from (3.2.4) that
Because of the independence of the values at every point, white noise is appro-
priate for describing rapidly fluctuating random phenomena, for which the cor-
relation of the state at the instant t with the state at the times when It -sI is
increasing becomes small very rapidly. For example, this is the case with the
force acting on the particle observed in the case of Brownian motion or for the
variation in current in an electrical circuit due to thermal noise.
White noise E, can be approximated by an ordinary stationary Gaussian process
X, for example, one with covariance
C(t) = a e-bI I (a>O,b>O).
Such a process has spectral density
1 ab
a V+,I2
If we now let a and b approach oo in such a way that alb --* 1/2, we get
0, t 0,
C(t)--* 1 00, t 0'
but
0
1
so that
C (t) - (t);
that is, X, converges in a certain sense to E.
Let us now look at the indefinite integral
Y,= X,ds.
E Y, Y, = f i a e-b1-I du dv.
00
Taking the limit as above, we get
EY,Y,-*min (t,s),
56 3. Wiener Process and White Noise
that is, the covariance of the one-dimensional Wiener process W. This is a further
heuristic justification of formulas (3.2.3).
Now, we can define the d-dimensional (Gaussian) white noise as the derivative (in
the generalized function sense) of the d-dimensional Wiener process. I t is a sta-
tionary Gaussian generalized process with independent values at every point, with
expectation vector 0, and with covariance matrix 6 (t) 1. In other words, white
noise in Rd is simply a combination of d independent one-dimensional white
noise processes. The spectral density (now a matrix!) of such a process is 1/2 it.
A d-dimensional Gaussian noise process with expectation 0 and with covariance
matrix
Erl,i,=Q(t)6(t-s)
is treated by various authors (for example, Bucy and Joseph 1611 and Jazwinski
[661). Such a process is no longer in general stationary but, as a "delta-correlated"
process, it has independent values at every point. We shall see later (see remarks
(5.2.4) and (5.4.8)) that we can confine ourselves to the standard case Q (t) = I
without loss of generality. We obtain rl, from the standard noise process E, by
alt = G (t) J,
where G (t) is any (d x d)-matrix-valued function such that G (t) G (t)' = Q (t).
Chapter 4
Stochastic Integrals
4.1 Introduction
The analysis of stochastic dynamic systems often leads to differential equations
of the form
(4.1.1) X,= f (t, X,)+G(t,X,)E
where we can assume that t, is white noise. Here, X, and f are Rd-valued func-
tions, G (t, x) = (G;i (t, x)) is a d x m matrix and f, is m-dimensional white noise.
We saw in section 3.2 that, although E, is not a usual stochastic process, nonethe-
less the indefinite integral of f, can be identified with the m-dimensional Wiener
process W,:
ds
0
x,=c+J f(s,x,)ds,
to
for which it is possible to find a solution curve by means of the classical iteration
procedure.
In the same way, we transform equation (4.1.1) into an integral equation
Here, c is an arbitrary random variable, which can also degenerate into a con-
stant independent of chance. As a rule, the first integral in the right-hand mem-
ber of equation (4.1.2) can be understood as the familiar Riemann integral. The
second integral is more of a problem. Because of the smoothing effect of the in-
tegration, we still hope to be able to interpret integrals of this form for many
functions G (t, x) as ordinary random variables, which would spare us neces-
sity of using generalized stochastic processes. We now formally eliminate the white
noise in (4.1.2) by means of the relationship dW, _ E, dt, writing
e t
Equation (4.1.4) is also written more briefly in the following differential form:
(4.1.5) dX, = / (t, X,) dt+G (t, X,) dW,.
Since, in accordance with section 3. 1, almost all sample functions of W, are of un-
bounded variation, we cannot in general interpret the integral in the right-hand
member of (4.1.4) as an ordinary Riemann-Stieltjes integral. We shall consider
this with the example given in the following section.
For fixed (i.e., independent of w) continuously differentiable functions g, we
could use the formula for integration by parts to give the following definition:
t t
The last integral is an ordinary Riemann integral, evaluated for the individual
sample functions of W, .
In many cases of importance in practice, however, the function G (t, x) in the
integral equation (4.1.2) is not independent of x. For this general case, K. Ito
[42] has given a definition of the integral (4.1.3) that, as we shall see, includes
the definition (4.1.6) as a special case.
4.2 An Example
Our task is now to define the integral
t t
i 1dW,=W,-W,o
to
X, = J
IV, dW,.
90
(4.2.1) W,dW,=(W,-Wo)/2.
to
n
S = Y WTi (W,i-W,i-1)
i-1
with ever finer partitioning and arbitrary choice of the intermediate points Ti. Let
us now show that the limit of depends on the choice of the intermediate
points.
To do this, let us write the sum S. in the form
S.=We/2-W2So/2-1
2 I (W,i-W,i_1)2
n a
1 (WII- WT)) (WTi-Wll-1).
+ (WTi- W,1_1)2+
60 4. Stochastic Integrals
and
n n
v (WTi-Wti-1)Z =2 (ri-ti1)Z
i=1 i=1
Therefore, the convergence of {Sn} depends on the behavior of the sums (4.2.2),
which can assume any value in the interval [0, t -to] with appropriate choice of
the r; . More precisely,
(4.2.3) qm-lim
eno
(S.- i=t (ri-ti-t)) _ (W; -W,o)/2-(t-to)/2.
9 9
Among integrals of the type (4.2.4), Ito's integral is characterized by the fact that,
as a function of the upper limit, it is a martingale. This can be seen as follows: If, for
simplicity, we take to = 0 and
Xr=W,/2+(a-1/2)t,
to <tl < ... <t,,= t of the interval (to, t]. and every choice of intermediate
points Ti E jti_1, ti] there corresponds a step function
n
W;°i dW, = S. _
S
WTi (W1
- Wti-1)
the existence and value of the limit of the S. depend on the intermediate points
Ti.
(4.3.2) Definition. Let to denote a fixed nonnegative number. A family & for
t > to, of sub-sigma-algebras of 21 is said to be nonanticipating with respect to the
m-dimensional Wiener process W, if it has the following three properties:
(a) 5, Ea, (to - s < t),
(b) a, D 28 [to, t] (t to),
(c) 5, is independent of 28; (t > to).
Since o = 28 10, oo) (apart from sets of measure 0), condition (c) means, for
example, for t = 0,that Ro can contain only events that are independent of the
entire Wiener process W, fort >_ 0.
(4.3.3) Example. The family
51,=20 [to,t[
is the smallest possible nonanticipating family of sigma-algebras. However, it is
often necessary and desirable to augment 1 [to, t) with other events that are in-
dependent of %B+ (for example, initial conditions). In the case of stochastic dif-
ferential equations, we usually take
,=% [to,tJ,c),
where c is a random variable independent of Xli u .
(4.3.4) Definition. A (d X m matrix)-valued function C = G (s, to) defined on
[to, t[ x Q and measurable in (s, w) is said to bc-nonanticipating (with respect to
a family & of nonanticipating sigma-algebras) if G (s,.) is 5, -measurable for all
s E [to, t[. We denote by Ms'" [to, t[ = M2 [to, t] the set of those nonanticipa-
ting functions defined on [to, t] X Q for which the sample functions G (., w)
are with probability I in L2 [to, t] , that is, with probability 1
[G(s,w)[2ds<oo.
Here, the last integral is to be interpreted as the Lebesgue integral (which, for
example, coincides with the Riemann integral in the case of continuous func-
tions). We denote by
d
[G[=(
GA)tl2
=(trGG')h12
the norm of the matrix G. We have G E M2'" Ito, t[ if and only if G;l E M?' I
Ito,t[forall iandj.
Furthermore,
M2 [to, s[ D M2 [to, t], to<s t.
64 4. Stochastic Integrals
We set
G (t, w)
Fig. 5:
A nonanticipating step function. I I I I
0
to tj t2 ... te_l 1
ft 1 a
a) (a Gt + b G2) dW = a I GI dW + b G2 dW .
J 1
to to to
b) Also,
Gtk O s dWk,
W,
G dW = W=
to
2: S Gdk (S) dW; W
(k_1
k-t
to
E(ICdW)=O.
b
d) For E [G (3)12 < oo, where s E [to, fl, the following holds for the d x d co-
variance matrix of the stochastic integral (4.4.1):
I e
` r
(4.4.3) E(I IEC(s)G.(s'ds,
b / to
and, in particular,
66 4. Stochastic Integrals
9 9
E(I GdW)(SGdW
n n
=S1+S2.
In particular, the matrix element ck'k in the ith summand of Si is
We now treat S2 in the same way and again use the fact that the terms
Gkp (ti-1) (W, -W; _1) Gk9 (ti-1) and WQ -W; are, for i < j independent.
This yields
S2=0,
the desired outcome. Since tr (G G') = [G[2, equation (4.4.4) follows from (4.4.3)
by applying the trace operator. I
One should note that we do not assume that E G (s) = 0 in Theorem (4.4.2c).
Rather, the stochastic integral (4.4.1) has expectation 0 in every case. Also, the
form of the covariance matrix (4.4.3) is amazingly simple.
Step 2: Definition of the stochastic integral for arbitrary functions in M2 (to, fl:
Let us show first that the set of step functions is dense in M2 [to, tJ in the sense
of the following lemma:
(4.4.5) Lemma. For every function G E M2 [to, tJ, there exists a series of step
functions G. E M2 [to, t] such that
that is uniformly bounded by the constant c and converges, for almost all
s E [to, t ] [XI, to G-all this with probability 1. If we apply the theorem of
dominated convergence to the variables (see section 1.3), it then follows with
probability I that G. converges to G in the sense of L2 [to, t].
Finally, if G is any function in M2 [to, t], we can, for example, by shifting to
G (t, w) 111G(., w)I g ,j, approximate it to an arbitrary degree of accuracy in the
68 4. Stochastic Integrals
of course follows (see section 1.4). We now wish to show that this last assertion
implies stochastic convergence of the sequence of integrals
G. (s) dW,
to a specific random variable. For this we shall use the following estimate for the
stochastic integral of step functions.
(4.4.6) Lemma. Suppose that G E M2 [to, t] is a step function. Then, for all
N>0 andc>0,
P(IG(s)dW,l>c1=N/c2+P[j IG(s)I2ds>N1.
l` 1 l
Proof. Suppose that G (s)=G (ti-1) for ti_1 s < t;, where to <t1 < ... <t = t.
The function
ti
J
IG(s)I2ds=N,ti_1;9 s<ti,
b
GN (s) _
IG (s)I2 ds > N, ti_ 1 s < ti,
J
to
IG (s)I2 ds is 5,i_-measurable.
Since
(4.4.7) EI JGN(s)dW,I2=JEIGN(s)I2ds<N.
ro to
If we use equations (4.4.7) and (4.4.8), Theorem (4.4.2a), and the triangle and
Chebyshev inequalities, we obtain
GN(s)dW,I12+P[ J IG(s)I2ds>N]
<EI Jro to
If we define
j G. (s) d W,
to
70 4. Stochastic Integrals
where I (G) is a random variable that does not depend on the special choice of
the sequence (Gn) .
Proof. Since
t t t
IGn-G., 2 ds < 2 IG-G.12 ds+2 ds,
to to to
= e/a2,
8 9
Since every stochastic Cauchy sequence also converges stochastically, there ex-
ists a random variable 1 (G) such that
The limit is almost certainly uniquely determined and independent of the special
choice of sequence {Gn} for which (4.4.10) holds. This is true because, if {Gn}
and (Gn) are two such sequences, we can combine them into a single sequence,
from which the almost certain coincidence of the corresponding limits follows.
The following definition follows from Lemma (4.4.9).
(4.4.11) Definition. For every (d x m matrix)-valued function G E M2 (to, t],
the stochastic integral (or I08's integral) of G with respect to the m-dimensional
Wiener process Wt over the interval [to, t] is defined as the random variable
I (G), which is almost certainly uniquely determined in accordance with Lemma
(4.4.9):
r
For special functions in M2 (to, t], we can give a stronger than mere stochastic
approximation of the stochastic integral. Specifically, we can approximate it in
mean square, as indicated by the following lemma:
(4.4.12) Lemma. For every function G E M2 [to, t] such that
t
(4.4.13) 1E[G(s)[2ds<oo,
t0
there exists a sequence {G0} of step functions in M2 [to, t] with the same prop-
erty, so that
and
t r
Proof. According to Lemma (4.4.9), there always exists a sequence {G0} of step
functions such that
72 4. Stochastic Integrals
st-lim j IG-6n12 ds = 0.
to
Let
X, IxI-`-N,
gN (x) _
INx/IxI, IxI>N.
Since IN (x) - gN (y)I = Ix - yI, it follows that
Since
E IgN(G)-gN(Ge)12ds= E19N(G)-gN(?;.)12ds---i0
!0 k
as n- oo. It follows from the same theorem (now applied to the variable
(s, w) E [to, tI x Q) that
as N- oo (by virtue of the inequality IN (G (s)) - G (5)12 IG (3)12 and the as-
sumption (4.4.13)). Therefore, there exist sequences(Nk) and {nk} such that
/ /
+2 j
N
-0
Accordingly, we can choose
Gk (3) 9Nk (GRk (s))
4.4 Definition of the Stochastic Integral 73
kJim J EJGk(s)-G(s)12ds=0.
90
However, by virtue of Theorem (4.4.2d) and what we have just proven, we see
that
t t
GpdWI2=JE[Gk-GpJ2ds-+0
Eli GkdW-J
to to to
a)
J
(aGr+bG2)dW =a J G1 dW+b J C2dW, a,bER'.
to 4 to
i Glk dWk\ W,
to
b) iGdW=
2:
90 k-1
J GdkdW
to
d) The relationship
r
implies
r r
st-lim GdW
A-00
r0 to
EIG(s)12ds <co,
to
then, for the expectation vector of the stochastic integral and its covariance ma-
trix, we always have respectively
E(GdW)=0
and
r e
E
q GdWI (f GdWI = EGG'ds;
hence, in particular,
E I i G dW IZ = f E IG12 ds.
ti I
lim
A. I EIGA-G12 ds=0
ro
and
4.5 Examples and Notes 75
I e
qm-lim f G. dW = G dW.
80 10
Now using Theorem (4.4.2), parts c) and d), we get from the last equation
E(f GdW)=0
90
E(f 1 EG.G.ds
- go
EGG'ds=E(i GdW)(JGdW).I
o/ to
Actual evaluation of the stochastic integrals is another matter. Of course, this
problem exists even with ordinary integrals. It is always possible to use the defi-
nition, which is of a constructive nature, to obtain an arbitrarily close approxi-
mation of any stochastic integral. Ita's theorem, which will be discussed in
section 5.3, is an important tool for explicit evaluation of many stochastic
integrals.
(4.5.1) X, = i W, dW,
90
which we discussed in section 4.2, can now be evaluated without difficulty, for
example, in accordance with
(4.5.2) Corollary. Suppose that G E M2 [to, tj is continuous with probability 1.
Then,
Proof. We have
76 4. Stochastic Integrals
G. dW = G (tk
k=1 (W tk- Wtk-1)
t0
_ (W; -W;)/2-(t-to)/2,
which we can get from (4.2.4), for example.
For an m-dimensional Wiener process, we similarly obtain
W; M. = (IWgI2-IWtoI2)/2-m (t-to)/2.
to
(4.5.4) Corollary. If
i G dW = 0 with probability 1.
to
This last corollary tells us, for example, that the value of every function G can
be changed for every s in a fixed set of Lebesgue measure 0 in [to, tJ without
changing the value of its stochastic integral. In particular, this is true for the
values of the function at a finite or countable set of points s.
(4.5.5) Corollary. If
iEJG(s)J2ds<oo,
80
J GdW
90
(4.5.7) J JG-G.J2 ds -- 0
to
G.G;,ds-r CG'ds,
b to
it follows that the limit in (4.5.8) is also normally distributed with the given first
and second moments.'
(4.5.9) Remark. Even when the step functions are such that
this does not in general imply almost certain convergence of the integrals
to to
Let us now show that formula (4.1.6), which we have used as a definition of a
stochastic integral for smooth functions G that are independent of to, can be
proven for considerably more general Ito stochastic integrals. In other words,
Ito's integral and the stochastic integral (4.1.6) are consistent and coincide when
the latter is defined. Somewhat more generally, we have
(4.5.10) Corollary. Suppose that G E Md"n [to, t] and that the variation of
G w) defined on [to, t} is almost certainly bounded. Then,
where the last integral is the usual Riemann-Stieltjes integral. If, in fact, G (., w)
is almost certainly continuously differentiable on [to, t[ (or, more generally, ab-
solutely continuous) with derivative G, then
r
(4.5.12) G (s) dW, = G (t) W, - G (to) W,o - G (s) W, ds.
J
b b
Proof. By virtue of the continuity of W., both integrals in (4.5.11) exist under
our assumptions as ordinary Riemann-Stieltjes integrals, and (4.5.11) is the us-
ual rule for integration by parts, which, for continuously differentiable C, takes
the form (4.5.12). Of course, the stochastic integral of G with respect to W.
coincides with the Riemann-Stieltjes integral if the latter exists.
We note that G in Corollary (4.5.1) is not necessarily independent of to.
Chapter 5
The Stochastic Integral as a
Stochastic Process, Stochastic
Differentials
5.1 The Stochastic Integral as a Function of the Upper Limit
Let W, again denote an m-dimensional Wiener process. Let to denote a fixed non-
negative number. Let {5,; t to) denote a family of nonanticipating sigma-alge-
bras. Let M2 [to, TI denote the set of nonanticipating (d x m matrix)-valued
functions (see (4.3.4)) for which we have defined the Rd-valued stochastic inte-
gral
T T
I GdW=IG(s,w)dW,(w).
to to
Suppose that G belongs to M2 [to, T), that A C [to, T J is a Borel set, and that 1 A
is its indicator function. Then,
G1AEM2[to,TJ.
We therefore define
T
JGdW=IGIAdW.
A to
In accordance with Theorem (4.4.14a), for any two disjoint sets A, B C [to, TJ,
we have
J
GdW= IGdW+IGdW.
AvB A B
In particular, for to < a < b < c < T (by virtue of Corollary (4.5.4), finitely many
points do not change the situation),
c b c
IGdW=JGdW+f GdW.
a b
80 S. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
Now if G belongs to M2, that is, to M2 [to, t] for all t > to, then X, is defined for
all t_> to. Then, all the assertions made in this chapter regarding X, are valid (if
the corresponding assumptions are satisfied) without any upper bound on the
time, that is, for arbitrarily large intervals [to, T].
We now wish to investigate the process X, for fixed G E M2 [to, T1. For this,
we shall always assume that we have chosen a separable version of I, (see section
1.8), which is always possible.
(5.1.1) Theorem. Let G denote a function in M2 [to, T] and suppose that
then (X,, 5,), for t E [to, T1, is an Rd -valued martingale; that is, for to < s <
c9 T,
E (X, [ 5,) = X, .
EX, 0,
5.1 The Stochastic Integral as a Function of the Upper Limit 81
min (t,,)
(5.1.3) E X, X; = E G (u) G (u)' du;
90
in particular,
t
(5.1.3a) E IX,I2 = E IG (u)12 du
J
ro
(5.1.4) P[ sup
.
a.a a
IX,-Xal>c] <f EIG(s)12ds/c2
and
EIG(s)12kds<oo, to<a<t<T,
then
r
(5.1.6) EIX,-XaI2k <(k(2k-1))k-r (t -a)k- r I EIG(s)I2kds.
a
and the obvious 5,. measurability of the integral X;a) of the step functions
G,,.
E(X,lif,)=X,
82 5. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
or, equivalently,
E(X,-X,E0 GdW15)=0.
This certainly holds for step functions since ` , and 9Z are independent
and E (W,,- W,,_,) = 0 and hence in general. Similarly, formula (5.1.3)
for the covariance matrix of the process X, is first proven easily for step
functions and then in general by taking the limit.
We know now that (X & is a martingale. Hence, (X, -Xa, 5,), for
tat a, is also a martingale and((X,-Xa12, &) is a submartingale (see
section 1.9). Then, inequality (1.9.1) yields, for every c > 0, to:-5 a b:-5
T, and p = 2,
so that
b
(Wt-Wma: tj)
we choose on the basis of Lemma (4.4.12) a sequence of step functions G,, such
that
T
lim
J
E IG (s)-G,,(3)I2 ds = 0.
to
X;"1= i G"dW,
90
we obtain
T
P [ sup IXt-X;"1I > c) < E IG-G"I2 ds/c2.
to;t:T 1
to
then
sup IX t
(w)-X("k) (w)I <= ck for all k ko (w).
to _taT
This makes Xt, with probability 1, the uniform limit of a sequence of continuous
functions, and hence continuous itself.
Step 3. Finally, for arbitrary G E M2 [to, T I, we approximate C with a function
GN, for N > 0, defined by
I
G (t) if IGI2 ds < N,
1
CN (t) =
0 if IGI2ds>N.
to
84 S. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
The process
X(N)
= J
GN dW
90
E [X,-X4[2 = J
E [G (s)12 ds.
At Is = 2, we prove (5.1.6) again, first for step functions and then for general G,
by choosing a sequence of step functions G. with the property
E [G (s)-G (3)14 ds - 0
(see Gikhman and Skorokhod [5] , pp. 385-386). For proof for general k, we re-
fer the reader to Gikhman and Skorokhod [361, pp. 26-27. 1
j E [G (s)[2 ds < 0
go
the process
5.2 Examples and Notes 85
)EG(v)G(v)'dv
go 90 90 90
0. 1
It would have been possible to obtain the orthogonality of the increments more
quickly from the following general formula:
(5.2.3) Corollary. Suppose that G and H belong to M2d "' [to, Tj and that
r r
E [G[2 ds < oo, f E [H]2 ds < oo.
1
to to
and, in particular,
t a min (t, s)
E(1GdW)(f HdWI = J
EG(u)H(u)'du, s,tE[to,T].
eo / to / to
(certainly for step functions and hence in general), from which the assertion fol-
lows. I
(5.2.4) Remark. Although, under the assumption of Corollary (5.2.1), the stochas-
tic integral has orthogonal and hence uncorrelated increments, these increments
are not in general independent. The case in which G E M2 [to, TI is independent
of w constitutes an exception. Then,
8
(5.2.5) X, = T G dW,
to
G(s)=U(s)A(s)112
5.2 Examples and Notes 87
coincides, with respect to distribution, with the given process X,, as one can im-
mediately verify by calculating the first two moments.
(5.2.6) Remark. The Gaussian process mentioned in Remark (5.2.4) is, in the case
d = m = I and to = 0, "essentially" (that is, up to within a transformation of the
time axis) a piece of the Wiener process. To see this, let us set
X,GdW=WT(,)
0
or
(5.2.7) XT-1(,) = W
as can be checked immediately by calculating the first two moments. For this
reason, we call r (t) the intrinsic time of X, . Here, T 1 (t) = min (s : s (s) = t) is
defined fort < r (T).
This consideration can be carried over to the case of arbitrary G E M2" ' [to, T j
if we define the intrinsic time by
r (t) = J
I G (s)12 ds, to = t = T.
10
Now, r (t) is itself a (nonanticipating) random function (see McKean [451, pp.
29-31). From the representation (5.2.7), for example, we immediately derive
the law of the iterated logarithm for a one-dimensional stochastic integral
in the form
X, _
lim
94 ioP 2 T (t) log log -1
(5.2.8) Remark. The property of having Infinite length in every finite interval also
carries over from sample functions of W, to sample functions of the stochastic
integral X , . This follows, as for W, in section 3.1, from the following more
precise result (see Goldstein [37a] , Theorem 4.1): Suppose that GE Ill2d m Ito, TI,
to < tl < ... <t,= T < oo, 8" = mk x (tk - . Then for
X,= i GdW)
to
we have
T
(5.2.9) st-lim (X,k X1k_l) (Xtk-`Ytk_l)I = G (s) G (s)' ds
k it J
90
and, in particular,
" T
(5.2.10) stt-lim E IXtk-Xtk_,I2 = IG (S)I2 ds.
aft-0 k-1 J
to
One should note that random variables appear in general in the right-hand mem-
bers of (5.2.9) and (5.2.10). Thus, we have the alternative: Either X, = 0 (for all
t E Ito, TI), as will be the case if the right-hand member of (5.2.10) vanishes, or
X, is not of bounded variation on Ito, TI, as will be the case if the right-hand
member of (5.2.10) fails to vanish.
This is a special so-called stochastic differential. We shall now define and investi-
gate such differentials. To do this, let us look at a somewhat more general sto-
chastic process of the form
5.3 Stochastic Differentials. Itt's Theorem. 89
1 e
Stochastic differentials are simply a more compact symbolic notation for rela-
tionships of the form (5.3.1).
(5.3.3' Definition. We shall say that a stochastic process X, defined by equa-
tion (5.3.1) possesses the stochastic differential / (t) dt +G (t) dW, and we shall
write
(5.3.4) dX,(t)dt+G(t)dW,
dt+GdW.
(5.3.5) Remark. In the shift from (5.3.1) to the stochastic differential (5.3.4), the
initial value X,, disappears, so that we can get from (5.3.4) only the differences
90 5. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
t t
W. dW, = W; /2 - t/2
0
is
d (W,) = 2 W, dW,+(dW,)2.
Comparison with (5.3.7) shows that, in the case of the stochastic differential of
W,, we must regard the first two terms as first-order terms and must replace
(dW,)2 with dt (see remark (3.1.9)).
The overall explanation of the phenomenon is to be found in the following
theorem of Ite (42(. It says, in the language of stochastic differentials, that
smooth functions of processes defined by (5.3.1) are themselves processes of
this type. Here is It6's theorem in its most general form:
(5.3.8) Ita's Theorem. Let is = u (t, x) denote a continuous function defined
on Ito, TI x Rd with values in Rk and with the continuous partial derivatives
(k-vectors!)
= ut,
at u (t, x)
a
axi
u (t, x) = usi , x = (xl, , xd)',
a2
u(t,x)=u:i:i, i,j<d.
axi axi
defined on [to, TI with initial value Y,o=u (0, Xb) also possesses a stochastic dif-
ferential with respect to the same Wiener process W, , and we have
dY, _ (u(l, Xe)+ii (t, XI) / (t)+! uxixj (t, X,) (G (t)
(5.3.9a)
x dW dt
dW dt 0
dt 0 0
92 5. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
In other words,
dX; dX; = G; G, dt, i, j : d,
and
dY'_ u,(t, X,) +u: (t, X.) l (t)+ 2 usx (t, X.) G (t)2) dt
Before we give a proof of Theorem (5.3.8), let us make clear its scope and use-
fulness with various examples.
then
This is the rule for integration of stochastic integrals by parts. In integral form, it
is
The most interesting term in (5.4.3b) is the stochastic integral with respect to
W,, for which we have thus found an expression containing only an ordinary in-
tegral.
Formulas (5.4.3a) and (5.4.3b) bring out sharply the essential characteristic of
the calculus of stochastic integrals, specifically, the presence of an extra first-
order term in the differential of smooth functions of a Wiener process IF,. Equa-
tion (5.4.3b) is sometimes called the "fundamental theorem of the calculus of
(Ito's) stochastic integrals."
(5.4.4) Example. For u (x) = x", where n = 1, 2, ... , and t z 0, formula
(5.4.3a) yields
94 S. The Stochastic Integral as a Stochastic Process, Stochastic Differentials
d(W,)=nW,-'dW,+n(n-1)W`-2dt.
2
(5.4.5) Example. Let us look again at the one-dimensional case d = m =1. We
begin with the process
9 e
2 to to
EX,=EX4+f Ids
90
Y,=X,-X,o-J f(s)ds
90
Some authors (see, for example, Bucy and Joseph [611 ) have analyzed stochas-
tic differentials in terms of an m-dimensional Gaussian process V, with indepen-
dent increments Vo = 0, E V, = 0, and covariance
min (t, a)
EV,V;= q(u)du.
0
However, by virtue of what was said above, we can always represent V, in the
form
with
Yto = u (to, X to)
Suppose that to < t1 < ... < to = t _S T. Then,
where 0 <dk and dk < 1. In view of the continuity of X, u, and u.., we see
that there exist random variables an and fn that converge with probability I to
0 as
tan = Max (1k-9k_1) --i 0
1_k_a
and that satisfy the inequalities
Max Iut (tk-1+dk (tk-tk-1),
1;ks Xek-1)-ut (1k-1, Xtk-1)) an
and
E (tk-tk-1) = 1-to
k-1
and
n
st-lim Z (Xtk-Xtk-1)2 = G2 (t-to),
al-0 k=1
this does not change the limit of Yt - Yt, in (5.5.1) as 6n - 0. What we need to
show, then, is that
ac-lim Iu
an-0 k=1 t (tk-1, Xtk-i) (tk-tk-1) = i
ut (s, X,) ds
and
n
us(s,X,)GdW,.
tp
We still need to take care of the sum
n n
IUss(tk-1,Xtk_1)(Xtk-Xtk_1)2=t/2 E U S(tk-1,X,k_1)(tk-tk-1)2
k=1 k=1
Since the first two sums in the right-hand member converge, by virtue of
the continuity of uss and W, to 0 with probability 1, it remains only to show
that
n t
(5.5.3) st-lim
anr0 E Uss (tk-1, Xtk_I)
k-I
(Wtk- Wtk-1)2 = 1 Uss (s, X,) ds.
to
Since
a t
ac-lim I u,s (tk-1, Xek_1) (tk- 1k-1) = uss (s, X,) ds
8n+0 k-1 to
(5.5.3) reduces to
st-lim S. = 0
an -0
with
n
Sn = Uxs(4-1'Xtk-1)((Wtk-Wtk_1)2-(tk-tk-1))
5.5 Proof of It8's Theorem 99
ESN =0
and
0
E (SN E(uss(tk-1,Xek_1)Ik 1)2E4
k-1
-+ 0 (6. -0).
For every fixed N>0, we therefore have
J-O
I'
-liSk=0 .
qm-IimS,k=s6,-0
(6.1.1b) X, = c+ j f (s, X,) ds+ J G (s, X,) dW to: t:T < oo,
k 90
where X, is an Rd-valued stochastic process (for the time being, assumed known)
defined on [to, T] and W, is an m-dimensional Wiener process. The Rd-valued
function/ and the ('d x m matrix)-valued function G are assumed to be defined
and measurable on [to, T [ X Rd. For fixed (t, x), suppose that/ (t, x) and G (t, x)
are independent of w E d2 , i.e., that the random parameter w appears only
indirectly in the coefficients in equation (6.1.1) in the form/ (t, X, (w)) and
G (t, X, (w)). For a generalization, see remark (6.1.5).
Here, the process X which we assume known, must of course be constructed in
such a way that, after it is substituted into (6.1.1 a), the right-hand member will
become a stochastic differential in the sense of section 5.3. In particular, X, must
not anticipate (see remark (5.3.2)); that is, it must be , -measurable.
Equations (6.1.1) can also be interpreted as the defining equations for an unknown
stochastic process X, with given initial value X,, = c. With regard to the accompa-
nying family of sigma-algebras & we make once and for all the following
(6.1.2) Convention. For the purpose of treating stochastic differential equations
on the interval (to, T[, it is always sufficient to choose for J, the smallest sigma-
algebra with respect to which the initial value c and the random variables IV,, for
s t, are measurable, specifically,
t).
6. 1 Definition and Examples 101
I I/(s,w)Ids<oo
10
and
r
(G (s, w)IZ ds < o0
J
90
(that is, G belongs to Mr" Ito, TI). Then, in accordance with section 5.3, the
right-hand member of (6. 1.1 a) is meaningful.
c) Equation (6.1.1 b) holds for every t E [to, TJ with probability 1.
(6.1.4) Remark. We therefore have the following situation: There are, on the
one hand, the fixed functions/ and G that determine the "system" and, on the
other hand, the two independent random elements c and IV.. For almost every
choice of c (w)and almost every Wiener sample function W. (w), we obtain via /
and G, in the case of a unique solution of (6.l .1), the sample function X. (w) of
a new process defined on [to, T j that satisfies (6.1.1). In accordance with (6.1.3a)
102 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
and our definition of it j,, X, is a functional of c and W, for s < t; that is, there
exists a function g (uniquely determined by / and G alone) such that
X,=g(c;W,,s:t)
Thus, (6.1.1) can be interpreted as a formula (in general very complicated) de-
termined by the functions f and G with the aid of which the process X. can be
constructed from c and W,. Here, only c (to) and the values of W, (tu), for s < t,
are used for construction of the value X, (to).
5t
Fig. 6.
Xr as a function of c and W fors
c I. G X,
dX,=(oldt+IG)dWi, X,s=(c
a\\nd//G
1Y, 0
Y,
dX, = d Y` = )d+ dW,
Yt"-t) tY Y("-1))
G (t Y Y,(""'))
6.1 Definition and Examples 103
into a first-order stochastic differential equation of the type (6.l.la) for the Rd"-
valued process X, with initial value X10= (Co, ... ) C.- I)'. We shall consider the
case n = 2 in detail in section 7.1 (example (7.1.6)).
(6.1.7) Remark. Equation (6.1.1b) is equivalent to
9 ,
X,-X, =f /(u,X.)du+ G(u,X.)dWU, X88 =c,
where to - s:_5 t:5 T . From this it follows that, if X, (to, c) satisfies equation
(6.1.1 b), then the "semigroup property"
X, (to, c) = X, (s, X, (to, c)), t0 : s < t T.
holds.
(6.1.8) Remark. If X , is a solution of (6.1.1), then every process stochastically
equivalent to X, is also a solution. Specifically, if, for every fixed t E (to, TI,
X,=X,
with probability 1 (where the exceptional set must belong to a,), then, by virtue
of the tacitly assumed separability of all processes, we have, for almost all w,
X. (w) = X. (w) in Ito, T1.
This implies
J
J (s, X,) ds = J (s, X,) ds
J
90 to
Substitution of a solution (which for the moment we assume to exist) into the
right-hand member of (6.1.1 b) yields, in accordance with remark (5.3.2), a con-
tinuous function of t, which, being a solution, is at the same time almost certainly
equal to the left-hand member. It follows that, for every solution of (6.1.1),
there exists a stochastically equivalent solution with almost certainly continuous
sample functions. Therefore, we shall always consider continuous solutions of
stochastic differential equations
(6.1.9) Example. If G =_ 0, the fluctuational term in (6.1.1) disappears. We in-
terpret (6.1.1) as the ordinary differential equation
with initial condition X,o=c. A random influence can show up only in the initial
value c.
(6.1.10) Example. If the functions I (t, x) (t) and G (t, x) = G (t) are inde-
pendent of x E Rd and if / E Lt [to, T) and G E L2 [to, T(, then
dX, _ / (t) dt+G (t) dW,
is a stochastic differential whose coefficients are independent of X, and hence in-
dependent of w. Therefore, in [to, T[, the unique solution of (6.1.1) is
In accordance with remark (5.4.8), the process X, is, in the case of normally dis-
tributed or constant c, ad-dimensional continuous Gaussian process with inde-
pendent increments, with expectation
e
EX,=Ec+ /(s)ds
to
In this case, / (t, x) = 0 and G (t, x) = g (t) x. The uniqueness of the solution for
all functionsg that are bounded on (to, T[ follows from section 6.2. For the
special case g =_ 1, the equation
dX,=X,dW , X,o=1,
has, for every interval [to, TJ C 10, oo), the solution
Proof: First we shall prove the uniqueness; then, with the aid of an iteration pro-
cedure, we shall prove that a solution exists.
a) Uniqueness. Let X, and Y, denote two continuous solutions of (6.2.3). We
would like to show that
EIX,-Y,I2=0 forall tE[to,TI.
However, since the second moments of X, and Y, are not necessarily finite, we
must again work with a truncation procedure. Suppose that, for N>0 and
t E [to, TI,
Since
IN (s) (11 (s, X,) - f (s, Y,)I + IG (s, X,)-G (s, Y,)) )
KIN (s)IX,-Y,I,2KN.
Therefore, the second moments of the two integrals in (6.2.6) exist. By virtue of
the inequality 1.% + y12 - 2 (IxI2 + Iyi2), Schwarz's inequality, and formula (5.1.3a),
we obtain from (6.2.6)
+2 J
EIN(s)IG(s,X,)-G(s,Y,)I2ds.
90
We now use condition (6.2.4) for the integrands and we obtain, for L = 2 (T -to
+1)K2,
,
(6.2.7) EIN(t)IX,-Y,IZ<L J EIN(s)IX,-Y,I2ds.
90
Proof can be found, for example, in Gikhman and Skorokhod [51, p. 393. The
choices h (t) = 0 and
g (t) = E IN (t) IX, - Y,I2
yield (6.2.8); that is,
IN (t) X, = IN (t) Y, with probability 1
for every fixed t E [to, T1. The inequality
P[IN(t)*1inIto,T]1;g PI sup IX,I>N1+PI sup IY,I>N1
toS,ST ,o;E ,_1
and the continuity (hence boundedness) of X, and Y, imply that we can make
the right-hand member of the last inequality arbitrarily small by taking N suf-
ficiently great. Therefore,
X, = Y,
with probability I for every fixed t E Ito, T1 and hence for a countable dense
set M in Ito, T1. Now X, and Y, are assumed to be almost certainly continuous,
so that coincidence in M implies coincidence throughout the entire interval
Ito, T) and hence
108 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
P[ sup [Xt-Y,I>0J=0.
to it =T
This proves the uniqueness of a continuous solution-with only satisfaction of
the Lipschitz condition (6.2.4) assumed.
b) Existence. We first treat the case
Let us begin an iteration procedure with X,°) = c and let us define, for n > 1 and
t E [t0, TJ,
t t
That this holds also for the subsequent processes X,) can be seen as follows: by
virtue of the inequality lx+y+z[2 < 3 ([x[2 + [yJ2 + [:J2) and (6.2.5), it follows
from (6.2.9) that
t
E [X'(0[2 < 3 E [c[2+3 (T-t0) ( K2 (1 + E [X."-1)[2) ds
90
+3 1
K2 (1 + E JX;n-1)12) ds
to
to;5*ST
so that %
Since E ICV < oo, equation (6.2.7) for X("+')-X;") can now be derived without
IN (t) ; that is,
E IXtn+1)-X,(n)I2 G L
jE IX:n)-X,("-1)I2 ds
80
t Sn 1
J J
... g(s)ds dt,...dt"-'= g(s)((n-)1 ds,
$0 10 90 to
we get
( )
(6.2.11) E I X("X,n)i2 L" EIX;')-X;°)I2 ds.
(n - 1)!
Now, under the assumption (6.2.5),
<L(T-to)(I+EIc12)=C.
Therefore, it follows from (6.2.11) that
(6.2.12) sup E IX(n+')-X;")I2 <C(L (T-to))nln!, n2!0.
t0 ; t9T
To prove the uniform convergence of X;") itself, we need to find an estimate for
do = sup IX`n+')-X(,n)I
to:t;9 T
It follows from (6.2.9) that
T
do (l (s, Xin))-' (s, X,"-'))I ds
80
E IX'(n)-X;4-1)I2 ds.
110 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
= C1 (L (T -to))"-1/(n-1)!.
implies that
n
ac-lim X; 0i+ i=1 (X;`1-X(`-'1) =ac-lim
"roc X;"1 = X,
A-.oo
for all t E Ito, TI. Since X(') = c (for n 0), this is obvious for t = to. Fort E (to,
we take the limit in equation (6.2.9). By virtue of (6.2.4) and the uniform con-
vergence of we have with probability 1
and
so that
e t
t t
{c, if lc, N,
CN =
0, otherwise,
and taking the limit as N-- oo. For more details, see, for example, Gikhman and
Skorokhod [51, pp. 395-397. 1
Xt = J
IX,I°` ds
to
has, for a > 1, only the solution X, = 0 but, for 0 < a < 1, it has also the solution
X, = (A(t-to))ua
where =1- a. I.V. Girsanov has shown [37[ that the equation
t
X, =J IX,I°dW, (d=m=1)
0
has exactly one nonanticipating solution for a> 1/2 but infinitely many for
0<a<1f2.
In order to be able to admit functions like sin x2 (which become steeper and
steeper with increasing x) as coefficients, we need
1 12 6. Stochastic Differential Equations, Existence and Uniqueness of Solutions
which is a Lipschitz condition. When we apply this to each component off and
G, we get
(6.3.3) Corollary. For the Lipschitz condition in the existence-and-uniqueness
theorem (or its generalization (6.3.2)) to be satisfied, it is sufficient that the
functions f (t, x) and G (t, x) have continuous partial derivatives of first order
with respect to the components of x for every t E Ito, TI and that these be
bounded on Ito, T) X Rd (or, in the case of the generalization, on [to, TI x
{I xi = N}).
Let us now discuss the meaning of the second assumption (namely, inequality
(6.2.5)) in the existence-and-uniqueness theorem. This assumption bounds / and
G uniformly with respect to t E (to, T] and allows at most linear increase of
these functions with respect to x. If this condition is violated, we get the effect
(familiar from the study of ordinary differential equations) of an "explosion" of
the solution. Let us illustrate this with the scalar-ordinary-differential-equation
initial-value problem
dX, = X; dt, Xo = c.
The solution is
10, if C=O,
X, =
1 (11C-t)-1, if C*0.
Thus, the trajectory X, is defined for c>0 only on the interval to, 1/c). At t
6.3 Supplements to the Existence-and-Uniqueness Theorem 113
1/c, a so-called explosion takes place. For given 10, T), there always exist initial
values, namely, those for which c >--1/T, for which the solution X, is not defined
throughout the entire interval (0, T(. The restriction on the growth of / and G
guarantees that, with probability 1, the solution X, does not explode in the inter-
val [to, T(, whatever the initial value X,.= c. For further remarks about explos-
ions, see McKean [451 and also (6.3.6)-(6.3.8).
(6.3.4) Remarks concerning global solutions. If the functions / and G are de-
fined on [to, oo) x Rd and if the assumptions of the existence-and-uniqueness
theorem hold on every finite subinterval [to, TI of [to, oo), then the equation
e
has a unique solution X, defined on the entire half-line [to, oo). Such a solution
is called a global solution. The assumptions listed are satisfied, in particular, in
the following special case:
(6.3.5) Corollary. Consider the autonomous stochastic differential equation
dX, = / (X,) dt +C (X,) dW X,.=c,
(By autonomous is meant that / (t, x) - / (x) and G (t, x) = G (x), where
f (x) E Rd and G (x) is a d x m matrix.) For every initial value c that is inde-
pendent of the m-dimensional Wiener process W,-W,0 for t =to, this equation
has exactly one continuous global solution X, on the entire interval Ito, oo) such
that X,0 = c provided only the following Lipschitz condition is satisfied: there
exists a positive constant K such that, for all x, y E Rd,
If (x)-/(y)I+IC (x)-G (y)I = K Ix-y(.
The restriction on the growth of/ and C follows from this global Lipschitz con-
dition (we fix y = yo).
(6.3.6) Example. Suppose that d = m = 1. Consider the autonomous equation
and evaluate the stochastic differential of X, = exp Y, with the aid of Ito's
theorem, we get equation (6.3.12). If c<0, we consider the process -X, and
obtain the same result.
Other examples will be found, in particular, in Chapter 8.
Chapter 7
Properties of the Solutions of
Stochastic Differential Equations
where n is a positive integer. Then, for the solution X, of the stochastic differen-
tial equation (7. 1.1) on [to, T1, where T < oo,
and
(7.1.4) E IX,-cl2n < D (1 + E Ic[2a) (t-to)" eC(t-to)
t0
t
IX,I2n-2
G (s, X,) dW, + J n IG (s, X,)I2 ds
to
ZI The Moments of the Solutions 117
(where the last term is absent if n = 1). That E IX,I2n exists if E IcI2" <oo fol-
lows from step b) in the proof of Theorem (6.2.2). We take the expectation on
both sides of the last equation and, keeping the relationship
+nIX3I2n-2IG
(6,X,)12+2 n(n-1)IX,I2n-4IX,G (S,X,)I2)ds
EIC12n+(2n+1)nK2
J E(1+IX,I2)IX,12n-2ds.
90
+2n(2n+1)K2 J EIX,I2nds,
90
J exp(2n(2n+1)K2(t-s))
to
h (s) ds
where
h (t) = E IcI2,.+(2 n+ 1) n K2 (t-to),
from which (7.1.3) follows.
Inequality (7.1.4) is obtained in a similar manner. We shall treat only the case n =
I and refer the reader to Gikhman and Skorokhod [36] , pp. 49-50, for general n
(though the scalar case).
Since Ia + b12 < 2 (Ia1 + IbI), we have
EIX,-cI2<2EIIf(s,X,)dsI2+2E18,0 JG(S,X,)dW,l2
to
1 18 7. Properties of the Solutions of Stochastic Differential Equations
t t
<2(T-to) J Elf (s, X,)12ds+2 jEIG(s,X,)12ds.
to 90
EIX,-cl2<L J (1+EIX.I2)ds,
$o
lim E IX,-X,I2 = 0,
t+t
that is, the solution X, is mean-square-continuous at every point of the interval
[to, TI (but this does not imply mean-square differentiability [see also section
7.21).
Of great importance are the functions E X, and K (s, t) = E X, X;, which are
meaningful for E IcI2 < oo although these do not in the general (nonlinear) case
satisfy any simple equation. For example,
t
m,=EX,=Ec+jEf(s,X,)ds, to <_ t:_5 T,
98
but E f (s, X,) cannot in general be expressed as a function of m,. A similar situ-
ation (see example (5.4.9)) holds for
Z/ The Moments of the Solutions 119
0
dX, = d (Y`1= ( Y' \ dt+( 1 dw,.
Y Y,) G (t, Y Y,)/
EY,=Eco+f EY,ds,
EY,=Ecl+ Eto(s)ds,
b
and, for the variances (see Goldstein [37a] , pp. 48-49),
In particular, if IG (t, x', x2)1 >_ a> 0 in [to, Tj X R2, the last two equations imply
In the case T = oo, the variances of both components must therefore approach
oo. A consequence of this is that neither Y, nor Y, can remain indefinitely in a
bounded set.
has absolutely continuous (that is, everywhere [A] differentiable and of bounded
variation) and, in the case of continuous f , continuously differentiable sample
functions such that
We now cite a few new results that provide a justification for these qualitative
remarks.
(7.2.2) Theorem (Goldstein [37a), p. 31). If X, is the solution of equation
(7.2.1), to<t1 < ... <t,=T is a partition of the interval [to, TJ, and 6.=max
(tk-tk_1), then
T
stt-lim (Xrk-Xrk-,) (Xrk-Xrk_d'=
J
G (s, X) G (s, X,)' ds ,
k=1 ro
and, in particular,
a T
(7.2.3) st-lim
do-0
E IXrk-X,k_1I2 = 1 IG (s, X,)12 ds.
k=1 10
(7.2.4) Corollary. If, for some pE 11, 2, ... , d}, the inequality
m
holds at almost all [2] points t E [to, T] with probability 1, then the pth com-
ponent of X, is almost certainly of unbounded variation in every subinterval of
[to, T1.
Proof. The assertions follow immediately from the inequality
IXk-Xk_,I2 < Max IXj -Xi{-1I I IX°k-Xrk-1I ,
Z
k=1 k=1
roots of the corresponding eigenvalues of G (t, X,) G (t, X,)'. Therefore, this el-
lipsoid depends on t and X, (and hence also on w) (see Arnold [30a] ).
Just as in section 3.1 for W, we can again conclude from the last remark that
X. (w) is nondifferentiable at the instant t provided G (t, X, (w)) * 0.
The possibility of smooth behavior on the part of X, occurs only in the case
G (t, X,) = 0. In this connection, we cite a result of Anderson Q301, p. 59):
(7.2.6) Theorem. If X, is the solution of equation (7.2.1), if f and G are con-
tinuous Ior t E f to, TI, and if the initial value c is almost certainly constant,
then, in the case G (to, c) = 0,
lira
X,-c = f (to, c).
t-to t
with probability 1. Thus, differentiability of the solution of a stochastic differ-
ential equation is the exception and nondifferentiability is the rule. The formal
differential equation
X, = / (t, X,)+G (t, X,)
where , is an m-dimensional white noise, cannot therefore as a rule be inter-
preted as an ordinary differential equation for the functionX,.
functions f (p, t, x) and G (p, t, x) satisfy, for all p, the conditions of the exis-
tence-and-uniqueness theorem. Suppose also that the following conditions are
satisfied:
a) st-lim c (P) = c (po),
P- PO
we also have
st-lim sup IX, (e) -X, (0)I = 0.
r-+0 so1t;j T
b) An analogous conclusion may be drawn in the case
dX, (e) = f (t, X, (e)) dt+e dW
where It (0) is the solution of the ordinary differential equation is = f (t, X,)
with a possibly random initial value.
We shall now investigate the special case of a constant initial value though at an
arbitrary instants E Ito, t]. Let X, (s, x) denote the solution of the equation
a a2
X, X, (s, x)
axi axi axi
8.1 Introduction
For the differential equations of the type that we are studying in this book,
namely,
X,=1(t,X,) +G(t,X,)t,
where is a Gaussian white noise, the right-hand member represents a linear func-
tion of the disturbance E. On the other hand, the functions/ and G are in general
nonlinear functions of the state X, of the system.
Just as with ordinary differential equations, a much more complete theory can
be developed in the stochastic case when the coefficient functions J (t, x) and
G (t, x) are linear functions of x, especially when G is independent of X.
(8.1.1) Definition. A stochastic differential equation
dX, = f (t, X,) dt+G (t, X,) dW,
for the d-dimensional process X, on the interval [to, T) is said to be linear if the
functions f (t, x) and G (t, x) are linear functions of x E Rd on [to, T j X Rd, in
other words, if
I(t,x)=A (t)x+a(t),
where A (t) is (d x d)-matrix-valued and a (t) is Rd-valued, and if
G (t, x) = (Br (t) x+br (t), ... , B. (t) x+b. (t)),
where Bk (t) is (d x d }matrix-valued and bk (t) is Rd-valued. Thus, a linear dif-
ferential equation has the form
and
where
(t)
Q (t) = C D (t)
(Y(t) E (t),
By virtue of remark (5.2.4), such a process (we assume integrability of Q (t) over
the interval (to, T) can be represented as
Y,= Q dW,,
to
that is,
dY,= dW
8.I Introduction 127
where the Ei (t) are in general correlated Gaussian noise processes with covariance
Y,
X,
X, d=n,
X; y,n - l)
specifically,
n
+t-k_
(bk (t)+Ek (t)) X, (bn+1 (t)+E,,. (t)),
k-1
dX;=X;+'dt, i=1,...,n-1,
128 8. Linear Stochastic Differential Equations
dX,
k..1
has, for every initial value X,0 = c that is independent of W, - W,0 (where t >= to),
a unique continuous solution throughout the interval [to, Ti provided only the
functions A (t), a (t), Bi (t), and bi (t) are measurable and bounded on that inter-
val. If this assumption holds in every subinterval of [to, oo), there exists a unique
global solution (i.e. defined for all t E [to, oo)).
(8.1.6) Corollary. A global solution always exists for the autonomous linear dif-
ferential equation
m
dX,=(AX,+a)dt+ (BiX,+bi)dW;, X,o=c,
i-1
(with coefficients A, a, Bi, and bi independent oft ).
We now wish, if possible, to get a closed and explicit expression for this solution
and to investigate it.
B (t) E
(where B (t) is a d x m matrix and E, is an m-dimensional white noise) that is in-
dependent of the state of the system; that is, we shall investigate equations of
8.2 Linear Equations in the Narrow Sense 129
the form
(8.2.1) dX, = (A (t) X,+a (t)) dt+B (t) dW,.
Here, we have combined the in vectors b; appearing in definition (8.1.1) into a
single d x in matrix B = (bt, ... , b.). If the functions A (t), a (t), and B (t) are
measurable and bounded on [to, T] (as we shall assume to be the case in what
follows), there exists, by virtue of Theorem (8.1.5), for every initial value X,o=c
a unique solution.
Let us review a few familiar items regarding deterministic linear systems (B (t)
0) (see, for example, Bucy and Joseph [61) , p. 5).
The matrix 0 (t) = 0 (t, to) of solutions of the homogeneous equation
JIC,=A(t)X,
with unit vectors c=e; in thex;-direction as initial value, in other words, the so-
lution of the matrix equation
o (t) = A (t) 0 (t), 0 (to) =1,
is called the fundamental matrix of the system
X, _= A (t) X, +a (t).
The solution with initial value X,o = c can be represented with the aid of 0 (t) in
the following form:
With this knowledge, we can now easily determine the solution of the "nonho-
mogeneous" equation (8.2.1):
(8.2.2) Theorem. The linear (in the narrow sense) stochastic differential equa-
tion
9 9
dX,=0(t)Y,dt+0(t)dY,
= A (t) 0 (t) Y, dt+a (t) dt + B (t) dW,
=(A(t)X,+a(t))dt+B(t)dW,.
In the above representation of the solution, we see clearly that the value X, is a
functional (uniquely determined by the coefficientsA (t), a (t), and B (t)) of c
and W, - W,, for to :-5 s.'-5 t (see remark (6.3.10)).
We mention in particular the following special cases:
(8.2.4) Corollary. If the matrix A (t) _- A in equation (8.2.1) is independent of
t, then
and hence
X,= exp(J A(s) ds)(c+ exp (_cA(U) du)(a (s) ds+B(s) dW,)).
b
8.2 Linear Equations in the Narrow Sense 131
equation for K which we could have obtained directly from the stochastic differ-
ential equation. The differential equation for K (t) = K (t)' satisfies the Lipschitz
and boundedness conditions on (to, TI, so that a unique solution exists.
Equation (8.2.8) therefore represents (in view of the symmetry of K) a system of
d (d + 1)/2 linear equations.
(8.2.9) Remark. Of particular interest is the behavior of
d
EIX,-EX,I2=EIX,12-Im,I2=trK(t)= Ki;(t).
i=I
By using the relationship tr A A' = IA I2 , we obtain from the formula for K (s, t)
i
trK(t)=EIt(t)(c-Ec)I2+J 10(t)0 (s)-' B(s)I2ds.
to
The process X, is itself Gaussian if and only if the initial value c is normally dis-
tributed (or constant). We write this important special case as
(8.2.10) Theorem. The solution (8.2.3) of the linear equation
dX, = (A (t) X,+a (t)) dt + B (t) dW X,,=c,
is a Gaussian stochastic process Xt if and only if c is normally distributed or con-
stant. The mean value m, and the covariance matrix E (X, - m,) (X, - m,)' are
given in Theorem (8.2.6). The process X, has independent increments if and only
if c is constant or A (t) _- 0 (that is, 0 (t) _- 1) -
Now that we know the process is Gaussian in the case of normally distributed c,
the question arises as to when it is stationary. A necessary and sufficient condi-
tion for this is
m, = const,
K (s, t) = K (s -t).
8.2 Linear Equations in the Narrow Sense 133
K (0) = E cc'.
The matrix equation (8.2.11) has a nonnegative-definite solution K (0), namely,
00
K(0)= eA1BB'e't''dt,
0
(eA(,-t)K, s>t>to,
KeA'(,-t), t=s>to.
Obviously, under the above conditions, the process X, is stationary in the wide
sense with the above first and second moments even when c is not normally dis-
tributed but E c = 0 and E c c'= K.
In accordance with (8.2.6), X, has, in the case E c2 < co, mean value
m,=EX,=a-°tEc
and covariance
K (s,t) = E (X,-m,) (X,-m,) = e(Var (c)
+0 2 (e2a min (t, s)-1)/2 a).
In particular,
K (t, t) = Var (X,) = e-2at Var (c)+o2 (I - e-2at)/2 a.
For arbitrary c,
8.3 The Ornstein-Uhlenbeck-Process 135
ac -lira e-a' c = 0,
4 -* 00
Y,=Y0+f X,ds
0
EY, = EYo+(1-e-a')Ec/a
and
EY, --iEYo,
E (Y,-E Y,) (Y,-EY,) - Var (Yo)+2 D min (s, t),
that is, all finite-dimensional distributions of the Ornstein-Uhlenbeck process Y,
converge to the distributions of the Gaussian process
Ye°) = Yo+ F 2_D W,.
136 8. Linear Stochastic Differential Equations
But this is the Wiener process that starts at Yo, multiplied by f2-D. In this sense,
the Wiener process approximates the Ornstein-Uhlenbeck process. The sample
functions of Y, possess a derivative, namely, X, . In the Ornstein-Uhlenbeck theory
of Brownian motion, the particle therefore possesses a continuous velocity (but no
acceleration), which ceases to exist when we shift to
(8.3.1) Remark. An analogous electrical problem leads formally to the same
Langevin equation. Let X, denote the current in an inductance-resistance circuit.
Then,
Ll,+RX,=ai,,
where E, is a rapidly fluctuating electromagnetic force generated by the thermal
noise and again idealizable as a "white noise".
(8.3.2) Remark. The Langevin equation for the position X, of a Brownian parti-
cle in an external force field is
X,+S
By setting V, =,k, we obtain from this equation the system
d(v`)
\X, \ Y, II
0//
We can find a closed solution in the case of a harmonic oscillator, for which
K (t, x) = -V2 x, so that the corresponding equation is linear (see Chandrasekhar
[32) , pp. 27-30).
(8.4.1) dX, = (A (t) X,+a (t)) dt+ (Bi (t) X,+bi (t)) dW X,o = c.
i=t
All the quantities in this equation (except W, E R'°) are scalar functions. Suppose
that the coefficients A, a, Bi, and bi are measurable and bounded on the interval
Ito, Tj, so that there always exists a unique solutionX,, which we shall now de-
termine explicitly.
(8.4.,) Theorem. Equation (8.4.1) has the solution
e
X,=0,(c+j 0, t (a 0,
i=1 i_ito ///
8.4 The Genera! Scalar Linear Equation 137
where
,
we get
X, = u (Yo , Z)
where u is defined by
u (x, y) = ell y.
Application of formula (5.3.9b) yields
is
is
We note that, by virtue of the law of large numbers for W, we have in the last
case for arbitrary c
ac-lim X, = 0
provided
AB?/2
c-t
In general, in the homogeneous case, X, has, for all t E [to, TI, the same sign as c
Let us now calculate, the moments of X, X. For this we use
(8.4.4) Lemma. If Xis%(a, a2)-distributed, then, for every p>0,
E (e')p = epa+pta=l2.
Proof.
1 °r°
exp(px-(x-a)2/2a2)dx=exp(pa+p2a2/2).
2na 2 J
_o,
a)
m,=EX,=9c,(Ec+92-1a(s)ds)
to
where
r
92,=exp(1A(s)ds)
+ b; (t)2
i=1
we have
E IX,IP = E IcIP E OP
140 8. Linear Stochastic Differential Equations
is finite if and only if E IcIP is finite. In the nonhomogeneous case, we add to the
solution of the homogeneous equation only terms with finite moments of every
order, so that it is a matter only of E ICIP. We obtain the form of m, immediately
from (8.4.2) by using Lemma (8.4.4) with p = 1, and we obtain the differential
equation for m, either by differentiating this result or directly from the integral
form of equation (8.4.1).
From example (5.4.9), we have
If we take the expectation on both sides of the integral form of this equation, we
get
ft
dX, = A X, dt + > Bi X, dW;, X,o = c,
i=t
we have
where A (t) and Bi (t) are d X d matrices, a (t) and b; (t) are Rd-valued functions,
and W , = (We', ... , Wi")' is an m-dimensional Wiener process, By Theorem (8.1.5),
there exists, for every initial value c that is independent of W, - W,0 for t E [to,
TI, a unique solution of (8.5.1) on the interval [to, T1, provided the coefficients
A, a, Bi, and b; are measurable bounded functions on that interval, as we shall
always assume.
We now model the general solution of (8.5.1) after the scalar case d = 1, so that
it will include the case treated in section 8.2 as a special case.
(8.5.2) Theorem. The linear stochastic differential equation (8.5.1) with initial
value X,0 = c has on Ito, TI the solution
Here,
Zt=c+i Os' dY
to
n, m
=dY,+A(t)X,dt+ B, (t) X,dW;+ B; (t) b; (t) dt
m
_ (a (t) + A (t) X,) dt + (B, (t) X, +b; (t)) dW; .
X,_0,c+O, ds;'H,.
Here, the first term on the right is the general solution of the corresponding homo-
geneous equation (which here in contrast with Theorem (8.2.2) is in general also a
stochastic process), and the second term is the particular solution of the nonho-
mogeneous equation corresponding to the initial value X,, = 0. For d = 1, we have
0, given explicitly in Theorem (8.4.2). For Bi (t) - 0, Theorem (8.5.2) reduces
to Theorem (8.2.2).
Let us look again at the first-order ordinary differential equations that the first
two moments of the solution must satisfy.
(8.5.5) Theorem. For the solution (8.5.3) of the linear stochastic differential
equation (8.5.1), we have under the assumption E Ic12 <00,
a) E X, = m, is the unique solution of the equation
m, = A (t) m,+a (t), m,o = E c.
b) E X, Xj'=P (t) is the unique nonnegative-de finite symmetric solution of the
equation
8.5 The General Vector Linear Equation 143
+ I (B; (t) P (t) B; (t)'+ B; (t) m, b; (t)'+ b; (t) mi B; (t)'+ b; (t) b; (t)')
i-I
dX, X; = X, dX;+(dX,) X;+ (B; (1) X,+b; (t)) (X; B; (t)'+ b; (t)') dt
+ (X, X; B;(t)'+X,b;(t)'+B;(t)X,X,+b;(t)X,)dW;.
This formula is obtained from lto's theorem. Both equations have unique solutions
on the interval [to, T) since the right-hand members satisfy the boundedness and
Lipschitz conditions. Since
P (t) = P (t)',
(8.5.6) represents a system of d (d + 1)/2 linear equations. The solution P (t), be-
ing the covariance matrix of X, is of course nonnegative-definite. 1
(8.5.7) Remark. The function m, =-E X, is independent of the fluctuational
part (that is, independent of the B; and the b;) of equation (8.5.1).
(8.5.8) Remark. We have
d
E[X,12=trP(t)= P;;(t)
;=t
However, the differential equation for E IX,12, which follows from (8.5.6), con-
tains in general in its right-hand member all other elements of the matrix P (t).
(8.5.9) Remark. Even if the homogeneous equation
144 8. Linear Stochastic Differential Equations
=0,dY,+141, B2)dt
2 i- /
M
=A4f,dt+Bi0,dW
that is, 0, satisfies the homogeneous equation.
Chapter 9
The Solutions of Stochastic Differential
Equations as Markov and
Diffusion Processes
9.1 Introduction
In the preceding three chapters, we have constructed and examined in a sort of
"stochastic analysis" the solutions of stochastic differential equations. This puts
us in a position to calculate explicitly, for given sample functions of the initial value
and of the Wiener process, an arbitrarily accurate solution trajectory, for exam-
ple, with the aid of the iteration procedure used in the proof of Theorem (6.2.2).
On the other hand, the solution X, is a stochastic process on the interval Jto, T]
and, as such, it can be regarded as a set of compatible finite-dimensional distri-
butions
P(X,1EB1,...,X,nEB.]=P,t....... (B1,...,B.).
In accordance with Theorem (2.2.5), for the important class of Markov processes
all these distributions can be obtained from the initial probability
P(X,,EBJ=P4 (B)
and the transition probability
P(X,EBJX,=x)=P(s,x,t,B), to:5s<t:5T.
Specifically,
r f
P(X,1EB1i...,X,.EBn]= f f ... f P(tn_l,xn-1, tn,B.).
Rd B1 tin-1
.P (tn-2, to-2, P (tt, x1, t2, dx2) P (to, xo, il, dxt) P,e (dxo),
to < t l < ... < to < T , Bi E td .
Stochastic differential equations owe their significance and expanding study not
146 9. The Solutions of Stochastic Differential Equations
least to the fact that, as we shall show, their solutions are Markov processes. There-
fore, we have for them the powerful analytical tools developed for Markov pro-
cesses at our disposal. The keystone of the Markov property of the solution pro-
cesses is the fact that a white noise , in the form
%, = f (t, X,)+G (t, X,) ,
of the stochastic differential equation
dX, = f (t, X,) dt+G (t, X,) dW,
is a process with independent values at every point.
Furthermore, in many cases X, is in fact a diffusion process whose drift vector and
diffusion matrix can be read in the simplest conceivable manner from the equation
(see section 9.3).
21, C 5, -
The validity of (9.2.4) therefore follows from the stronger equation
(9.2.5) P(X,EBI j5,)= P(X,EB(X,)
by virtue of (1.7.1).
Furthermore, instead of (9.2.5), it will be sufficient to prove the following: For
every scalar bounded measurable function h (x, w) defined on Rd X Q for which
h (x, ) is, for every fixed x, a random variable independent of Ru, , we have
(9.2.6) E (h (X w) (`f. ,) = E (h (X w) (X,) = H (X,)
with H (x) = E h (x, w). This is true because, if we choose
4s (t,
J
yP(s,x,t,dy)=EX,(s,x)=mt(s,x)
Rd
= K, (s, x)
150 9. The Solutions of Stochastic Differential Equations
,
_ 0 (t, s) J 0 (u, s) B (u) B (u)' ('P (u, s)-') du 0 (t, s)'.
,
and
and
P(s,x,t,(O,Y))=PIX,(s,x)=YJ
=P( log
x
1y/-f (A(u)-5 Bi(u)2/2)du]
iI J
z
=( 2na)-'
J e-x=12a=du
- 00
where
z =lo x- Au- B, u 2 du
and
a2 = J
Bi (u)2 du.
z = log (Y/x) - (A
1
- i-t B?/2) (t - s)
and
02 =( B? (t-s)
- ;I ;
axa32
E,,:B(X,)= J B(y)P(s,x,t,dy)
Rd
+m (t-s)-t
f ly-xJ4 P (s, x, t, dy) = 0.
Rd
To prove that X, is a diffusion process with given drift and diffusion coefficients,
it is therefore sufficient, by virtue of remark (2.5.2), to show that
(9.3.2) E (X, (s, x) - x) = / (s, x) (t - s) + o (t - s)
and
(9.3.3) E (X, (s, x) - x) (X, (s, x) - x)' = C (s, x) G (s, x)' (t -s) + o (t - s) .
Using the Schwarz inequality, the Lipschitz condition, and inequality (7.1.4), we
obtain for c = x and n = 1
Ii E (/ (u, X. (s, x))-/ (u, x)) du1 = E If (u, X. (s, x))- f (u, x)l du
=(t-s)312 p(1).
from the sample functions of W,? After all, in accordance with Remark (6.1.4), this
equation represents a transformation that maps W. (w) and X,0 (w) into X. (w).
Sufficient conditions for this can be found, for example, in Prohorov and Ro-
zanov [15] , pp. 261-262 or Gikhman and Skorokhod [361, p. 70.
If, for a given diffusion process X, with drift vector/ (t, x) and diffusion matrix
B (t, x), we wish to find a stochastic differential equation whose solution coin-
cides with X, only in the initial distribution P,0 and the transition probabilities
P (s, x, t, B) (and hence in all finite-dimensional distributions), in other words,
if we wish to reproduce not the given realizations of X, but only their distri-
butions, we proceed as follows: We choose a probability space (Sa, 21, P) on
which an m-dimensional Wiener process IV, and a random variable c independent
of W, - W,0 for t > to with distribution P,0 can be defined and we consider the
stochastic differential equation
(9.3.6) dY, = f (t, Y,) dt +G (t, Y,) dW,, Y,,,=c, to < t _<_ T .
Here, G (t, x) is a d x m matrix with the property
(9.3.7) B (t, x) = G (t, x) G (t, x)'.
Now, there are various possibilities for decomposing a given symmetric nonnega-
tive-definite d x d matrix B (t, x) in the form (9.3.7), so that the coefficient G in
equation (9.3.6) is not uniquely determined. If we represent B in the form
B=UAU'
(where A is the diagonal matrix of the eigenvalues 1;>0 (arranged in increasing
order) and U is the orthogonal d x d matrix of the column eigenvectors u; of B
[see remark (5.2.4)] ), then the choice d = m and
G=UA'12U'=B'12
yields again a symmetric nonnegative matrix G, while
(YI, u',...,VAd ud)
has the advantage of pointing the column vectors of G in the direction of the eigen-
vectors of B. If k of the A; are identically equal to 0, we can go on to the d x m
matrix
G=(+'uk+',...,VAd ud), m=d-k.
This lack of uniqueness is not important, however, if the given diffusion process
is uniquely determined by its parameters / (t, x) and B (t, x) (in the sense of
unique determination of the transition probabilities P (s, x, t. B) by f and B).
For this nontrivial property (we have obtained / and B from the first and second
moments of P (s, x, t, B) only!) we have given a sufficient condition in remark
(2.6.5). If the given process is uniquely determined by f and B, then all equations
of the form (9.3.6) in which G is chosen on the basis of (9.3.7) and which satisfy
156 9. The Solutions of Stochastic Differential Equations
the assumptions of Theorem (9.3.1) lead to the same diffusion process. In parti-
cular, for homogeneous diffusion processes, an autonomous equation can always
be found as a dynamic model.
Summing up, we may say that the solutions of stochastic differential equations
and diffusion processes represent essentially the same classes of processes despite
their completely different definitions.
E,,.g(t,X8)= J
g (t, y) P (s, x, t, dy)
Rd
Therefore,
E,,,, g (t, X,) = E g (t, X, (s, x)),
and, in particular,
d 1 d d 32
(9.4.2)
+- 1 bii (s, x) axi axi
3x; 2 i_1 i_1
B (s, x) _ (bii (s, x)) = G (s, x) G (s, x)', f = (h, ... ,1d),
9.4 Transition Probabilities 157
where the derivatives are evaluated at the point (s, x). The infinitesimal operator
A, which uniquely determines the transition probability of X is defined in ac-
cordance with (2.4.10) as the uniform limit
(9.4.4) Lg=i+Zg,
which holds for all functions g defined on (to, T (X Rd that have continuous first
partial derivatives with respect to i and continuous second partial derivatives with
respect to the components of x and that, together with their derivatives, do not,
as a function of x, increase faster than some fixed power of x. This is easily seen
from (9.4.3) by replacing g (t + s, X,+, (s, x)) in that expression with the sto-
chastic integral of Ito's theorem, namely (for brevity, we write X,+, forX,+, (s, x)),
t+s d t+s
g (t +s, Xt+,) = g (s, x)+ S gs (u, X.) du+ f. (u, X.) gX, (u, X.) du
, ,-1 s
i d d t+s
+-2 f-1
Z i-1 ft by (u, g:, g:i (u, du
d ss t+s
+2 f g,,; (u, X,t) G1 (u, X.) dW;,
i-1 k-r S
and taking the limit. If g depends only on x, we have in the homogeneous case,
in place of (9.4.4),
L=Z.
The limit in (9.4.3) exists and is uniform if g has the above-listed properties and
vanishes identically outside a bounded subset of (to, T] X Rd. Such functions
therefore, belong to the domain of definition DA of A. For them,
Ag=ag+Dg,
and, in the case of homogeneous processes,
Ag=`ng.
158 9. The Solutions of Stochastic Differential Equations
Thus, we have found the form of the infinitesimal operator of the solution of the
stochastic differential equation (9.4.1). In this case, the solution is uniquely de-
termined by / and G.
In remark (2.6.5), we showed how the transition probabilities of a diffusion pro-
cess can be found in theory from knowledge of the functions
u (s, x) = E g (X, (s, x)),
where t is fixed and g ranges over a set of functions that is dense in the space
C (Rd) of continuous bounded functions defined on Rd. For given g, we can
calculate u (s, x) from Kolmogorov's backward equation. This equation is valid
here under the following assumptions:
(9.4.4) Theorem. Suppose that the assumptions of Theorem (9.3.1) are satis-
fied for equation (9.4.1). Suppose also that the coefficients f and G have con-
tinuous bounded first and second partial derivatives with respect to the compon-
ents of x. Then, if g (x) is a continuous bounded function with continuous
bounded first and second partial derivatives, the function
u (s, x) = E g (X, (s, x)), to < s< t< T, x E Rd,
and its first and second partial derivatives with respect to x and its first deriva-
tive with respect to s are continuous and bounded. Also, the backward equation
au (s, x)
(9.4.5)
as
where Z is the differential operator (9.4.2), with the end condition
! tm u (s, x) = g (x)
is valid.
The proof of these assertions follows, on the basis of remark (7.3.7), from
Theorem (2.6.3).
Instead of solving the backward equation (9.4.5) for a set of end values g that is
dense in C (Rd), we can confine our attention to the family
g(x)=e`rx, AE Rd .
We obtain
u (s, x) = E exp (i 2' X, (s, x)),
that is, the characteristic function of X, (s, x), which determines uniquely the
probibitity distribution of X, (s, x), namely, P (s, x, t, ).
If P (s, x, t, B) has a density p (s, x, t, y), we can get equations for p (s, x, t, y)
itself from Theorem (2.6.6) and (2.6.9). The density is a fundamental solution of
the backward equation; that is, for fixed t and y and for s <t,
9.4 Transition Probabilities 159
d
8
as
P(s,x,t,y)+ fi(s,x) axi P(s,x,t,Y)
(9.4.6)
+Z d
2 i_1 ;-1
d
b1(s,x)
axi ax;
a2
P(s,x,t,Y)=0,
However, these laws of development for p are valid only under certain assump-
tions regarding the coefficients f and B = G G' (see section 2.6).
We refer to a theorem of Gikhman and Skorokhod Q36], pp. 96-99) that, for
the scalar case, gives sufficient conditions for existence of a density with certain
analytical properties.
(9.4.8) Example. In accordance with section 8.2, the scalar autonomous linear
stochastic differential equation (d = m =1)
dX,_(AX,+a)dt+bdW,, X0=c, t 0,
has the solution
Special cases are the Ornstein-Uhlenbeck process (a = 0), the deterministic linear
equation with random initial value (b = 0), and the Wiener process (A = a = 0, b =
1, c = 0). For b = 0, X, has a density if c has a density, but the transition proba-
bilities degenerate to
P (s, x, t, ) = b s = x e4(t-,)+A 1) .
a a
-p(s,x,t,y)+(Ax+a) p($,x,t,y)+1 b2 a2
p(s,x,t,y)=0
as ax 2 axe
or from the forward equation
a 2
at p
(s, x, t, y) +
ay
((A y+a) p (s, x, t, y)) - 2 aye (b2p (s, x, t, y)) = 0
as the fundamental solution. As boundary condition, we assume that p and its
partial derivatives with respect to x, andy, vanish respectively as (xj --I- oo and
jyj -oo. We know from example (9.2.12) that
p (s, x, t, y) = (2 it K1 (s, x))-112 exp (- (y - m, (s, x))22 K, (s, x))
where
that is, p (s, x, t, y) is the density of 91 (m, (s, x), K, (s, x)). In the special case
A = 0, we have
m, (s, z) = x + a (t -s),
K, (s, x) = b2 (t-s).
dX, =(a (t) +A (9)X,) d9+7 (B; (t) X,+b; (t)) dW;
-t
with drift vector
m
B(t,x)= (Bixx'B;+Bixb;+b;x'B;+bi6:)
(see example (9.3.4)) becomes, for the density p (s, x, t, y) and t > s
a d d d
(s'
a a
-at P x, t, Y) + Z ai
i=t
(t) P (s, x, h Y) + 2] Aii (t) (Y; P)
aYi i=t ;=1 aYi
1 d d a2
--Z T.
2 i1 1=1 hi ay1
(bt,(t,Y)P(s,x,t+Y))=0,
aYd
1'
-p,...,-p =P
r ayi aYi
P Prr
We know from example (9.2.12) that the solution of this equation is the density
of a normal distribution whose parameters m, (s, x) and K, (s, x) are given in
(9.2.12). The dynamic development of these parameters is characterized by the
differential equations of Theorem (8.2.6).
(9.4.11) Remark. The distribution of an RP-valued functional g (X,) depen-
dent only on the state X, at the instant t can always be obtained via the char-
acteristic function
u(s,x)=Eeia's(x,(,,x)), 2ERP
u (t, x) = eirt(x)
Many interesting quantities (for example, the time of first entry or the length of
stay in specific regions) depend, however, on the overall course of a trajectory in a
time interval. But, in certain cases, one can give a differential equation for their charac-
teristic function. For example, if g (x)isRP-valued and h (t, x) is R9-valued, put
162 9. The Solutions of Stochastic Differential Equations
Thus, the disturbing action of the process Y;') has, on the (timewise) average,
less and less effect on the left-hand member of (10.1. 1) as n - * oo since the
variance E I y,(1)12 = d (that is, the average energy of the disturbance) remains
finite.
Thus, we are led inexorably to rapidly fluctuating processes with infinite energy
and hence to generalized stochastic processes with independent values at every
point (in the sense of section 3.2). So-called "delta-correlated" Gaussian pro-
cesses are examples of this.
We now need to confine ourselves, for a meaningful theory, to functions
f (t, x, y) that are linear in y, so that (10.1.1) has the form
X, = f (t, X,) +G (t, X,) Y, .
Let us now replace in the original equation the white noise E, with a sequence of
physically realizable continuous Gaussian stationary processes {E(")} such that
E E(,") __ 0 and E E;") E(,") = C. (t - s), where
Jim C. (t) = d (t).
" 00
166 10. Questions of Modelling and Approximation
Then,
y, = A (t) Y, + B (t) Y, fi"), Xto = c ,
is an ordinary differential equation and hence has the solution
The process
ft
Zt" = 1
B (s) s(n) ds
to
EZ;")Z;")= B(u)B(v)C"(u-v)dudv
to to
- min (t, )
B (u)2 du,
that is, Y;") converges in mean square to a process whose distributions coincide
with the distributions of the process
8 9
Therefore, the processes X, and Y, are quite different for B (t) 4 0. From Corol-
lary (8.4.3a), Y, is the solution of the stochastic differential equation
dY, = (A (t)+B (t)2/2) Y, dt+B (t) Y, dW Y,.=C.
To getX,, we made the limiting shift to the white noise E, in the original equation
and solved the equation as a stochastic differential equation. In contrast, we ob-
tained Y, by solving the ordinary differential equation disturbed by 4(") and then
made this shift to E, in the solution. Obviously, this leads to different results.
Both processes (though not the processes Yt")) are Markov processes since they
satisfy (different!) stochastic differential equations. Which of these is the "cor-
rect" process (in the sense of giving the better description of the basic system)
can in general be decided only pragmatically.
If we denote by L (g) the solution of an equation g and by g (t,) or g (J(1)) the
stochastic differential equation
X,= f+GEt, f, is a white noise,
10.2 Stratonovich's Stochastic Integral 167
The question now arises as to whether we can modify the definition of the sto-
chastic integral in such a way that equality will hold in the last relationship. This
is in fact possible-by means of the definition (already mentioned in section 4.2)
of Stratonovich's time-symmetric stochastic integral [48] .
W, d I,
J
go
the result depends very much on the choice of intermediate points Ti. lto's choice
ri=ti_1, on which we have based our exposition exclusively up to now led to the
value
where
an=max (ti-ti_1)
dX,=il(t,X,)di+G(t,X,)dW,
explained on this basis yields, under the assumptions of Theorem (9.3.1), a dif-
fusion process as solution. The intuitive significance of the coefficients f and G
is explained by regarding ! as the drift and G G' as the diffusion matrix of that
process. A disadvantage is that the calculus valid for stochastic differential equa-
tions, which operates in accordance with Ito's theorem, deviates from the
familiar one.
This disadvantage (together with all the advantages of Ito's integral that we have
mentioned) is removed by Stratonovich's definition (481, which yields for the
special case considered at the beginning
=(Wi -Wy)/2
hence a value that we can also obtain by formal integration by parts.
Somewhat more generally, let us define
JE1H(s,W,)I2ds<oo
a
It follows from Theorem (10.2.5) that the limit in (10.2.1) exists. The result is
connected with Ito's integral, which is defined in the present case by
I
w"'0 i'1
ft
H (s, W,) dW, = qm-lim Z H (tiW,i_1) (W,i-W,i_1)
1
EIH(s,Y,)a(s,Y,)I ds<oo
to
and
i EIH(s,Y.)B(s,Y,)H(s,Y,)'Ids<oo
'o
where to t1 < ... < t = t is a partition of the interval [to, t] and d = max
(t, - t; _ 1), is called the stochastic integral in the sense of Stratonovich.
This integral is connected with Ito's integral as follows:
(10.2.5) Theorem. The limit in (10.2.4) exists under the conditions mentioned
in definition (10.2.3). It is connected with Ito's stochastic integral, defined here
by
I is
Here, the d-vector (Hsk).j is the jth column of the d X m matrix Hsk = (aH,l/3xk).
To prove this, we consider the difference between the sums in (10.2.4) and
(10.2.6):
170 10. Questions of Modelling and Approximation
n Y,. + Y,.
H _H (ti-1, Y,j-1) (Y°+- Y,i_1)
We then apply the mean-value theorem to the terms H (ti-1, (Y,i-1 + Y,4)/2). For
details, see Stratonovich [481.
(10.2.7) Remark. For d = m = 1, the conversion formula is
Gk1(a,X,)ds
chain rule) and hence in this respect can "more easily" be manipulated than the
Ito integral or differential. Unfortunately, the-price we have to pay for this is the
loss of all the advantages of It6's integral that were mentioned earlier. However,
the conversion formulas that we have given enable us at all times to shift from
one type of integral to the other.
The system-theoretic significance of Stratonovich equations consists in the fact
that, in many cases, they present themselves automatically when one approximatess
a white noise or a Wiener process with smoother processes, solves the approxima-
ting equation, and in the solution shifts back to the white noise. Comparison of
(10.1.4) and (10.2.9) shows this immediately. In the following section, we shall
discuss this matter in greater detail.
X 8k+ 1 - X 1k
= f (tk, X,k) + G (tk, X,k) i;,k
tk+ l - tk
where the i;,k are Gaussian, independent, and identically distributed.
On the other hand, if C, is a continuous process and only an approximation of
the white noise (for example, with the delta-like covariance function C (t)), we
can treat (10.3.1) as an ordinary differential equation and solve it by the classi-
cal procedures. The solution is not a Markov process, but under certain condi-
tions (see Gray [381 or Clark [331) it converges in mean square, as
C (t) --+ b (t)
or
10.3 Approximation of Stochastic Differential Equations 173
qm-lim J
C, ds =W,
and by linear interpolation between the partition points converges in mean square
to the solution of the Ito equation (10.3.2). If we set
Xt")
9i+1
= X(")
ek tki+1'
+ (t i,Xl") )(t ti)+ G(41
2
(tW k+l-W i)
1k)
and
sup I W() (w)- Wt (cu)l - 0
tea
This is the case, for example, for the polygonal approximation at the points
to<t1< , <ta=a:
W;")=Wtk+(Wtk+1-Wtk)
9-t ,
k
9k+1 -tk
and
b"=max (tk+1-tk)--i0.
Fig. 7;
Polygonal approxi-
mation of the sam-
ple functions of a
Wiener process.
In the equation
the last integral is an ordinary Riemann-Stieltjes integral for the individual tra-
jectories. Under certain conditions on the functions/ and G, the sequence of
the X;") then converges with probability 1, uniformly on Ito, a), to the solution
of the Stratonovich equation (10.3.3); that is,
ac-lim ( sup )X;")-XtI) = 0,
4-00 491sa
where Xt satisfies (10.3.3).
In this connection, we cite a result of Wong and Zakai (52] for the scalar case.
(10.3.4) Theorem4 Suppose that d = m = I and that (W(") } is a sequence of
approximations of the Wiener process Wt with the properties mentioned above.
Suppose that the functions f (t, x) and G (t, x) are continuous functions de-
fined on )to, T) x R1 and that G has continuous partial derivatives Gt and G.,
Suppose-that the functions/, G, and G G satisfy a Lipschitz condition in x (see
(6.2.4)). Suppose that
G(t,x)j: a>0 (or a>0)
10.3 Approximation of Stochastic Differential Equations 175
and
JG,(t,x)J<flG(t,x)2
Suppose finally that the initial value c is independent of W, -W,o for t E [to, T J
and that X;") is the solution ofthe equation
If V,:5 0, then X, varies in such a way that values of V, do not increase, that is,
the "distance" of X, from the equilibrium point measured by v (t, X,) does not
increase. This elementary consideration leads to the following sufficient criteria
discovered by Lyapunov.
(11.1.3) Theorem, a) If there exists a positive-definite function v (t, x) with
178 11. Stability of Stochastic Dynamic Systems
continuous first partial derivatives such that the derivative formed along the tra-
jectories of
X' =1(t, X,), t > to, 1(t, 0) 0,
satisfies the inequality
av d av
at i_I ax,
in a half-cylinder
{(t, x): t?to, JxJ<h}
then the equilibrium position of the differential equation is stable.
b) If there exists a positive-definite decrescent function v (t, x) such that v (t, x)
is negative-definite, then the equilibrium position is asymptotically stable.
A function v (t, x) that satisfies the stability conditions of Theorem (11.1.3) is
said to be a Lyapunov function corresponding to the differential equation in
question.
(11. 1.4) Example. The linear autonomous equation
Xt=AXg, t to, Xa=c,
has the solution
Xt = e4('-a) c.
Let 11, ... , ld denote the eigenvalues ofA A. Then, the equilibrium position is as-
ymptotically stable if and only if
(11.1.5) Re (1;) <0, i=1,...,d.
if at least one eigenvalue has a positive real part, it is unstable. If some of the real
parts vanish, the equilibrium position is stable (though not asymptotically stable)
provided the elementary divisors corresponding to the eigenvalues with vanishing
real parts are all simple. If any of the elementary divisors are of higher order, the
equilibrium position is unstable. One can see whether (11.1.5) holds or not by
using the criteria of Routh, Hurwitz, and others (see Hahn [641) or one can see
whether the equation
(11.1.6) A'P+PA= -Q,
has, for some positive-definite Q, a positive-definite matrix P as its solution. Then,
we can choose the Lyapunov function
v(x)=x'Px>0
for which
v(x)=2(Px)'Ax=x'PAx+x'A'Px=-x'Qx<0.
11.2 The Basic Ideas of Stochastic Stability Theory 179
L at+T, L :A,
(11.2.3)
d d d
3 a2
Z= fi (t, x) + (G (t, x) G (t, x)')ij
i=1 axi 2 i-1 j=i axi axi
then, in accordance with formula (5.3.9a),
d m
(11.2.4) dV, = (L v (t, X,)) dt+ Usi
(t, X,) Gij (t, X,) MI.
Now, a stable system should have the property that V, does not increase, that is,
d V, < 0. This would mean that the ordinary definition of stability holds for each
single trajectory X. (w). However, because of the presence of the fluctuational
term in (11.2.4), the condition d V, < 0 can be satisfied only in degenerate cases.
Therefore, it makes sense to require instead that X, not run "up hill" on the aver-
age, that is,
E(dV,)a0.
Since
E(dV,)= E(Lv(t,X,)dt)
this requirement will be satisfied if
(11.2.5) Lv(t,x)=0for all ti=to, xERd.
This is the stochastic analogue of the requirement that i:_5 0 in the deterministic
case and it reduces to that case if G vanishes. We shall refer to the function v (t, x)
used here as a Lyapunov function corresponding to the stochastic differential
equation (11.2.2).
For what stability concept is (11.2.5) a sufficient condition? In this connection,
we remember that, in accordance with Theorem (5.1.1b), the second integral in
t
V, = U (to, C)+ J L v (s, X,) ds+ 12: Usi (S, Gij (3, Xi) dWi
to k`-I j-1
is a martingale, Here, the accompanying family 5, of sigma-algebras is the one de-
fined in (6.1.2). Therefore, for t s, we have by virtue of (11.2.5)
11.2 The Basic Ideas of Stochastic Stability Theory 181
or
E(Vt13'r,)=V
that is, under condition (11.2.5), V, is a (positive) supermartingale. For every in-
terval la, bJ C [to, oo), the supermartingale tnequality yields
and
Here, the limit V(c) may depend on the initial point c. If V(c) were at least equal
to some positive cl on an w-set Bc with positive probability, we would have, for
these w,
c2>0forallt r(w),
and, by virtue of the decrescence of v,
I X,I ac3>0for all
The assumption (11.2.7) and the radial unboundedness of -L v implies the
existence of a positive c4 such that
L v (t, x) -c4 for JxI C3.
By virtue of (11.2.4),
M«eJ<v(to,c)
P[sup
t to e
For all w E Bc n [sup Mt <eJ, we have, fort 2:'r (w),
0 < vt < v (to, C)-C4 (t - r (w))+e.
When we let t approach oo, this leads to a contradiction; that is, Bc n [sup Mt <el
has probability 0. Therefore,
V (to, c)
P (Be) = P (Bc n [sup M, eJ) <
e
and, finally,
P [t+oo
lim X, (c)=0J > I -v (to'
e
c) -+ I
a;
to t and twice continuously differentiable with respect to the components x; of z.
Furthermore,
Lv(t,x)=0, tffi?to, 0<JxJ=h,
where
I d d
a d a2
L +7 +- Y Z (G (t, x) G (t, x)');j
2 ;,, ;.,
at ax; x; axi
a>0, sisto.
t=, e
positive b such that all the sample functions of the bundle of sample functions
originating at c * 0, 1 c I < d remain in an e -neighborhood of x = 0
b) Since
IV`-9/`0
ac- lim, =0
t-.00 t-to
the equilibrium position is stochastically asymptotically stable in the large for
A < B2/2
and stochastically unstable for
B2/2>0.
Let us derive the same result with the aid of Lyapunov's technique. For X, the
operator L takes the form
2
L at +Ax
X B2 x2 2x2
A trial with v (t, x) 2 where r > 0, leads, for x * 0, to
v (x) = jxj',
As long as A< B2/2, we can choose r such that 0 < r < I - 2 A/B2 and hence
satisfy the condition
Lvs-kv
which, according to remark (11.2.9), is sufficient for stochastic asymptotic sta-
bility. Since v (x) = IxI' is radially unbounded, we have, in accordance with
Theorem (11.2.8c), stochastic asymptotic stability in the large. As an estimate,
we have
A(t-to)+B J n,di
X,=ce So
we obtain
X, = c eA(*-to)+B(Wt-Ir'k).
In accordance with section (10.2), this is the solution of equation (11.2.16), now
interpreted as the Stratonovich equation, or of the equivalent Ito equation
dX, = (A+B2/2) X, dt+B X, dW,, X18=C,
whose equilibrium position is now stochastically asymptotically stable for arbi-
trary B * 0 if A <0 and stochastically unstable if A ? 0.
With either interpretation, an asymptotically stable undisturbed system (A <0,
B = 0) remains stochastically asymptotically stable upon addition of arbitrarily
strong disturbances (only ordinary stability A = 0 is destroyed). This property
disappears with lto equations for d > 3 and with Stratonovich equations for
d >= 2 (see Khas'minskiy [651, p. 222).
(11.2.18) Remark. The use of Lyapunov's method depends on knowledge of
Lyapunov functions v (t, x). As in the deterministic case, there are a number of
techniques that can be used to find suitable functions. For example, one can
seek a positive-definite solution of the equation L v = 0 or of the inequality
L v <. 0 (see Kushner [721, pp. 55-76, for a number of examples). The choice
v(x)=x'Cx,
where C is a positive-definite matrix, leads to the goal if
L v = 2 f (t,x)'Cx+tr(G(t,x)G(t,x)'C)<_0
where the undisturbed system is asymptotically stable (see example (11.1.4)). Ac-
cording to Corollary (8.2.4), the solution of this equation is
t
Xt = eA(t - to) c + eA(' B (s) dW,.
to
By assumption eA ('-'o) c- -*O as t oo for all c E Rd. The second term is nor-
mally distributed with mean 0 and
so that
t fft
IeA(t-1) B (3)12 ds _
< e-2 r(t-s) IB (s)I2 ds
1
to to
N2 t
-0
this last holding under the condition
0
(11.2.20) f IB (3)I2 ds < oo.
to
qm-limX,=0.
t-.00
Pflim
I -. cc
Xt=O1=1.
188 11. Stability of Stochastic Dynamic Systems
IP (t)I = E J X, (c)12,
Lv(t,x)<=-c3IxIP
for certain positive constants c1, c2, and c3.
Then, there exists a positive constant c4 and an almost certainly finite random
variable K (c) dependent on c E Rd such that
IX, (c)I = K (c) e-c' (1 -4)for all t to,
for almost all sample functions starting at c.
considerably more precise stability assertions can be made for the equilibrium
position X, = 0. In accordance with our assumptions (11.2.1), the d x d matrices
A (t), B, (9), ... , B. (t) are continuous functions on the interval t >_ to. Suppose
again tha'i'c E Rd is a constant.
Stability of the first and second moments acquires for (11.4.1) considerable signif-
icance in that simple ordinary differential equations can now be written for
them. In accordance with Theorem (8.5.5), we have
11.4 Linear Equations 191
Y,+(bo+bE;)1',+(ao+ae,)Y,=O, t,-;:0,
where e; and i are uncorrelated scalar white noise processes, is equivalent to
the linear stochastic differential equation
)xdt+(0O 0
dX,=(-a
0 -b0)
b)X,dW;+
a 0)X,dw,.
Here, we have again set
192 11. Stability of Stochastic Dynamic Systems
%t
- CY)
(see example (8.1.4)). The differential equation (11.4.2) for the expectation value
fl1t is
m,
=( 00 1 0)
M,
-a b
and it is asymptotically stable if and only if both ao and bo are positive. Equation
(11.4.3) for the 2 X 2 matrix P (t) of the second moments yields
P11 = 2 P12,
P12 = -ao P11 - bo P12 +P22,
P22 = a2 P11 - 2 ao P12 + (b2 - 2 bo) P22
The 3 x 3 matrix 21 in equation (11.4.4) is
0 2 0
I
`2 = -ao -bo 1
a2 -2 ao b2- 2 bo l
and its characteristic equation is
The real parts of its roots are then negative if and only if
b2 < 2bo, a2 < (2 bo-b2) ao
(by the Routh-Hurwitz criterion). Therefore, for fixed ao and bo, the intensity
of the disturbance must not exceed a certain value if the (exponential) stability
in mean square is not to be destroyed. For the nth-order scalar differential equa-
tion with disturbed constant coefficients, compare Theorem (11.5.2).
(11.4.6) E plc Gikhman ([36] and [631, pp. 320-328) has investigated
second-order scalar equations of the form
(11.4.7) Yt+(a (t)+b (t) ti,) Yt = 0, t to,
where be a general disturbance process (the derivative of a martingale). The
stability properties of Yt are closely connected with those of the undisturbed
equation
s(t)+a(t)x(t)=0.
11.4 Linear Equations 193
For example, if rig is a white noise, if, for every solution z (t) of the last equation,
z (t)--+0 and
0
J
z (t)2 dt < oo,
to
and if b (t) is bounded, then E Y; --0 uniformly for all initial values such that
I YtoI + I Yj< R as t- oo. Furthermore, the equilibrium position of equation
(11.4.7) is then stochastically stable.
(11.4.8) Example. The general homogeneous equation (11.4.1) for the case
d=m=1
dX,=A(1)X,dt+B(1)X,dW1, X,s=c,
has, by Corollary (8.4.3), the solution
exp t IV,).
80 80
EX,=cexp(IA(s)ds)
to
and
P (P- 1) B(s)2dsl.
EJX,IP=IcIPexp(p i A (s) ds+ i
to 2 to
From all this, we conclude that the equilibrium position is asymptotically stable
(resp. stable, resp. unstable) in pth mean if and only if
limp J
(pA(s)+P(P-1B(s)2)ds
2 /
is - oo (resp. is less than oo, resp. is oo). In particular, this yields criteria for the
first and second moments. Similarly, the equilibrium position is stochastically
asymptotically stable (resp. stochastically stable, resp. stochastically unstable) if
and only if
- 00
limn p
i
A(s)ds= <0o
to +00
i (A (s) - B (5)2/2) ds
to
I (t) =
2 T (t) log log r (t)
By virtue of the law of the iterated logarithm for W, (,), we see that a sufficient
condition for stochastic asymptotic stability (resp. stochastic instability) is that
lim sup I (t) < -1 (resp. lim inf J (t) > -1).
(11.4.9) Remark. For the solution X, (c) of (11.4.1), we have X, (a c) = of X, (c)
for all a E R'. Therefore,
P I lim X, (c) = 0] = pc = const, for all c E Rd.
1-00
In the case of stochastic asymptotic stability of the equilibrium position, pc- I
as c-r0. This is compatible only with pc =1, hence with
ac- lim X, (c) = 0 for all c E Rd,
d1=br>0,
b, b3
d2 = > 0,
1 b2
by b3 bs ...
1 b2 b4 ... 0
A. _ 0 b, b3 ... 0 > 0.
with
E (t) Gj (s) = Qij 6 (t-s)
is now rewritten, just as in example (8.1.4), as a stochastic first-order differential
equation for the n-dimensional process
Y,
X"
` y -
X,=
X,
Yin - ')
We obtain
dX, = X;+' dt, i =1, ... , n -1,
(11.5.1)
dX, _- biX;+'-idt-1 GijX;+1 idR,9
i=1 i=1 j=1
with an n X n matrix G such that G G'= Q.
In accordance with the criteria of section 11.4 (the Routh-Hurwitz criterion for
equation (11.4.4) or Corollary (11.4.14)), to prove the asymptotic (= exponential)
stability in mean square of (11.5.1), we must treat a system of n (n + 1)/2 linear
equations. We now cite a criterion of Khas'minskiy ([65] , pp. 286-292) that op-
erates with only n + 1 decisions.
(11.5.2) Theorem. The equilibrium position of (1 1.5.1) is asymptotically stable
in mean square if and only if 41 > 0, ... , A. > 0 (the di are the Routh-Hurwitz
determinants mentioned above; their positiveness implies stability of the undis-
turbed equation) and where
A = 0 b1 63 ... 0
0 0 0 ... b
(4-k-1) Qit (- l y+1.
qA
i+j=2(n-k)
For n = 2, Theorem (11.5.2) provides the conditions
bi > 0, b2 > 0, 2 b1 b2 > Q11 b2+Q22,
(see example (11.4.5)). For n = 3,
198 11. Stability of Stochastic Dynamic Systems
A= 0 b1 ... 0
b) If the matrices A and Bi are independent oft, then stochastic asymptotic sta-
bility of the equilibrium position of the linearized equation (11.6.3) implies sto-
chastic asymptotic stability of the original equation (11.6.2).
__ X; 0
dX,
(11.7.1b)
(-Sin X,+Csin2X19 -BX2t ) (sin X, +BX2))dW
9
Replacement of sin y (resp. sin 2 y) with y (resp. 2 y) yields the linearized equa-
tion with constant coefficients
(I1.7.2a) Y,+B(1+Ae,)Y,+(1-2C+Ae,)Y,=0
in the first case and
in the second. A necessary and sufficient condition for asymptotic (and hence ex-
ponential) stability in mean square of the linearized equation (11.7.2) is, accord-
ing to the criterion (11.5.2) with
= (B2 A2 B A2)
B A2 . A2
Q
and with d - n = 2, the following:
B>O, 1-2C>0, 2B(1-2C)>B2A2(1-2C)+A2.
The conditions B > 0 and 1- 2 C > 0 ensure asymptotic stability of the undis-
turbed system as a necessary condition. The last condition yields the inequality
A2< 2B(1-2C)
B2(1-2C)+1*
for the intensity of the disturbance. Therefore, under these conditions, the equi-
librium position of (11.7.2)-hence, by Theorem (11.6.1), the equilibrium posi-
tion of the nonlinear equation-is stochastically asymptotically stable.
200 11. Stability of Stochastic Dynamic Systems
We now seek to use Lyapunov's technique to carry these assertions over to the
nonlinear original equation. The operator L assumes for (11.7.1) the following
form:
a
L= a+x2 a +(-sin xi+Csin2xl-Bx2) ax2
at ax,
A2 32
+ (sin x1 +B x2)2
2 axe2
For the Lyapunov function, we try an expression consisting of a quadratic form
and integrals of the nonlinear components:
xl z1
I2
=ax1+bxlx2+x2+2d sin-' +e(sinx1)2.
2
0
This yields
Lv(x)=(2a-bB)x1x2-(2B-b-A2 B2) x2
+(d-2+2 42B+(4 C+2 e) cos x1) x2 sin x1
and
sin xi)
L v (x)= -(2 B-b-A2 B2) x2-(b - 2 b C cos x1- A2 x, sin xl.
b=- 2B
1+B2(1-2 C)
and hence, for the intensity of the disturbance,
A2< 2B(1-2C)
B2 (1- 2 C) + 1
Theorem (I I.2.8b) ensures, under this condition, the stochastic asymptotic sta-
bility of the equilibrium position of (11.7. 1). This result is identical to the re-
sult obtained by linearization.
Chapter 12
Optimal Filtering of a Disturbed
Signal
Suppose that the observed process Z, (the quantity being measured) isp-dimen-
sional and that it is a disturbed functional of X, of the form
(12.1.2) dZ, = h (t, X,) dt +R (t) dV Z0=b, t _>_ to.
IR(s)12ds<ooforallt>to,
ro
Disturbance W Disturbance V
(e-neighborhoods of the fixed continuous function yi), then the mapping (12.1.6)
is 21-41 ([to, ti)-measurable. A functional that is measurable in the sense of the
above-given formulation of the problem is a mapping
F: C (Ito, tl) --+ Rd
that is measurable with respect to ( ([to, fl) and the Borel sigma-algebra $d in
Rd. Then, of course, the combined mapping
F(Z[to,t}):Q-+Rd
is a d-dimension:.) random variable that depends on w only through Z Ito, tl. We
now seek a measurable Fo: C ([to, tl)--+Rd such that
t,=Fo(ZIto,tl)
has a second moment and expectation value E X,, and possesses the mini-
mality property (12.1.4).
(12.1.7) Remark. The condition (12.1.4) is equivalent to the following condi-
tion: For every nonnegative-definite symmetric matrix C, we have the inequality
''% _e
IXy-X,,Ic = E IX.,-F (Z [to, tl)Ic
with the abbreviation x' C x = IxI2 for x E Rd. The equivalence with (12.1.4)
is seen from the spectral decomposition of C:
12.2 The Conditional Expectation as Optima! Estimate 205
d
C= 1 (ui u;); li and ui are the eigenvalues and eigenvectors of C,
i=1
the relationship
d
Ixl2
c= Ai (us' x)2
i=1
and the choice of the special matrix C = y y'. We mention that the requirement
EIX,,-X,112 = EIX1,-F(Zfto, 11)12
is for d > 1 weaker than (12.1.4). This last also includes the nondiagonal mem-
bers of the matrix E (X,1- X,1) (X,1- X,1)' in the comparison.
E(X1-Xl,)(. -F)'E(E((X11F)'IZ(to,t)))
= E (E ((X,1-11)IZ Ito, t)) (2,1- F)')
= 0,
if we set
X,,=E(X,,IZIto,t1)
Then, for this 111,
E(y'(X,1-X1,))2 E(y'(K1-F))2,
206 12. Optimal Filtering of a Disturbed Signal
Therefore, if we know the conditional density p, (xJZ (to, ti), we can easily find
the optimal estimate X, by means of an ordinary integration.
(12.3.1) Theorem (Busy': representation theorem). Suppose that we are given
equation& (12.1.1) and (12.1.2) with the assumptions stated in section 12.1. We
also assun tli t R (t) R (t)' is positive-definite, that the finite-dimensional dis-
tributions of the process X, have densities, and that
E exp ((t-to) sup
tos,:, (h (s, X,)I2(R(,)R(+)')-1) < oo -
12.3 The Katman-Bucy Filter 207
Here, we have set Ix1A = x' A x. Then, the conditional distribution P (X, E B I Z
Ito, tJ)has a density p, (xIZ Ito, tJ) that, for a fixed observation Z [to, t], is given by
E(eQJX4=x)P4(x)
(12.3.2) P,(xIZ[to,t])= EeQ
Here, p, (x) is the density of X, and
Formula (12.3.2) was conjectured by Bucy [60] and proven by Mortensen [741
(see also Bucy and Joseph 1611 ). For fixed Z Ito, tJ, the integral with respect to
Z, in (12.3.3) is, in complete analogy with the integral with respect to W,, de-
fined by the approximation of the integrand by means of step functions and the
choice of the left end-points of a decomposition as the intermediate points.
For useless observations (h -=O or (R (t) R (t)') ' MO), we obtain from Theorem
(12.3.1) Q =- 0 and hence
a2
Z*g -ax;UrB)+2
d a 1 d d
ax;axl((GG'),,B)
and
Equation (12.3.4) shows the possibility, at least in theory, of calculating the con-
ditional density p, (xIZ (to, t1), beginning with the initial value p,0 (x) with pro-
gressive observation of Z.. Here, p, (xIZ Ito, tI) depends only on the two systems
defined by the functions J, G, h, and R, on the density of the initial value, and
on the observations up to the instant t. Therefore, (12.3.4) can be regarded as the
dynamic equation for the optimal filter.
Equation (12.3.4) yields, upon integration with respect to x, equations for the
moments of the conditional density, in particular, for the optimal estimateX, :
dX, = j (t, x) dt+(a (t, x)'-X, h (t, x)') (R (t) R
(dZ,-h (t, x) dt)
with the initial value X,o = E c. Also of interest is the estimation error, that is,
the conditional covariance matrix
P (tIZ Ito, fl) = E ((X,-.k,) (X,-X,)'IZ (to,'I)
For this, we get from (12.3.4) (see Jazwinski [66] , p. 184)
matrix functions A (t), B (t), H (t), R (t), and (R (t) R (t)') -I are bounded in
every bounded subinterval of (to, oo). We again assume the independence of
W., V., c, and b. If c and b are normally distributed or constant, then, in ac-
cordance with Theorem (8.2.10),X, and hence Z, are Gaussian processes.
Therefore, all the conditional distributions are normal distributions. In parti-
cular, we have
(12.4.1) Theorem (Kalman-Bucy filters for linear systems). In the linear case,
the conditional density p, (xI Z Ito, tJ) of X, under the condition that Z [to, tJ
was observed is the density of a normal distribution with mean
X, = E (X, I Z (to, tJ)
and covariance matrix
P (t) = E((X,-X,) (X,Z [to, tJ) = E(X,-X,)(X,
The dynamic equations for these parameters are
X,= jD(t,s)dZ,.
60
=E(X,X;)H(u)'
(see Bucy and Joseph [611, p. 53, or Gikhman and Skorokhod [5], p. 229).
(12.4.3) Example. Suppose that the signal process is undisturbed (B (t) = 0)
and that it starts with a random 9? (0, E c c')-distributed initial value with posi-
tive-definite E c c'. The (unconditional) distribution of X, (before observation!)
is, by Theorem (8.2.10),
%(0,0(t)Ecc'0
where 0 (t) is the fundamental matrix of X, =A (t) X. The conditional distri-
bution of X, after observation of Z [to, t] is, as one can verify by substitution
into the equations of Theorem (12.4.1), a normal distribution 9 (X P (t))
with
X, = P (t) (0 (0')-1
0 (s)' H (s)' (R (s) R (s)')-1 dZ,
to
P (t) _
Since the second summand in P (t) is positive-definite, the error covariance ma-
trix P (t) is "smaller" than the original covariance matrix of X.
Chapter 13
Optimal Control of Stochastic
Dynamic Systems
13.1 Bellman's Equation
The analytical difficulties that arise with a mathematically rigorous treatment of
stochastic control problems are so numerous that, in this brief survey, we must
confine ourselves on didactic grounds to a more qualitative and intuitive treat-
ment.
As in the case of stability theory, there is again here a well-developed theory for
deterministic systems, which one can study, for example, in the works by Athans
and Falb 1561, Strauss [78] , or Kalman, Falb, and Arbib (681. It is again a mat-
ter of replacing, in the shift to the stochastic case, the first-order derivatives at
the corresponding places with the infinitesimal operator of the corresponding
process. For a more detailed treatment of optimal control of stochastic systems,
we refer to the books and survey articles by Aoki [55] , Kushner [72] , Strato-
novich 1771, Bucy and Joseph 1611, Mandl [28] , Khas'minskiy 1651, Fleming
[621, and Wonham [80] and to the literature cited in those works.
Let us now consider a system described by the stochastic differential equation
dX, = f (t, X u (t, X,)) dt+G (t, X u (t, X,)) dW,,
(13.1.1) X,o = c, t?to,
where, as usual, X f (t, x, u), and c assume values in Rd, G (t, x, u) is (d x m)-
matrix-valued, and W, is an m-dimensional Wiener process. The new variable u in
the arguments of / and G varies in some RP, and the functions f (t, x, -) and
G (t, x, -) are assumed to be sufficiently smooth. The function u (t, x) in equa-
tion (13.1.1) is a control function belonging to a set U of admissible control
functions. We shall confine ourselves here to so-called Markov control functions
which depend only on t and on the state X, at the instant t (and not, for ex-
ample, on the values of X, for s <t). The system (13.1.1) is also called a "plant".
If we substitute a fixed control function u E U in (13.1.1), we get a stochastic
differential equation of the usual form. Now, the set U must be narrowed down
by boundedness and analytical conditions, which we shall not further specify, in
212 13. Optimal Control of Stochastic Dynamic Systems
such a way that existence and uniqueness of a solution X, = X", which now de-
pends on u E U, are ensured for the differential equation. The solution that starts
at x at the instants will be denoted by X, (s, x) = X, (s, x). We shall also write.
E g (X, (s, x)) = E,, = g (X,).
Suppose that the costs arising from the choice of control function u up to an in-
stant T < oo are, in the case of a start at x at the instant s,
T
(13.1.2) V" (s, x) = E,,s k (r, X u (r, X,)) dr+M (T,
\I
Here, we shall confine ourselves to fixed-time control. In general, T in (13.1.2) is
replaced with a random instant rr, at which the process reaches a specified target
set. The functions k and M have respectively nonnegative and real values. The
integral in (13.1.2) represents the running costs, and the second term in the large
parentheses represents the unique cost for a stop at XT at the instant T.
We now seek the optimal control function, that is, the control function u` E U
that minimizes the costs:
V (s, x) = V"' (s, x) = m u V" (s, x).
Disturbance W
Plant (13.1.1)
Fig. 9:
Scheme of the optimal control. X,
Optimal controller or
1
optimizer for (13.1.2)
a a2
'=a+E/i(s,x,u) +1 E (G(s,x,u)G(s,x,u)')ij axi ax; .
Lu V (s, x)+k (s, x, u) is, for given V and fixed (s, x), a function of u E Ri',
whose minimum is sought. The position u* of this minimum depends on (s, x);
thus, u*=u* (s, x). If V (s, x) is equal to the optimal costs and if the function
u* (s, x) resulting from the search for the minimum is an admissible control
function, it is also an optimal control function. Then,
Lu'(s,x) V (s, x)+k (s, x, u* (s, x)) = min (Lu V (s, x)+k (s, x, u)) = 0.
The following steps therefore yield (under certain conditions) both the optimal
control function and the minimum costs:
1. For fixed V, we determine the point u = u (s, x; V) at which Lu V (s, x) +
k (s, x, u) attains its minimum.
2. We substitute the function u for the parameter u in Lu V (s, x) + k (s, x, u)
and solve the partial differential equation
L" V (s, x)+k (s, x, u (s, x; V)) = 0, to <s ;ET,
with the end condition V (T, x) = M (T, x). The solution V (s, x) yields the
minimum costs.
3. The function V (s, x) is inserted into the function ii determined in step 1. It
yields the optimal control function u* = u* (s, x) = u (s, x; V (s, x)).
We shall illustrate this in the next section for the linear case and a quadratic
"criterion" (13.1.2).
a+(A(t)x+B{t)u, 32
Lu= as ax 2 ;.1 ,_1 ax; ax;
,
214 13. Optimal Control of Stochastic Dynamic Systems
so that
L" V (s, x) = as +(A (t) x)' Vs+(B (t) u)' Vs+2 tr (C (t)G (t)' Vsx),
aV
+(A (s) x)' V. +(B (s) u)' 1 tr (G (s) C (s)'
(13.2.1) as 2
+x'C(s)x+u'D(s)u=min.
The quadratic function (B (s) u)' D (s) u assumes its minimum when
When we substitute this into (13.2.1), we get the partial differential equation for
the minimum costs V (s, x), for to :-5 s:-5 T:
p(s)+tr(GG'Q)-q'BD-'B'q=O, p(T)=b(T)-
4
These must be solved backwards in direction to beginning with T.
Since Vs = 2 Q x+ q, the optimal control function u* is now obtained from
(13.2.2):
k(t,x,u)=x'C(t)x+u'D(t)u,
M (T, x) = x'F (T) x,
where C (t) and F (t) are symmetric and nonnegative-definite and D (t) is sym-
metric and positive-definite, then the control and filtering can be separated from
each other. In the following discussion, we shall follow Bucy and Joseph ([61) ,
pp. 96-102).
Since instead of X, we know only the observations Z, our control functions u
are allowed to depend not on X, but only on the observations Z [to, tJ; that is,
we are considering functionals of the form
U,=u(t,ZIto,tJ).
Also, the cost functional must now be considered under condition of a certain
observation:
on the interval [to, TI with the initial value X,o = 0. Finally, P (t) = E ((X, -
X,) (X,-X,)'I Z Ito, tI)is the error covariance matrix which is independent of
the control function and of the observation and which satisfies the equation
P(t)=AP+PA'-PH'(RR')HP+GG', to:t<_T,
P(to)=Ecc',
The minimum costs arising from use of the control function Ut with the esti-
mated starting point X, in is, TI are then
T
T
+tr F (T) P (T) + S tr (C (r) P (r)) dr.
Disturbance W Disturbance V
Fig. 10:
Separation of
filtering and
control.
U, z,
The combined filtering and control problem can obviously be broken into the
following problems:
1. Filtering: determination of the optimal estimate X, of X, on the basis of the
observation Z Ito, ti from equation (13.3.5).
2. Determination of the optimal control function u* = u5 (t, X,) for the deter-
ministic problem (G m 0). We get
151 1 Wax, N. (ed.), Selected Papers on Noise and Stochastic Processes, New
York, Dover, 1954, (contains [321, [351, [491, [501).
[521 Wong, E., and Zakai, M., "On the convergence of ordinary integrals to sto-
chastic integrals," Ann. Math. Statist., 36 (1965) pp. 1560-1564.
[531 Wong, E., and Zakai, M., "The oscillation of stochastic integrals," Z. Wahrs-
cheinlichkeitstheorie verw. Geb., 4 (1965) pp. 103-112.
[541 Wong, E., and Zakai, M.: "Riemann-Stieltjes approximation of stochastic
integrals," Z. Wahrscheinlichkeitstheorie verw. Geb., 12 (1969), pp.
87-97.
[68] Kalman, R. E., Falb, P. L., and Arbib, M. A., Topics in Mathematical System
Theory, New York, McGraw-Hill, 1969.
[69] Kolmogorov, A. N., Interpolirovaniye i ekstrapolirovaniye statsionarnykh
sluchaynykh posledovatel'nostey (Interpolation and extrapolation of
stationary random sequences), Izvestiya Akad. nauk (seriya matemati-
cheskaya), 5 (1941), pp. 3-14.
[70] Kozin, F., "On almost sure asymptotic sample properties of diffusion pro-
cesses defined by stochastic differential equations," Journ. of Math.
of Kyoto Univ., 4 (1965), pp. 515-528.
[711 Kushner, H. J., "On the differential equations satisfied by conditional prob-
ability densities of Markov processes," SIAM J. Control, 2 (1964),
pp. 106-119.
[72] Kushner, H. J., Stochastic Stability and Control, New York, Academic Press,
1967.
[73] Morozan, T., Stabilitatea sistemelor cu parametri aleatori (stability of sys-
tems with random parameters), Bucharest, Editura Academiei Repub-
licii Socialiste Romania, 1969.
[74] Mortensen, R. E., Optimal Control of Continuous-Time Stochastic Systems,
Ph.D. thesis (engineering), Berkeley, Univ. of California press, 1966.
[75] Sagirow, P., Stochastic Methods in the Dynamics of Satellites, Lecture
Notes, Udine, CISM, 1970.
[76] Stratonovich, R. L.: Topics in the Theory of Random Noise, Vol. 1, New
York, Gordon and Breach, 1963 (translation from Russian).
[77] Stratonovich, R. L.: Conditional Markov Processes and Their Application
to the Theory of Optimal Control, New York, American Elsevier,
1968 (translation from Russian).
[78] Strauss, Aaron: An Introduction to Optimal Control Theory, Berlin, Heidel-
berg, and New York, Springer, 1968 (Lecture notes in operations re-
search and mathematical economics, Vol. 3.)
[79] Wiener, N.: Extrapolation, Interpolation and Smoothing of Stationary Time
Series with Engineering Applications, Cambridge, Mass., Mass. Inst.
Tech. press, 1949.
[80] Wonham, W. M., "Random differential equations in control theory" in A.
T. Bharucha-Reid (ed.): Probabilistic Methods in Applied Mathema-
tics, Vol. 2, New York, Academic Press, 1970, pp. 131-212.
Subject Index