Professional Documents
Culture Documents
Tutor PDF
Tutor PDF
September 1988
Synopsis
We present in these lectures, in an informal manner, the very basic ideas and results of
stochastic calculus, including its chain rule, the fundamental theorems on the representation of martingales as stochastic integrals and on the equivalent change of probability
measure, as well as elements of stochastic differential equations. These results suffice for
a rigorous treatment of important applications, such as filtering theory, stochastic control, and the modern theory of financial economics. We outline recent developments in
these fields, with proofs of the major results whenever possible, and send the reader to the
literature for further study.
Some familiarity with probability theory and stochastic processes, including a good
understanding of conditional distributions and expectations, will be assumed. Previous
exposure to the fields of application will be desirable, but not necessary.
Lecture notes prepared during the period 25 July - 15 September 1988, while the author
was with the Office for Research & Development of the Hellenic Navy (ETEN), at the
suggestion of its former Director, Capt. I. Martinos. The author expresses his appreciation
to the leadership of the Office, in particular Capts. I. Martinos and A. Nanos, Cmdr. N.
Eutaxiopoulos, and Dr. B. Sylaidis, for their interest and support.
CONTENTS
Page
0.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.
Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.
3.
Stochastic Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.
5.
6.
7.
Filtering Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.
Robust Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.
Stochastic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
10.
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
11.
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.
GENERALITIES
Sn () = nj=1 Tj () for n 1.
The interpretation here is that the Tj s represent the interarrival times, and that the Sn s
represent the arrival times, of customers in a certain facility. The stochastic process
Nt () = #{n 1 : Sn () t},
0t<
counts, for every 0 t < , the number of arrivals up to that time and is called a Poisson
process with intensity > 0. Every sample path t 7 Nt () is a staircase function
(piecewise constant, right-continuous, with jumps of size +1 at the arrival times), and
we have the following properties:
(i) for every 0 = t0 < t1 < t2 < < tm < t < < , the increments Nt1 ,
Nt2 Nt1 , , Nt Ntm , N Nt are independent;
(ii) the distribution of the increment N Nt is Poisson with parameter ( t), i.e.,
P [N Nt = k] = e(t)
(( t))k
,
k!
k = 0, 1, 2, .
In other words, given the past {Ns : 0 s < t} and the present {Nt }, the future
{N } depends only on the present. This is the Markov property of the Poisson process.
1.2 Remark on Notation: For every stochastic process X, we denote by
(1.2)
FtX = Xs ; 0 s t
the record (history, observations, sample path) of the process up to time t. The resulting
family {FtX ; 0 t < } is increasing: FtX FX for t < . This corresponds to the
intuitive notion that
X
Ft represents the information about the process
(1.3)
,
X that has been revealed up to time t
4
Just like the Poisson process, every process with independent increments has this property.
1.4 The Martingale property: A stochastic process X with E|Xt | < is called
martingale, if
E(Xt |Fs ) = Xs
submartingale, if
E(Xt |Fs ) Xs
supermartingale, if E(Xt |Fs ) Xs
1.5 Discussion: (i) The filtration {Ft } in 1.4 can be the same as {FtX }, but it may also be
larger. This point can be important (e.g. in the representation Theorem 5.3) or even crucial
(e.g. in Filtering Theory; cf. section 7), and not just a mere technicality. We stress it, when
necessary, by saying that X is an {Ft } - martingale, or that X = {Xt , Ft ; 0 t < }
is a martingale.
(ii) In a certain sense, martingales are the constant functions of probability theory;
submartingales are the increasing functions, and supermartingales are the decreasing
functions. In particular, for a martingale (submartingale, supermartingale) the expectation t 7 EXt is a constant (resp. nondecreasing, nonincreasing) function; on the other
hand, a super(sub)martingale with constant expectation is necessarily a martingale. With
this interpretation, if Xt stands for the fortune of a gambler at time t, then a martinale (submartingalge, supermartingale) corresponds to the notion of a fair (respectively:
favorable, unfavorable) game.
(iii) The study of processes of the martingale type is at the heart of stochastic analysis, and
becomes exceedingly important in applications. We shall try in this tutorial to illustrate
both these points.
1.6 The Compensated Poisson process: If N is a Poisson process with intensity
> 0, it is checked easily that the compensated process
Mt = Nt t , FtN , 0 t <
is a martingale.
In order to state correctly some of our later results, we shall need to localize the
martingale property.
5
1.7 Definition: A random variable : [0, ] is called a stopping time of the filtration
{Ft }, if the event { t} belongs to Ft , for every 0 t < .
In other words, the determination of whether has occurred by time t, can be made
by looking at the information Ft that has been made available up to time t only, without
anticipation of the future.
For instance, if X has continuous paths and A is a closed set of the real line, the
hitting time A = min{t 0 : Xt A} is a stopping time.
1.8 Definition: An adapted process X = {Xt , Ft ; 0 t < } is called a local martingale,
if there exists an increasing sequence {n }
n=1 of stopping times with limn n = such
that the stopped process {Xtn , Ft ; 0 t < } is a martingale, for every n 1.
It can be shown that every martingale is also a local martingale, and that there exist
local martingales which are not martingales; we shall not press these points here.
1.9 Exercise: Every nonnegative local martingale is a supermartingale.
1.10 Exercise: If X is a submartingale and is a stopping time, then the stopped process
4
Xt = X t , 0 t < is also a submartingale.
1.11 Exercise (optional sampling theorem): If X is a submartingale with rightcontinuous sample paths and , two stopping times with M (w.p.1) for some
real constant M > 0, then we have E(X ) E(X ).
2. BROWNIAN MOTION
This is by far the most interesting and fundamental stochastic process. It was studied
by A. Einstein (1905) in the context of a kinematic theory for the irregular movement
of pollen immersed in water that was first observed by the botanist R. Brown in 1824,
and by Bachelier (1900) in the context of financial economics. Its mathematical theory
was initiated by N. Wiener (1923), and P. Levy (1948) carried out a brilliant study of its
sample paths that inspired practically all subsequent research on stochastic processes until
today. Appropriately, the process is also known as the Wiener-Levy process, and finds
applications in engineering (communications, signal processing, control), economics and
finance, mathematical biology, management science, etc.
2.1 Motivational considerations (in one dimension): Consider a particle that is subjected to a sequence of I.I.D. (independent, identically distributed) Bernoulli kicks
1 , 2 , with P [1 = 1] = 1/2, of size h > 0, at the end of regular time-intervals
of constant length > 0. Thus, the location of the particle after n kicks is given as
h nj=1 j (); more generally, the location of the particle at time t is
[t/]
St () = h.
j () ,
j=1
0 t < .
The resulting process S has right-continuous and piecewise constant sample paths, as
well as stationary and independent increments (because of the independence of the j s).
Obviously, ESt = 0 and
t h2
2
V ar(St ) = h
t.
=
4
(n)
St () =
(2.1)
j () ; 0 t < , n 1.
n j=1
1 j m.
(n)
You can easily imagine now that the entire process S (n) = {St ; 0 t < }
converges in distribution (in a suitable sense) as n , to a process W = {Wt ; 0
t < } with the following properties:
(i)
W0 = 0;
(ii)
(2.2)
(iii)
(iv)
n2 ( n1 )
0
; | | 1/n
; | | 1/n
.
t = lim t
n
(in a generalized, distributional sense) is then a zero mean Gaussian process with covariance function R(t, s) = E(t s ) = (t s). It is called White Noise and is of tremendous
importance in communications and system theory.
Nota Bene: Despite its continuity, the sample path t 7 Wt () is not differentiable
anywhere on [0, ).
(n)
(n)
(n)
Now fix a t > 0 and consider a sequence of partitions 0 = t0 < t1 < . . . < tk
(n)
(n)
. . . < t2n = t of the interval [0, t], say with tk = kt2n , as well as the quantity
(2.3)
2n
p
X
Vp(n) () =
Wt(n) () Wt(n) () ,
4
k=1
k1
p > 0,
<
the variation of order p of the sample path t 7 Wt () along the nth partition. For
(n)
p = 1, V1 () is simply the length of the polygonal approximation to the Brownian path;
(n)
for p = 2, V2 () is the quadratic variation of the path along the approximation.
2.5 Theorem: With probability one, we have
lim Vp(n)
(2.4)
=
t
;
;
;
0 < p < 2
p=2
.
p>2
for
for
for
In particular:
2n
X
Wt(n) Wt(n) ,
(2.5)
k=1
k1
2
X
(2.6)
Wt(n) Wt(n)
k=1
2
t .
k1
Remark: The relations (2.5), (2.6) become easily believable, if one considers them in L1
rather than with probability one. Indeed, since
q
1
(n)
(n)
E Wt(n) Wt(n) = c tk tk1 = c 2n/2 t 2
k1
k
2
(n)
(n)
E Wt(n) Wt(n)
= tk tk1 = 2n t ,
k
with c =
k1
2, we have as n :
n
2
X
E
Wt(n) Wt(n)
k=1
k1
= c.2n/2 ,
2
X
k=1
Wt(n) Wt(n)
k
2
= t.
k1
Arbitrary (local) martingales with continuous sample paths do not behave much differently. In fact, we have the following result.
2.6 Theorem: For every nonconstant (local) martingale M with continuous sample paths,
we have the analogues of (2.5), (2.6):
(2.7)
2n
X
P
Mt(n) Mt(n)
k=1
k1
2
X
(2.8)
k=1
Mt(n) Mt(n)
k
k1
2
hM it ,
(2.9)
(2.9)0
2.8 Corollary: Every (local) martingale M , with sample paths which are continuous and
of finite first variation, is necessarily constant.
2.9 Exercise: For the compensated Poisson process M of 1.6, show that Mt2 t is a
martingale, and thus hM it = t in (2.9)0 .
2.11 Exercise: For any two (local) martingales M and N with continuous sample paths,
we have
k
(2.10)
2
X
k=1
k1
k1
i
1h
hM + N it hM N it .
4
2.12 Remark: The process hM, N i of (2.10) is continuous and of bounded variation
(difference of two nondecreasing processes); it is the unique process with these properties,
for which
(2.11)
E XT , p > 1 .
p1
0tT
3. STOCHASTIC INTEGRATION
Consider a Brownian motion W adapted to a given filtration {Ft }; for a suitable adapted
process X, we would like to define the stochastic integral
Z t
(3.1)
It (X) =
Xs dWs
0
and
R t to study its properties as a process indexed by t. We see immediately, however, that
Xs ()dWs () cannot possibly be defined for any as a Lebesgue-Stieltjes integral,
0
because the path s 7 Ws () is of infinite first variation on any interval [0, t]; recall (2.5).
Thus, we need a new approach, one that can exploit the fact that the path has finite
and positive second (quadratic) variation; cf. (2.6). We shall try to sketch the main lines
of this approach, leaving aside all the technicalities (which are rather demanding!).
Just as with the Lebesgue integral, it is pretty obvious what everybodys choice should
be for the stochastic integral, in the case of particularly simple processes X. Let us place
ourselves, from now on, on a finite interval [0, T ].
3.1 Definition: A process X is called simple, if there exists a partition 0 = t0 <
t1 . . . < tr < tr+1 = T such that Xs () = j (); tj < s tj+1 where j is a bounded,
Ftj measurable random variable.
For such a process, we define in a natural way:
Z t
m1
4 X
It (X) =
Xs dWs =
j (Wtj+1 Wtj ) + m (Wt Wtm ); tm < t tm+1
0
(3.2)
=
j=0
r
X
j (Wttj+1 Wttj ).
j=0
There are several properties of the integral that follow easily from this definition; pretending that t = tm+1 to simplify notation, we obtain
m
m
X
X
E j E(Wtj+1 Wtj |Ftj ) = 0 ,
j (Wtj +1 Wtj ) =
EIt (X) = E
j=0
j=0
11
In other words, the integral is a martingale with continuous sample paths. What is the
quadratic variation of this martingale? We can get a clue, if we compute the second
moment
(3.4)
!2
m
X
E(It (X))2 = E
j (Wtj+1 Wtj )
= E
= E
j=0
m
X
j2 (Wtj+1 Wtj )2 + 2 E
j=0
m
X
m X
m
X
j=0 i=j+1
j=0
+2E
m X
m
X
j=0 i=j+1
= E
m
X
j2 (tj+1
Xu2 du.
tj ) = E
0
j=0
Z
du Fs ,
Xu2
hI(X)it =
Rt
0
Xu2 du .
Z
Xu Yu du | Fs .
s
hI(X)it =
Xu2
Z
du ,
Xu Yu du .
0
12
In particular, we have
(3.7)
t
Xu2 du ,
Z
E[It (X)It (Y )] = E
Xu Yu du.
Xu2 du <
|Xu(n) Xu |2 du 0 .
n
(w.p.1),
one can define the stochastic integral I M (X) of X with repect to M ; the resulting process
is a local martingale with continuous paths and quadratic (and cross) variations
(3.8)
hI (X)it =
Xu2 dhM iu ,
hI (X), I (Y )it =
Xu Yu dhM iu .
0
(3.9)
RT
hI (X), I (Z)it =
Xu Zu dhM, N iu .
0
RT
RT
If now E 0 (Xu2 + Yu2 )dhM iu < , E 0 Zu2 dhN iu < , then the processes I M (X) ,
I M (Y ) and I N (Z) are actually square-integrable martingales, with
(3.10)
E(ItM (X))
= 0,
2
ItM (X)
Z
=E
Xu2 dhM iu ,
(3.11)
E[ItM (X)ItM (Y
Xu Yu dhM iu ,
)] = E
0
(3.12)
Xu Zu dhM, N iu .
=E
0
hI (X), N it =
Xu dhM, N iu .
0
It turns out that this property characterizes the stochastic integral, in the following sense:
suppose that for some continuous local martingale we have
Z
h, N it =
Xu dhM, N iu ,
0tT
f ((t)) = f ((0)) +
f 0 ((s))d(s).
Actually (4.1) holds even when is simply of bounded variation (not necessarily continuously differentiable), provided that the integral on the right-hand side is interpreted in
the Stieltjes sense. We cannot expect (4.1) to hold, however, if () is replaced by W.(),
because the path t 7 Wt () is of infinite variation for almost every . It turns out
that we need a second-order correction term.
4.1 Theorem: Let f : R R be of class C 2 , and W be a Brownian motion. Then
t
Z
(4.2)
1
f (Ws )dWs +
2
0
f (Wt ) = f (W0 ) +
0
00
f (Ws )ds ,
0 t < .
f (Mt ) = f (M0 ) +
0
1
f (Ms )dMs +
2
0
00
f (Ms ) dhM is ,
0 t < .
Idea of proof: Consider a partition 0 = t0 < t1 < . . . < tm < tm+1 = t of the interval
[0, t], and do a Taylor expansion:
f (Wt ) f (W0 ) =
m
X
{f (Wtj+1 ) f (Wtj )} =
j=1
m
X
j=1
1
2
m
X
00
j=1
where j is an Ftj+1 measurable random variable with values in the interal [1, 1]. In this
last expression, as the partition becomes finer and finer, the first sum approximates the
15
stochastic integral
cf. (2.6).
Rt
0
Rt
0
f 00 (Ws )ds;
Rt
4.2 Example: In order to compute 0 Ws dWs , all you have to do is take f (x) = x2 in
(4.2), and obtain
Z t
1
Ws dWs = (Wt2 t) .
2
0
(PTry to arrive at the same conclusion, by evaluating the approximating sums of the form
m
j=1 Wtj (Wtj+1 Wtj ) along a partition, and then letting the partition become dense in
[0, t]. Notice how harder you have to work this way!).
4.3 Theorem: Let M be a (local) martingale with continuous sample paths and hM it = t.
Then M is a Brownian motion.
Proof: We have to show that M has independent increments, and that the increment
Mt Ms has a normal distribution with mean zero and variance t s , for 0 < s < t < .
Both these claims will follow, as soon as it is shown that
i
h
1 2
(4.4)
E ei(Mt Ms ) Fs = e 2 (ts) ; R .
With f (x) = eix , we have from (4.3):
Z t
Z
2 t iMu
iMt
iMs
iMu
e
= e
+
ie
dMu
e
du ,
2 s
s
and this leads to:
Z t
i
2 Z t h
i
h
i(Mt Ms )
iMs
iMu
i(Mu Ms )
E
e
dMu Fs
E e
E e
Fs = 1+ie
Fs du .
2 s
s
Because the conditional expectation of the stochastic integral is zero (the martingale property!), we are led to the conclusion that the function
i
h
4
i(Mt Ms )
g(t) = E e
Fs ; t s
Rt
satisfies the integral equation g(t) = 1 21 2 s g(u)du. But there is only one solution to
1 2
this equation, namely g(t) = e 2 (ts) , proving (4.4).
Theorem 4.1 can be generalized in several ways; here is a version that we shall find
most useful, and which is established in more or less the same way.
4.4 Proposition: Let X be a semimartingale, i.e., a process of the form
X t = X 0 + Mt + V t , 0 t <
16
where M is a local martingale with continuous sample paths, and V a process with continuous sample paths of finite first variation. Then, for every function f : R R of class
C 2 , we have
(4.5)
Z
Z t
Z t
1 t 00
0
0
f (Xt ) = f (X0 ) +
f (Xs ) dMs +
f (Xs ) dVs +
f (Xs ) dhM is , 0 t < .
2 0
0
0
More generally, let X = (X (1) , , X (d) ) be an Rd - valued process with components
(i)
(i)
(i)
(i)
Xt = X0 + Mt + Vt of the above type, and f : Rd R a function of class C 2 . We
have then
d Z
X
d Z t
X
f
f
(i)
f (Xt ) = f (X0 ) +
(Xs )dMs +
(Xs )dVs(i) +
xi
xi
i=1 0
i=1 0
d
d Z
1 X X t 2f
(Xs ) dhM (i) , M (j) is , 0 t < .
+
2 i=1 j=1 0 xi xj
(4.6)
Z
(4.8)
Zs dMs , 0 t < .
Zt = 1 +
0
(4.9)
Mt =
Xs dWs =
0
with
RT
0
i=1
Xs(i) dWs(i)
(4.11)
d Z
X
Zt = exp
0
1
Xs dWs
2
17
t
2
||Xs || ds
0
Z
(4.12)
Zs Xs dWs , 0 t <
Zt = 1 +
0
and is a martingale if
n1 Z T
oi
E exp
||Xs ||2 ds
< .
2 0
h
(4.13)
(1) (2)
Xt Xt
(1) (2)
X0 X0
Z
+
Xs(1) dXs(2)
Z
+
f (x1 , x2 ) = x1 x2 in (4.6), we
4.7 Exercise: Using the formula (4.6), establish the following multi-dimensional analogue of Theorem 4.3: If M = (M (1) , . . . , M (d) ) is a vector of (local) martingales with
continuous paths and hM (i) , M (j) it = tij , then M is an Rd - valued Brownian motion.
0 , i 6= j
Here, ij =
is the Kronecker delta.
1 , i=j
E.g., if
Mt =
Rt
0
Xs dBs , where B
18
takes values in
R\{0}.
0s<
Ws = MT (s) ,
Z
(5.2)
Mt = M0 +
0t<
Xs dWs ,
0
Z
(5.3)
Wt =
0
1
dMs
Xs
in (5.2). For the general case assume, as you may, that the probability space is rich enough
to support a Brownian motion B independent of M , and use B to modify accordingly the
definition of W in (5.3)).
Here is now our final, and most important, representation result for martingales in
terms of Brownian motion. In constrast to both Theorem 5.1 and Exercise 5.2, the Brownian motion W in Thorem 5.3 is given and fixed.
5.3 Theorem: Let W be a Brownian motion, and recall the notation FtW = (Ws ; 0
s t) of (1.2) for the history of the process up to time t. Every local martingale M with
respect to the filtration {FtW } admits a representation of the form
Z
(5.4)
Mt = M0 +
Xs dWs ,
0 t < ,
RT 2
Xs ds <
for a measurable process X which is adapted to {FtW } and satisfies
0
(w.p.1) for every 0 < T < . In particular, every such M has continuous sample paths.
19
( n
X
1X 2
i i ()
2 i=1 i
i=1
)
P (d),
Pn
1
2i
n
Y
E ei i = 1.
i=1
We have
!
n
X
1
2
Pe[ dz1 , . . . , n dzn ] = exp
i zi
P [1 dz1 , . . . , n dzn ]
2 1 i
1
)
(
n
X
1
(zi i )2 dz1 . . . dzn .
= (2)n/2 exp
2 i=1
n
X
E(ZT ) = 1.
ft = Wt
W
Xs ds, Ft , 0 t T
0
is Brownian motion.
5.6 Remark: Recall the sufficient condition (4.13) for the validity of (5.6); in particular,
Z is a martingale if X is bounded.
5.7 Remark: We have the following generalization of Theorem 5.5. Suppose that M is
a local martingale with continuous sample paths and M0 = 0, for which the exponential
local martingale Z = exp[M 12 hM i] of (4.7) is actually a martingale, and define a new
probability measure P as in (5.6). Then for any continuous, P -local martingale N , the
process
Z t
1
4
t = Nt hM, N it = Nt
dhZ, N is ,
0tT
N
0 Zs
is a (continuous) local martingale under Pe.
(6.1)
where Xt is the (Rd valued) state of the system at time t, and t is the (Rn valued)
input at t.
The study of dynamical systems of this form, for deterministic inputs , is wellestablished. We should like to develop a similar theory for stochastic inputs as well; in
particular, we shall develop a theory for equations of the type (6.1) when is a white noise
process as in 2.4.
Since, formally at least, we have t = dWt /dt, we shall prefer to look at the integral
version
Z t
Z t
(6.2)
Xt = +
b(s, Xs ) ds +
(s, Xs ) dWs , 0 t T
0
(6.2). Note that, before we can even pose such questions, we need to have developed a
theory of stochastic integration with respect to Brownian motion W (in order to make
sense of the last integral in (6.2)). This was part of the motivation behind the theory of
section 3.
6.1 Example: Linear Systems. In the case b(t, x) = A(t)x + a(t), (t, x) = B(t)x + b(t),
the resulting equation
(6.3)
(in differential form) is linear in Xt , and can be solved in principle explicitly; the solution
X is then a Gaussian process, if the initial random vector X0 has a normal distribution
and is independent of the driving Brownian motion W .
More precisely, one considers the deterministic linear system
= A(t)(t) + a(t)
(t)
(6.3)0
and
Z t
Z t
1
1
Xt = (t) X0 +
(s)a(s)ds +
(s)b(s)dWs ,
0
respectively.
Z t
1
m(t) = (t) m(0) +
(s)a(s)ds
0
Z st
1
1
T
(s, t) = (s) (0, 0) +
(u)b(u)( (u)b(u)) du T (t).
0
1
B(s) dWs
2
22
t
2
B (s) ds
0
;
(6.4)
leads to the so-called Ornstein-Uhlenbeck process (Brownian movement with proportional restoring force). Assuming that there is a unique solution to (6.4) cf. Theorem
6.4 below we can find it via an integration by parts:
d(Xt et ) = et (dXt + Xt dt) = bet dWt ,
(6.5)
Xt e
that is,
= X0 +
bes dWs ,
t0.
aij (t, x) =
n
X
1 i, j d
k=1
At (x) =
d
X
bi (t, x)
i=1
2 (x)
(x) 1 X X
+
aij (t, x)
.
xi
2 i=1 j=1
xi xj
f (s, Xs )
ik (s, Xs )s dWs(k) .
xi
23
6.3 Exercise: In the case of bounded, continuous drift and diffusion coefficients b(t, x),
and a(t, x), establish their respective interpretations as local velocity vector
(6.8)
bi (t, x) = lim
h0
1
(i)
E[Xt+h xi |Xt = x] ;
h
1id
1
(i)
(j)
E[(Xt+h xi )(Xt+h xj )|Xt = x] ;
h
1 i, j d.
Here is the fundamental existence and uniqueness result for the equation (6.2).
6.4 Theorem: Suppose that the coefficients of the equation (6.2) satisfy the Lipschitz and
linear growth conditions
(6.10)
(6.11)
x, y Rd ,
x Rd ,
for some real K > 0. Then there exists a unique process X that satisfies (6.2); it has
continuous sample paths, is adapted to the filtration {FtW } of the driving Brownian motion
W , is a Markov process, and its transition probability density function
Z
P [Xt A|Xs = y] =
p(t, x; s, y)dx
A
(6.13)
d
d
d
X
1 X X 2
The idea in the proof of the existence and uniqueness part in Theorem 6.4, is to
mimic the procedure followed in ordinary differential equations, i.e., to consider the Picard
iterations
Z t
Z t
(k+1)
(0)
(k)
X , Xt
=+
b(s, Xs )ds +
(s, Xs(k) )dWs
0
24
for k = 0, 1, 2, . The conditions (6.10), (6.11) then guarantee that the sequence of
continuous processes {X (k) }
k=0 converges to a continuous process X, which is the unique
solution of the equation (6.2); they also imply that the sequence {X (k) }
k=1 and the solution
X satisfy moment growth conditions of the type
EkXt k2 C,T 1 + Ekk2 , 0 t T
for any real numbers 1 and T > 0, where C,T is a positive constant depending only
on , T and on the constant K of (6.10), (6.11).
6.5 Exercise: Let f : [0, T ] Rd R and polynomial growth condition
(6.15)
0tT
x Rd
for some C > 0, p 1, let k : [0, T ] Rd [0, ) be continuous, and suppose that the
Cauchy problem
(6.16)
V
+ At V + f = kV, in [0, T ) Rd
t
V (T, ) = g, in Rd
k(u,Xu )du
k(u,Xu )du
(6.17)
V (t, x) = E
e t
f (, X )d + g(XT )e t
t
X = x +
b(s, Xs )ds +
t
(s, Xs ) dWs ,
t T.
We are assuming here that the conditions (6.10), (6.11) are satisfied, and are using the
notation (6.7).
(Hint: Exploit (6.14) in conjunction with the growth conditions (6.15), to show that the
local martingale M V of Exercise (6.2) is actually a martingale.)
6.6 Important Remark: For the equation
Z t
(6.19)
Xt = +
b(s, Xs )ds + Wt ,
o
25
0tT
of the form (6.2) with = Id , the Girsanov Theorem 5.5 provides a solution for drift
functions b(t, x) : [0, T ] Rd Rd which are only bounded and measurable.
Indeed, start by considering a Brownian motion B, and an independent random vari4
able with distribution F , on a probability space (, F, P0 ). Define Xt = + Bt , recall
the exponential martingale
t
Z
1
b(s, + Bs ) dBs
2
Zt = exp
0
t
2
kb(s, + Bs )k ds
0tT,
and define the probability measure P (d) = ZT ()P0 (d). According to Theorem 5.5, the
process
4
Wt = Bt
Z
b(s, + Bs ) ds = Xt
b(s, Xs ) ds ,
0tT
is Brownian motion on [0, T ] under P , and obviously the equation (6.19) is satisfied. We
also have for any 0 = t0 t1 ... tn t and any function f : Rn [0, ) :
(6.20)
Z
oi
nZ t
1 t
kb(s, + Bs )k2 ds
= E0 f ( + Bt1 , , + Btn ) exp
b(s, + Bs ) dBs
2 0
0
Z
Z
Z
h
n t
oi
1 t
=
E0 f (x + Bt1 , , x + Btn ) exp
b(s, x + Bs ) dBs
kb(s, x + Bs )k2 ds F (dx).
2 0
Rd
0
h
6.7 Remark: Our ability to compute these transition probabilities hinges on carrying
out the function-space integration in (6.21), not an easy task. In the one-dimensional case
with drift b() C 1 (R), we get
(6.22)
Z
1 t
pt (x; z) dz = exp{G(z) G(x)} E0 1{x+Bt dz} exp
V (x + Bs )ds
,
2 0
Rx
from (6.21), where V = b0 + b2 and G(x) = 0 b(u)du. In certain special cases, the
Feynman-Kac formula of (6.17) can help carry out the computation.
26
7.
FILTERING THEORY
Let us place ourselves now on a probability space (, F, P ), together with a filtration {Ft }
with respect to which all our processes will be adapted. In particular, we shall consider
two processes of interest:
(i) a signal process X = {Xt ; 0 t T }, which is not directly observable, and
(ii) an observation process Y = {Yt ; 0 t T }, whose value is available to us at any
time and which is suitably correlated with X (so that, by observing Y , we can say
something about the distribution of X).
For simplicity of exposition and notation, we shall take both X and Y to be onedimensional. The problem of Filtering can then be cast in the following terms: to compute
the conditional distribution
P [ Xt A | FtY ] ,
0tT
of the signal Xt at time t, given the observation record up to that time. Equivalently, to
compute the conditional expectations
(7.1)
0tT
Hs2 ds < .
Yt =
Hs ds + Wt ,
0 t T.
4
bt = E(t |FtY )
27
Nt = Yt
(7.5)
b s ds,
H
FtY ;
0 t T.
Z
(7.6)
b s )ds + Wt ,
(Hs H
Nt =
0
) Ns = E
hZ
s
Z
s
i
b u ) du + (Wt Ws ) F Y =
(Hu H
s
h
i
b u du FsY + E E Wt Ws Fs FsY = 0
{E(Hu |FuY ) H
7.3 Discussion: Since N is adapted to {FtY }, we have {FtN } {FtY }. For linear
systems, we also have {FtN } = {FtY }: the observations and the innovations carry the same
information, because in that case there is a causal and causally invertible transformation
that derives the innovations from the observations; cf. Remark 7.11. It has been a longstanding conjecture of T. Kailath, that this identity should hold in general. We know now
(Allinger & Mitter (1981)) that this is indeed the case if H and W are independent, and
that the identity {FtN } = {FtY } does not hold in general. However, the following positive
and extremely useful result holds.
7.4 Theorem: Every local martingale M with respect to the filtration {FtY } admits a
representation of the form
Z
(7.7)
M t = M0 +
s dNs ,
0tT
RT
where is measurable, adapted to {FtY } , and satisfies 0 2s ds < (w.p.1). If M
RT
happens to be a square integrable martingale, then can be chosen so that E 0 2s ds < .
Comment: The result would follow directly from Theorem 5.3, if only the innovations
conjecture {FtN } = {FtY } were true in general ! Since this is not the case, we are going
28
Z t
Z t
1
2
b
b
Hs dNs
Hs ds , FtY ,
Zt = exp
2 0
0
4
0tT
is a martingale (Remark 5.6); according to the Girsanov Theorem 5.5, the process Yt =
Rt
b s ) ds is Brownian motion under the new probability measure
Nt 0 (H
Pe(d) = Zt ()P (d).
Consider also the process
4
t =
(7.9)
Zt1
Z
Z t
1 t b2
b
H ds
= exp
Hs dNs +
2 0 s
0
Z t
Z t
1
b 2 ds ,
b s dYs
H
= exp
H
2 0 s
0
0 t T,
dP
= t
dPe F Y
as well as the stochastic integral equations (cf. (4.12)) satisfied by the exponential processes
of (7.8) and (7.9):
Z
(7.10)
Zt = 1
Z
b s dNs ,
Zs H
t = 1 +
b s dYs .
s H
0
E[QZt |FsY ]
Y
e
E(Q|F
)
=
,
s
Zs
(valid for every s < t and nonnegative, FtY measurable random variable Q), the fact
that M is a martingale under P implies that M is a martingale under Pe :
Y
e t Mt |FsY ] = E[t Mt Zt |Fs ] = s Ms ,
E[
Zs
29
Z
(7.12)
t Mt =
Z
s dYs =
b s ds) .
s (dNs + H
Now from (7.12), (7.10) and the integration by parts formula (4.14), we obtain:
Z
b s ds
Mt = (M )t Zt =
(M )s dZs +
Zs d(M )s
s Zs H
0
0
0
Z t
Z t
Z t
b
b
b s ds
=
s Ms Zs (Hs )dNs +
Zs s (dNs + Hs ds)
s Zs H
0
0
0
Z t
bt .
=
s dNs ,
where t = Zt t Mt H
0
In order to proceed further we shall need even more structure, this time on the signal
process X.
7.5 Signal Proces Model: We shall assume henceforth that the signal process X has
the following property: for every function f C02 (R) (twice continuously differentiable,
RT
with compact support), there exist {Ft } - adapted processes Gf , f with E 0 {|Gft | +
|tf |}dt < , such that
(7.13)
4
Mtf =
Z
f (Xt ) f (X0 )
(Gf )s ds, Ft ;
0tT
is a martingale with
(7.14)
hM , W it =
sf ds.
7.6 Discussion: Typically (Gf )t = (At f )(t, Xt ), where At is a second-order linear differential operator as in (6.7). Then the requirement (7.13) imposes the Markov property
on the signal process X (the famous martingale problem of Stroock & Varadhan (1969,
1979), which characterizes the Markov property in terms of martingales of the type (7.13)).
On the other hand, (7.14) is a statement about the correlation of the signal X with
the noise W in the observation model.
7.7 Example: Let X satisfy the one-dimensional stochastic integral equation
Z
(7.15)
Xt = X0 +
Z
b(Xs )ds +
(Xs )dBs
0
30
(7.16)
(Gf )t = Af (Xt ),
tf = (Xt )f 0 (Xt )
d
hB, W it .
dt
Let us try to discuss the significance and some of the consequences of Theorem 7.8,
before giving its proof.
7.9 Example: Suppose that the signal process X satisfies the stochastic equation (7.15)
with B independent of W and X0 , and
(7.19)
Ht = h(Xt ) ,
0 t T,
t (f ) = 0 (f ) +
{s (f h) s (f )s (h)} dNs
s (Af ) ds +
o
(7.21)
dpt (x) = A pt (x) dt + pt (x) h(x)
h(y)pt (y) dy dNt .
R
Notice that, if h 0 (i.e., if the observations consist of pure independent white noise),
(7.21) reduces to the Fokker-Planck equation (6.13).
You should not fail to notice that (7.21) is a formidable equation, since it has all of
the following features:
31
R
R
bs ds
C
is an {FtY } martingale.
Z
(7.22)
Gfs ds + Mtf ,
ft = f0 +
0
where M f is an {Ft } - local martingale; thus, in conjunction with Exercise 7.10 and
Theorem 7.4, we have
Z t
Z t
Y
b
b
d
(7.23)
ft f0
Gfs ds = ({Ft } local martingale) =
s dNs
0
Rt
for a suitable {FtY } adapted process with 0 2s ds < (w.p.1). The whole point is
to compute explicitly, namely, to show that
cf
bb
t = fd
t Ht ft Ht + t .
(7.24)
This will be accomplished by computing E[ft Yt |FtY ] = Yt fbt in two ways, and then comparing the results.
On the one hand, we have from (7.22), (7.3) and the integration-by-parts formula
(4.14) that
Z
ft Yt =
Z
fs (Hs ds + dWs ) +
0
t
Ys (Gfs .ds +
0
dMsf )
Z
+
sf ds
fs Hs + Ys .Gfs + sf ds + ({Ft } local martingale),
32
On the other hand, from (7.23), (7.5) and the integration-by-parts formula (9.14), we
obtain,
Z t
Z t
Z t
b
b
d
b
ft Yt =
fs (dNs + Hs ds) +
Ys Gfs ds + dNs +
s ds
0
0
0
(7.26)
Z t
b
d
b
=
fs Hs + Ys .Gfs + s ds + ({Ft } local martingale).
0
Comparing (7.25) with (7.26), and recalling that a continuous martingale of bounded
variation is constant (Corollary 2.8), we conclude that (7.24) holds.
7.9 Example (Contd): With h(x) = cx (linear observations) and f (x) = xk ; k = 1, 2,
we obtain from (7.20):
Z t
Z tn
o
d
c2 (X
bt = X
b0 +
bs )2 dNs ,
(7.27)
X
b(Xs )ds + c
X
s
0
Z t
k1 2
d
d
k2
k1
(Xs )Xs + b(Xs )Xs
ds
2
0
Z tn
o
k+1
ck dNs ; k = 2, 3, .
bs X
+c
Xd
X
s
s
ck = X
ck + k
X
t
0
(7.28)
The equations (7.27), (7.28) convey the basic difficulty of nonlinear filtering: in order
to solve the equation for the k th conditional moment, one needs to know the (k + 1)st
conditional moment (as well as (f ) for f (x) = xk1 b(x), f (x) = xk2 2 (x), etcetera). In
other words, the computation of conditional moments cannot be done by induction (on k)
and the problem is inherently infinite dimensional, except in the linear case !
7.11 The Linear Case, when b(x) = ax and (x) 1:
(7.29)
The problem then becomes, to find an algorithm (preferably recursive) for computing the
t , Vt from their initial values X
b0 = , V0 = v.
sufficient statistics X
From (7.27), (7.28) with k = 2 we obtain
(7.31)
bt = aX
bt dt + cVt dNt ,
dX
(7.32)
c2 = 1 + 2aX
c2 dt + c X
c3 X
c2 dN .
b
dX
X
t t
t
t
t
t
V t = 1 + 2aVt c2 Vt2 ,
V0 = v.
(7.31)
bt = aX
bt dt + c Vt dNt
dX
bt dt + cVt dYt ,
= (a c2 Vt )X
b0 = .
X
X0 N (, v)
Rt
and hW (i) , B (j) it = 0 ij (s) ds, for suitable deterministic matrix-valued functions A() ,
H() , b() and vector-valued function a(). The joint law of the pair (Xt , Yt ) is multivariate
normal, and thus the conditional distribution of Xt given FtY is again multivariate normal,
t and non-random variance - covariance matrix V (t). In the special
with mean vector X
case () = a() = 0, V () satisfies the matrix Riccati equation
V (t) = A(t)V (t) + V (t)AT (t) V (t)H T (t)H(t)V (t) + b(t)bT (t),
V (0) = v
(which, unlike its scalar counterpart (7.33), does not admit in general an exacit solution),
is then obtained as the solution of the Kalman-Bucy filter equation
and X
bt = A(t)X
bt dt + V (t)H T (t)[dYt H(t)X
bt dt],
dX
b0 = .
X
7.12 Remark: It is not hard to see that the innovations conjecture {FtN } = {FtY }
holds for linear systems.
b of the equation (7.31) is
Indeed, it follows from Theorem 6.4 that the solution X
b } {F N }.
adapted to the filtration {FtN } of the driving Brownian motion N , i.e., {FtX
t
Rt
bs ds it develops that Y is adapted to {FtN }, i.e., {FtY } {FtN }.
From Yt = Nt c 0 X
Because the reverse inclusion holds anyway, the two filtrations are the same.
7.13 Remark: For the signal and observation model
Z t
Xt = +
b(Xs )ds + Wt
0
(7.32)
Z t
Yt =
h(Xs )ds + Bt
0
with b C 1 (R) and h C 2 (R), which is a special case of Example 7.9, we have from
Remark 6.6 and the Bayes rule:
t (f ) = E[f (Xt )|FtY ] =
(7.33)
E0 [f ( + wt )t |FtY ]
.
E0 [t |FtY ]
Here
Z
1 t 2
2
t = exp
b( + ws )dws +
h( + ws ) dYs
[b ( + ws ) + h ( + ws )]ds
2 0
0
0
Z t
n
= exp G( + wt ) G() + Yt h( + wt )
Ys h0 ( + ws ) dws
0
Z t
o
00
1
0
2
2
(b + b + h + Ys h )( + ws )ds ,
2 0
Rx
2
P0 (d) = 1
T ()P (d), G(x) = 0 b(u) du . Here (w, Y ) is, under P0 , an R valued
Brownian motion, independent of the random variable .
4
Z
35
For every continuous f : Rd [0, ) and y : [0, T ] R, let us define the quantity
(7.34)
h
n
4
t (f ; y) = E0 f ( + wt ). exp G( + wt ) G() + y(t)h( + wt )
Z t
Z
oi
00
1 t 0
0
y(s)h ( + ws )dws
b + b2 + h2 + y(s)h ( + ws )ds
2 0
0
Z
=
h
n
E0 f (x + wt ). exp G(x + wt ) G(x) + y(t)h(x + wt )
Rd
Z
Z t
oi
1 t 0
2
2
00
0
y(s)h (x + ws )dws
b + b + h + y(s)h )(x + ws ds F (dx),
2 0
0
where F is the distribution of the random variable . Then (7.33) takes the form
(7.35)
t (f ; y)
.
t (f ) =
t (1; y) y=Y ()
The formula (7.34) simplifies considerably if h() is linear, say h(x) = x; then
(7.36)
Z
Z t
h
n
t (f ; y) =
E0 f (x + wt ). exp G(x + wt ) G(x) + y(t)(x + wt )
y(s)dws
Rd
0
Z
oi
1 t
V (x + ws )ds F (dx),
2 0
where
(7.37)
b0 (x) + b2 (x) = x2 + x + ;
> 1,
then the famous result of Benes (1981) shows that the integration in (7.36) can be carried out explicitly, and leads in (7.35) to a distribution with a finite number of sufficient
statistics; these latter obey recursive schemes (filters).
Notice that (7.38) is satisfied by linear functions b(), but also by genuinely nonlinear
ones like b(x) = tanh(x).
36
8. ROBUST FILTERING
In this section we shall place ourselves in the context of Exanple 7.9 (in particular, of
the filtering model consisting of (7.3), (7.15), (7.19) and hB, W i = 0 ), and shall try to
simplify in several regards the equations (7.20), (7.21) for the conditional distribution of
Xt , given FtY = {Ys ; 0 s t}.
We start by recalling the probability measure Pe in the proof of Theorem 7.4, and
the notation introduced there. From the Bayes rule (7.11) (with the r
oles of P and P
interchanged, and with playing the r
ole of Z):
t (f ) = E[f (Xt )|FtY ] =
(8.1)
e (Xt )t |FtY ]
E[f
t (f )
=
t
t (1)
where
4
e (Xt )t |FtY ].
t (f ) = E[f
(8.2)
t (f ) = 0 (f ) +
s (Af )ds +
0
s (f h) dYs .
0
f (x)qt (x)dx
R
q (x)dx,
R t
To make matters even more impressive, (8.4) can be written equivalently as a nonstochastic second-order partial differential equation of the parabolic type, with the randomness (the observation Yt () at time t) appearing only parametrically in the coefficients.
We shall call this a ROBUST (or pathwise) form of the filtering equation, and will outline
the clever method of B.L. Rozovskii, that leads to it.
The idea is to introduce the function
(8.5)
0 t T,
x R.
(8.6)
In our case, we have
A f (x) =
1 2
f (x)
(x)
[b(x)f (x)],
2 x
x
x
1
2
x S
(8.8)
S
4
with (Q f )(x) =
qyx f (y). On the other hand, the analogue of (8.4) for the unnormal
e 1{X =x} t | FtY is
ized probability mass function qt (x) = E
t
yS
(8.9)
4
and zt (x) = qt (x) exp{h(x)Yt } satisfies again the analogue of equation (8.6), namely:
h
i
1
h(x)Yt
h()Yt
zt (x) = e
Q zt ()e
(x) h2 (x)zt (x)
t
2
X
(8.10)
1
=
qyx zt (y)e[h(y)h(x)]Yt h2 (x)zt (x) , x S .
2
yS
9. STOCHASTIC CONTROL
Let us consider the following stochastic integral equation
Z
Z
(9.1)
X = x +
b(s, Xs , Us )ds +
(s, Xs , Us )dWs ,
t
t T.
This is a controlled version of the equation (6.18), the process U being the element
of control. More precisely, let us suppose throughout this section that the real-valued
functions b = {bi }1id , = {ij } 1id are defined on [0, T ] Rd A (where the control
1jn
space A is a compact subset of some Euclidean space) and are bounded, continuous, with
bounded and continuous derivatives of first and second order in the argument x.
9.1 Definition: An admissible system U consists of
(i) a probability space (, F, P ), {Ft }, and on it
(ii) an adapted, Rn valued Brownian motion W and
(iii) a measurable, adapted, A - valued process U (the control process).
Thanks to our conditions on the coefficients b and , the equation (9.1) has a unique
(adapted, continuous) solution X for every admissible system U. We shall call occasionally
X the state process corresponding to this system.
39
Now consider two other bounded and continuous functions, namely f : [0, T ] Rd
A R which plays the r
ole of a running cost on both the state and the control, and g :
d
R R which is a terminal cost on the state. We assume that both f, g are of class C 2
in the spatial argument. Thus, corresponding to every admissible system U, we have an
associated expected total cost
4
J(t, x; U) = E
(9.2)
hZ
i
f (, X , U )d + g(XT ) .
The control problem is to minimize this expected cost over all admissible systems U, to
study the value function
4
(9.3)
(which can be shown to be measurable), and to find - optimal admissible systems (or
even optimal ones, whenever these exist).
9.2 Definition: An admissible system is called
(i) optimal for some given > 0, if
Q(t, x) J(t, x; U) Q(t, x) + ;
(9.4)
tT
has a solution X on some probability space (, F, P ), {Ft } and with respect to some
Brownian motion W on this space.
Remark: This is the case, for instance, if is continuous, or if the diffusion matrix
(9.5)
a(t, x, u) T ||||2 ;
(t, x, u) [0, T ] Rd A
for some > 0; see Karatzas & Shreve (1987), Stroock & Varadhan (1979) or Krylov
(1973).
40
Quite obviously, to every feedback control law corresponds an admissible system with
Ut (t, Xt ); it makes then sense to talk about - optimal or optimal feedback laws.
The constant control law Ut u, for some u A, has the associated expected cost
u
J (t, x) J(t, x; u) = E
hZ
i
f (, X )d + g(XT )
u
t
4
(9.7)
in [0, T ) Rd
in Rd
Aut
X
1 XX
2
aij (t, x, u)
+
bi (t, x, u)
=
2 i=1 j=1
xi xj
xi
i=1
(cf. Exercise 6.5). Since Q is obtained from J(, ; U) by minimization, it is natural to ask
whether Q satisfies the minimized version of (9.7), that is, the HJB (Hamilton-JacobiBellman) equation:
(9.8)
Q
+ inf [Aut Q + f u ] = 0, in [0, T ) Rd ,
uA
t
Q(T, ) = g, in Rd .
We shall see that this is indeed the case, provided that (9.8) is interpreted in a suitably
weak sense.
9.4 Remark: Notice that (9.8) is, in general, a strongly nonlinear and degenerate secondorder equation. With the notation D = {/xi }1id for the gradient, and D2 =
{ 2 /xi xj }1id for the Hessian, and with
(9.9)
F (t, x, , M ) = inf
uA
d X
d
h1X
i=1 j=1
d
X
i=1
Q
+ F (t, x, DQ, D2 Q) = 0.
t
We call this equation strongly nonlinear, because the nonlinearity F acts on both the
gradient DQ and the higher-order derivatives D2 Q.
41
On the other hand, if the diffusion coefficients do not depend on the control variable
u, i.e., if we have aij (t, x, u) aij (t, x), then (9.10) is transformed into the semilinear
equation
d
(9.11)
Q 1 X X
2
+
aij (t, x)Dij
Q + H(t, x, DQ) = 0,
t
2 i=1 j=1
(9.12)
uA
acts only on the gradient DQ, and the higher-order derivatives enter linearly. For this
reason, (9.11) is in principle a much easier equation to study than (9.10).
Let us quit the HJB equation for a while, and concentrate on the fundamental characterization of the value function Q via the so-called Principle of Dynamic Programming
of R. Bellman (1957):
hZ t+h
i
(9.13)
Q(t, x) = inf E
f (, X , U )d + Q(t + h, Xt+h )
U
for 0 h T t. This says roughly the following: Suppose that you do not know the
optimal expected cost at time t, but you do know how well you can do at some later time
t + h; then, in order to solve the optimization problem at time t, compute the expected
cost associated with the policy of
applying the control U during (t, t + h), and behaving optimally from t + h onward,
and then minimize over U.
9.5 Theorem: Principle of Dynamic Programning.
(i) For every stopping time of {Ft } with values in the interval [t, T ], we have
hZ
i
(9.14)
Q(t, x) = inf E
f (, X , U )d + Q(, X ) .
U
The technicalities of the Proof of (9.14) are awesome (and will not be produced here),
but the basic idea is fairly simple: take an - optimal admissible system U for (t, x); it is
clear that we should have
hZ T
i
E
f (, X , U )d + g(XT ) F Q(, X ), w.p. 1
42
hZ
f (, X , U )d + E
nZ
oi
f (, X , U )d + g(XT ) F
hZ
i
E
f (, X , U )d + Q(, X )
t
hZ
i
inf E
f (, X , U )d + Q(, X ) .
U
In order to obtain an inequality in the reverse direction, consider an arbitrary admissible system U and an admissible system U , which is optimal at (, X ), i.e.,
E
hZ
f (, X, , U, )d
g(XT, )
i
F Q(, X ) + .
U
;
t
e =
and the assoConsidering the composite control process U
U, ; < T
ciated admissible system Ue (there is a lot of hand-waving here, because the two systems
may not be defined on the same probability space), we have
Q(t, x) E
hZ
i
hZ
e
e
e
f (, X , U )d + g(XT ) = E
f (, X , U )d +
f (, X, , U, )d
g(XT, )
hZ
i
f (, X , U )d + Q(, X ) + .
Taking the infimum on the right-hand side over U, and noting the arbitrariness of > 0,
we arrive at the desired inequality.
On the other hand, the Proof of (i) (ii) is straightforward; for an arbitrary U,
and stopping times with values in [t, T ], the extension
hZ
i
Q(, X ) inf E
f (, X , U )d + Q(, X ) | F
U
of (9.14) gives E MU E MU , and this leads to the submartingale property.
If M U is a martingale, then obviously
Q(t, x) = E
M0U
=E
MTU
=E
hZ
t
43
i
f (, X , U )d + g(XT )
(9.8)
t+h
Q
+ AUs Q (x, Xs )ds + martingale ;
s
t+h n
o
Q
(s, Xs ) ds = 0 ,
s
(9.16)
Z
t
t+h
Q
+ AUs Q + f Us (s, x)ds 0.
h0
s
u =
Q
(t, x) + Au Q(t, x) + f u (t, x) 0 ,
t
for every u A ,
P
+ inf [Au P + f u ] = 0,
uA
t
P (T, ) = g,
44
in
[0, T ) Rd
in
Rd .
Then P Q.
On the other hand, let u (t, x, , M ) : [0, T ] Rd Rd S d A be a measurable
function that achieves the infimum in (9.9), and introduce the feedback control law
(9.18)
(t, x) = u
t, x, DP (t, x), D P (t, x) : [0, T ] Rd A.
2
If this function is continuous, or if the condition (9.6) holds, then is an optimal feedback
law.
Proof: We shall discuss only the second statement (and establish the identity P = Q only
in its context; for the general case, cf. Safonov (1977) or Lions (1983.a)). For an arbitrary
admissible system U, we have under the assumption P Cb1,2 ([0, T ) Rd ) by the chain
rule (4.6):
Z
g(XT ) P (t, x) =
Tn
(9.19)
d
X
P (, X ) +
bi (, X , U )
P (, X ) +
t
x
i
i=1
d
d
o
1 XX
2
aij (, X , U )
P (, X ) d + (MT Mt )
+
2 i=1 j=1
xi xj
Z T
f (, X , U )d + (MT Mt )
t
n (|x| t)2 o
t
(|x| + t 1) exp
2
2t
n
o
1
|x| t
2
+ (|x| t) + t
.
2
t
h
1
|x| + t i
2|x|
+e
|x| + t
1
2
t
1
Q(t, x) = +
2
Rz
2
where (z) = 12 ex /2 dx, and satisfies: (t, x) = sgn Qx (t, x) = sgn(x) ;
cf. Karatzas & Shreve (1987), section 6.6.
9.10 Example: The one-dimensional linear regulator. Consider now the case with
d = 1, A = R and b(t, x, u) = a(t)x + u, (t, x, u) = (t), f (t, x, u) = c(t)x2 + 21 u2 ,
g 0, where a, , c are bounded and continuous functions on [0, T ]. Certainly the assumptions of this section are violated rather grossly, but formally at least the function of (9.12)
takes the form
1
H(t, x, ) = a(t)x + c(t)x2 + min u + u2
uR
2
1
= a(t)x + c(t)x2 2 ,
2
the minimization is achieved by u (t, x, ) = , and (9.18) becomes a (t, x) =
u (t, x, Qx (t, x)) = Qx (t, x). Here Q is the solution of the HJB (semilinear parabolic,
possibly degenerate) equation
(9.20)
1
1
Qt + 2 (t)Qxx + a(t)xQx + c(t)x2 Q2x = 0
2
2
Q(T, ) = 0.
Q(t, x) = A(t)x +
A(s) 2 (s)ds
solves the equation (9.20), provided that A() is the solution of the Riccati equation
+ 2a(t)A(t) 2A2 (t) + c(t) = 0
A(t)
A(T ) = 0.
The eminently reasonable conjecture now is that the admissible system U with
Ut = Qx (t, Xt ) = 2A(t)Xt ,
dXt = [a(t) 2A(t)]Xt dt + (t)dWt ,
46
is optimal.
9.11 Exercise: In the context of Example 9.10, show that J(t, x; U ) = Q(t, x)
RT
J(t, x; U)d < holds for any admissble system for which E t U2 d < .
(Hint: It suffices to show that
Z
Q(, X ) +
t
1 2
2
c(s)Xs + Us ds ,
2
tT
is a submartingale for any admissible U with the above property, and is a martingale for
U ; to this end, you will need to establish that holds for every such U.)
The trouble with the Verification Theorem 9.7 is that it assumes a lot of smoothness
on the part of the value function Q. The smoothness requirement Q C 1,2 was satisfied in
both Examples 9.9 and 9.10 and, more generally, is satisfied by solutions of the semi-linear
parabolic equation (9.11) under nondegeneracy assumptions on the diffusion matrix a(t, x)
and reasonable smoothness conditions on the nonlinear function H(t, x, ); cf. Chapter
VI in Fleming & Rishel (1975). But in general, this will not be the case. In fact, as one
can see quite easily on deterministic examples, the value function can fail to be even once
continuously differentiable in the spatial variable; on the other hand, the fully nonlinear
(and possibly degenerate, since we allow for 0) equation (9.10) may even fail to have
a solution!
All these remarks make plain the need for a new, weak notion of solutions for fully
nonlinear, second-order equations like (9.10), that will be met by the value function Q(t, x)
of the control problem. Such a concept was developed by P.L. Lions (1983.a,b,c) under the
rubric of viscosity solution, following up on the work of that author with M.G. Crandall
on first-order equations. We shall sketch the general outlines of this theory, but drop the
term viscosity solution in favor of the more intelligible one weak solution.
Thus, let us consider a continuous function
F (t, x, u, , M ) : [0, T ] Rd R Rd S d R
which satisfies the analogue
(9.22)
A B F (t, x, u, , A) F (t, x, u, , B)
u
+ F (t, x, u, Du, D2 u) = 0
t
47
that requires only continuity (and no differentiablility whatsoever) on the part of the
solution u(t, x).
9.12 Definition: A continuous function u : [0, T ] Rd R is called a
(i) weak supersolution of (9.23), if for every C 1,2 ((0, T ) Rd ) we have
(9.24)
u
(t0 , x0 ) =
(t0 , x0 ) ,
t
t
0=
In other words, (9.24) is satisfied and thus u is a weak supersolution; similarly for (9.25).
The new concept relates well to the notion of weak solvability in the Sobolev sense.
In particular, we have the following result (cf. Lions (1983.c):
1,2,p
9.15 Theorem: (i) Let u Wloc
(p > d + 1) be a weak solution of (9.23); then
(9.26)
u
(t, x) + F t, x, u(t, x), Du(t, x), D2 u(t, x) = 0
t
48
On the other hand, stability results for this new notion are almost trivial consequences
of the definition.
9.16 Proposition: Let {Fn }
n=1 be a sequence of continuous functions on [0, T ]
2d+1
d
R
S , and {un }n=1 a sequence of corresponding weak solutions of
un
+ Fn t, x, un , Dun , D2 un = 0,
t
n 1.
Suppose that these sequences converge to the continuous functions F and u, respectively,
uniformly on compact subsets of their respective domains. Then u is a weak solution of
u
+ F t, x, u, Du, D2 u = 0.
t
Proof: Let C 1,2 ([0, T ) Rd ), and let u have strict local maximum at (t0 , x0 ) in
(0, T ) Rd ; recall Remark 9.13. Suppose that > 0 is small enough, so that we have
(u )(t0 , x0 ) > maxB((t0 ,x0 ),) (u )(t, x); then for n (= n() , as 0) large
enough, we have by continuity
max(un ) >
max
B((t0 ,x0 ),)
(un ),
Now from
un (t , x ) + Fn t , x , un (t , x ), Dun (t , x ), D2 un (t , x ) 0,
t
we let 0, to obtain (observing that (t , x ) (t0 , x0 ), un (t , x ) u(t0 , x0 ),
Dj (t , x ) Dj (t0 , x0 ) for j = 1, 2, because C 1,2 , and recalling that Fn
converges to F uniformly on compact sets):
u
t0 , x0 ) + F t0 , x0 , u(t0 , x0 ), D(t0 , x0 ), D2 (t0 , x0 ) 0.
t
It follows that u is a weak supersolution; similarly for the weak subsolution property.
49
Finally, here is the result that connects the concept of weak solutions with the control
problem of this section.
9.17 Theorem: If the value function Q of (9.3) is continuous on the strip [0, T ] Rd ,
then Q is a weak solution of
Q
+ F (t, x, DQ, D2 Q) = 0
t
(9.27)
in the notation of (9.9).
inf E
U
hZ
t0 +h
hZ
t0 +h
i
f (, X , U )d + Q(t0 + h, Xt0 +h )
t0
i
f (, X , U )d + (t0 + h, Xt0 +h ) .
t0
But now the argument used in the proof of Proposition 9.6 (applied this time to the smooth
test-function C 1,2 ) yields:
2
(t0 , x0 ) + F t0 , x0 , Q(t0 , x0 , Q(t0 , x0 ), D(t0 , x0 ), D (t0 , x0 ) .
=
t
Thus Q is a weak supersolution of (9.27), and its weak subsolution property is proved
similarly.
We shall close this section with an example of a stochastic control problem arising in
financial economics.
9.18 Consumption/Investment Optimization: Let us consider a financial market
with d + 1 assets; one of them is a risk-free asset called bond with interest rate r (and price
B0 (t) = ert ), and the remaining are risky stocks, with prices-per-share Si (t) given by
d
h
i
X
dSi (t) = Si (t) bi dt +
ij dWj (t) ,
1 i d.
j=1
Here W = (W1 , ..., Wd )T is an Rd - valued Brownian motion which models the uncertainty
in the market, b = (b1 , ..., bd )T is the vector of appreciation rates, and = {ij }1i,jd
is the volatility matrix for the stocks. We assume that both , T are invertible.
50
It is worthwhile to notice that the discounted stock prices Si (t) = ert Si (t) satisfy
the equations
d
X
fj (t),
dSei (t) = Sei (t).
ij dW
1 i d,
j=1
i=1
Now if X(t) denotes the investors wealth at time t, the amount X(t)
invested in the bond, and thus X() satisfies the equation
(9.28)
d
d
d
h
i
X
X
X
dX(t) =
i (t) bi dt +
ij dWj (t) + X(t)
i (t) rdt c(t)dt
i=1
j=1
= [rX(t) c(t)]dt +
Pd
i=1
i (t) is
i=1
d
X
d
h
i
X
i (t) (bi r)dt +
ij dWj (t)
i=1
j=1
f (t).
= [rX(t) c(t)dt + T (t)[(b r1)dt + dW (t)] = [rX(t) c(t)]dt + T (t)dW
In other words,
(9.29)
ru
Z
X(u) = x
u
rs
Z
c(s)ds +
f (s);
ers T (s)dW
0uT.
The class A(x) of admissible control process pairs (c, ) consists of those pairs for which
the corresponding wealth process X of (9.29) remains nonnegative on [0, T ] (i.e., X(u)
0, 0 u T ) w.p.1.
It is not hard to see that for every (c, ) A(x), we have
Z T
e
(9.30)
E
ert c(t)dt x .
0
51
Conversely, for every consumption process c which satisfies (9.30), it can be shown that
there exists a portfolio process such that (c, ) A(x).
(Exercise: Try to work this out ! The converse statement hinges on the fact that every
f , thanks to
Pemartingale can be represented as a stochastic integral with respect to W
the representation Theorems 5.3 and 7.4.)
The control problem now is to maximize the expected discounted utility from consumption
Z T
4
J(x; c, ) = E
et U (c(t)) dt
0
(9.29)
ru
rt
X(u) = xe
u
rs
Z
c(s)ds +
f (s);
ers T (s)dW
tuT
Q(t, x) =
sup
(c,) A(t,x)
es U (c(s)) ds ;
0 t T,
x (0, )
of the resulting control problem. By analogy with (9.8), and in conjunction with the
equation (9.28) for the wealth process, we expect this value function to satisfy the HJB
equation
(9.32)
h1
i
Q
+ max
|| ||2 Qxx + {(rx c) + T (b r1)}Qx + et U (c) = 0
Rd
t
2
c[0,)
0 t T.
Q
||||2 Q2x
+ et U I(et Qx ) Qx I(et Qx )
+ rxQx = 0 .
t
2 Qxx
52
This is a strongly nonlinear equation, unlike the ones appearing in Examples 9.9 and 9.10.
Nevertheless, it has a classical solution which, quite remarkably, can be written down in
closed form for very general utility functions U ; cf. Karatzas, Lehoczky & Shreve (1987),
section 7, or Karatzas (1989), 9 for the details.
For instance, in the special case U (c) = c with 0 < < 1, the solution of (9.34) is
Q(t, x) = et (p(t))1 x ,
with
p(t) =
ek(T t) ] ; k 6= 0
T t
; k=0
1
k [1
,
||||2
1
k=
r
1
2(1 )
x
,
p(t)
(t, x) = ( T )1
x
,
1
10. NOTES:
Section 1 & 2: The material here is standard; see, for instance, Karatzas & Shreve
(1987), Chapter 1 (for the general theory of martingales, and the associated concepts of
filtrations and stopping times) and Chapter 2 (for the construction and the fundamental
properties of Brownian motion, as well as for an introduction to Markov processes).
The term martingale was introduced in probability theory (from gambling!) by
J. Ville (1939), although the concept was invented several years earlier by P. Levy in
an attempt to extend the basic theorems of probability from independent to dependent
random variables. The fundamental theory for processes of this type was developed by
Doob (1953).
Sections 3 & 4: The construction and the study of stochastic integrals started with a seminal series of articles by K. It
o (1942, 1944, 1946, 1951) for Brownian motion, and continued
with Kunita & Watanabe (1967) for continuous local martingales, and with Doleans-Dade
& Meyer (1971) for general local martingales. This theory culminated with the course of
Meyer (1976), and can be studied in the monographs by Liptser & Shiryaev(1977), Ikeda
& Watanabe (1981), Elliott (1982), Karatzas & Shreve (1987), and Rogers & Williams
(1987).
Theorem 4.3 was established by P. Levy (1948), but the wonderfully simple proof that
you see here is due to Kunita & Watanabe (1967). Condition (4.9) is due to Novikov(1972).
53
Section 5: Theorem 5.1 is due to Dambis (1965) and Dubins & Schwarz (1965), while
the result of Exercise 5.2 is due to Doob (1953). For complete proofs of the results in this
section, see 3.4, 3.5 in Karatzas & Shreve (1987).
Section 6: The field of stochastic differential equations is now vast, both in theory and
in applications. For a systematic account of the basic results, and for some applications,
cf. Chapter 5 in Karatzas & Shreve (1987). More advanced and/or specialized treatments
appear in Stroock & Varadhan (1979), Ikeda & Watanabe (1981), Rogers & Williams
(1987), Friedman (1975/76).
The fundamental Theorem 6.4 is due to K. It
o (1946, 1951). Martingales of the type
M of Exercise 6.2 play a central r
ole in the modern theory of Markov processes, as was
discovered by Stroock & Varadhan (1969, 1979). See also 5.4 in Karatzas & Shreve
(1987), and the monograph by Ethier & Kurtz (1986).
f
For the kinematics and dynamics of random motions with general drift and diffusion
coefficients, including elaborated versions of the representations (6.8) and (6.9), see the
monograph by Nelson (1967).
Section 7: A systematic account of filtering theory appears in the monograph by Kallianpur (1980), and a rich collection of interesting papers can be found in the volume edited
by Hazewinkel & Willems (1981). The fundamental Theorems 7.4, 7.8 as well as the
equations (7.20), are due to Fujisaki, Kallianpur & Kunita (1972); the equation (7.21) for
the conditional density was discovered by Kushner (1967), whereas (7.31), (7.33) constitute
the ubiquitious Kalman & Bucy (1961) filter. Proposition 7.2 is due to Kailath (1971), who
also introduced the innovations approach to the study of filtering; see Kailath (1968),
Frost & Kailath (1971).
We have followed Rogers & Williams (1987) in the derivation of the filtering equations.
Section 8: The equations (8.3), (8.4) for the unnormalized conditional density are due to
Zakai (1969). For further work on the robust equations of the type (8.6), (8.7), (8.10)
see Davis (1980, 1981, 1982), Pardoux (1979), and the articles in the volume edited by
Hazewinkel & Willems (1981).
Section 9: For the general theory of stochastic control, see Fleming & Rishel (1975),
Bensoussan & Lions (1978), Krylov (1980) and Bensoussan (1982). The notion of weak
solutions, as in Definition 9.12, is due to P.L. Lions (1983). For a very general treatment
of optimization problems arising in financial economics, see Karatzas, Lehoczky & Shreve
(1987) and the survey paper by Karatzas (1989).
54
11. REFERENCES
ALLINGER, D. & MITTER, S.K. (1981) New results on the innovations problem for
non-linear filtering. Stochastics 4, 339-348.
DOLEANS-DADE,
C. & MEYER, P.A. (1971) Integrales stochastiques par rapport aux
martingales locales. Seminaire de Probabilites IV, Lecture Notes in Mathematics 124,
77-107.
DOOB, J.L. (1953) Stochastic Processes. J. Wiley & Sons, New York.
DUBINS, L. & SCHWARZ, G. (1965) On continuous martingales. Proc. Natl Acad. Sci.
USA 53, 913-916.
EINSTEIN, A. (1905) Theory of the Brownian Movement. Ann. Physik. 17.
ELLIOTT, R.J. (1982) Stochastic Calculus and Applications. Springer-Verlag, New York.
ETHIER, S.N. & KURTZ, T.G. (1986) Markov Processes: Characterization and Convergence. J. Wiley & Sons, New York.
FLEMING, W.H. & RISHEL, R.W. (1975) Deterministic and Stochastic Optimal Control.
Springer-Verlag, New York.
FRIEDMAN, A. (1975/76) Stochastic Differential Equations and Applications (2 volumes).
Academic Press, New York.
FROST, P.A. & KAILATH, T. (1971) An innovations approach to least-squares estimation,
Part III. IEEE Trans. Autom. Control 16, 217-226.
55
56
LEVY,
P. (1948) Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris.
LIONS, P.L. (1983.a) On the Hamilton Jacobi-Bellman equations. Acta Applicandae Mathematicae 1, 17-41.
LIONS, P.L. (1983.b) Optimal control of diffusion processes and Hamilton-Jacobi-Bellman
equations, Part I: The Dynamic Programming principle and applications. Comm.
Partial Diff. Equations 8, 1101-1174.
LIONS, P.L. (1983.c) Optimal control of diffusion processes and Hamilton-Jacobi-Bellman
equations, Part II: Viscosity solutions and uniqueness. Comm. Partial Diff. Equations
8, 1229-1276.
LIPTSER, R.S. & SHIRYAEV, A.N. (1977) Statistics of Random Processes, Vol. I: General Theory. Springer-Verlag, New York.
MEYER, P.A. (1976) Un cours sur les integrales stochastiques. Seminaire de Probabilites
X, Lecture Notes in Mathematics 511, 245-400.
NELSON, E. (1967) Dynamical Theories of Brownian Motion. Princeton University Press.
NOVIKOV, A.A. (1972) On moment inequalities for stochastic integrals. Theory Probab.
Appl. 16, 538-541.
PARDOUX, E. (1979) Stochastic partial differential equations and filtering of diffusion
processes. Stochastics 3, 127-167.
ROGERS, L.C.G. & WILLIAMS, D. (1987) Diffusions, Markov Processes, and Martingales, Vol. 2: It
o Calculus. J. Wiley & Sons, New York.
SAFONOV, M.V. (1977) On the Dirichlet problem for Bellmans equation in a plane
domain, I. Math. USSR (Sbornik) 31, 231-248.
STROOCK, D.W. & VARADHAN, S.R.S. (1969) Diffusion processes with continuous
coefficients, I & II. Comm. Pure Appl. Math. 22, 345-400 & 479-530.
STROOCK, D.W. & VARADHAN, S.R.S. (1979) Multidimensional Diffusion Processes.
Springer-Verlag, Berlin.
57