Professional Documents
Culture Documents
SIAM J. M A T H . A N A L .
Vol. 29, No. 1, pp. 117, January 1998
001
1
(k1)
2L 2 (Rn ) + h2
||2 dx
2
Rn
over an appropriate class of densities . Here, h is the time step size. On the other
hand, we derive as a special case of our results below that the scheme
1
(k1)
2
(1)
, ) + h
log dx
2 d(
Rn
over all K ,
where K is the set of all probability densities on Rn having finite second moments,
is also a discretization of the diusion equation when d is the Wasserstein metric.
In particular, this allows us to regard the diusion equation as a steepest descent
of the functional Rn log dx with respect to the Wasserstein metric. This reveals
a novel link between the diusion equation and the GibbsBoltzmann entropy
( Rn log dx) of the density . Furthermore, this formulation allows us to attach a precise interpretation to the conventional notion that diusion arises from the
tendency of the system to maximize entropy.
The connection between the Wasserstein metric and dynamical problems involving
dissipation or diusion (such as strongly overdamped fluid flow or nonlinear diusion
equations) seems to have first been recognized by Otto in [19]. The results in [19]
together with our recent research on variational principles of entropy and free energy
type for measures [12, 11, 15] provide the impetus for the present investigation. The
work in [12] was motivated by the desire to model and characterize metastability
(t, x) 0 for almost every (t, x) (0, ) Rn , and Rn (t, x) dx = 1 for almost
every t (0, ).
It is well known that the FokkerPlanck equation (2) is inherently related to the
Ito stochastic dierential equation [7, 22, 23]
(3)
dX(t) = (X(t))dt + 2 1 dW (t) , X(0) = X 0 .
s (x) = Z 1 exp((x)),
exp((x)) dx.
Z=
Rn
Note that, in order for equation (4) to make sense, must grow rapidly enough to
ensure that Z is finite. The probability measure s (x) dx, when it exists, is the unique
invariant measure for the Markov process X(t) defined by the stochastic dierential
equation (3).
It is readily verified (see, for example, [12]) that the Gibbs distribution s satisfies
a variational principleit minimizes over all probability densities on Rn the free
energy functional
(5)
F () = E() + 1 S(),
where
(6)
E() :=
dx
Rn
(7)
log dx
S() :=
Rn
2
(8)
inf
|x y|2 p(dxdy),
d(1 , 2 ) =
pP( 1 , 2 )
Rn Rn
where P(1 , 2 ) is the set of all probability measures on Rn Rn with first marginal
1 and second marginal 2 , and the symbol | | denotes the usual Euclidean norm
on Rn . More precisely, a probability measure p is in P(1 , 2 ) if and only if for each
Borel subset A Rn there holds
p(A Rn ) = 1 (A), p(Rn A) = 2 (A).
Wasserstein distances of order q with q dierent from 2 may be analogously defined
[10]. Since no confusion should arise in doing so, we shall refer to d in what follows
as simply the Wasserstein distance.
It is well known that d defines
on the set of probability measures on Rn
a metric
2
having finite second moments: Rn |x| (dx) < [10, 21]. In particular, d satisfies
the triangle inequality on this set. That is, if 1 , 2 , and 3 are probability measures
on Rn with finite second moments, then
(9)
where E(U ) denotes the expectation of the random variable U , and the infimum is
taken over all random variables X and Y such that X has distribution 1 and Y
has distribution 2 . In other words, the infimum is over all possible couplings of the
random variables X and Y . Convergence in the metric d is equivalent to the usual
weak convergence plus convergence of second moments. This latter assertion may be
demonstrated by appealing to the representation (10) and applying the well-known
Skorohod theorem from probability theory (see Theorem 29.6 of [1]). We omit the
details.
The variational problem (8) is an example of a MongeKantorovich mass transference problem with the particular cost function c(x, y) = |x y|2 [21]. In that context,
an infimizer p P(1 , 2 ) is referred to as an optimal transference plan. When 1
and 2 have finite second moments, the existence of such a p for (8) is readily verified
by a simple adaptation of our arguments in section 4. For a probabilistic proof that
the infimum in (8) is attained when 1 and 2 have finite second moments, see [10].
Brenier [2] has established the existence of a one-to-one optimal transference plan
in the case that the measures 1 and 2 have bounded support and are absolutely
continuous with respect to Lebesgue measure. Caarelli [3] and Gangbo and McCann
[8, 9] have recently extended Breniers results to more general cost functions c and to
cases in which the measures do not have bounded support.
If the measures 1 and 2 are absolutely continuous with respect to the Lebesgue
measure, with densities 1 and 2 , respectively, we will write P(1 , 2 ) for the set of
probability measures having first marginal 1 and second marginal 2 . Correspondingly, we will denote by d(1 , 2 ) the Wasserstein distance between 1 and 2 . This
is the situation that we will be concerned with in what follows.
4. The discrete scheme. We shall now construct a time-discrete scheme that is
designed to converge in an appropriate sense (to be made precise in the next section)
to a solution of the FokkerPlanck equation. The scheme that we shall describe
was motivated by a similar scheme developed by Otto in an investigation of pattern
formation in magnetic fluids [19]. We shall make the following assumptions concerning
the potential introduced in section 2:
C (Rn );
(11)
(12)
for
some constant C < . Notice that our assumptions on allow for cases in which
exp() dx is not defined, so the stationary density s given by (4) does not
Rn
exist. These assumptions allow us to treat a wide class of FokkerPlanck equations.
1
, for which const., falls
In particular, the classical diusion equation
t =
into this category. We also introduce the set K of admissible probability densities:
(x) dx = 1 , M () < ,
K := : Rn [0, ) measurable
Rn
where
M () =
Rn
|x|2 (x) dx .
With these conventions in hand, we now formulate the iterative discrete scheme:
1
(k1)
2
(13)
d(
,
)
+
h
F
()
2
over all K .
Here we use the notation (0) = 0 . The scheme (13) is the obvious generalization of
the scheme (1) set forth in the Introduction for the diusion equation. We shall now
establish existence and uniqueness of the solution to (13).
PROPOSITION 4.1. Given 0 K, there exists a unique solution of the scheme
(13).
Proof. Let us first demonstrate that S is well defined as a functional on K with
values in (, +] and that, in addition, there exist < 1 and C < depending
only on n such that
S() C (M () + 1)
(14)
for all K .
n
, 1). For future reference,
Actually, we shall show that (14) is valid for any ( n+2
we prove a somewhat finer estimate. Namely, we demonstrate that there exists a
C < , depending only on n and , such that for all R 0, and for each K,
there holds
(15)
Rn B
1
R2 + 1
( 2 + n2) n
(M () + 1) ,
where BR denotes the ball of radius R centered at the origin in Rn . Indeed, for < 1
there holds
| min{z log z, 0}| C z
for all z 0.
C
C
dx
Rn BR
Rn B
Rn BR
>
1
|x|2 + 1
n
2,
dx
(M () + 1) .
we have
1
|x|2 + 1
dx C
1
R2 + 1
1
n2
Let us now prove that for given (k1) K, there exists a minimizer K of
the functional
(16)
1
2
d((k1) , )2 + h F () .
Observe that S is not bounded below on K and hence F is not bounded below on K
either. Nevertheless, using the inequality
M (1 ) 2 M (0 ) + 2 d(0 , 1 )2
(17)
for all 0 , 1 K
(which immediately follows from the inequality |y|2 2|x|2 + 2|x y|2 and from the
definition of d) together with (14) we obtain
(k1)
1
, )2 + h F ()
2 d(
(17)
14 M () 12 M ((k1) ) +
(14)
14 M () C (M () + 1)
(18)
h S()
1
2
M ((k1) )
for all K ,
which ensures that (16) is bounded below. Now, let { } be a minimizing sequence
for (16). Obviously, we have that
{S( )}
(19)
is bounded above ,
(20)
is bounded .
is bounded ,
max{ log , 0} dx
Rn
is bounded .
As z max{z log z, 0}, z [0, ), has superlinear growth, this result, in conjunction
with (20), guarantees the existence of a (k) K such that (at least for a subsequence)
w
(k)
(21)
in L1 (Rn ) .
(22)
(23)
(k) log (k) dx lim inf
log dx ,
(24)
BR
(k)
max{
log
Rn BR
(k)
, 0} dx lim inf
BR
Rn BR
(25)
| min{ log , 0}| dx = 0 .
lim sup
R N
Rn BR
Rn BR
BR
(26)
(27)
d(
(k1)
(k) 2
Equation (26) follows immediately from (21) and Fatous lemma. As for (27), we
choose p P((k1) , ) satisfying
By (20) the sequence of probability measures { dx} is tight, or relatively compact with respect to the usual weak convergence in the space of probability measures
on Rn (i.e., convergence tested against bounded continuous functions) [1]. This, together with the fact that the density (k1) has finite second moment, guarantees
that the sequence {p } of probability measures on Rn Rn is tight. Hence, there
is a subsequence of {p } that converges weakly to some probability measure p.
From (21) we deduce that p P((k1) , (k) ). We now could invoke the Skorohod
theorem [1] and Fatous lemma to infer (27) from this weak convergence, but we prefer
here to give a more analytic proof. For R < let us select a continuous function
R : Rn [0, 1] such that
R = 1 inside of BR
We then have
(28)
Rn Rn
for each fixed R < . On the other hand, using the monotone convergence theorem,
we deduce that
(k1) (k) 2
, )
|x y|2 p(dx dy)
d(
Rn Rn
Rn Rn
Remark. One of the referees has communicated to us the following simple estimate
that could be used in place of (14)(15) in the previous and subsequent analysis: for
any Rn (in particular, for = Rn BR ) and for all K there holds
|x|
1
(29)
| min{ log , 0}| dx C
e 2 dx + M () +
dx
4
0
1
|x|
e 2 dx +
|x| dx .
C
The desired result (29) then follows from the inequality |x| |x|2 + 1/(4) for > 0.
5. Convergence to the solution of the FokkerPlanck equation. We come
now to our main result. We shall demonstrate that an appropriate interpolation of
the solution to the scheme (13) converges to the unique solution of the FokkerPlanck
equation. Specifically, the convergence result that we will prove here is as follows.
(k)
THEOREM 5.1. Let 0 K satisfy F (0 ) < , and for given h > 0, let {h }kN
be the solution of (13). Define the interpolation h : (0, )Rn [0, ) by
(k)
h (t) = h
Then as h 0,
(30)
h (t) (t)
weakly in L1 (Rn )
= div() + 1 ,
t
(31)
with initial condition
(32)
(t) 0
strongly in L1 (Rn )
for t 0
and
(33)
Proof. The proof basically follows along the lines of [19, Proposition 2, Theorem
3]. The crucial step is to recognize that the first variation of (16) with respect to the
independent variables indeed yields a time-discrete scheme for (31), as will now be
demonstrated. For notational convenience only, we shall set 1 from here on in.
As will be evident from the ensuing arguments, our proof works for any positive . In
10
fact, it is not dicult to see that, with appropriate modifications to the scheme (13),
we can establish an analogous convergence result for time-dependent .
Let a smooth vector field with bounded support, C0 (Rn , Rn ), be given, and
define the corresponding flux { } R by
= for all R
and
0 = id .
For any R, let the measure (y) dy be the push forward of (k) (y) dy under .
This means that
(34)
(y) (y) dy =
(k) (y) ( (y)) dy for all C00 (Rn ) .
Rn
Rn
(35)
(k1)
1
(36) 1
, )2 + h F ( ) 12 d((k1) , (k) )2 + h F ((k) )
0,
2 d(
(y) (y) dy =
(k) (y) ( (y)) dy .
Rn
Rn
This yields
1
E( ) E((k) ) =
Rn
Observe that the dierence quotient under the integral converges uniformly to (y)(y),
hence implying that
d
(37)
(y)(y) (k) (y) dy .
d [E( )] =0 =
Rn
(34)
=
(k) (y) log( ( (y))) dy
Rn
(k) (y)
(35)
(k)
dy .
=
(y) log
det (y)
Rn
Therefore, we have
1
S( ) S((k) ) =
Rn
11
Now using
d
d [det (y)] =0
= div(y) ,
together with the fact that 0 = id, we see that the dierence quotient under the
integral converges uniformly to div, hence implying that
d
(38)
[S(
)]
=
(k) div dy .
=0
d
Rn
Now, let p be optimal in the definition of d((k1) , (k) )2 (see section 3). The formula
Rn Rn
(k1)
2
(k1) (k) 2
1
1
1
d(
,
d(
,
2
2
2
2
1
1
1
p(dx dy) ,
2 | (y) x| 2 |y x|
Rn Rn
lim sup 1
0
(39)
Rn Rn
1
2
d((k1) , )2
1
2
d((k1) , (k) )2
We now infer from (36), (37), (38), and (39) (and the symmetry in ) that
(k)
(k1)
) dy
(y x)(y) p(dx dy)
n (
R
Rn Rn
=
((y) (x) + (x y)(y)) p(dx dy)
Rn Rn
2
1
|y x|2 p(dx dy)
2 sup | |
Rn
1
2
Rn Rn
(k1)
sup | | d(
, (k) )2
Rn
(k)
(k1)
(k)
1
dy
) + ( )
n h (
R
(41)
12 sup|2 | h1 d((k1) , (k) )2 for all C0 (Rn ) .
Rn
12
(42)
M (h ) C ,
(43)
(N )
Rn
(N )
log h , 0} dx C ,
max{h
(N )
E(h ) C ,
(44)
N
(45)
(k1)
d(h
k=1
(k)
, h )2 C h .
(k1)
(k1)
d(h
(k)
(k)
(k1)
, h )2 + h F (h ) h F (h
),
(46)
1
2h
(k1)
d(h
k=1
(k)
(N )
, h )2 + F (h ) F (0 ) .
As in Proposition 4.1, we must confront the technical diculty that F is not bounded
below. The inequality (42) is established via the following calculations:
(N )
(17)
(N )
M (h ) 2 d(0 , h )2 + 2 M (0 )
2N
(k1)
d(h
(k)
, h )2 + 2 M (0 )
k=1
(N )
4 h N F (0 ) F (h ) + 2 M (0 )
(14)
(N )
4 T F (0 ) + C (M (h ) + 1) + 2 M (0 ) ,
(46)
which clearly gives (42). To obtain the second line of the above display, we have made
use of the triangle inequality for the Wasserstein metric (see equation (9)) and the
CauchySchwarz inequality. The estimates (43), (44), and (45) now follow readily
from the bounds (14) and (15), the estimate (42), and the inequality (46), as follows:
(N )
(N )
(N )
(N )
(N )
max{h log h , 0} dx S(h ) +
| min{h log h , 0}| dx
Rn
Rn
(15)
(N )
(N )
(N )
(N )
S(h ) + C (M (h ) + 1)
F (h ) + C (M (h ) + 1)
(46)
(N )
F (0 ) + C (M (h ) + 1) ;
(N )
(N )
(N )
E(h ) = F (h ) S(h )
(14)
(N )
(N )
F (h ) + C (M (h ) + 1)
(46)
(N )
F (0 ) + C (M (h ) + 1) ;
(k1)
d(h
k=1
13
(46)
(k)
(N )
, h )2 2 h F (0 ) F (h )
(N )
2 h F (0 ) + C (M (h ) + 1) .
(14)
Now, owing to the estimates (42) and (43), we may conclude that there exists a
measurable (t, x) such that, after extraction of a subsequence,
h weakly in L1 ((0, T )Rn )
(47)
A straightforward analysis reveals that (42), (43), and (44) guarantee that
(t) K
(48)
Let us now improve upon the convergence in (47) by showing that (30) holds. For
a given finite time horizon T < , there exists a constant C < such that for all
N, N N and all h [0, 1] with N h T , and N h T , we have
(N )
d(h
(N )
, h )2 C |N h N h| .
This result is obtained from (45) by use of the triangle inequality (9) for d and
the CauchySchwarz inequality. Furthermore, for all , K , p P(, ), and
C0 (Rn ), there holds
dx =
((x) (y)) p(dx dy)
n dx
R
Rn
Rn Rn
|x y| p(dx dy)
sup ||
Rn
sup ||
Rn
Rn Rn
Rn Rn
12
|x y| p(dx dy)
,
2
(t
)
dx
(t)
dx
C
sup
||
(|t
t|
+
h)
h
n h
(49)
Rn
R
Rn
Let t (0, T ) and C0 (Rn ) be given, and notice that for any > 0, we have
(t) dx
n h (t) dx
R
Rn
t+
h (t) dx 21
h ( ) dx d
Rn
n
R
t
t+
t+
1
1
h ( ) dx d 2
( ) dx d
+ 2
Rn
Rn
t
t
t+
( ) dx d
(t) dx .
+ 21
n
n
R
R
t
14
According to (49), the first term on the right-hand side of this equation is bounded
by
1
C sup || ( + h) 2 ,
Rn
and owing to (47), the second term converges to zero as h 0 for any fixed > 0. At
this point, let us remark that from the result (47) we may deduce that is smooth on
(0, )Rn . This is the conclusion of assertion (a) below, which will be proved later.
From this smoothness property, we ascertain that the final term on the right-hand
side of the above display converges to zero as 0. Therefore, we have established
that
(50)
h (t) dx
(t) dx for all C0 (Rn ) .
Rn
Rn
However, the estimate (42) guarantees that M (h (t)) is bounded for h 0. Consequently, (50) holds for any L (Rn ), and therefore, the convergence result (30)
does indeed hold.
It now follows immediately from (41), (45), and (47) that satisfies
(t + ) dx dt =
0 (0) dx ,
(51)
(0,)Rn
Rn
(t1 ) (t1 ) dx
(t + ) dx dt
Rn
(t 0 ,t 1 )Rn
=
(52)
(t0 ) (t0 ) dx
Rn
(t1 ) (t1 ) dx
(t + ) dx dt
Rn
(t
,t
)
R
0
1
( ) dx dt
=
n
(t
,t
)
R
0 1
(53)
(2 ) dx dt
+
n
(t 0 ,t 1 )R
(t0 ) (t0 ) dx .
+
Rn
Notice that for fixed (t1 , x1 ) (0, )Rn and for each > 0, the function
(t, x) = G(t1 + t, x x1 )
15
G(t, x) = t 2 g(t 2 x)
(54)
Inserting into (53) and taking the limit 0, we obtain the equation
t1
[(t) ( )] G(t1 t) dt
( )(t1 ) =
tt10
(55)
[(t) (2 )] G(t1 t) dt
+
t0
where denotes convolution in the x-variables. From (55), we extract the following
estimate in the Lp -norm:
t1
(t) ( )L 1 G(t1 t)Lp dt
( )(t1 )Lp =
tt10
(t) (2 )L 1 G(t1 t)Lp dt
+
t0
G(t)Lp = t( p 1)
n
2
n+ 1
1 n
2 2
gLp ,
gLp ,
G(t)Lp , = t p
which leads to
( )(t1 )Lp
n
n1
t 1 t 0
t( p 1)
n
2
t 1 t 0
n+ 1
1 n
2 2
tp
gLp dt
gLp dt
We now appeal to the Lp -estimates [18, section 3, (3.1), and (3.2)] for the potentials
in (55) to conclude by the usual bootstrap arguments that any derivative of is in
Lploc ((0, )Rn ), from which we obtain the stated regularity condition (a).
We now prove assertion (b). Using (55) with t0 = 0, and proceeding as above, we
obtain
( )(t1 ) (0 ) G(t1 )L 1
= ess sup (t) ( )L 1
t(0,t 1 )
t1
t1
gL 1 dt
1
t 2 gL 1 dt
16
and therefore,
( )(t) (0 ) G(t) 0
in L1 (Rn )
for t 0 .
in L1 (Rn )
for t 0 ,
which leads to
( )(t) 0
in L1 (Rn )
for t 0 .
From this result, together with the boundedness of {M ((t))}t0 , we infer that (32)
is satisfied.
Finally, we prove the uniqueness result (c) using a well-known method from the
theory of ellipticparabolic equations (see, for instance, [20]). Let 1 , 2 be solutions
of (32) which are smooth on (0, )Rn and satisfy (32), (33). Their dierence
satisfies the equation
div [ + ] = 0.
t
We multiply this equation for by (), where the family { }0 is a convex and
smooth approximation to the modulus function. For example, we could take
1
(z) = (z 2 + 2 ) 2 .
This procedure yields the inequality
t [ ()] div [ () + [ ()]]
= () ||2 + ( () ())
( () ()) ,
d
((t)) dx +
((t)) ( ) dx
dt
n
Rn
R
( () ()) dx .
Rn
Integrating over (0, t) for given t (0, ), we obtain with help of (32)
((t)) dx +
((t)) ( ) dx dt
Rn
(0,t)Rn
( () ()) dx dt .
(0,t)Rn
Letting tend to zero yields
(56)
|(t)| dx +
|(t)| ( ) dx dt 0 .
Rn
(0,t)Rn
According to (12) and (33), and are integrable on the entire Rn . Hence, if we
replace in (56) by a function R satisfying
x
, where 1 (x) = 1 for |x| 1 , 1 (x) = 0 for |x| 2 ,
R (x) = 1
R
and let R tend to infinity, we obtain Rn |(t)| dx = 0. This produces the desired
uniqueness result.
17
[18] O. A. LADYZENSKAJA
, V. A. SOLONNIKOV, AND N. N. URALCEVA, Linear and QuasiLinear
Equations of Parabolic Type, American Mathematical Society, Providence, RI, 1968.
[19] F. OTTO, Dynamics of labyrinthine pattern formation in magnetic fluids: A meanfield theory,
Archive Rat. Mech. Anal., to appear.
[20] F. OTTO, L1 contraction and uniqueness for quasilinear ellipticparabolic equations, J. Dierential Equations, 131 (1996), pp. 2038.
[21] S. T. RACHEV, Probability metrics and the stability of stochastic models, John Wiley, New
York, 1991.
[22] H. RISKEN, The Fokker-Planck equation: Methods of solution and applications, 2nd ed.,
Springer-Verlag, Berlin, Heidelberg, 1989.
[23] Z. SCHUSS, Singular perturbation methods in stochastic dierential equations of mathematical
physics, SIAM Rev., 22 (1980), pp. 119155.
[24] J. C. STRIKWERDA, Finite Dierence Schemes and Partial Dierential Equations, Wardsworth
& Brooks/Cole, New York, 1989.