Professional Documents
Culture Documents
Series editors
Editor-in-Chief
A. Chenciner J. Coates S.R.S. Varadhan
Cédric Villani
Optimal Transport
Old and New
ABC
Cédric Villani
Unité de Mathématiques Pures et Appliquées (UMPA)
École Normale Supérieure de Lyon
46, allée d'Italie
69364 Lyon CX 07
France
cvillani@umpa.ens-lyon.fr
ISSN 0072-7830
ISBN 978-3-540-71049-3 e-ISBN 978 -3 -540 -71050 -9
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
9 8 7 6 5 4 3 2 1
springer.com
Do mo chuisle mo chroı́, Aëlle
Preface
When I was first approached for the 2005 edition of the Saint-Flour
Probability Summer School, I was intrigued, flattered and scared.1
Apart from the challenge posed by the teaching of a rather analytical
subject to a probabilistic audience, there was the danger of producing
a remake of my recent book Topics in Optimal Transportation.
However, I gradually realized that I was being offered a unique op-
portunity to rewrite the whole theory from a different perspective, with
alternative proofs and a different focus, and a more probabilistic pre-
sentation; plus the incorporation of recent progress. Among the most
striking of these recent advances, there was the rising awareness that
John Mather’s minimal measures had a lot to do with optimal trans-
port, and that both theories could actually be embedded within a single
framework. There was also the discovery that optimal transport could
provide a robust synthetic approach to Ricci curvature bounds. These
links with dynamical systems on one hand, differential geometry on
the other hand, were only briefly alluded to in my first book; here on
the contrary they will be at the basis of the presentation. To summa-
rize: more probability, more geometry, and more dynamical systems.
Of course there cannot be more of everything, so in some sense there
is less analysis and less physics, and also there are fewer digressions.
So the present course is by no means a reduction or an expansion of
my previous book, but should be regarded as a complementary reading.
Both sources can be read independently, or together, and hopefully the
complementarity of points of view will have pedagogical value.
1
Fans of Tom Waits may have identified this quotation.
VII
VIII Preface
Throughout the book I have tried to optimize the results and the
presentation, to provide complete and self-contained proofs of the most
important results, and comprehensive bibliographical notes — a daunt-
ingly difficult task in view of the rapid expansion of the literature. Many
statements and theorems have been written specifically for this course,
and many results appear in rather sharp form for the first time. I also
added several appendices, either to present some domains of mathe-
matics to non-experts, or to provide proofs of important auxiliary re-
sults. All this has resulted in a rapid growth of the document, which in
the end is about six times (!) the size that I had planned initially. So
the non-expert reader is advised to skip long proofs at first
reading, and concentrate on explanations, statements, examples and
sketches of proofs when they are available.
About terminology: For some reason I decided to switch from “trans-
portation” to “transport”, but this really is a matter of taste.
For people who are already familiar with the theory of optimal trans-
port, here are some more serious changes.
Part I is devoted to a qualitative description of optimal transport.
The dynamical point of view is given a prominent role from the be-
ginning, with Robert McCann’s concept of displacement interpolation.
This notion is discussed before any theorem about the solvability of the
Monge problem, in an abstract setting of “Lagrangian action” which
generalizes the notion of length space. This provides a unified picture
of recent developments dealing with various classes of cost functions,
in a smooth or nonsmooth context.
I also wrote down in detail some important estimates by John
Mather, well-known in certain circles, and made extensive use of them,
in particular to prove the Lipschitz regularity of “intermediate” trans-
port maps (starting from some intermediate time, rather than from
initial time). Then the absolute continuity of displacement interpolants
comes for free, and this gives a more unified picture of the Mather and
Monge–Kantorovich theories. I rewrote in this way the classical theo-
rems of solvability of the Monge problem for quadratic cost in Euclidean
space. Finally, this approach allows one to treat change of variables
formulas associated with optimal transport by means of changes of
variables that are Lipschitz, and not just with bounded variation.
Part II discusses optimal transport in Riemannian geometry, a line
of research which started around 2000; I have rewritten all these ap-
plications in terms of Ricci curvature, or more precisely curvature-
Preface IX
While writing this text I asked for help from a number of friends and
collaborators. Among them, Luigi Ambrosio and John Lott are the ones
whom I requested most to contribute; this book owes a lot to their detailed
comments and suggestions. Most of Part III, but also significant portions
of Parts I and II, are made up with ideas taken from my collaborations with
John, which started in 2004 as I was enjoying the hospitality of the Miller
Institute in Berkeley. Frequent discussions with Patrick Bernard and
Albert Fathi allowed me to get the links between optimal transport and
John Mather’s theory, which were a key to the presentation in Part I; John
himself gave precious hints about the history of the subject. Neil Trudinger
and Xu-Jia Wang spent vast amounts of time teaching me the regularity
theory of Monge–Ampère equations. Alessio Figalli took up the dreadful
challenge to check the entire set of notes from first to last page. Apart from
these people, I got valuable help from Stefano Bianchini, François Bolley,
Yann Brenier, Xavier Cabré, Vincent Calvez, José Antonio Carrillo,
Dario Cordero-Erausquin, Denis Feyel, Sylvain Gallot, Wilfrid Gangbo,
Diogo Aguiar Gomes, Nathaël Gozlan, Arnaud Guillin, Nicolas Juillet,
Kazuhiro Kuwae, Michel Ledoux, Grégoire Loeper, Francesco Maggi,
Robert McCann, Shin-ichi Ohta, Vladimir Oliker, Yann Ollivier, Felix
Otto, Ludger Rüschendorf, Giuseppe Savaré, Walter Schachermayer,
Benedikt Schulte, Theo Sturm, Josef Teichmann, Anthon Thalmaier,
Hermann Thorisson, Süleyman Üstünel, Anatoly Vershik, and others.
Short versions of this course were tried on mixed audiences in the
Universities of Bonn, Dortmund, Grenoble and Orléans, as well as the
Borel seminar in Leysin and the IHES in Bures-sur-Yvette. Part of
the writing was done during stays at the marvelous MFO Institute
in Oberwolfach, the CIRM in Luminy, and the Australian National
University in Canberra. All these institutions are warmly thanked.
It is a pleasure to thank Jean Picard for all his organization work
on the 2005 Saint-Flour summer school; and the participants for their
questions, comments and bug-tracking, in particular Sylvain Arlot
(great bug-tracker!), Fabrice Baudoin, Jérôme Demange, Steve Evans
(whom I also thank for his beautiful lectures), Christophe Leuridan,
Jan Oblój, Erwan Saint Loubert Bié, and others. I extend these thanks
to the joyful group of young PhD students and maı̂tres de conférences
with whom I spent such a good time on excursions, restaurants, quan-
tum ping-pong and other activities, making my stay in Saint-Flour
truly wonderful (with special thanks to my personal driver, Stéphane
Loisel, and my table tennis sparring-partner and adversary, François
Simenhaus). I will cherish my visit there in memory as long as I live!
Preface XI
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVII
.
Introduction 1
4 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
XIII
XIV Contents
12 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
Axioms
I use the classical axioms of set theory; not the full version of the axiom
of choice (only the classical axiom of “countable dependent choice”).
XVII
XVIII Conventions
Function spaces
C(X ) is the space of continuous functions X → R, Cb (X ) the space
of bounded continuous functions X → R; and C0 (X ) the space of
continuous functions X → R converging to 0 at infinity; all of them
are equipped with the norm of uniform convergence ϕ∞ = sup |ϕ|.
Then Cbk (X ) is the space of k-times continuously differentiable func-
tions u : X → R, such that all the partial derivatives of u up to order k
are bounded; it is equipped with the norm given by the supremum of
all norms ∂uCb , where ∂u is a partial derivative of order at most k;
Cck (X ) is the space of k-times continuously differentiable functions with
compact support; etc. When the target space is not R but some other
space Y, the notation is transformed in an obvious way: C(X ; Y), etc.
Lp is the Lebesgue space of exponent p; the space and the measure
will often be implicit, but clear from the context.
Calculus
The derivative of a function u = u(t), defined on an interval of R and
valued in Rn or in a smooth manifold, will be denoted by u , or more
often by u̇. The notation d+ u/dt stands for the upper right-derivative
of a real-valued function u: d+ u/dt = lim sups↓0 [u(t + s) − u(t)]/s.
If u is a function of several variables, the partial derivative with
respect to the variable t will be denoted by ∂t u, or ∂u/∂t. The notation
ut does not stand for ∂t u, but for u(t).
The gradient operator will be denoted by grad or simply ∇; the di-
vergence operator by div or ∇· ; the Laplace operator by ∆; the Hessian
operator by Hess or ∇2 (so ∇2 does not stand for the Laplace opera-
tor). The notation is the same in Rn or in a Riemannian manifold. ∆ is
the divergence of the gradient, so it is typically a nonpositive operator.
The value of the gradient of f at point x will be denoted either by
∇x f or ∇f (x). The notation ∇ stands for the approximate gradient,
introduced in Definition 10.2.
If T is a map Rn → Rn , ∇T stands for the Jacobian matrix of T ,
that is the matrix of all partial derivatives (∂Ti /∂xj ) (1 ≤ i, j ≤ n).
All these differential operators will be applied to (smooth) functions
but also to measures, by duality. For
instance, the Laplacian of a mea-
sure µ is defined via the identity ζ d(∆µ) = (∆ζ) dµ (ζ ∈ Cc2 ). The
notation is consistent in the sense that ∆(f vol) = (∆f ) vol. Similarly,
I shall take the divergence of a vector-valued measure, etc.
f = o(g) means f /g −→ 0 (in an asymptotic regime that should be
clear from the context), while f = O(g) means that f /g is bounded.
XX Conventions
Probability measures
δx is the Dirac mass at point x.
All measures considered in the text are Borel measures on Polish
spaces, which are complete, separable metric spaces, equipped with
their Borel σ-algebra. I shall usually not use the completed σ-algebra,
except on some rare occasions (emphasized in the text) in Chapter 5.
A measure is said to be finite if it has finite mass, and locally finite
if it attributes finite mass to compact sets.
The space of Borel probability measures on X is denoted by P (X ),
the space of finite Borel measures by M+ (X ), the space of signed finite
Borel measures by M (X ). The total variation of µ is denoted by µTV .
The integral of a function f with respect
to a probability
measure
µ
will be denoted interchangeably by f (x) dµ(x) or f (x) µ(dx) or
f dµ.
If µ is a Borel measure on a topological space X , a set N is said to
be µ-negligible if N is included in a Borel set of zero µ-measure. Then
µ is said to be concentrated on a set C if X \ C is negligible. (If C
itself is Borel measurable, this is of course equivalent to µ[X \ C] = 0.)
By abuse of language, I may say that X has full µ-measure if µ is
concentrated on X .
If µ is a Borel measure, its support Spt µ is the smallest closed set
on which it is concentrated. The same notation Spt will be used for the
support of a continuous function.
If µ is a Borel measure on X , and T is a Borel map X → Y, then
T# µ stands for the image measure2 (or push-forward) of µ by T : It is
a Borel measure on Y, defined by (T# µ)[A] = µ[T −1 (A)].
The law of a random variable X defined on a probability space
(Ω, P ) is denoted by law (X); this is the same as X# P .
The weak topology on P (X ) (or topology of weak convergence, or
narrow topology) is induced by convergence against Cb (X ), i.e. bounded
2
Depending
on the authors, the measure T# µ is often denoted by T #µ, T∗ µ, T (µ),
T µ, δT (a) µ(da), µ ◦ T −1 , µT −1 , or µ[T ∈ · ].
Conventions XXI
If µ and ν are the only laws in the problem, then without loss of
generality one may choose Ω = X × Y. In a more measure-theoretical
formulation, coupling µ and ν means constructing a measure π on X ×Y
such that π admits µ and ν as marginals on X and Y respectively.
The following three statements are equivalent ways to rephrase that
marginal condition:
• π = (Id , T )# µ.
Here below are some of the most famous couplings used in mathematics
— of course the list is far from complete, since everybody has his or
her own preferred coupling technique. Each of these couplings comes
with its own natural setting; this variety of assumptions reflects the
variety of constructions. (This is a good reason to state each of them
with some generality.)
and set
T = G−1 ◦ F.
If µ does not have atoms, then T# µ = ν. This rearrangement is quite
simple, explicit, as smooth as can be, and enjoys good geometric
properties.
4. The Knothe–Rosenblatt rearrangement in Rn . Let µ and ν be
two probability measures on Rn , such that µ is absolutely continu-
ous with respect to Lebesgue measure. Then define a coupling of µ
and ν as follows.
Step 1: Take the marginal on the first variable: this gives probabil-
ity measures µ1 (dx1 ), ν1 (dy1 ) on R, with µ1 being atomless. Then
define y1 = T1 (x1 ) by the formula for the increasing rearrangement
of µ1 into ν1 .
Step 2: Now take the marginal on the first two variables and dis-
integrate it with respect to the first variable. This gives proba-
bility measures µ2 (dx1 dx2 ) = µ1 (dx1 ) µ2 (dx2 |x1 ), ν2 (dy1 dy2 ) =
ν1 (dy1 ) ν2 (dy2 |y1 ). Then, for each given y1 ∈ R, set y1 = T1 (x1 ),
and define y2 = T2 (x2 ; x1 ) by the formula for the increasing re-
arrangement of µ2 (dx2 |x1 ) into ν2 (dy2 |y1 ). (See Figure 1.1.)
Then repeat the construction, adding variables one after the other
and defining y3 = T3 (x3 ; x1 , x2 ); etc. After n steps, this produces
a map y = T (x) which transports µ to ν, and in practical situa-
tions might be computed explicitly with little effort. Moreover, the
Jacobian matrix of the change of variables T is (by construction)
upper triangular with positive entries on the diagonal; this makes
it suitable for various geometric applications. On the negative side,
this mapping does not satisfy many interesting intrinsic properties;
it is not invariant under isometries of Rn , not even under relabeling
of coordinates.
5. The Holley coupling on a lattice. Let µ and ν be two discrete
probabilities on a finite lattice Λ, say {0, 1}N , equipped with the
natural partial ordering (x ≤ y if xn ≤ yn for all n). Assume that
ν
µ
dx1 dy1
T1
Fig. 1.1. Second step in the construction of the Knothe–Rosenblatt map: After the
correspondance x1 → y1 has been determined, the conditional probability of x2 (seen
as a one-dimensional probability on a small “slice” of width dx1 ) can be transported
to the conditional probability of y2 (seen as a one-dimensional probability on a slice
of width dy1 ).
are many variants with important differences which are all intended
to make two trajectories close to each other after some time: the
Ornstein coupling, the ε-coupling (in which one requires the
two variables to be close, rather than to occupy the same state),
the shift-coupling (in which one allows an additional time-shift),
etc.
8. The optimal coupling or optimal transport. Here one intro-
duces a cost function c(x, y) on X × Y, that can be interpreted
as the work needed to move one unit of mass from location x to
location y. Then one considers the Monge–Kantorovich mini-
mization problem
inf E c(X, Y ),
where the pair (X, Y ) runs over all possible couplings of (µ, ν); or
equivalently, in terms of measures,
inf c(x, y) dπ(x, y),
X ×Y
Gluing
π123 (dx1 dx2 dx3 ) = π12 (dx1 |x2 ) µ2 (dx2 ) π23 (dx3 |x2 ).
Diffusion formula
Remark 1.6. Actually, there is a finer criterion for the diffusion equa-
tion to hold true: it is sufficient that the Ricci curvature at point x be
bounded below by −Cd(x0 , x)2 gx as x → ∞, where gx is the metric at
point x and x0 is an arbitrary reference point. The exponent 2 here is
sharp.
∂t ρ = ∆ρ.
∇u(x)
ξ(t, x) = ,
(1 − t) ρ0 (x) + t ρ1 (x)
with associated flow (Tt (x))0≤t≤1 , and a family (µt )0<t<1 of probability
measures by
µt = (1 − t) µ0 + t µ1 .
It is easy to check that
∂t µ = (ρ1 − ρ0 ) ν,
∇· µt ξ(t, ·) = ∇· ∇u e−V vol = e−V ∆u−∇V ·∇u vol = (ρ0 −ρ1 ) ν.
So µt satisfies the formula of conservation of mass, therefore µt =
(Tt )# µ0 . In particular, T1 pushes µ0 forward to µ1 .
In the case when M is compact and V = 0, the above construction
works if ρ0 and ρ1 are Lipschitz continuous and positive. Indeed, the
solution u of ∆u = ρ0 − ρ1 will be of class C 2,α for all α ∈ (0, 1),
and in particular ∇u will be of class C 1 (in fact C 1,α ). In more general
situations, things might depend on the regularity of V , and its behavior
at infinity.
Bibliographical notes
(In [814], for the sake of consistency of the presentation I treated op-
timal coupling on R as a particular case of optimal coupling on Rn ,
however this has the drawback to involve subtle arguments.)
The Knothe–Rosenblatt coupling was introduced in 1952 by Rosen-
blatt [709], who suggested that it might be useful to “normalize” sta-
tistical data before applying a statistical test. In 1957, Knothe [523]
rediscovered it for applications to the theory of convex bodies. It is
quite likely that other people have discovered this coupling indepen-
dently. An infinite-dimensional generalization was studied by Bogachev,
Kolesnikov and Medvedev [134, 135].
FKG inequalities were introduced in [375], and have since then
played a crucial role in statistical mechanics. Holley’s proof by coupling
appears in [477]. Recently, Caffarelli [188] has revisited the subject in
connection with optimal transport.
It was in 1965 that Moser proved his coupling theorem, for smooth
compact manifolds without boundaries [640]; noncompact manifolds
were later considered by Greene and Shiohama [432]. Moser himself also
worked with Dacorogna on the more delicate case where the domain
is an open set with boundary, and the transport is required to fix the
boundary [270].
Strassen’s duality theorem is discussed e.g. in [814, Section 1.4].
The gluing lemma is due to several authors, starting with Vorob’ev
in 1962 for finite sets. The modern formulation seems to have emerged
around 1980, independently by Berkes and Philipp [101], Kallenberg,
Thorisson, and maybe others. Refinements were discussed e.g. by de
Acosta [273, Theorem A.1] (for marginals indexed by an arbitrary set)
or Thorisson [781, Theorem 5.1]; see also the bibliographic comments
in [317, p. 20]. For a proof of the statement in these notes, it is suf-
ficient to consult Dudley [317, Theorem 1.1.10], or [814, Lemma 7.6].
A comment about terminology: I like the word “gluing” which gives a
good indication of the construction, but many authors just talk about
“composition” of plans.
The formula of change of variables for C 1 or Lipschitz change of vari-
ables can be found in many textbooks, see e.g. Evans and Gariepy [331,
Chapter 3]. The generalization to approximately differentiable maps is
explained in Ambrosio, Gigli and Savaré [30, Section 5.5]. Such a gen-
erality is interesting in the context of optimal transportation, where
changes of variables are often very rough (say BV , which means of
bounded variation). In that context however, there is more structure:
Bibliographical notes 19
so in particular
d |αt |2 2
= − ∇V (Xt )−∇V (Yt ), Xt −Yt ≤ −K Xt −Yt = −K |αt |2 .
dt 2
It follows by Gronwall’s lemma that
Assume for simplicity that E |X0 |2 and E |Y0 |2 are finite. Then
E |Xt − Yt |2 ≤ e−2Kt E |X0 − Y0 |2 ≤ 2 E |X0 |2 + E |Y0 |2 e−2Kt . (2.4)
e−V (y) dy
ν(dy) =
Z
−V
(where Z = e is a normalization constant) is stationary: If
law (Y0 ) = ν, then also law (Yt ) = ν. Then (2.4) easily implies that
µt := law (Xt ) converges weakly to ν; in addition, the convergence is
exponentially fast.
Euclidean isoperimetry
Among all subsets of Rn with given surface, which one has the largest
volume? To simplify the problem, let us assume that we are looking
for a bounded open set Ω ⊂ Rn with, say, Lipschitz boundary ∂Ω, and
that the measure of |∂Ω| is given; then the problem is to maximize the
measure of |Ω|. To measure ∂Ω one should use the (n − 1)-dimensional
Hausdorff measure, and to measure Ω the n-dimensional Hausdorff
measure, which of course is the same as the Lebesgue measure in Rn .
It has been known, at least since ancient times, that the solution
to this “isoperimetric problem” is the ball. A simple scaling argument
shows that this statement is equivalent to the Euclidean isoperimetric
inequality:
24 2 Three examples of coupling techniques
|∂Ω| |∂B|
n ≥ n ,
|Ω| n−1 |B| n−1
where B is any ball.
There are very many proofs of the isoperimetric inequality, and
many refinements as well. It is less known that there is a proof by
coupling.
Here is a sketch of the argument, forgetting about regularity issues.
Let B be a ball such that |∂B| = |∂Ω|. Consider a random point X dis-
tributed uniformly in Ω, and a random point Y distributed uniformly
in B. Introduce the Knothe–Rosenblatt coupling of X and Y : This is
a deterministic coupling of the form Y = T (X), such that, at each
x ∈ Ω, the Jacobian matrix ∇T (x) is triangular with nonnegative di-
agonal entries. Since the law of X (resp. Y ) has uniform density 1/|Ω|
(resp. 1/|B|), the change of variables formula yields
1 1
∀x ∈ Ω = det ∇T (x) . (2.5)
|Ω| |B|
1 (∇ · T )(x)
≤ .
|Ω| 1/n n |B|1/n
1 |∂Ω|
|Ω|1− n ≤ 1 .
n |B| n
Caffarelli’s log-concave perturbation theorem 25
1
Since |∂Ω| = |∂B| = n|B|, the right-hand side is actually |B|1− n , so
the volume of Ω is indeed bounded by the volume of B. This concludes
the proof.
The above argument suggests the following problem:
Open Problem 2.1. Can one devise an optimal coupling between sets
(in the sense of a coupling between the uniform probability measures on
these sets) in such a way that the total cost of the coupling decreases un-
der some evolution converging to balls, such as mean curvature motion?
Bibliographical notes
Cattiaux and Guillin [221] found a simple and elegant coupling argu-
ment to prove the exponential convergence for the law of the stochastic
process √
dXt = 2 dBt − E ∇V (Xt − Xt ) dt,
x
y
remblais
déblais
Fig. 3.2. Economic illustration of Monge’s problem: squares stand for production
units, circles for consumption places.
you want to minimize the total cost. Monge assumed that the trans-
port cost of one unit of mass along a certain distance was given by the
product of the mass by the distance.
Nowadays there is a Monge street in Paris, and therein one can find
an excellent bakery called Le Boulanger de Monge. To acknowledge this,
and to illustrate how Monge’s problem can be recast in an economic
perspective, I shall express the problem as follows. Consider a large
number of bakeries, producing loaves, that should be transported each
morning to cafés where consumers will eat them. The amount of bread
that can be produced at each bakery, and the amount that will be
consumed at each café are known in advance, and can be modeled as
probability measures (there is a “density of production” and a “density
of consumption”) on a certain space, which in our case would be Paris
(equipped with the natural metric such that the distance between two
points is the length of the shortest path joining them). The problem is
to find in practice where each unit of bread should go (see Figure 3.2),
in such a way as to minimize the total transport cost. So Monge’s
problem really is the search of an optimal coupling; and to be more
precise, he was looking for a deterministic optimal coupling.
Monge studied the problem in three dimensions for a continuous
distribution of mass. Guided by his beautiful geometric intuition, he
3 The founding fathers of optimal transport 31
Bibliographical notes
Before the twentieth century, the main references for the problem of
“déblais et remblais” are the memoirs by Monge [636], Dupin [319] and
Appell [42]. Besides achieving important mathematical results, Monge
and Dupin were strongly committed to the development of society and
it is interesting to browse some of their writings about economics and
industry (a list can be found online at gallica.bnf.fr). A lively ac-
count of Monge’s life and political commitments can be found in Bell’s
delightful treatise, Men of Mathematics [80, Chapter 12]. It seems how-
ever that Bell did dramatize the story a bit, at the expense of accuracy
and neutrality. A more cold-blooded biography of Monge was written by
de Launay [277]. Considered as one the greatest geologists of his time,
not particularly sympathetic to the French Revolution, de Launay doc-
umented himself with remarkable rigor, going back to original sources
whenever possible. Other biographies have been written since then by
Taton [778, 779] and Aubry [50].
Monge originally formulated his transport problem in Euclidean
space for the cost function c(x, y) = |x − y|; he probably had no idea of
the extreme difficulty of a rigorous treatment. It was only in 1979 that
Sudakov [765] claimed a proof of the existence of a Monge transport
for general probability densities with this particular cost function. But
his proof was not completely correct, and was amended much later by
Ambrosio [20]. In the meantime, alternative rigorous proofs had been
devised first by Evans and Gangbo [330] (under rather strong assump-
tions on the data), then by Trudinger and Wang [791], and Caffarelli,
Feldman and McCann [190].
Kantorovich defined linear programming in [499], introduced his
minimization problem and duality theorem in [500], and in [501] applied
his theory to the problem of optimal transport; this note can be consid-
ered as the act of birth of the modern formulation of optimal transport.
Later he made the link with Monge’s problem in [502]. His major work
in economics is the book [503], including a reproduction of [499]. An-
other important contribution is a study of numerical schemes based on
linear programming, joint with his student Gavurin [505].
Kantorovich wrote a short autobiography for his Nobel Prize [504].
Online at www.math.nsc.ru/LBRT/g2/english/ssk/legacy.html are
some comments by Kutateladze, who edited his mathematical works.
A recent special issue of the Journal of Mathematical Sciences, edited
by Vershik, was devoted to Kantorovich [810]; this reference contains
Bibliographical notes 35
and Otto at the end of the nineties [493], and then explored by several
authors [208, 209, 210, 211, 212, 213, 214, 216, 669, 671].
Below is a nonexhaustive list of some other unexpected applications.
Relations with the modeling of sandpiles are reviewed by Evans [328], as
well as compression molding problems; see also Feldman [353] (this is for
the cost function c(x, y) = |x − y|). Applications of optimal transport
to image processing and shape recognition are discussed by Gangbo
and McCann [400], Ahmad [6], Angenent, Haker, Tannenbaum, and
Zhu [462, 463], Chazal, Cohen-Steiner and Mérigot [224], and many
other contributors from the engineering community (see e.g. [700, 713]).
X.-J. Wang [834], and independently Glimm and Oliker [419] (around
2000 and 2002 respectively) discovered that the theoretical problem
of designing reflector antennas could be recast in terms of optimal
transport for the cost function c(x, y) = − log(1 − x · y) on S 2 ;
see [402, 419, 660] for further work in the area, and [420] for an-
other version of this problem involving two reflectors.1 Rubinstein
and Wolansky adapted the strategy in [420] to study the optimal de-
sign of lenses [712]; and Gutiérrez and Huang to treat a refraction
problem [453]. In his PhD Thesis, Bernot [108] made the link be-
tween optimal transport, irrigation and the design of networks. Such
topics were also considered by Santambrogio with various collabora-
tors [152, 207, 731, 732, 733, 734]; in particular it is shown in [732]
that optimal transport theory gives a rigorous basis to some varia-
tional constructions used by physicists and hydrologists to study river
basin morphology [65, 706]. Buttazzo and collaborators [178, 179, 180]
explored city planning via optimal transport. Brenier found a connec-
tion to the electrodynamic equations of Maxwell and related models
in string theory [161, 162, 163, 164, 165, 166]. Frisch and collaborators
linked optimal transport to the problem of reconstruction of the “condi-
tions of the initial Universe” [168, 382, 755]. (The publication of [382] in
the prestigious generalist scientific journal Nature is a good indication
of the current visibility of optimal transport outside mathematics.)
Relations of optimal transport with geometry, in particular Ricci
curvature, will be explored in detail in Parts II and III of these notes.
Many generalizations and variants have been studied in the litera-
ture, such as the optimal matching [323], the optimal transshipment
1
According to Oliker, the connection between the two-reflector problem (as for-
mulated in [661]) and optimal transport is in fact much older, since it was first
formulated in a 1993 conference in which he and Caffarelli were participating.
Bibliographical notes 37
(see [696] for a discussion and list of references), the optimal transport
of a fraction of the mass [192, 365], or the optimal coupling with more
than two prescribed marginals [403, 525, 718, 723, 725]; I learnt from
Strulovici that the latter problem has applications in contract theory.
In spite of this avalanche of works, one certainly should not regard
optimal transport as a kind of miraculous tool, for “there are no mir-
acles in mathematics”. In my opinion this abundance only reflects the
fact that optimal transport is a simple, meaningful, natural and there-
fore universal concept.
Part I
The first part of this course is devoted to the description and charac-
terization of optimal transport under certain regularity assumptions on
the measures and the cost function.
As a start, some general theorems about optimal transport plans are
established in Chapters 4 and 5, in particular the Kantorovich duality
theorem. The emphasis is on c-cyclically monotone maps, both in the
statements and in the proofs. The assumptions on the cost function
and the spaces will be very general.
From the Monge–Kantorovich problem one can derive natural dis-
tance functions on spaces of probability measures, by choosing the cost
function as a power of the distance. The main properties of these dis-
tances are established in Chapter 6.
In Chapter 7 a time-dependent version of the Monge–Kantorovich
problem is investigated, which leads to an interpolation procedure be-
tween probability measures, called displacement interpolation. The nat-
ural assumption is that the cost function derives from a Lagrangian
action, in the sense of classical mechanics; still (almost) no smoothness
is required at that level. In Chapter 8 I shall make further assumptions
of smoothness and convexity, and recover some regularity properties of
the displacement interpolant by a strategy due to Mather.
Then in Chapters 9 and 10 it is shown how to establish the
existence of deterministic optimal couplings, and characterize the asso-
ciated transport maps, again under adequate regularity and convex-
ity assumptions. The Change of variables Formula is considered in
Chapter 11. Finally, in Chapter 12 I shall discuss the regularity of
the transport map, which in general is not smooth.
The main results of this part are synthetized and summarized in
Chapter 13. A good understanding of this chapter is sufficient to go
through Part II of this course.
4
Basic properties
Existence
The first good thing about optimal couplings is that they exist:
Theorem 4.1 (Existence of an optimal coupling). Let (X , µ)
and (Y, ν) be two Polish probability spaces; let a : X → R ∪ {−∞}
and b : Y → R ∪ {−∞} be two upper semicontinuous functions such
that a ∈ L1 (µ), b ∈ L1 (ν). Let c : X × Y → R ∪ {+∞} be a lower
semicontinuous cost function, such that c(x, y) ≥ a(x) + b(y) for all
x, y. Then there is a coupling of (µ, ν) which minimizes the total cost
E c(X, Y ) among all possible couplings (X, Y ).
Remark 4.2. The lower bound assumption on c guarantees that the
expected cost E c(X, Y ) is well-defined in R ∪ {+∞}. In most cases of
applications — but not all — one may choose a = 0, b = 0.
The proof relies on basic variational arguments involving the topol-
ogy of weak convergence (i.e. imposed by bounded continuous test func-
tions). There are two key properties to check: (a) lower semicontinuity,
(b) compactness. These issues are taken care of respectively in Lem-
mas 4.3 and 4.4 below, which will be used again in the sequel. Before
going on, I recall Prokhorov’s theorem: If X is a Polish space, then
a set P ⊂ P (X ) is precompact for the weak topology if and only if it is
tight, i.e. for any ε > 0 there is a compact set Kε such that µ[X \Kε ] ≤ ε
for all µ ∈ P.
Lemma 4.3 (Lower semicontinuity of the cost functional). Let
X and Y be two Polish spaces, and c : X × Y → R ∪ {+∞} a lower
C. Villani, Optimal Transport, Grundlehren der mathematischen 43
Wissenschaften 338,
c Springer-Verlag Berlin Heidelberg 2009
44 4 Basic properties
Then
c dπ ≤ lim inf c dπk .
X ×Y k→∞ X ×Y
In particular, if c is nonnegative, then F : π → c dπ is lower semicon-
tinuous on P (X × Y), equipped with the topology of weak convergence.
The desired result follows since this bound is independent of the cou-
pling, and Kε × Lε is compact in X × Y.
Thus π is minimizing.
Remark 4.5. This existence theorem does not imply that the optimal
cost is finite. It might be that all transport plans lead to an infinite
total cost, i.e. c dπ = +∞ for all π ∈ Π(µ, ν). A simple condition to
rule out this annoying possibility is
c(x, y) dµ(x) dν(y) < +∞,
which guarantees that at least the independent coupling has finite total
cost. In the sequel, I shall sometimes make the stronger assumption
which implies that any coupling has finite total cost, and has other nice
consequences (see e.g. Theorem 5.10).
Restriction property
The second good thing about optimal couplings is that any sub-coupling
is still optimal. In words: If you have an optimal transport plan, then
any induced sub-plan (transferring part of the initial mass to part of
the final mass) has to be optimal too — otherwise you would be able
to lower the cost of the sub-plan, and as a consequence the cost of the
whole plan. This is the content of the next theorem.
46 4 Basic properties
π
π :=
[X × Y]
π
Proof of Theorem 4.6. Assume that π is not optimal; then there exists
π such that
Then consider
:= (π − π
π ,
) + Zπ (4.3)
where Z=π [X × Y] > 0. Clearly, π
is a nonnegative measure. On the
other hand, it can be written as
− π );
= π + Z(π
π
then (4.1) shows that π has the same marginals as π, while (4.2) implies
that it has a lower transport cost than π. (Here I use the fact that the
total cost is finite.) This contradicts the optimality of π. The conclusion
is that π is in fact optimal.
Convexity properties 47
Convexity properties
The following estimates are of constant use:
Theorem 4.8 (Convexity of the optimal cost). Let X and Y be
two Polish spaces, let c : X ×Y → R∪{+∞} be a lower semicontinuous
function, and let C be the associated optimal transport cost functional
on P (X ) × P (Y). Let (Θ, λ) be a probability space, and let µθ , νθ be two
measurable functions defined on Θ, with values in P (X ) and P (Y) re-
spectively. Assume that c(x, y) ≥ a(x) + b(y), where a ∈ L1 (dµθ dλ(θ)),
b ∈ L1 (dνθ dλ(θ)). Then
C µθ λ(dθ), νθ λ(dθ) ≤ C(µθ , νθ ) λ(dθ) .
Θ Θ Θ
About the second question: Why don’t we try to apply the same
reasoning as in the proof of Theorem 4.1? The problem is that the set
of deterministic couplings is in general not compact; in fact, this set
is often dense in the larger space of all couplings! So we may expect
that the value of the infimum in the Monge problem coincides with the
value of the minimum in the Kantorovich problem; but there is no a
priori reason to expect the existence of a Monge minimizer.
Fig. 4.1. The optimal plan, represented in the left image, consists in splitting the
mass in the center into two halves and transporting mass horizontally. On the right
the filled regions represent the lines of transport for a deterministic (without splitting
of mass) approximation of the optimum.
Bibliographical notes 49
Bibliographical notes
Fig. 5.1. An attempt to improve the cost by a cycle; solid arrows indicate the mass
transport in the original plan, dashed arrows the paths along which a bit of mass is
rerouted.
Thus, if you can find such cycles (x1 , y1 ), . . . , (xN , yN ) in your trans-
ference plan, certainly the latter is not optimal. Conversely, if you do
not find them, then your plan cannot be improved (at least by the pro-
cedure described above) and it is likely to be optimal. This motivates
the following definitions.
N
N
c(xi , yi ) ≤ c(xi , yi+1 ) (5.1)
i=1 i=1
Clearly, if we can find a pair (ψ, φ) and a transference plan π for which
there is equality, then (ψ, φ) is optimal in the left-hand side and π is
also optimal in the right-hand side.
A pair of price functions (ψ, φ) will informally be said to be com-
petitive if it satisfies (5.2). For a given y, it is of course in the interest
of the company to set the highest possible competitive price φ(y), i.e.
the highest lower bound for (i.e. the infimum of) ψ(x) + c(x, y), among
all bakeries x. Similarly, for a given x, the price ψ(x) should be the
supremum of all φ(y) − c(x, y). Thus it makes sense to describe a pair
of prices (ψ, φ) as tight if
φ(y) = inf ψ(x) + c(x, y) , ψ(x) = sup φ(y) − c(x, y) . (5.5)
x y
In words, prices are tight if it is impossible for the company to raise the
selling price, or lower the buying price, without losing its competitivity.
Consider an arbitrary pair of competitive prices (ψ, φ). Wecan al-
ways improve φ by replacing it by φ1 (y) = inf x ψ(x) + c(x, y) . Then
we can also improve ψ by replacing y φ1 (y) − c(x, y) ;
it by ψ1 (x) = sup
then replacing φ1 by φ2 (y) = inf x ψ1 (x) + c(x, y) , and so on. It turns
out that this process is stationary: as an easy exercise, the reader can
check that φ2 = φ1 , ψ2 = ψ1 , which means that after just one iteration
one obtains a pair of tight prices. Thus, when we consider the dual
Kantorovich problem (5.3), it makes sense to restrict our attention to
tight pairs, in the sense of equation (5.5). From that equation we can
reconstruct φ in terms of ψ, so we can just take ψ as the only unknown
in our problem.
That unknown cannot be just any function: if you take a general
function ψ, and compute φ by the first formula in (5.5), there is no
chance that the second formula will be satisfied. In fact this second
formula will hold true if and only if ψ is c-convex, in the sense of the
next definition (illustrated by Figure 5.2).
y0 y1 x2
x0 x1 y2
Fig. 5.2. A c-convex function is a function whose graph you can entirely caress
from below with a tool whose shape is the negative of the cost function (this shape
might vary with the point y). In the picture yi ∈ ∂c ψ(xi ).
∀x ∈ X ψ(x) = sup ζ(y) − c(x, y) . (5.6)
y∈Y
∂c ψ(x) = y ∈ Y; (x, y) ∈ ∂c ψ ,
or equivalently
open set Ω where it is finite, but lower semicontinuity might fail at the
boundary of Ω.) One can think of the cost function c(x, y) = −x · y
as basically the same as c(x, y) = |x − y|2 /2, since the “interaction”
between the positions x and y is the same for both costs.
Particular Case 5.4. If c = d is a distance on some metric space X ,
then a c-convex function is just a 1-Lipschitz function, and it is its
own c-transform. Indeed, if ψ is c-convex it is obviously 1-Lipschitz;
conversely, if ψ is 1-Lipschitz, then ψ(x) ≤ ψ(y) + d(x, y), so ψ(x) =
inf y [ψ(y) + d(x, y)] = ψ c (x). As an even more particular case, if
c(x, y) = 1x=y , then ψ is c-convex if and only if sup ψ − inf ψ ≤ 1,
and then again ψ c = ψ. (More generally, if c satisfies the triangle in-
equality c(x, z) ≤ c(x, y) + c(y, z), then ψ is c-convex if and only if
ψ(y) − ψ(x) ≤ c(x, y) for all x, y; and then ψ = ψ c .)
Remark 5.5. There is no measure theory in Definition 5.2, so no as-
sumption of measurability is made, and the supremum in (5.6) is a true
supremum, not just an essential supremum; the same for the infimum
in (5.7). If c is continuous, then a c-convex function is automatically
lower semicontinuous, and its subdifferential is closed; but if c is not
continuous the measurability of ψ and ∂c ψ is not a priori guaranteed.
Remark 5.6. I excluded the case when ψ ≡ +∞ so as to avoid trivial
situations; what I called a c-convex function might more properly (!)
be called a proper c-convex function. This automatically implies that
ζ in (5.6) does not take the value +∞ at all if c is real-valued. If
c does achieve infinite values, then the correct convention in (5.6) is
(+∞) − (+∞) = −∞.
If ψ is a function on X , then its c-transform is a function on Y.
Conversely, given a function on Y, one may define its c-transform as a
function on X . It will be convenient in the sequel to define the latter
concept by an infimum rather than a supremum. This convention has
the drawback of breaking the symmetry between the roles of X and Y,
but has other advantages that will be apparent later on.
Definition 5.7 (c-concavity). With the same notation as in Definition
5.2, a function φ : Y → R ∪ {−∞} is said to be c-concave if it is not
identically −∞, and there exists ψ : X → R ∪ {±∞} such that φ = ψ c .
Then its c-transform is the function φc defined by
∀x ∈ X φc (x) = sup φ(y) − c(x, y) ;
y∈Y
Kantorovich duality 57
then the choice x = x shows that φccc (x) ≤ φc (x); while the choice
y = y shows that φccc (x) ≥ φc (x).
If ψ is c-convex, then there is ζ such that ψ = ζ c , so ψ cc = ζ ccc =
c
ζ = ψ.
The converse is obvious: If ψ cc = ψ, then ψ is c-convex, as the
c-transform of ψ c .
Remark 5.9. Proposition 5.8 is a generalized version of the Legendre
duality in convex analysis (to recover the usual Legendre duality, take
c(x, y) = −x · y in Rn × Rn ).
Kantorovich duality
We are now ready to state and prove the main result in this chapter.
Theorem 5.10 (Kantorovich duality). Let (X , µ) and (Y, ν) be
two Polish probability spaces and let c : X × Y → R ∪ {+∞} be a lower
semicontinuous cost function, such that
∀(x, y) ∈ X × Y, c(x, y) ≥ a(x) + b(y)
for some real-valued upper semicontinuous functions a ∈ L1 (µ) and
b ∈ L1 (ν). Then
58 5 Cyclical monotonicity and Kantorovich duality
and in the above suprema one might as well impose that ψ be c-convex
and φ c-concave.
(ii) If c is real-valued and the optimal cost C(µ, ν) = inf π∈Π(µ,ν)
c dπ is finite, then there is a measurable c-cyclically monotone set
Γ ⊂ X × Y (closed if a, b, c are continuous) such that for any π ∈
Π(µ, ν) the following five statements are equivalent:
(a) π is optimal;
(b) π is c-cyclically monotone;
(c) There is a c-convex ψ such that, π-almost surely,
ψ c (y) − ψ(x) = c(x, y);
(d) There exist ψ : X → R ∪ {+∞} and φ : Y → R ∪ {−∞},
such that φ(y) − ψ(x) ≤ c(x, y) for all (x, y),
with equality π-almost surely;
(e) π is concentrated on Γ .
(iii) If c is real-valued, C(µ, ν) < +∞, and one has the pointwise
upper bound
c(x, y) ≤ cX (x) + cY (y), (cX , cY ) ∈ L1 (µ) × L1 (ν), (5.9)
then both the primal and dual Kantorovich problems have solutions, so
min c(x, y) dπ(x, y)
π∈Π(µ,ν) X ×Y
= max φ(y) dν(y) − ψ(x) dµ(x)
(ψ,φ)∈L1 (µ)×L1 (ν); φ−ψ≤c Y X
= max ψ (y) dν(y) −
c
ψ(x) dµ(x) ,
ψ∈L1 (µ) Y X
Kantorovich duality 59
Remark 5.12. Note the difference between statements (b) and (e):
The set Γ appearing in (ii)(e) is the same for all optimal π’s, it only
depends on µ and ν. This set is in general not unique. If c is contin-
uous and Γ is imposed to be closed, then one can define a smallest
Γ , which is the closure of the union of all the supports of the opti-
mal π’s. There is also a largest Γ , which is the intersection of all the
c-subdifferentials ∂c ψ, where ψ is such that there exists an optimal π
supported in ∂c ψ. (Since the cost function is assumed to be continuous,
the c-subdifferentials are closed, and so is their intersection.)
bread there and transporting it here is lowest among all possible bak-
eries. Similarly, given a pair of competitive prices (ψ, φ), if you can cook
up a transference plan π such that φ(y) − ψ(x) = c(x, y) throughout
the support of π, then you know that (ψ, φ) is a solution to the dual
Kantorovich problem.
Remark 5.15. If the variables x and y are swapped, then (µ, ν) should
be replaced by (ν, µ) and (ψ, φ) by (−φ, −ψ).
Particular Case 5.16. Particular Case 5.4 leads to the following vari-
ant of Theorem 5.10. When c(x, y) = d(x, y) is a distance on a Polish
space X , and µ, ν belong to P1 (X ), then
inf E d(X, Y ) = sup E [ψ(X) − ψ(Y )] = sup ψ dµ − ψ dν .
X Y
(5.11)
where the infimum on the left is over all couplings (X, Y ) of (µ, ν), and
the supremum on the right is over all 1-Lipschitz functions ψ. This is
the Kantorovich–Rubinstein formula; it holds true as soon as the
supremum in the left-hand side is finite, and it is very useful.
Rigorous proof of Theorem 5.10, Part (i). First I claim that it is suffi-
cient to treat the case when c is nonnegative. Indeed, let
c(x, y) := c(x, y) − a(x) − b(y) ≥ 0, Λ := a dµ + b dν ∈ R.
c real-valued =⇒
c real-valued
c lower semicontinuous =⇒
c lower semicontinuous
ψ ∈ L1 (µ) ⇐⇒ ψ ∈ L1 (µ); φ ∈ L1 (ν) ⇐⇒ φ ∈ L1 (ν);
∀π ∈ Π(µ, ν),
c dπ = c dπ − Λ;
Kantorovich duality 63
∀(ψ, φ) ∈ L1 (µ) × L1 (ν), φ dν − ψ dµ = φ dν − ψ dν − Λ;
ψ is c-convex ⇐⇒ ψ is
c-convex;
φ is c-concave ⇐⇒ φ is
c-concave;
ψ)
(φ, ψ) are c-conjugate ⇐⇒ (φ, are
c-conjugate;
Γ is c-cyclically monotone ⇐⇒ Γ is
c-cyclically monotone.
Thanks to these formulas, it is equivalent to establish Theorem 5.10
for the cost c or for the nonnegative cost
c. So in the sequel, I shall
assume, without further comment, that c is nonnegative.
The rest of the proof is divided into five steps.
Step 1: If µ = (1/n) ni=1 δxi , ν = (1/n) nj=1 δyj , where the costs
c(xi , yj ) are finite, then there is at least one cyclically monotone trans-
ference plan.
Indeed, in that particular case, a transference plan between µ and
ν can be identified with a bistochastic n × n array of real numbers
aij ∈ [0, 1]: each aij tells what proportion of the 1/n mass carried by
point xi will go to destination yj . So the Monge–Kantorovich problem
becomes
inf aij c(xi , yi )
(aij )
ij
Here we are minimizing a linear function on the compact set [0, 1]n×n ,
so obviously there exists a minimizer; the corresponding transference
plan π can be written as
1
π= aij δ(xi ,yj ) ,
n
ij
and its support S is the set of all couples (xi , yj ) such that aij > 0.
Assume that S is not cyclically monotone: Then there exist N ∈ N
and (xi1 , yj1 ), . . . , (xiN , yjN ) in S such that
c(xi1 , yj2 ) + c(xi2 , yj3 ) + . . . + c(xiN , yj1 ) < c(xi1 , yj1 ) + . . . + c(xiN , yjN ).
(5.15)
64 5 Cyclical monotonicity and Kantorovich duality
Let a := min(ai1 ,j1 , . . . , aiN ,jN ) > 0. Define a new transference plan π
by the formula
a
N
=π+
π δ(xi ,yj ) − δ(xi ,yj ) .
n +1
=1
It is easy to check that this has the correct marginals, and by (5.15) the
cost associated with π is strictly less than the cost associated with π.
This is a contradiction, so S is indeed c-cyclically monotone!
Step 2: If c is continuous, then there is a cyclically monotone trans-
ference plan.
To prove this, consider sequences of independent random variables
xi ∈ X , yj ∈ Y, with respective law µ, ν. According to the law of
large numbers for empirical measures (sometimes called fundamental
theorem of statistics, or Varadarajan’s theorem), one has, with proba-
bility 1,
1 1
n n
µn := δxi −→ µ, νn := δyj −→ ν (5.16)
n n
i=1 j=1
In the definition of ψ, it does not matter whether one takes the supre-
mum over m−1 or over m variables, since one also takes the supremum
over m. So the previous inequality can be recast as
In particular, ψ(x) + c(x, y) ≥ ψ(x) + c(x, y). Taking the infimum over
x ∈ X in the left-hand side, we deduce that
So
inf c dπ ≤ π ≤ lim sup
c d φk (y) dν(y) − ψk (x) dµ(x)
Π(µ,ν) k→∞
≤ inf c dπ.
Π(µ,ν)
Moreover,
φk (y) dν(y) − ψk (x) dµ(x) −−−→ inf c dπ. (5.20)
k→∞ Π(µ,ν)
Since each pair (ψk , φk ) lies in Cb (X ) × Cb (Y), the duality also holds
with bounded continuous (and even Lipschitz) test functions, as claimed
in Theorem 5.10(i).
Proof of Theorem 5.10, Part (ii). From now on, I shall assume that the
optimal transport cost C(µ, ν) is finite, and that c is real-valued. As
in the proof of Part (i) I shall assume that c is nonnegative, since the
general case can always be reduced to that particular case. Part (ii)
will be established in the following way: (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒
(a) ⇒ (e) ⇒ (b). There seems to be some redundancy in this chain of
implications, but this is because the implication (a) ⇒ (c) will be used
to construct the set Γ appearing in (e).
(a) ⇒ (b): Let π be an optimal plan, and let (φk , ψk )k∈N be as in
Step 5 of the proof of Part (i). Since the optimal transport cost is finite
by assumption, the cost function c belongs to L1 (π). From (5.20) and
the marginal property of π,
c(x, y) − φk (y) + ψk (x) dπ(x, y) −−−→ 0,
k→∞
N
N
c(xi , yi+1 ) ≥ c(xi , yi ). (5.21)
i=1 i=1
for each k ∈ N there is a Borel set Ek with π[Ek ] ≤ 1/k, such that
the convergence of c to c is uniform on Γ \ Ek . Since π (just as any
probability measure on a Polish space) is regular, we can find a compact
set Γk ⊂ Γ \Ek , such that π[Γk ] ≥ 1−2/k. There is no loss of generality
in assuming that the sets Γk are increasing in k.
On each Γk , the sequence (c ) converges uniformly and monotoni-
cally to c; in particular c is continuous on Γk . Furthermore, since π is
obviously concentrated on the union of all Γk , there is no loss of general-
ity in assuming that Γ = ∪Γk . We may also assume that (x0 , y0 ) ∈ Γ1 .
Now, let x be given in X , and for each k, , m, let
Fm,k, x0 , y0 , . . . , xm , ym := c(x0 , y0 ) − c (x1 , y0 )
+ c(x1 , y1 ) − c (x2 , y1 ) + · · · + c(xm , ym ) − c (x, ym ) ,
where
Fm,k x0 , y0 , . . . , xm , ym := c(x0 , y0 ) − c(x1 , y0 )
+ c(x1 , y1 ) − c(x2 , y1 ) + · · · + c(xm , ym ) − c(x, ym ) .
then we have
lim ψm,k, (x) = sup c(x0 , y0 ) − c(x1 , y0 ) + c(x1 , y1 ) − c(x2 , y1 )
→∞
and
ξn dπ −−−→ ξ dπ = 0.
ξ<0 n→∞ ξ<0
so π is optimal.
Before completing the chain of equivalences, we should first con-
struct the set Γ . By Theorem 4.1, there is at least one optimal trans-
ference plan, say π. From the implication (a) ⇒ (c), there is some ψ
such that π just choose Γ := ∂c ψ.
is concentrated on ∂c ψ;
(a) ⇒ (e): Let π be the optimal plan used to construct Γ , and let
ψ = ψ be the associated c-convex function; let φ = ψ c . Then let π
be another optimal plan. Since π and π have the same cost and same
marginals,
c dπ = c d π = lim (Tn φ − Tn ψ) dπ
n→∞
= lim (Tn φ − Tn ψ) dπ,
n→∞
where Tn is the truncation operator that was already used in the proof
of (d) ⇒ (a). So
c(x, y) − Tn φ(y) + Tn ψ(x) dπ(x, y) −−−→ 0. (5.25)
n→∞
Proof of Theorem 5.10, Part (iii). As in the proof of Part (i) we may
assume that c ≥ 0. Let π be optimal, and let ψ be a c-convex func-
tion such that π is concentrated on ∂c ψ. Let φ := ψ c . The goal is to
74 5 Cyclical monotonicity and Kantorovich duality
≥ φ(y0 ) − cY (y0 );
φ(y) − cY (y) = inf ψ(x) + c(x, y) − cY (y) ≤ inf [ψ(x) + cX (x)]
x x
≤ ψ(x0 ) + cX (x0 ).
and it results from Part (i) of the theorem that both π and (ψ, φ) are
optimal, respectively in the original and the dual Kantorovich prob-
lems.
To prove the last part of (iii), assume that c is continuous; then
the subdifferential of any c-convex function is a closed c-cyclically
monotone set.
Let π be an arbitrary optimal transference plan, and (ψ, φ) an op-
timal pair of prices. We know that (ψ, ψ c ) is optimal in the dual Kan-
torovich problem, so
c(x, y) dπ(x, y) = ψ c dν − ψ dµ.
Restriction property
The dual side of the Kantorovich problem also behaves well with respect
to restriction, as shown by the following results.
Application: Stability
then the optimal total transport cost C(µ, ν) between µ and ν is finite,
and π is an optimal transference plan.
Since this is a closed set, the same is true for π ⊗N , and then by letting
ε → 0 we see that π ⊗N is concentrated on the set C(N ) defined by
c(xi , yi ) ≤ c(xi , yi+1 ).
1≤i≤N 1≤i≤N
Theorem 5.20 also admits the following corollary about the stability
of transport maps.
stand for the value of the optimal transport cost of transport between
µ and ν.
If ν is a given reference measure, inequalities of the form
∀µ ∈ P (X ), C(µ, ν) ≤ F (µ)
Remark 5.27. The writing in (ii) or (ii’) is not very rigorous since
Λ is a priori defined on the set of bounded continuous functions, and
φc might not belong to that set. (It is clear that φc is bounded from
above, but this is all that can be said.) However, from (5.28) Λ(ϕ) is
a nondecreasing function of ϕ, so there is in practice no problem to
extend Λ to a more general class of measurable functions. In any case,
the correct way to interpret the left-hand side in (ii) is
Λ φ dν − φ = sup Λ
c
φ dν − ψ ,
Y ψ≥φc Y
Remark 5.28. One may simplify (ii’) by taking the supremum over t;
since Λ is nonincreasing, the result is
Λ Φ φ dν − φc
≤ 0. (5.29)
Y
∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)
and
c
∀φ ∈ Cb (X ), e φ dν
≤ eφ dν
are equivalent.
Proof of Theorem 5.26. First assume that (i) is satisfied. Then for all
ψ ≥ φc ,
Λ φ dν − ψ = sup φ dν − ψ dµ − F (µ)
Y µ∈P (X ) X Y
= sup φ dν − ψ dµ − F (µ)
µ∈P (X ) Y X
≤ sup C(µ, ν) − F (µ) ≤ 0,
µ∈P (X )
where the easiest part of Theorem 5.10 (that is, inequality (5.4)) was
used to go from the next-to-last line to the last one. Then (ii) follows
upon taking the supremum over ψ.
Conversely, assume that (ii) is satisfied. Then, for any pair (ψ, φ) ∈
Cb (X ) × Cb (Y) one has, by (5.28),
φ dν− ψ dµ = φ dν − ψ dµ ≤ Λ φ dν − ψ + F (µ).
Y X X Y Y
Then (i) follows upon taking the supremum over φ ∈ Cb (Y) and apply-
ing Theorem 5.10 (i).
Now let us consider the equivalence between (i’) and (ii’). By as-
sumption, Φ(r) ≤ 0 for r ≤ 0, so Φ∗ (t) = supr [r t − Φ(r)] = +∞ if
t < 0. Then the Legendre inversion formula says that
∀r ∈ R, Φ(r) = sup r t − Φ∗ (t) .
t∈R+
(The important thing is that the supremum is over R+ and not R.)
If (i’) is satisfied, then for all φ ∈ Cb (X ), for all ψ ≥ φc and for all
t ∈ R+ ,
∗
Λ t φ dν − t ψ − Φ (t)
Y
% &
= sup t φ dν − t ψ − Φ∗ (t) dµ − F (µ)
µ∈P (X ) X Y
% &
∗
= sup t φ dν − ψ dµ − Φ (t) − F (µ)
µ∈P (X ) Y X
≤ sup t C(µ, ν) − Φ∗ (t) − F (µ)
µ∈P (X )
≤ sup Φ(C(µ, ν)) − F (µ) ≤ 0,
µ∈P (X )
If is given, then for any x lying in the compact set J := projX (K )
we can define T (x) as the unique y such that (x, y) ∈ K . Then we
can define T on ∪J by stipulating that for each , T restricts to T
on J . The map T is continuous on J : Indeed, if xm ∈ T and xm → x,
then the sequence (xm , T (xm ))m∈N is valued in the compact set K ,
so up to extraction it converges to some (x, y) ∈ K , and necessarily
y = T (x). So T is a Borel map. Even if it is not defined on the whole
of Γ \ (Z × Y), it is still defined on a set of full µ-measure, so the proof
can be concluded just as before.
Bibliographical notes
There are many ways to state the Kantorovich duality, and even more
ways to prove it. There are also several economic interpretations, that
belong to folklore. The one which I formulated in this chapter is a
variant of one that I learnt from Caffarelli. Related economic inter-
pretations underlie some algorithms, such as the fascinating “auction
algorithm” developed by Bertsekas (see [115, Chapter 7], or the vari-
ous surveys written by Bertsekas on the subject). But also many more
classical algorithms are based on the Kantorovich duality [743].
A common convention consists in taking the pair (−ψ, φ) as the un-
known.3 This has the advantage of making some formulas more sym-
metric: The c-transform becomes ϕc (y) = inf x [c(x, y) − ϕ(x)], and then
ψ c (x) = inf y [c(x, y) − ψ(y)], so this is the same formula going back and
forth between functions of x and functions of y, upon exchange of x
and y. Since X and Y may have nothing in common, in general this
symmetry is essentially cosmetic. The conventions used in this chap-
ter lead to a somewhat natural “economic” interpretation, and will
also lend themselves better to a time-dependent treatment. Moreover,
they also agree with the conventions used in weak KAM theory, and
more generally in the theory of dynamical systems [105, 106, 347]. It
might be good to make the link more explicit. In weak KAM theory,
X = Y is a Riemannian manifold M ; a Lagrangian cost function is
given on the tangent bundle TM ; and c = c(x, y) is a continuous cost
function defined from the dynamics, as the minimum action that one
3
The latter pair was denoted (ϕ, ψ) in [814, Chapter 1], which will probably upset
the reader.
86 5 Cyclical monotonicity and Kantorovich duality
4
De Acosta used an idea suggested by Dudley in Saint-Flour, 25 years before my
own course!
Bibliographical notes 87
715], Fernique [356], Szulga [769], Kellerer [512], Feyel [357], and prob-
ably others, contributed to the problem.
Modern treatments most often use variants of the Hahn–Banach
theorem, see for instance [550, 696, 814]. The proof presented in [814,
Theorem 1] first proves the duality when X , Y are compact, then treats
the general case by an approximation argument; this is somewhat tricky
but has the advantage of avoiding the general version of the axiom of
choice, since it uses the Hahn–Banach theorem only in the separable
space C(K), where K is compact.
Mikami [629] recovered the duality theorem in Rn using stochastic
control, and together with Thieullen [631] extended it to certain classes
of stochastic processes.
Ramachandran and Rüschendorf [698, 699] investigated the
Kantorovich duality out of the setting of Polish spaces, and found out
a necessary and sufficient condition for its validity (the spaces should
be “perfect”).
In the case when the cost function is a distance, the optimal trans-
port problem coincides with the Kantorovich transshipment problem,
for which more duality theorems are available, and a vast literature
has been written; see [696, Chapter 6] for results and references. This
topic is closely related to the subject of “Kantorovich norms”: see [464],
[695, Chapters 5 and 6], [450, Chapter 4], [149] and the many references
therein.
Around the mid-eighties, it was understood that the study of
the dual problem, and in particular the existence of a maximizer,
could lead to precious qualitative information about the solution of
the Monge–Kantorovich problem. This point of view was emphasized
by many authors such as Knott and Smith [524], Cuesta-Albertos,
Matrán and Tuero-Dı́az [254, 255, 259], Brenier [154, 156], Rachev and
Rüschendorf [696, 722], Abdellaoui and Heinich [1, 2], Gangbo [395],
Gangbo and McCann [398, 399], McCann [616] and probably others.
Then Ambrosio and Pratelli proved the existence of a maximizing pair
under the conditions (5.10); see [32, Theorem 3.2]. Under adequate as-
sumptions, one can also prove the existence of a maximizer for the dual
problem by direct arguments which do not use the original problem (see
for instance [814, Chapter 2]).
The classical theory of convexity and its links with the prop-
erty of cyclical monotonicity are exposed in the well-known treatise
by Rockafellar [705]. The more general notions of c-convexity and
88 5 Cyclical monotonicity and Kantorovich duality
The use of the law of large numbers for empirical measures might be
natural for a probabilistic audience, but one should not forget that this
is a subtle result: For any bounded continuous test function, the usual
law of large numbers yields convergence out of a negligible set, but then
one has to find a negligible set that works for all bounded continuous
test functions. In a compact space X this is easy, because Cb (X ) is
separable, but if X is not compact one should be careful. Dudley [318,
Theorem 11.4.1] proves the law of large numbers for empirical measures
on general separable metric spaces, giving credit to Varadarajan for
this theorem. In the dynamical systems community, these results are
known as part of the so-called Krylov–Bogoljubov theory, in relation
with ergodic theorems; see e.g. Oxtoby [674] for a compact space.
The equivalence between the properties of optimality (of a transfer-
ence plan) and cyclical monotonicity, for quite general cost functions
and probability measures, was a widely open problem until recently; it
was explicitly listed as Open Problem 2.25 in [814] for a quadratic cost
function in Rn . The current state of the art is as follows:
• the equivalence is false for a general lower semicontinuous cost func-
tion with possibly infinite values, as shown by a clever counterex-
ample of Ambrosio and Pratelli [32];
• the equivalence is true for a continuous cost function with possibly
infinite values, as shown by Pratelli [688];
• the equivalence is true for a real-valued lower semicontinuous cost
function, as shown by Schachermayer and Teichmann [738]; this
is the result that I chose to present in this course. Actually it is
sufficient for the cost to be lower semicontinuous and real-valued
almost everywhere (with respect to the product measure µ ⊗ ν).
• more generally, the equivalence is true as soon as c is measurable
(not even lower semicontinuous!) and {c = ∞} is the union of a
closed set and a (µ ⊗ ν)-negligible Borel set; this result is due to
Beiglböck, Goldstern, Maresch and Schachermayer [79].
Schachermayer and Teichmann gave a nice interpretation of the
Ambrosio–Pratelli counterexample and suggested that the correct no-
tion in the whole business is not cyclical monotonicity, but a variant
which they named “strong cyclical monotonicity condition” [738].
As I am completing these notes, it seems that the final resolution
of this equivalence issue might soon be available, but at the price of
a journey into the very guts of measure theory. The following con-
struction was explained to me by Bianchini. Let c be an arbitrary
90 5 Cyclical monotonicity and Kantorovich duality
maximizing pair. Yet this is true for instance in the case of optimal
transport in Wiener space; this is an indication that condition (5.10)
might not be the “correct” one, although at present no better general
condition is known.
Lemma 5.18 and Theorem 5.19 were inspired by a recent work of
Fathi and Figalli [348], in which a restriction procedure is used to solve
the Monge problem for certain cost functions arising from Lagrangian
dynamics in unbounded space; see Theorem 10.28 for more information.
The core argument in the proof of Theorem 5.20 has probably been
discovered several times; I learnt it in [30, Proposition 7.13]. At the
present level of generality this statement is due to Schachermayer and
Teichmann [738].
I learnt Corollary 5.23 from Ambrosio and Pratelli [32], for mea-
sures on Euclidean space. The general statement presented here, and
its simple proof, were suggested to me by Schulte, together with the
counterexample in Remark 5.25. In some situations where more struc-
ture is available and optimal transport is smooth, one can refine the
convergence in probability into true pointwise convergence [822]. Even
when that information is not available, under some circumstances one
can obtain the uniform convergence of c(x, Tk (x)): Theorem 28.9(v)
is an illustration (this principle also appears in the tedious proof of
Theorem 23.14).
Measurable selection theorems in the style of Corollary 5.22 go back
at least to Berbee [98]. Recently, Fontbona, Guérin and Méléard [374]
studied the measurability of the map which to (µ, ν) associates the
union of the supports of all optimal transports between µ and ν (mea-
surability in the sense of set-valued functions). This question was mo-
tivated by applications in particle systems and coupling of stochastic
differential equations.
Theorem 5.26 appears in a more or less explicit form in various
works, especially for the particular case described in Example 5.29;
see in particular [128, Section 3]. About Legendre duality for convex
functions in R, one may consult [44, Chapter 14]. The classical reference
textbook about Legendre duality for plainly convex functions in Rn
is [705]. An excellent introduction to the Legendre duality in Banach
spaces can be found in [172, Section I.3].
Finally, I shall say a few words about some basic measure-theoretical
tools used in this chapter.
92 5 Cyclical monotonicity and Kantorovich duality
Assume, as before, that you are in charge of the transport of goods be-
tween producers and consumers, whose respective spatial distributions
are modeled by probability measures. The farther producers and con-
sumers are from each other, the more difficult will be your job, and you
would like to summarize the degree of difficulty with just one quantity.
For that purpose it is natural to consider, as in (5.27), the optimal
transport cost between the two measures:
C(µ, ν) = inf c(x, y) dπ(x, y), (6.1)
π∈Π(µ,ν)
where c(x, y) is the cost for transporting one unit of mass from x to
y. Here we do not care about the shape of the optimizer but only the
value of this optimal cost.
One can think of (6.1) as a kind of distance between µ and ν, but in
general it does not, strictly speaking, satisfy the axioms of a distance
function. However, when the cost is defined in terms of a distance, it
is easy to cook up a distance from (6.1):
Example 6.3. Wp (δx , δy ) = d(x, y). In this example, the distance does
not depend on p; but this is not the rule.
1 1
Wp (µ1 , µ3 ) ≤ E d(X1 , X3 )p p p p
≤ E d(X1 , X2 ) + d(X2 , X3 )
1 1
p p p p
≤ E d(X1 , X2 ) + E d(X2 , X3 )
= Wp (µ1 , µ2 ) + Wp (µ2 , µ3 ),
Remark 6.5. Theorem 5.10(i) and Particular Case 5.4 together lead
to the useful duality formula for the Kantorovich–Rubinstein
distance: For any µ, ν in P1 (X ),
W1 (µ, ν) = sup ψ dµ − ψ dν . (6.3)
ψLip ≤1 X X
Among many applications of this formula I shall just mention the fol-
lowing covariance inequality: if f is a probability density with respect
to µ then
f dµ g dµ − (f g) dµ ≤ gLip W1 (f µ, µ).
p ≤ q =⇒ Wp ≤ Wq . (6.4)
To prove Theorem 6.9 I shall use the following lemma, which has
interest on its own and will be useful again later.
The proof is not so obvious and one might skip it at first reading.
remains bounded as k → ∞.
Since Wp ≥ W1 , the sequence (µk ) is also Cauchy in the W1 sense.
Let ε > 0 be given, and let N ∈ N be such that
ε2
µk [Uε ] ≥ 1 − ε − = 1 − 2ε.
ε
At this point we have shown the following: For each ε > 0 there
is a finite family (xi )1≤i≤m such that all measures µk give mass at
least 1 − 2ε to the set Z := ∪B(xi , 2ε). The point is that Z might not
be compact. There is a classical remedy: Repeat the reasoning with ε
replaced by 2−(+1) ε, ∈ N; so there will be (xi )1≤i≤m() such that
% - &
µk X \ B(xi , 2 ε) ≤ 2− ε.
−
1≤i≤m()
Thus
µk [X \ S] ≤ ε,
where . -
S := B(xi , ε2−p ).
1≤p≤∞ 1≤i≤m(p)
µ, µ) ≤ lim
Wp (
inf Wp (µk , µ) = 0.
k →∞
(a + b)p ≤ (1 + ε) ap + Cε bp .
But of course,
d(x, y)p dπk (x, y) = Wp (µk , µ)p −−−→ 0;
k→∞
therefore,
lim sup d(x0 , x) dµk (x) ≤ (1 + ε)
p
d(x0 , x)p dµ(x).
k→∞
1d(x,y)≥R
≤ 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2] + 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2] .
So, obviously
d(x, y)p − Rp + ≤ d(x, y)p 1[d(x,x0 )≥R/2 and d(x,x0 )≥d(x,y)/2]
+ d(x, y)p 1[d(x0 ,y)≥R/2 and d(x0 ,y)≥d(x,y)/2]
≤ 2p d(x, x0 )p 1d(x,x0 )≥R/2 + 2p d(x0 , y)p 1d(x0 ,y)≥R/2 .
It follows that
Wp (µk , µ)p = d(x, y)p dπk (x, y)
p
= d(x, y) ∧ R dπk (x, y) + d(x, y)p − Rp + dπk (x, y)
p
≤ d(x, y) ∧ R dπk (x, y) + 2 p
d(x, x0 )p dπk (x, y)
d(x,x0 )≥R/2
+ 2p d(x0 , y)p dπk (x, y)
d(x0 ,y)>R/2
p
= d(x, y) ∧ R dπk (x, y) + 2 p
d(x, x0 )p dµk (x)
d(x,x0 )≥R/2
p
+2 d(x0 , y)p dµ(y).
d(x0 ,y)≥R/2
Control by total variation 103
where the infimum is over all couplings (X, Y ) of (µ, ν); this identity
can be seen as a very particular case of Kantorovich duality for the cost
function 1x=y .
It seems natural that a control in Wasserstein distance should be
weaker than a control in total variation. This is not completely true,
because total variation does not take into account large distances. But
one can control Wp by weighted total variation:
Remark 6.17. The integral in the right-hand side of (6.12) can be in-
terpreted as the Wasserstein distance W1 for the particular cost func-
tion [d(x0 , x) + d(x0 , y)]1x=y .
104 6 The Wasserstein distances
1
= d(x, y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
2p−1
≤ d(x, x0 )p + d(x0 , y)p d(µ − ν)+ (x) d(µ − ν)− (y)
a
% &
≤2 p−1 p p
d(x, x0 ) d(µ−ν)+ (x)+ d(x0 , y) d(µ−ν)− (y)
= 2p−1 d(x, x0 )p d (µ − ν)+ + (µ − ν)− (x)
=2 p−1
d(x, x0 )p d|µ − ν|(x).
Proof of Theorem 6.18. The fact that Pp (X ) is a metric space was al-
ready explained, so let us turn to the proof of separability. Let D be
a dense sequence in X , and let P be the space of probability measures
that can be written aj δxj , where the aj are rational coefficients, and
the xj are finitely many elements in D. It will turn out that P is dense
in Pp (X ).
To prove this, let ε > 0 be given, and let x0 be an arbitrary element
of D. If µ lies in Pp (X ), then there exists a compact set K ⊂ X such
that
d(x0 , x)p dµ(x) ≤ εp .
X \K
µ, Wp (µ, f# µ) ≤ 2εp .
Since (Id , f ) is a coupling of µ and f#
Of course, f# µ can be written as aj δxj , 0 ≤ j ≤ N . This shows
that µ might be approximated, with arbitrary precision, by a finite
combination of Dirac masses. To conclude, it is sufficient to show that
the coefficients aj might be replaced by rational coefficients, up to a
very small error in Wasserstein distance. By Theorem 6.15,
106 6 The Wasserstein distances
⎛ ⎞
% &
1 1
Wp ⎝ aj δxj , bj δxj ⎠ ≤ 2 p max d(xk , x ) |aj − bj | p ,
k,
j≤N j≤N j≤N
and obviously the latter quantity can be made as small as possible for
some well-chosen rational coefficients bj .
Finally, let us prove the completeness. Again let (µk )k∈N be a
Cauchy sequence in Pp (X ). By Lemma 6.14, it admits a subsequence
(µk ) which converges weakly (in the usual sense) to some measure µ.
Then,
d(x0 , x)p dµ(x) ≤ lim
inf d(x0 , x)p dµk (x) < +∞,
k →∞
so in particular
which means that µ converges to µ in the Wp sense (and not just in
the sense of weak convergence). Since (µk ) is a Cauchy sequence with
a converging subsequence, it follows by a classical argument that the
whole sequence is converging.
Bibliographical notes
Wasserstein distances are also useful in the study of mixing and con-
vergence for Markov chains; the original idea, based on a contraction
property, seems to be due to Dobrushin [310], and has been redis-
covered since then by various authors [231, 662, 679]. Tanaka proved
that the W2 distance is contracting along solutions of a certain class
of Boltzmann equations [776, 777]; these results are reviewed in [814,
Section 7.5] and have been generalized in [138, 214, 379, 590].
Wasserstein distances behave well with increasing dimension, and
therefore have been successfully used in large or infinite dimension; for
instance for the large-time behavior of stochastic partial differential
equations [455, 458, 533, 605], or hydrodynamic limits of systems of
particles [444].
In a Riemannian context, the W2 distance is well-adapted to the
study of Ricci curvature, in relation with diffusion equations; these
themes will be considered again in Part II.
Here is a short list of some more surprising applications. Werner [836]
suggested that the W1 distance is well adapted to quantify some
variants of the uncertainty principle in quantum physics. In a re-
cent note, Melleray, Petrov and Vershik [625] use the properties of
the Kantorovich–Rubinstein norm to study spaces which are “linearly
rigid”, in the sense that, roughly speaking, there is only one way to
embed them in a Banach space. The beautiful text [809] by Vershik
reviews further applications of the Kantorovich–Rubinstein distance to
several original topics (towers of measures, Bernoulli automorphisms,
classification of metric spaces); see also [808] and the older contribu-
tion [807] by the same author.
7
Displacement interpolation
As in the previous chapter I shall assume that the initial and final
probability measures are defined on the same Polish space (X , d). The
main additional structure assumption is that the cost is associated with
an action, which is a way to measure the cost of displacement along
a continuous curve, defined on a given time-interval, say [0, 1]. So the
cost function between an initial point x and a final point y is obtained
by minimizing the action among paths that go from x to y:
where γ̇t stands for the velocity (or time-derivative) of the curve γ at
time t. More generally, an action is classically given by the time-integral
of a Lagrangian along the path:
1
A(γ) = L(γt , γ̇t , t) dt. (7.2)
0
|v|2
L(x, v, t) = − V (x) (7.3)
2
where V is a potential; but much more complicated forms are admis-
sible. When V is continuously differentiable, it is a simple particular
Deterministic interpolation via action-minimizing curves 115
d2 x
= −∇V (x). (7.4)
dt2
To make sure that A(γ) is well-defined, it is natural to assume that
the path γ is continuously differentiable, or piecewise continuously dif-
ferentiable, or at least almost everywhere differentiable as a function
of t. A classical and general setting is that of absolutely continuous
curves: By definition, if (X , d) is a metric space, a continuous curve
γ : [0, 1] → X is said to be absolutely continuous if there exists a func-
tion ∈ L1 ([0, 1]; dt) such that for all intermediate times t0 < t1 in
[0, 1],
t1
d(γt0 , γt1 ) ≤ (t) dt. (7.5)
t0
More generally, it is said to be absolutely continuous of order p if for-
mula (7.5) holds with some ∈ Lp ([0, 1]; dt).
If γ is absolutely continuous, then the function t −→ d(γt0 , γt ) is
differentiable almost everywhere, and its derivative is integrable. But
the converse is false: for instance, if γ is the “Devil’s staircase”, en-
countered in measure theory textbooks (a nonconstant function whose
distributional derivative is concentrated on the Cantor set in [0, 1]),
then γ is differentiable almost everywhere, and γ̇(t) = 0 for almost
every t, even though γ is not constant! This motivates the “integral”
definition of absolute continuity based on formula (7.5).
If X is Rn , or a smooth differentiable manifold, then absolutely
continuous paths are differentiable for Lebesgue-almost all t ∈ [0, 1]; in
physical words, the velocity is well-defined for almost all times.
Before going further, here are some simple and important examples.
For all of them, the class of admissible curves is the space of absolutely
continuous curves.
Remark 7.3. This example shows that very different Lagrangians can
have the same minimizing curves.
which is called the Hamiltonian; then one can recast (7.6) in terms
of a Hamiltonian system, and access to the rich mathematical world
118 7 Displacement interpolation
This looks general enough, however there are interesting cases where
X does not have enough differentiable structure for the velocity vector
to be well-defined (tangent spaces might not exist, for lack of smooth-
ness). In such a case, it is still possible to define the speed along the
curve:
d(γt , γt+ε )
|γ̇t | := lim sup . (7.8)
ε→0 |ε|
This generalizes the natural notion of speed, which is the norm of the
velocity vector. Thus it makes perfect sense to write a Lagrangian of the
Deterministic interpolation via action-minimizing curves 119
Then minimizing curves are called geodesics. They may have variable
speed, but, just as on a Riemannian manifold, one can always repara-
metrize them (that is, replace γ by γ where γt = γs(t) , with s continuous
increasing) in such a way that they have constant speed. In that case
d(γs , γt ) = |t − s| L(γ) for all s, t ∈ [0, 1].
The functional A = Ati ,tf will just be called the action, and the
cost function c = cti ,tf the cost associated with the action. A curve
γ : [ti , tf ] → X is said to be action-minimizing if it minimizes A among
all curves having the same endpoints.
Examples 7.12. (i)Torecover(7.2)asaparticularcaseofDefinition7.11,
just set t
As,t (γ) = L(γτ , γ̇τ , τ ) dτ. (7.13)
s
(ii) A length space is a space in which As,t (γ) = L(γ) (here L is the
length) defines a Lagrangian action.
If [ti , tf ] ⊂ [ti , tf ] then it is clear that (A)ti ,tf induces an action
(A)ti ,tf on the time-interval [ti , tf ], just by restriction.
In the rest of this section I shall take (ti , tf ) = (0, 1), just for simplic-
ity; of course one can always reduce to this case by reparametrization.
It will now be useful to introduce further assumptions about exis-
tence and compactness of minimizing curves.
Definition 7.13 (Coercive action). Let (A)0,1 be a Lagrangian ac-
tion on a Polish space X , with associated cost functions (cs,t )0≤s<t≤1 .
For any two times s, t (0 ≤ s < t ≤ 1), and any two compact sets
s,t
Ks , Kt ⊂ X , let ΓK s →Kt
be the set of minimizing paths starting in
Ks at time s, and ending in Kt at time t. The action will be called
coercive if:
(i) It is bounded below, in the sense that
inf inf As,t (γ) > −∞;
s<t γ
(ii) If s < t are any two intermediate times, and Ks , Kt are any
two nonempty compact sets such that cs,t (x, y) < +∞ for all x ∈ Ks ,
s,t
y ∈ Kt , then the set ΓK s →Kt
is compact and nonempty.
In particular, minimizing curves between any two fixed points x, y
with c(x, y) < +∞ should always exist and form a compact set.
Remark 7.14. If each As,t has compact sub-level sets (more explicitly,
if {γ; As,t (γ) ≤ A} is compact in C([s, t]; X ) for any A ∈ R), then the
lower semicontinuity of As,t , together with a standard compactness
argument (just as in Theorem 4.1) imply the existence of at least one
action-minimizing curve among the set of curves that have prescribed
final and initial points. In that case the requirement of nonemptiness
in (ii) is fulfilled.
Abstract Lagrangian action 123
ct1 ,t3 (γt1 , γt3 ) = ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ). (7.15)
(v) If the cost functions cs,t are continuous, then the set Γ of all
action-minimizing curves is closed in the topology of uniform conver-
gence;
(vi) For all times s < t, there exists a Borel map Ss→t : X × X →
s,t
C([s, t]; X ), such that for all x, y ∈ X , S(x, y) belongs to Γx→y . In
words, there is a measurable recipe to join any two endpoints x and y
by a minimizing curve γ : [s, t] → X .
By point (iii) in Definition 7.11, it follows that c0,1 (γ0 , γ1 ) = A0,1 (γ),
which proves (iv).
If 0 ≤ t1 < t2 < t3 ≤ 1, now let Γ (t1 , t2 , t3 ) stand for the set
of all curves satisfying (7.15). If all functions cs,t are continuous, then
Γ (t1 , t2 , t3 ) is closed for the topology of uniform convergence. Then Γ is
the intersection of all Γ (t1 , t2 , t3 ), so it is closed also; this proves state-
ment (v). (Now there is a similarity with the proof of Theorem 5.20.)
For given times s < t, let Γ s,t be the set of all action-minimizing
curves defined on [s, t], and let Es,t be the “endpoints” mapping, defined
on Γ s,t by γ −→ (γs , γt ). By assumption, any two points are joined by
at least one minimizing curve, so Es,t is onto X × X . It is clear that
Es,t is a continuous map between Polish spaces, and by assumption
−1
Es,t (x, y) is compact for all x, y. It follows by general theorems of
measurable selection (see the bibliographical notes in case of need)
that Es,t admits a measurable right-inverse Ss→t , i.e. Es,t ◦ Ss→t = Id .
This proves statement (vi).
π0,1 := (e0 , e1 )# Π
The next theorem is the main result of this chapter. It shows that
the law at time t of a dynamical optimal coupling can be seen as a
minimizing path in the space of probability measures. In the important
case when the cost is a power of a geodesic distance, the corollary stated
right after the theorem shows that displacement interpolation can be
thought of as a geodesic path in the space of probability measures.
(“A geodesic in the space of laws is the law of a geodesic.”) The theorem
also shows that such interpolations can be constructed under quite weak
assumptions.
Interpolation of random variables 127
where the last infimum is over all random curves γ : [s, t] → X such
that law (γτ ) = µτ (s ≤ τ ≤ t).
In that case (µt )0≤t≤1 is said to be a displacement interpolation be-
tween µ0 and µ1 . There always exists at least one such curve.
Finally, if K0 and K1 are two compact subsets of P (X ), such that
C (µ0 , µ1 ) < +∞ for all µ0 ∈ K0 , µ1 ∈ K1 , then the set of dynamical
0,1
Moreover, if µ0 and µ1 are given, there exists at least one such curve.
More generally, if K0 ⊂ Pp (X ) and K1 ⊂ Pp (X ) are compact subsets of
P (X ), then the set of geodesic curves (µt )0≤t≤1 such that µ0 ∈ K0 and
µ1 ∈ K1 is compact and nonempty; and also the set of dynamical opti-
mal transference plans Π with (e0 )# Π ∈ K0 , (e1 )# Π ∈ K1 is compact
and nonempty.
plings together, I can actually assume that (γ )1/2 = γ1/2 , so that I have
(1) (1)
Now the key observation is that if (γt1 , γt2 ) and (γt2 , γt3 ) are optimal
couplings of (µt1 , µt2 ) and (µt2 , µt3 ) respectively, and the µtk satisfy the
equality appearing in (ii), then also (γt1 , γt3 ) should be optimal. Indeed,
by taking expectation in the inequality
ct1 ,t3 (γt1 , γt3 ) ≤ ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 )
E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).
so actually
E ct1 ,t3 (γt1 , γt3 ) ≤ C t1 ,t3 (µt1 , µt3 ),
which means that indeed (γt1 , γt3 ) is an optimal coupling of (µt1 , µt3 )
for the cost ct1 ,t3 .
(1) (1)
So (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ). Now we can proceed
in the same manner and define, for each k, a random discrete path
(k) (k) (k)
(γj 2−k ) such that (γs , γt ) is an optimal coupling for all times s, t of
the form j/2k . These are only discrete paths, but it is possible to extend
(k)
them into paths (γt )0≤t≤1 that are minimizers of the action. Of course,
(k)
if t is not of the form j/2k , there is no reason why law (γt ) would
coincide with µt . But hopefully we can pass to the limit as k → ∞, for
each dyadic time, and conclude by a density argument.
Then Π defines the law of a random geodesic γ, and the identity E0,1 ◦
S0→1 = Id implies that the endpoints of γ are distributed according
to π. This proves the existence of a path satisfying (i). Now the main
part of the proof consists in checking the equivalence of properties (i)
and (ii). This will be performed in four steps.
Step 1. Let (µt )0≤t≤1 be any continuous curve in the space of proba-
bility measures, and let t1 , t2 , t3 be three intermediate times. Let πt1 →t2
be an optimal transference plan between µt1 and µt2 for the transport
cost ct1 ,t2 , and similarly let πt2 →t3 be an optimal transference plan be-
tween µt2 and µt3 for the transport cost ct2 ,t3 . By the Gluing Lemma
of Chapter 1 one can construct random variables (γt1 , γt2 , γt3 ) such
that law (γt1 , γt2 ) = πt1 →t2 and law (γt2 , γt3 ) = πt2 →t3 (in particular,
law (γti ) = µti for i = 1, 2, 3). Then, by (7.14),
C t1 ,t3 (µt1 , µt3 ) ≤ E ct1 ,t3 (γt1 , γt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ).
C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) ≤ E ct1 ,t2 (γt1 , γt2 ) + E ct2 ,t3 (γt2 , γt3 )
= E At1 ,t2 (γ) + E At2 ,t3 (γ) = E At1 ,t3 (γ) = E ct1 ,t3 (γt1 , γt3 ). (7.18)
This together with Step 1 proves that (µt ) satisfies Property (ii). So
far we have proven (i) ⇒ (ii).
Step 3. Assume that (µt ) satisfies Property (ii); then we can per-
form again the same computation as in Step 1, but now all the in-
equalities have to be equalities. This implies that the random variables
(γt1 , γt2 , γt3 ) satisfy:
(a) (γt1 , γt3 ) is an optimal coupling of (µt1 , µt3 ) for the cost ct1 ,t3 ;
(b) ct1 ,t3 (γt1 , γt3 ) = ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ) almost surely;
(c) cs,t (γs , γt ) < +∞ almost surely.
Armed with that information, we proceed as follows. We start from
an optimal coupling (γ0 , γ1 ) of (µ0 , µ1 ), with joint law π0→1 . Then as
(1) (1) (1) (1)
in Step 1 we construct a triple (γ0 , γ 1 , γ1 ) with law (γ0 ) = µ0 ,
2
(1) (1) (1) (1)
law (γ 1 ) = µ 1 , law (γ1 ) = µ1 , such that (γ0 , γ 1 ) is an optimal
2 2 2
1 (1) (1)
coupling of (µ0 , µ 1 ) for the cost c0, 2 and (γ 1 , γ1 ) is an optimal cou-
2 2
1
pling of (µ 1 , µ1 ) for the cost c 2 ,1 . From (a) and (b) above we know
2
(1) (1) (1) (1)
that (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ) (but law (γ0 , γ1 )
(1) (1)
might be different from law (γ0 , γ1 )), and moreover c0,1 (γ0 , γ1 ) =
1 (1) (1) 1 (1) (1)
c0, 2 (γ0 , γ 1 ) + c 2 ,1 (γ 1 , γ1 ) almost surely.
2 2
Next it is possible to iterate the construction, introducing more
and more midpoints. By a reasoning similar to the one above and an
induction argument, one can construct, for each integer k ≥ 1, random
(k) (k) (k) (k) (k)
variables (γ0 , γ 1 , γ 2 , γ 3 , . . . γ1 ) in such a way that
2k 2k 2k
(k) (k)
(a) for any two i, j ≤ 2k , (γ i ,γ j ) constitutes an optimal coupling
2k 2k
of (µ i ,µ j ),
2k 2k
(Recall that et is just the evaluation at time t.) Then the law Π (k) of
(γt )0≤t≤1 is a probability measure on the set of continuous curves in X .
I claim that Π (k) is actually concentrated on minimizing curves.
(Skip at first reading and go directly to Step 4.) To prove this, it is
sufficient to check the criterion in Proposition 7.16(iv), involving three
intermediate times t1 , t2 , t3 . By construction, the criterion holds true if
all these times belong to the same time-interval [i/2k , (i + 1)/2k ], and
also if they are all of the form j/2k ; the problem consists in “crossing
subintervals”. Let us show that
⎪
⎪
⎪ i
⎩cs,t (γs , γt ) = cs, 2k (γs , γ i )+c 2k
i
, j
2k (γ i ,γ j ) +c 2k (γ
j
,t
j , γt )
2k 2k 2k 2k
(7.19)
i−1 j+1
, k
and by construction of Π (k) this is just c 2k 2 (γ i−1 , γ j+1 ). So there has
2k 2k
to be equality everywhere in (7.20), which leads to (7.19). (Here I use
the fact that cs,t (γs , γt ) < +∞.) After that it is an easy game to con-
clude the proof of the minimizing property for arbitrary times t1 , t2 , t3 .
This proves the tightness of the family (Π (k) ). So one can extract a
subsequence thereof, still denoted Π (k) , that converges weakly to some
probability measure Π.
By Proposition 7.16(v), Γ is closed; so Π is still supported in Γ .
Moreover, for all dyadic time t = i/2 in [0, 1], we have, if k is larger
than , (et )# Π (k) = µt , and by passing to the limit we find that
(et )# Π = µt also.
By assumption, µt depends continuously on t. So, to conclude that
(et )# Π = µt for all times t ∈ [0, 1] it now suffices to check the continuity
of (et )# Π as a function of t. In other words, if ϕ is an arbitrary bounded
continuous function on X , one has to show that
ψ(t) = E ϕ(γt )
Next, let us check that the two expressions for As,t in (iii) do coin-
cide. This is about the same computation as in Step 1 above. Let s < t
be given, let (µτ )s≤τ ≤t be a continuous path, and let (ti ) be a subdivi-
sion of [s, t]. Further, let γ be such that law (γτ ) = µτ , and let (Xs , Xt )
be an optimal coupling of (µs , µt ), for the cost function cs,t . Further,
let (γτ )s≤τ ≤t be a random continuous path, such that law (γτ ) = µτ for
all τ ∈ [s, t]. Then
C ti ,ti+1 (µti , µti+1 ) ≤ C s,t (µs , µt ) = E cs,t (Xs , Xt )
i
≤ E cs,t (γs , γt ) ≤ E As,t (γ),
where the next-to-last inequality follows from the fact that (γs , γt ) is
a coupling of (µs , µt ), and the last inequality is a consequence of the
definition of cs,t . This shows that
C ti ,ti+1 (µti , µti+1 ) ≤ E As,t (γ).
i
t ) ≤ sup E d(γt , γ
sup W1 (µt , µ t ) ≤ E sup d(γt , γ
t ) = W1 (Π, Π).
t∈[0,1] t∈[0,1] t∈[0,1]
Then
d(x, y)p
cs,t (x, y) = ,
(t − s)p−1
and all our assumptions hold true for this action and cost. (The assump-
tion of local compactness is used to prove that the action is coercive,
see the Appendix.) The important point now is that
Wp (µ, ν)p
C s,t (µ, ν) = .
(t − s)p−1
So, according to the remarks in Example 7.9, Property (ii) in Theorem 7.21
means that (µt ) is in turn a minimizer of the action associated with the
Lagrangian |µ̇|p , i.e. a geodesic in Pp (X ). Note that if µt is the law of a
random optimal geodesic γt at time t, then
Wp (µt , µs )p ≤ E d(γs , γt )p = E (t − s)p d(γ0 , γ1 )p = (t − s)p Wp (µ0 , µ1 )p ,
so the path µt is indeed continuous (and actually 1-Lipschitz) for the
distance Wp .
Proof of Corollary 7.23. By Theorem 7.21, any displacement interpo-
lation has to take the form (et )# Π, where Π is a probability measure
on action-minimizing curves such that π := (e0 , e1 )# Π is an optimal
transference plan. By assumption, there is exactly one such π. Let Z
be the set of pairs (x0 , x1 ) such that there is more than one minimiz-
ing curve joining x0 to x1 ; by assumption π[Z] = 0. For (x0 , x1 ) ∈
/ Z,
there is a unique geodesic γ = S(x0 , x1 ) joining x0 to x1 . So Π has to
coincide with S# π.
To conclude this section with a simple application, I shall use
Corollary 7.22 to derive the Lipschitz continuity of moments, alluded
to in Remark 6.10.
Proposition 7.29 (Wp -Lipschitz continuity of p-moments). Let
(X , d) be a locally compact Polish length space, let p ≥ 1 and µ, ν ∈
Pp (X ). Then for any ϕ ∈ Lip(X ; R+ ),
1 1
p p
ϕ(x)p µ(dx) − ϕ(y)p ν(dy) ≤ ϕLip Wp (µ, ν).
138 7 Displacement interpolation
So (d+ /dt)[Ψ (t)1/p ] ≤ Wp (µ, ν), thus |Ψ (1)1/p − Ψ (0)1/p | ≤ Wp (µ, ν),
which is the desired result.
Again let µ0 and µ1 be any two probability measures, (µt )0≤t≤1 a dis-
placement interpolation associated with a dynamical optimal transfer-
ence plan Π, and (γt )0≤t≤1 a random action-minimizing curve with
law (γ) = Π. In particular (γ0 , γ1 ) is an optimal coupling of (µ0 , µ1 ).
With the help of the previous arguments, it is almost obvious that
(γt0 , γt1 ) is also an optimal coupling of (µt0 , µt1 ). What may look at
first sight more surprising is that if (t0 , t1 ) = (0, 1), this is the only op-
timal coupling, at least if action-minimizing curves “cannot branch”.
Furthermore, there is a time-dependent version of Theorem 4.6.
transport plan associated with a finite total cost. For any t0 , t1 in [0, 1]
with 0 ≤ t0 < t1 ≤ 1, define the time-restriction of Π to [t0 , t1 ] as
Π t0 ,t1 := (rt0 ,t1 )# Π, where rt0 ,t1 (γ) is the restriction of γ to the inter-
val [t0 , t1 ]. Then:
(i) Π t0 ,t1 is a dynamical optimal coupling for the action (A)t0 ,t1 .
(ii) If Π is a measure on C([t0 , t1 ]; X ), such that Π ≤ Π t0 ,t1 and
Π[C([t 0 , t1 ]; X )] > 0, let
Π
Π := , µt = (et )# Π .
C([t0 , t1 ]; X )
Π
Then Π is a dynamical optimal coupling between µt0 and µt1 ; and
(µt )t0 ≤t≤t1 is a displacement interpolation.
(iii) Further, assume that action-minimizing curves are uniquely
and measurably determined by their restriction to a nontrivial time-
interval, and (t0 , t1 ) = (0, 1). Then, Π in (ii) is the unique dynam-
ical optimal coupling between µt0 and µt1 . In particular, (π )t0 ,t1 :=
(et0 , et1 )# Π is the unique optimal transference plan between µt0 and
µt1 ; and µt := (et )# Π (t0 ≤ t ≤ t1 ) is the unique displacement inter-
polation between µt0 and µt1 .
(iv) Under the same assumptions as in (iii), for any t ∈ (0, 1),
(Π ⊗ Π)(dγ d γ )-almost surely,
t ] =⇒ [γ = γ
[γt = γ ].
In other words, the curves seen by Π cannot cross at intermediate
times. If the costs cs,t are continuous, this conclusion extends to all
∈ Spt Π.
curves γ, γ
(v) Under the same assumptions as in (iii), there is a measurable
map Ft : X → Γ (X ) such that, Π(dγ)-almost surely, Ft (γt ) = γ.
Remark 7.31. In Chapter 8 we shall work in the setting of smooth
Riemannian manifolds and derive a quantitative variant of the “no-
crossing” property expressed in (iv).
Corollary 7.32 (Nonbranching is inherited by the Wasserstein
space). Let (X , d) be a complete separable, locally compact length space
and let p ∈ (1, ∞). Assume that X is nonbranching, in the sense that
a geodesic γ : [0, 1] → X is uniquely determined by its restriction to
a nontrivial time-interval. Then also the Wasserstein space Pp (X ) is
nonbranching. Conversely, if Pp (X ) is nonbranching, then X is non-
branching.
140 7 Displacement interpolation
optimal transference plan between µt0 and µt1 ; let Π t0 ,t1 be another
such plan. The goal is to show that Π t0 ,t1 = Π t0 ,t1 .
Disintegrate Π 0,t0 and Π t0 ,t 1 along their common marginal µt0 and
glue them together. This gives a probability measure on C([0, t0 ); X ) ×
X × C((t0 , t1 ]; X ), supported on triples (γ, g, γ ) such that γ(t) → g as
t → t− 0 ,
γ (t) → g as t → t +
0 . Such triples can be identified with continu-
ous functions on [0, t1 ], so what we have is in fact a probability measure
on C([0, t1 ]; X ). Repeat the operation by gluing this with Π t1 ,1 , so as
to get a probability measure Π on C([0, 1]; X ).
Then let γ be a random variable with law Π: By construction,
law (γ 0,t0 ) = law (γ 0,t0 ), and law (γ t1 ,1 ) = law (γ t1 ,1 ), so
γ t0 ,t1 )= law (
law ( γ t0 ,t1 ) = law (F (
γ 0,t0 )) = law (F (γ 0,t0 )) = law (γ t0 ,t1 ).
and similarly
γ0 , γ1 ) ≤ c0,t (
c0,1 ( γ0 , X) + ct,1 (X, γ1 ). (7.23)
1 ) + c0,1 (
c0,1 (γ0 , γ γ0 , γ1 )
0,t
≤ c (γ0 , X) + ct,1 (X, γ1 ) + c0,t ( 1 )
γ0 , X) + ct,1 (X, γ
= c0,1 (γ0 , γ1 ) + c0,1 ( 1 ).
γ0 , γ
Since the reverse inequality holds true by (7.21), equality has to hold
in all intermediate inequalities, for instance in (7.22). Then it is easy
to see that the path γ defined by γ(s) = γ(s) for 0 ≤ s ≤ t, and
γ(s) = γ (s) for s ≥ t, is a minimizing curve. Since it coincides with γ
on a nontrivial time-interval, it has to coincide with γ everywhere, and
similarly it has to coincide with γ everywhere. So γ = γ .
s,t
If the costs c are continuous, the previous conclusion holds true
not only Π ⊗ Π-almost surely, but actually for any two minimizing
curves γ, γ lying in the support of Π. Indeed, inequality (7.21) defines
a closed set C in Γ ×Γ , where Γ stands for the set of minimizing curves;
so Spt Π × Spt Π = Spt(Π ⊗ Π) ⊂ C.
It remains to prove (v). Let Γ 0,1 be a c-cyclically monotone subset
of X × X on which π is concentrated, and let Γ be the set of mini-
mizing curves γ : [0, 1] → X such that (γ0 , γ1 ) ∈ Γ 0,1 . Let (K )∈N be
a nondecreasing sequence of compact sets contained in Γ , such that
Π[∪K ] = Π[Γ ] = 1. For each , we define F on et (K ) by F (γt ) = γ.
This map is continuous: Indeed, if xk ∈ et (K ) converges to x, then for
each k we have xk = (γk )t for some γk ∈ K , and up to extraction γk
converges to γ ∈ K , and in particular (γk )t converges to γt ; but then
F (γt ) = γ. Then we can define F on ∪K as a map which coincides
with F on each K . (Obviously, this is the same line of reasoning as in
Theorem 5.30.)
Interpolation of prices 143
Interpolation of prices
When the path µt varies in time, what becomes of the pair of “prices”
(ψ, φ) in the Kantorovich duality? The short answer is that these func-
tions will also evolve continuously in time, according to Hamilton–
Jacobi equations.
⎪
⎪
⎪ t,s
⎩H− φ (x) = sup φ(y) − cs,t (x, y) .
y∈X
144 7 Displacement interpolation
s,t s,t
The family of operators (H+ )t>s (resp. (H− )s<t ) is called the for-
ward (resp. backward) Hamilton–Jacobi (or Hopf–Lax, or Lax–Oleinik)
semigroup.
s,t
Roughly speaking, H+ gives the values of ψ at time t, from its values
s,t
at time s; while H− does the reverse. So the semigroups H− and H+
are in some sense inverses of each other. Yet it is not true in general
t,s s,t
that H− H+ = Id . Proposition 7.34 below summarizes some of the
main properties of these semigroups; the denomination of “semigroup”
itself is justified by Property (ii).
⎪
⎩ t2 ,t1 t3 ,t2 t3 ,t1
H− H− = H− .
Proof of Proposition 7.34. Properties (i) and (ii) are immediate conse-
quences of the definitions and Proposition 7.16(iii). To check Property
(iii), e.g. the first half of it, write
H−t,s s,t
(H+ ψ)(x) = sup inf ψ(x ) + cs,t (x , y) − cs,t (x, y) .
y x
∂S+ |∇− S+ |2
+ = 0,
∂t 2
where
[f (y) − f (x)]−
|∇− f |(x) := lim sup , z− = max(−z, 0).
y→x d(x, y)
and
φt (y) − ψs (x) ≤ cs,t (x, y),
with equality πs,t (dx dy)-almost surely, where πs,t is any optimal trans-
ference plan between µs and µt .
Since c0,s (x , x) + cs,t (x, y) + ct,1 (y, y ) ≥ c0,1 (x , y ), it follows that
φt (y) − ψs (x) − cs,t (x, y) ≤ sup φ1 (y ) − ψ0 (x ) − c0,1 (x , y ) ≤ 0.
y , x
So φt (y) − ψs (x) ≤ cs,t (x, y). This inequality does not depend on the
fact that (ψ0 , φ1 ) is a tight pair of prices, in the sense of (5.5), but only
on the inequality φ1 − ψ0 ≤ c0,1 .
Interpolation of prices 147
φt ≤ ψt
φt = ψt µt -almost surely
... but it is not true in general that φt = ψt everywhere in X .
Remark 7.38. However, the identity ψ1 = φ1 holds true everywhere
as a consequence of the definitions.
Exercise 7.39. After reading the rest of Part I, the reader can come
back to this exercise and check his or her understanding by proving
that, for a quadratic Lagrangian:
(i) The displacement interpolation between two balls in Euclidean
space is always a ball, whose radius increases linearly in time (here I am
identifying a set with the uniform probability measure on this set).
(ii) More generally, the displacement interpolation between two el-
lipsoids is always an ellipsoid.
(iii) But the displacement interpolation between two sets is in gen-
eral not a set.
(iv) The displacement interpolation between two spherical caps on
the sphere is in general not a spherical cap.
(v) The displacement interpolation between two antipodal spheri-
cal caps on the sphere is unique, while the displacement interpolation
between two antipodal points can be realized in an infinite number of
ways.
148 7 Displacement interpolation
p(x) · v = ξ(x) · v,
where v ∈ Tx M , and the dot in the left-hand side just means “p(x)
applied to v”, while the dot in the right-hand side stands for the scalar
product defined by g. As a particular case, if p is the differential of a
function f , the corresponding vector field ξ is the gradient of f , denoted
by ∇f or ∇x f .
If f = f (x, v) is a function on TM , then one can differentiate it with
respect to x or with respect to v. Since T(x,v) Tx M Tx M , both dx f
and dv f can be seen as linear forms on Tx M ; this allows us to define
∇x f and ∇v f , the “gradient with respect to the position variable” and
the “gradient with respect to the velocity variable”.
Differentiating functions does not pose any particular conceptual
problem, but differentiating vector fields is quite a different story. If ξ
is a vector field on M , then ξ(x) and ξ(y) live in different vector spaces,
so it does not a priori make sense to compare them, unless one identifies
in some sense Tx M and Ty M . (Of course, one could say that ξ is a map
M → TM and define its differential as a map TM → T (TM ) but this
is of little use, because T (TM ) is “too large”; it is much better if we
can come up with a reasonable notion of derivation which produces a
map TM → TM .)
There is in general no canonical way to identify Tx M and Ty M if
x = y; but there is a canonical way to identify Tγ0 M and Tγt M as
t varies continuously. This operation is called parallel transport, or
Levi-Civita transport. A vector field which is transported in a parallel
way along the curve γ will “look constant” to an observer who lives
in M and travels along γ. If M is a surface embedded in R3 , parallel
transport can be described as follows: Start from the tangent plane at
γ(0), and then press your plane onto M along γ, in such a way that
there is no friction (no slip, no rotation) between the plane and the
surface.
150 7 Displacement interpolation
Further, note that in the second formula of (7.28) the symbol f˙ stands
for the usual derivative of t → f (γt ); while the symbols ξ˙ and ζ̇ stand
for the covariant derivatives of the vector fields ξ and ζ along γ.
A third approach to covariant derivation is based on coordinates.
Let x ∈ M , then there is a neighborhood O of x which is diffeomorphic
to some open subset U ⊂ Rn . Let Φ be a diffeomorphism U → O, and
let (e1 , . . . , en ) be the usual basis of Rn . A point m in O is said to have
coordinates (y 1 , . . . , y n ) if m = Φ(y 1 , . . . , y n ); and a vector v ∈ Tm M
is said to have components v 1 , . . . , v k if d(y1 ,...,yn ) Φ · (v1 , . . . , vk ) = v.
Then the coefficients of the metric g are the functions gij defined by
g(v, v) = gij v i v j .
The coordinate point of view reduces everything to “explicit” com-
putations and formulas in Rn ; for instance the derivation of a function
152 7 Displacement interpolation
where (g ij )
is by convention the inverse of (gij ). Then the covariant
derivation along γ is given by the formula
Dξ k dξ k k i j
= + Γij γ̇ ξ .
Dt dt
ij
We have seen that minimizing curves have zero acceleration, and the
converse is also true locally, that is if γ1 is very close to γ0 . A curve which
minimizes the action between its endpoints is called a minimizing
geodesic, or minimal geodesic, or simply a geodesic. The Hopf–Rinow
theorem guarantees that if the manifold M (seen as a metric space) is
complete, then any two points in M are joined by at least one minimal
geodesic. There might be several minimal geodesics joining two points
x and y (to see this, consider two antipodal points on the sphere), but
geodesics are:
⎪
⎩d(x, y) = inf L(γ); γ0 = x, γ1 = y .
156 7 Displacement interpolation
Bibliographical notes
where the infimum is taken among all paths (µt )0≤t≤1 satisfying certain
regularity conditions. Brenier himself gave two sketches of the proof for
this formula [88, 164], and another formal argument was suggested by
Otto and myself [671, Section 3]. Rigorous proofs were later provided
by several authors under various assumptions [814, Theorem 8.1] [451]
[30, Chapter 8] (the latter reference contains the most precise results).
The adaptation to Riemannian manifolds has been considered in [278,
431, 491]. We shall come back to these formulas later on, after a more
precise qualitative picture of optimal transport has emerged. One of
the motivations of Benamou and Brenier was to devise new numerical
methods [88, 89, 90, 91]. Wolansky [838] considered a more general
situation in the presence of sources and sinks.
There was a rather amazing precursor to the idea of displacement
interpolation, in the form of Nelson’s theory of “stochastic mechanics”.
Nelson tried to build up a formalism in which quantum effects would
be explained by stochastic fluctuations. For this purpose he considered
an action minimization problem which was also studied by Guerra and
Morato: 1
inf E |Ẋt |2 dt,
0
where the infimum is over all random paths (Xt )0≤t≤1 such that
law (X0 ) = µ0 , law (X1 ) = µ1 , and in addition (Xt ) solves the sto-
chastic differential equation
dXt dBt
=σ + ξ(t, Xt ),
dt dt
where σ > 0 is some coefficient, Bt is a standard Brownian motion,
and ξ is a drift, which is an unknown in the problem. (So the mini-
mization is over all possible couplings (X0 , X1 ) but also over all drifts!)
This formulation is very similar to the Benamou–Brenier formula just
160 7 Displacement interpolation
alluded to, only there is the additional Brownian noise in it, thus it
is more complex in some sense. Moreover, the expected value of the
action is always infinite, so one has to renormalize it to make sense
of Nelson’s problem. Nelson made the incredible discovery that after a
change of variables, minimizers of the action produced solutions of the
free Schrödinger equation in Rn . He developed this approach for some
time, and finally gave up because it was introducing unpleasant non-
local features. I shall give references at the end of the bibliographical
notes for Chapter 23.
It was Otto [669] who first explicitly reformulated the Benamou–
Brenier formula (7.34) as the equation for a geodesic distance on a Rie-
mannian setting, from a formal point of view. Then Ambrosio, Gigli and
Savaré pointed out that if one is not interested in the equations of mo-
tion, but just in the geodesic property, it is simpler to use the metric no-
tion of geodesic in a length space [30]. Those issues were also developed
by other authors working with slightly different formalisms [203, 214].
All the above-mentioned works were mainly concerned with dis-
placement interpolation in Rn . Agueh [4] also considered the case of
cost c(x, y) = |x − y|p (p > 1) in Euclidean space. Then displacement
interpolation on Riemannian manifolds was studied, from a heuristic
point of view, by Otto and myself [671]. Some useful technical tools were
introduced in the field by Cordero-Erausquin, McCann and Schmucken-
schläger [246] for Riemannian manifolds; Cordero-Erausquin adapted
them to the case of rather general strictly convex cost functions in
Rn [243].
The displacement interpolation for more general cost functions,
arising from a smooth Lagrangian, was constructed by Bernard and
Buffoni [105], who first introduced in this context Property (ii) in
Theorem 7.21. At the same time, they made the explicit link with the
Mather minimization problem, which will appear in subsequent chap-
ters. This connection was also studied independently by De Pascale,
Gelli and Granieri [278].
In all these works, displacement interpolation took place in a smooth
structure, resulting in particular in the uniqueness (almost everywhere)
of minimizing curves used in the interpolation, at least if the Lagrangian
is nice enough. Displacement interpolation in length spaces, as pre-
sented in this chapter, via the notion of dynamical transference plan,
was developed more recently by Lott and myself [577]. Theorem 7.21
in this course is new; it was essentially obtained by rewriting the proof
Bibliographical notes 161
and this would contradict the fact that the support of π is c-cyclically
monotone. Stated otherwise: Given two crossing line segments, we can
shorten the total length of the paths by replacing these lines by the
new transport lines [x1 , y2 ] and [x2 , y1 ] (see Figure 8.1).
x1
x2
y1
y2
Then let
γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2
|x1 − y2 |2 + |x2 − y1 |2
= |x1 − X|2 + |X − y2 |2 − 2 X − x1 , X − y2
+ |x2 − X|2 + |X − y1 |2 − 2 X − x2 , X − y1
= [t20 + (1 − t0 )2 ] |x1 − y1 |2 + |x2 − y2 |2
+ 4 t0 (1 − t0 ) x1 − y1 , x2 − y2
≤ t20 + (1 − t0 )2 + 2 t0 (1 − t0 ) |x1 − y1 |2 + |x2 − y2 |2
= |x1 − y1 |2 + |x2 − y2 |2 ,
Example 8.3. The cost function c(x, y) = d(x, y)2 corresponds to the
Lagrangian function L(x, v, t) = |v|2 , which obviously satisfies the as-
sumptions of Corollary 8.2. In that case the exponent β = 1 is ad-
missible. Moreover, it is natural to expect that the constant CK can
be controlled in terms of just a lower bound on the sectional curvature
of M . I shall come back to this issue later in this chapter (see Open
Problem 8.21).
Example 8.4. The cost function c(x, y) = d(x, y)1+α does not satisfy
the assumptions of Corollary 8.2 for 0 < α < 1. Even if the associated
Lagrangian L(x, v, t) = |v|1+α is not smooth, the equation for mini-
mizing curves is just the geodesic equation, so Assumption (i) in The-
orem 8.1 is still satisfied. Then, by tracking exponents in the proof of
Theorem 8.1, one can find that (8.4) holds true with β = (1+α)/(3−α).
But this is far from optimal: By taking advantage of the homogeneity
of the power function, one can prove that the exponent β = 1 is also
168 8 The Monge–Mather shortening principle
admissible, for all α ∈ (0, 1). (It is the constant, rather than the expo-
nent, which deteriorates as α ↓ 0.) I shall explain this argument in the
Appendix, in the Euclidean case, and leave the Riemannian case as a
delicate exercise. This example suggests that Theorem 8.1 still leaves
room for improvement.
m1
m 0 = d x0
Fig. 8.2. In this example the map γ(0) → γ(1/2) is not well-defined, but the map
γ(1/2) → γ(0) is well-defined and Lipschitz, just as the map γ(1/2) → γ(1). Also
µ0 is singular, but µt is absolutely continuous as soon as t > 0.
Thus, Π ⊗ Π(dγ d
γ )-almost surely,
1K Π
Π := ,
Π[K]
and let π := (e0 , e1 )# Π be the associated transference plan, and µt =
(et )# Π the marginal of Π at time t. In particular,
(e1 )# Π µ1
µ1 ≤ = ,
Π[K] Π[K]
so µ1 is still absolutely continuous.
By Theorem 7.30(ii), (µt ) is a displacement interpolation. Now, µt0
is concentrated on et0 (K) ⊂ et0 (Z) ⊂ Zt0 , so µt0 is singular. But the
first part of the proof rules out this possibility, because µ0 and µ1 are
respectively supported in e0 (K) and e1 (K), which are compact, and µ1
is absolutely continuous.
γ1 (0) γ2 (0)
γ1
γ2
Idea of the proof of Theorem 8.1. Assume, to fix the ideas, that γ1 and
γ2 cross each other at a point m0 and at time t0 . Close to m0 , these
two curves look like two straight lines crossing each other, with re-
spective velocities v1 and v2 . Now cut these curves on the time inter-
val [t0 − τ, t0 + τ ] and on that interval introduce “deviations” (like a
plumber installing a new piece of pipe to short-cut a damaged region
of a channel) that join the first curve to the second, and vice versa.
This amounts to replacing (on a short interval of time) two curves
with approximate velocity v1 and v2 , by two curves with approximate
velocities (v1 +v2 )/2. Since the time-interval where the modification oc-
curs is short, everything is concentrated in the neighborhood of (m0 , t0 ),
so the modification in the Lagrangian action of the two curves is ap-
proximately
v1 + v2
(2τ ) 2 L m0 , , t0 − L(m0 , v1 , t0 ) + L(m0 , v2 , t0 ) .
2
number of such balls Bj , each of them having radius no less than δ > 0.
Without loss of generality, all these balls are included in projM (V ), and
it can be assumed that whenever two points X1 and X2 in γ1 ∪ γ2 are
separated by a distance less than δ/4, then one of the balls Bj contains
Bδ/4 (X1 ) ∪ Bδ/4 (X2 ).
If γ1 (t0 ) and γ2 (t0 ) are separated by a distance at least δ/4, then the
conclusion is obvious. Otherwise, choose τ small enough that τ S ≤ δ/4
(recall that S is a strict bound on the maximum speed along the curves);
then on the time-interval [t0 − τ, t0 + τ ] the curves never leave the balls
Bδ/4 (X1 ) ∪ Bδ/4 (X2 ), and therefore the whole trajectories of γ1 and
γ2 on that time-interval have to stay within a single ball Bj . If one
takes into account positions, velocities and time, the system is confined
within Bj × BS (0) × [0, 1] ⊂ V.
On any of these balls Bj , one can introduce a Euclidean system of
coordinates, and perform all computations in that system (write L in
those coordinates, etc.). The distance induced on Bj by that system of
coordinates will not be the same as the original Riemannian distance,
but it can be bounded from above and below by multiples thereof. So
we can pretend that we are really working with a Euclidean metric, and
all conclusions that are obtained, involving only what happens inside
the ball Bj , will remain true up to changing the bounds. Then, for the
sake of all computations, we can freely add points as if we were working
in Euclidean space.
If it can be shown, in that system of coordinates, that
γ̇1 (t0 ) − γ̇2 (t0 ) ≤ C γ1 (t0 ) − γ2 (t0 )β , (8.8)
then this means that (γ1 (t0 ), γ̇1 (t0 )) and (γ2 (t0 ), γ̇2 (t0 )) are very close
to each other
in TM ; more precisely
they are separated by a distance
which is O d(γ1 (t0 ), γ2 (t0 ))β . Then by Assumption (i) and Cauchy–
Lipschitz theory this bound will be propagated backward and forward
in time, so the distance
between (γ1 (t), γ̇1 (t)) and (γ2 (t), γ̇2 (t)) will re-
main bounded by O d(γ1 (t0 ), γ2 (t0 ))β . Thus to conclude the argument
it is sufficient to prove (8.8).
Step 2: Construction of shortcuts. First some notation: Let us
write x1 (t) = γ1 (t), x2 (t) = γ2 (t), v1 (t) = γ̇1 (t), v2 (t) = γ̇2 (t), and also
X1 = x1 (t0 ), V1 = v1 (t0 ), X2 = x2 (t0 ), V2 = v2 (t0 ). The goal is to
control |V1 − V2 | by |X1 − X2 |.
174 8 The Monge–Mather shortening principle
Then the path x21 (t) and its time-derivative v21 (t) are defined symetri-
cally (see Figure 8.4). These definitions are rather natural: First we try
x21
x12
Fig. 8.4. The paths x12 (t) and x21 (t) obtained by using the shortcuts to switch
from one original path to the other.
Proof of Mather’s estimates 175
In the sequel, the symbol O(m) will stand for any expression which
is bounded by Cm, where C only depends on V and on the regularity
bounds on the Lagrangian L on V. From Cauchy–Lipschitz theory and
Assumption (i),
|v1 − v2 |(t) + |x1 − x2 |(t) = O |X1 − X2 | + |V1 − V2 | , (8.10)
and then by plugging this back into the equation for minimizing curves
we obtain
|v̇1 − v̇2 |(t) = O |X1 − X2 | + |V1 − V2 | .
Upon integration in times, these bounds imply
x1 (t) − x2 (t) = (X1 − X2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.11)
v1 (t) − v2 (t) = (V1 − V2 ) + O τ (|X1 − X2 | + |V1 − V2 |) ; (8.12)
and therefore also
x1 (t)−x2 (t) = (X1 −X2 )+(t−t0 ) (V1 −V2 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) .
(8.13)
As a consequence of (8.12), if τ is small enough (depending only on
the Lagrangian L),
|V1 − V2 |
|v1 − v2 |(t) ≥ − O τ |X1 − X2 | . (8.14)
2
Next, from Cauchy–Lipschitz again,
x2 (t0 +τ )−x1 (t0 +τ ) = X2 −X1 +τ (V2 −V1 )+O τ 2 (|X1 −X2 |+|V1 −V2 |) ;
and since a similar expression holds true with τ replaced by −τ , one has
176 8 The Monge–Mather shortening principle
% & % &
x2 (t0 + τ ) − x1 (t0 + τ ) x1 (t0 − τ ) − x2 (t0 − τ )
−
2 2
= (X2 − X1 ) + O τ 2 (|X1 − X2 | + |V1 − V2 |) , (8.15)
and also
% & % &
x2 (t0 + τ ) − x1 (t0 + τ ) x1 (t0 − τ ) − x2 (t0 − τ )
+
2 2
= τ (V2 − V1 ) + O τ 2 (|X1 − X2 | + |V1 − V2 |) . (8.16)
v1 (t) + v2 (t) |X − X |
1 2
v12 (t) − =O + τ |V1 − V2 | . (8.17)
2 τ
After integration in time and use of (8.16), one obtains
x1 (t) + x2 (t)
x12 (t) −
2
x1 (t0 ) + x2 (t0 )
= x12 (t0 ) − + O |X1 − X2 | + τ 2 |V1 − V2 |
2
= O |X1 − X2 | + τ |V1 − V2 | (8.18)
In particular,
|x12 − x21 |(t) = O |X1 − X2 | + τ |V1 − V2 | . (8.19)
similarly
Proof of Mather’s estimates 177
x1 + x2
L(x2 , v2 ) − L , v2
2
x1 + x2 x2 − x1
= ∇x L , v2 · + O |x1 − x2 |1+α .
2 2
Moreover,
x1 + x2 x1 + x2
∇x L , v1 − ∇x L , v2 = O(|v1 − v2 |α ).
2 2
The combination of these three identities, together with estimates (8.11)
and (8.12), yields
x + x x + x
1 2 1 2
L(x1 , v1 ) + L(x2 , v2 ) − L , v1 + L , v2
2 2
= O |x1 − x2 | 1+α
+ |x1 − x2 | |v1 − v2 | α
= O |X1 − X2 |1+α + τ |V1 − V2 |1+α + |X1 − X2 | |V1 − V2 |α
+ τ 1+α |V1 − V2 | |X1 − X2 |α .
v1 + v2
L(x21 , v21 ) − L x21 ,
2
v1 + v2 v1 + v2 v + v 1+α
+ O v21 −
1 2
= ∇v L x21 , · v21 − ,
2 2 2
v1 + v2 v1 + v2
∇v L x12 , − ∇v L x21 , = O |x12 − x21 |α .
2 2
Combining this with (8.9), (8.17) and (8.19), one finds
v1 + v2 v1 + v2
L(x12 , v12 ) + L(x21 , v21 ) − L x12 , + L x21 ,
2 2
v1 + v2 1+α v1 + v2
= O v12 − + v12 − |x12 − x21 |α
2 2
|X1 − X2 |1+α
=O + τ 1+α |V1 − V2 |1+α .
τ 1+α
178 8 The Monge–Mather shortening principle
After that,
v1 + v2 x + x v + v
1 2 1 2
L x12 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
= ∇x L
1 2
,
1 2
· x12 −
+O x12 − ,
2 2 2 2
v1 + v2 x + x v + v
1 2 1 2
L x21 , −L ,
2 2 2
x + x v + v x1 + x2 x1 + x2 1+α
= ∇x L
1 2
,
1 2
· x21 −
+O x21 − ,
2 2 2 2
and now by (8.9) the terms in ∇x cancel each other exactly upon som-
mation, so the bound (8.18) leads to
v1 + v2 v1 + v2 x + x v + v
1 2 1 2
L x12 , + L x21 , − 2L ,
2 2 2 2
x1 + x2 1+α
= O x21 −
2
= O |X1 − X2 | 1+α
+ τ 1+α |V1 − V2 |1+α .
From Step 3, we can replace in the integrand all positions by (x1 +x2 )/2,
and v12 , v21 by (v1 + v2 )/2, up to a small error. Collecting the various
error terms, and taking into account the smallness of τ , one obtains
(dropping the t variable again)
t0 +τ x + x x + x x + x v + v
1 1 2 1 2 1 2 1 2
L , v1 +L , v2 −2L , dt
2τ t0 −τ 2 2 2 2
|X − X |1+α
1 2
≤C + τ |V 1 − V 2 | 1+α
. (8.21)
τ 1+α
Complement: Ruling out focalization by shortening 179
On the other hand, from the convexity condition (iii) and (8.14),
the left-hand side of (8.21) is bounded below by
t0 +τ 2+κ
1
K |v1 − v2 |2+κ dt ≥ K |V1 − V2 | − Aτ |X1 − X2 | . (8.22)
2τ t0 −τ
If |V1 − V2 | ≤ 2Aτ |X1 − X2 |, then the proof is finished. If this is not the
case, this means that |V1 − V2 | − Aτ |X1 − X2 | ≥ |V1 − V2 |/2, and then
the comparison of the upper bound (8.21) and the lower bound (8.22)
yields
|X − X |1+α
1 2
|V1 − V2 |2+κ ≤ C + τ |V 1 − V 2 | 1+α
. (8.23)
τ 1+α
If |V1 − V2 | = 0, then the proof is finished. Otherwise, the conclusion
follows by choosingτ small enough that Cτ |V1 − V2 |1+α ≤ (1/2)|V1 −
V2 |2+κ ; then τ = O |V1 − V2 |1+κ−α ) and (8.23) implies
1+α
|V1 − V2 | = O |X1 − X2 |β , β= .
(1 + α)(1 + κ − α) + 2 + κ
(8.24)
In the particular case when κ = 0 and α = 1, one has
|X1 − X2 |2
|V1 − V2 | ≤ C
2
+ τ |V1 − V2 | ,
2
τ2
|X1 − X2 |
|V1 − V2 | ≤ C . (8.25)
τ
The upper bound on τ depends on the regularity and strict convexity
of τ in V, but also on t0 since τ cannot be greater than min(t0 , 1 − t0 ).
This is actually the only way in which t0 explicitly enters the estimates.
So inequality (8.25) concludes the argument.
Remark 8.13. If L does not depend on t, then one can apply the pre-
vious result for any T = 2− , and then use a compactness argument to
construct a constant curve (µt )t∈R satisfying Properties (i)–(vi) above.
In particular µ0 is a stationary measure for the Lagrangian system.
Before giving its proof, let me explain briefly why Theorem 8.11
is interesting from the point of view of the dynamics. A trajectory
of the dynamical system defined by the Lagrangian L is a curve γ
which is locally action-minimizing; that is, one can cover the time-
interval by small subintervals on which the curve is action-minimizing.
It is a classical problem in mechanics to construct and study periodic
trajectories having certain given properties. Theorem 8.11 does not
construct a periodic trajectory, but at least it constructs a random
trajectory γ (or equivalently a probability measure Π on the set of
trajectories) which is periodic on average: The law µt of γt satisfies
µt+T = µt . This can also be thought of as a probability measure Π on
the set of all possible trajectories of the system.
Of course this in itself is not too striking, since there may be a great
deal of invariant measures for a dynamical system, and some of them
are often easy to construct. The important point in the conclusion of
Theorem 8.11 is that the curve γ is not “too random”, in the sense
that the random variable (γ(0), γ̇(0)) takes values in a Lipschitz graph.
(If (γ(0), γ̇(0)) were a deterministic element in TM , this would mean
that Π just sees a single periodic curve. Here we may have an infinite
collection of curves, but still it is not “too large”.)
Another remarkable property of the curves γ is the fact that the
minimization property holds along any time-interval in R, not neces-
sarily small.
k−1
C 0,kT
( ) ≤ C
µ, µ 0,kT
(µ0 , µkT ) ≤ C jT, (j+1)T (µjT , µ(j+1)T )
j=0
1
k−1
1
k−1
C 0,T
(µ, µ) ≤ C 0,T
jT ,
µ jT
µ (8.28)
k k
j=0 j=0
1
k−1
1
k−1
=C 0,T
jT ,
µ (j+1)T .
µ (8.29)
k k
j=0 j=0
Introduction to Mather’s theory 185
1
k−1
1
k−1 1 k−1
C 0,T jT ,
µ (j+1)T ≤
µ C jT, (j+1)T ( (j+1)T )
µjT , µ
k k k
j=0 j=0 j=0
1
= C 0,kT ( kT ),
µ0 , µ (8.30)
k
where the last equality is a consequence of Property (ii) in Theorem
7.21. Inequalities (8.29) and (8.30) together imply
1 0,kT 1
C 0,1 (µ, µ) ≤ C ( kT ) = C 0,kT (
µ0 , µ ).
µ, µ
k k
Since the reverse inequality holds true by (8.27), in fact all the inequal-
ities in (8.27), (8.29) and (8.30) have to be equalities. In particular,
k−1
C 0,kT (µ0 , µkT ) = C jT, (j+1)T (µjT , µ(j+1)T ). (8.31)
j=0
C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) = C t1 ,t3 (µt1 , µt3 ) (8.32)
holds true for any three intermediate times t1 < t2 < t3 . By periodicity,
it suffices to do this for t1 ≥ 0. If 0 ≤ t1 < t2 < t3 ≤ T , then (8.32)
is true by the property of displacement interpolation (Theorem 7.21
again). If jT ≤ t1 < t2 < t3 ≤ (j + 1)T , this is also true because of the
T -periodicity. In the remaining cases, we may choose k large enough
that t3 ≤ kT . Then
C 0,kT (µ0 , µkT ) ≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t3 (µt1 , µt3 ) + C t3 ,kT (µt3 , µkT )
≤ C 0,t1 (µ0 , µt1 ) + C t1 ,t2 (µt1 , µt2 ) + C t2 ,t3 (µt2 , µt3 ) + C t3 ,kT (µt3 , µkT )
≤ C sj ,sj+1 (µsj , µsj+1 ), (8.33)
(Consider for instance the particular case when 0 < t1 < t2 < T <
t3 < 2T ; then one can write C 0,t1 + C t1 ,t2 + C t2 ,T = C 0,T , and also
C T,t3 + C t3 ,2T = C T,2T . So C 0,t1 + C t1 ,t2 + C t2 ,T + C T,t3 + C t3 ,2T =
C 0,T + C T,2T .)
But (8.34) is just C 0,kT (µ0 , µkT ), as shown in (8.31). So there is
in fact equality in all these inequalities, and (8.32) follows. Then by
Theorem 7.21, (µt ) defines a displacement interpolation between any
two of its intermediate values. This proves (i). At this stage we have
also proven (iii) in the case when t0 = 0.
Now for any t ∈ R, one has, by (8.32) and the T -periodicity,
ct1 ,t2 (γt1 , γt2 ) + ct2 ,t3 (γt2 , γt3 ) = ct1 ,t3 (γt1 , γt3 ), (8.35)
Introduction to Mather’s theory 187
where the property of optimality of the path (µt )t∈R was used in the
last step. So all these inequalities are equalities, and in particular
E ct1 ,t3 (γt1 , γt3 ) − ct1 ,t2 (γt1 , γt2 ) − ct2 ,t3 (γt2 , γt3 ) = 0.
The story does not end here. First, there is a powerful dual point
of view to Mather’s theory, based on solutions to the dual Kantorovich
problem; this is a maximization problem defined by
sup (φ − ψ) dµ, (8.36)
Fig. 8.5. In the left figure, the pendulum oscillates with little energy between two
extreme positions; its trajectory is an arc of circle which is described clockwise, then
counterclockwise, then clockwise again, etc. On the right figure, the pendulum has
much more energy and draws complete circles again and again, either clockwise or
counterclockwise.
a certain critical value; above that value, the Mather measures are sup-
ported by revolutions. At the critical value, the Mather and Aubry sets
differ: the Aubry set (viewed in the variables (x, v)) is the union of the
two revolutions of infinite period.
The dual point of view in Mather’s theory, and the notion of the
Aubry set, are intimately related to the so-called weak KAM the-
ory, in which stationary solutions of Hamilton–Jacobi equations play
a central role. The next theorem partly explains the link between the
two theories.
Theorem 8.17 (Mather critical value and stationary Hamilton–
Jacobi equation). With the same notation as in Theorem 8.11, as-
sume that the Lagrangian L does not depend on t, and let ψ be a
0,t
Lipschitz function on M , such that H+ ψ=ψ+c t for all times t ≥ 0;
that is, ψ is left invariant by the forward Hamilton–Jacobi semigroup,
except for the addition of a constant which varies linearly in time. Then,
0,T
necessarily c = c, and the pair (ψ, H+ ψ)=(ψ, ψ +c T ) is optimal in the
dual Kantorovich problem with cost function c0,T , and initial and final
measures equal to µ.
0,1
Remark 8.18. The equation H+ ψ = ψ + c t is a way to reformu-
late the stationary Hamilton–Jacobi equation H(x, ∇ψ(x)) + c = 0.
Yet another reformulation would be obtained by changing the forward
Hamilton–Jacobi semigroup for the backward one. Theorem 8.17 does
not guarantee the existence of such stationary solutions, it just states
that if such solutions exist, then the value of the constant c is uniquely
190 8 The Monge–Mather shortening principle
But even for a “uniformly convex” Lagrangian there are several ex-
tensions of Theorem 8.1 which would be of interest, such as (a) getting
rid of the compactness assumption, or at least control the dependence of
constants at infinity; and (b) getting rid of the smoothness assumptions.
I shall discuss both problems in the most typical case L(x, v, t) = |v|2 ,
i.e. c(x, y) = d(x, y)2 .
Intuitively, Mather’s estimates are related to the local behavior of
geodesics (they should not diverge too fast), and to the convexity prop-
erties of the square distance function d2 (x0 , · ). Both features are well
captured by lower bounds on the sectional curvature of the manifold.
There is by chance a generalized notion of sectional curvature bounds,
due to Alexandrov, which makes sense in a general metric space, with-
out any smoothness; metric spaces which satisfy these bounds are called
Alexandrov spaces. (This notion will be explained in more detail in
Chapter 26.) In such spaces, one could hope to solve problems (a)
and (b) at the same time. Although the proofs in the present chap-
ter strongly rely on smoothness, I would be ready to believe in the
following statement (which might be not so difficult to prove):
Let L1 and L2 stand for the respective lengths of γ1 and γ2 , and let D
be a bound on the diameter of (γ1 ∪ γ2 )([0, 1]). Then
√ 0
C D
|L1 − L2 | ≤ d γ1 (t0 ), γ2 (t0 ) ,
t0 (1 − t0 )
Further, let
γ1 (t) = (1 − t) x1 + t y1 , γ2 (t) = (1 − t) x2 + t y2 .
Proof of Theorem 8.23. First note that it suffices to work in the affine
space generated by x1 , y1 , x2 , y2 , which is of dimension at most 3; hence
all the constants will be independent of the dimension n. For notational
simplicity, I shall assume that t0 = 1/2, which has no important in-
fluence on the computations. Let X1 := γ1 (1/2), X2 := γ2 (1/2). It is
sufficient to show that
where δx and δy are vectors of small norm (recall that x1 − y1 has unit
norm). Of course
δx + δy
X1 − X2 = , x2 − x1 = δx, y2 − y1 = δy;
2
so to conclude the proof it is sufficient to show that
Appendix: Lipschitz estimates for power cost functions 197
δx + δy
≥ K(|δx| + |δy|), (8.43)
2
as soon as |δx| and |δy| are small enough, and (8.40) is satisfied.
By using the formulas |a + b|2 = |a|2 + 2a, b + |b|2 and
1+α (1 + α) (1 + α)(1 − α) 2
(1 + ε) 2 =1+ ε − ε + O(ε3 ),
2 8
one easily deduces from (8.40) that
|δx − δy|2 − |δx|2 − |δy|2 ≤ (1 − α) δx − δy, e2 − δx, e2 − δy, e2
+ O |δx|3 + |δy|3 .
(which is indeed a scalar product because α > 0), and denote the
associated norm by v. Then the above conclusion can be summarized
into
δx, δy ≥ O δx3 + δy3 . (8.44)
It follows that
/ /
/ δx + δy /2 1
/ / = δx 2
+ δy2
+ 2δx, δy
/ 2 / 4
1
≥ (δx2 + δy2 ) + O(δx3 + δy3 ).
4
So inequality (8.43) is indeed satisfied if |δx| + |δy| is small enough.
Exercise 8.25. Extend this result to the cost function d(x, y)1+α on a
Riemannian manifold, when γ and γ stay within a compact set.
Hints: This tricky exercise is only for a reader who feels very comfort-
able. One can use a reasoning similar to that in Step 2 of the above
(k) ) which is asymptotically the
proof, introducing a sequence (γ (k) , γ
“worst possible”, and converges, up to extraction of a subsequence,
198 8 The Monge–Mather shortening principle
Bibliographical notes
Mather [602, 603] and Mañé [592, 593]. The reader can also consult
the book by Fathi [347], and (with a complementary point of view) the
one by Contreras and Iturriaga [238]. Also available are some technical
notes by Ishii [488], and the review works [604, 752].
The proof of Theorem 8.17, as I wrote it, is a minor variation of
an argument shown to me by Fathi. Related considerations appear in
a recent work by Bernard and Buffoni [106], who analyze the weak
KAM theory in light of the abstract Kantorovich duality. One may
also consult [278].
From its very beginning, the weak KAM theory has been associ-
ated with the theory of viscosity solutions of Hamilton–Jacobi equa-
tions. An early work on the subject (anterior to Mather’s papers) is an
unpublished preprint by P.-L. Lions, Papanicolaou and Varadhan [564].
Recently, the weak KAM theory has been related to the large-time
behavior of Hamilton–Jacobi equations [69, 107, 346, 349, 383, 384,
385, 487, 645, 707]. Aubry sets are also related with the C 1 regular-
ity of Hamilton–Jacobi equations, which has important applications in
the theory of dynamical systems [102, 103, 350]. See also Evans and
Gomes [332, 333, 334, 423] and the references therein for an alternative
point of view.
In this chapter I presented Mather’s problem in terms of trajectories
and transport cost. There is an alternative presentation in terms of
invariant measures, following an idea by Mañé. In Mañé’s version of
the problem, the unknown is a probability measure µ(dx dv) on the
tangent bundle TM ; it is stationary in the sense that ∇x · (v µ) = 0
(this is a stationary
kinetic transport equation), and it should minimize
the action L(x, v) µ(dx dv). Then one can show that µ is actually
invariant under the Lagrangian flow defined by L. As Gomes pointed
out to me, this approach has the drawback that the invariance of µ is
not built in from the definition; but it has several nice advantages:
(So in this case there is no need for an upper bound on the distances
between x1 , x2 , y1 , y2 .) The general case where K might be negative
seems to be quite more tricky. As a consequence of (8.45), Theorem 8.7
holds when the cost is the squared distance on an Alexandrov space
Bibliographical notes 203
with nonnegative curvature; but this can also be proven by the method
of Figalli and Juillet [366].
Theorem 8.22 takes inspiration from the no-crossing proof in [246,
Lemma 5.3]. I don’t know whether the Hölder-1/2 regularity is optimal,
and I don’t know either whether it is possible/useful to obtain similar
estimates for more general cost functions.
9
In the present chapter and the next one I shall investigate the solv-
ability of the Monge problem for a Lagrangian cost function. Recall
from Theorem 5.30 that it is sufficient to identify conditions under
which the initial measure µ does not see the set of points where the
c-subdifferential of a c-convex function ψ is multivalued.
Consider a Riemannnian manifold M , and a cost function c(x, y) on
M × M , deriving from a Lagrangian function L(x, v, t) on TM × [0, 1]
satisfying the classical conditions of Definition 7.6. Let µ0 and µ1 be
two given probability measures, and let (µt )0≤t≤1 be a displacement
interpolation, written as the law of a random minimizing curve γ at
time t.
If the Lagrangian satisfies adequate regularity and convexity prop-
erties, Theorem 8.5 shows that the coupling (γ(s), γ(t)) is always de-
terministic as soon as 0 < s < 1, however singular µ0 and µ1 might
be. The question whether one can construct a deterministic coupling
of (µ0 , µ1 ) is much more subtle, and cannot be answered without reg-
ularity assumptions on µ0 . In this chapter, a simple approach to this
problem will be attempted, but only with partial success, since even-
tually it will work out only for a particular class of cost functions,
including at least the quadratic cost in Euclidean space (arguably the
most important case).
Our main assumption on the cost function c will be:
Assumption (C): For any c-convex function ψ and any x ∈ M , the
c-subdifferential ∂c ψ(x) is pathwise connected.
cannot be two different points x for which γ(1/2) is the same.) Indeed,
assume y ∈ ∂c ψ(x) and y ∈ ∂c ψ(x ), with ψ(x) < +∞, ψ(x ) < +∞,
and let γ and γ be minimizing geodesics between x and y on the one
hand, x and y on the other hand. It follows from the definitions of
subdifferential that
ψ(x) + c(x, y) ≤ ψ(x ) + c(x , y)
ψ(x ) + c(x , y ) ≤ ψ(x) + c(x, y ).
Thus
c(x, y) + c(x , y ) ≤ c(x, y ) + c(x , y).
Then by (9.1),
1 1 β
d(x, x ) ≤ C d γ , γ .
2 2
This implies that m = γ(1/2) determines x = F (m) unambiguously,
and even that F is Hölder-β. (Obviously, this is the same reasoning as
in the proof of Theorem 8.5.)
Now, cover M by a countable number of open sets in which M is
diffeomorphic to a subset U of Rn , via some diffeomorphism ϕU . In each
of these open sets U , consider the union HU of all hyperplanes passing
through a point of rational coordinates, orthogonal to a unit vector
with rational coordinates. Transport this set back to M thanks to the
local diffeomorphism, and take the union over all the sets U . This gives
a set D ⊂ M with the following properties: (i) It is of dimension n − 1;
(ii) It meets every nontrivial continuous curve drawn on M (to see this,
write the curve locally in terms of ϕU and note that, by continuity, at
least one of the coordinates of the curve has to become rational at some
time).
Next, let x ∈ Z, and let y0 , y1 be two distinct elements of ∂c ψ(x).
By assumption there is a continuous curve (yt )0≤t≤1 lying entirely in
∂c ψ(x). For each t, introduce an action-minimizing curve (γt (s))0≤s≤1
between x and yt (s here is the time parameter along the curve). De-
fine mt := γt (1/2). This is a continuous path, nontrivial (otherwise
γ0 (1/2) = γ1 (1/2), but two minimizing trajectories starting from x can-
not cross in their middle, or they have to coincide at all times by (9.1)).
So there has to be some t such that yt ∈ D. Moreover, the map F con-
structed above satisfies F (yt ) = x for all t. It follows that x ∈ F (D).
(See Figure 9.1.)
As a conclusion, Z ⊂ F (D). Since D is of Hausdorff dimension n − 1
and F is β-Hölder, the dimension of F (D) is at most (n − 1)/β.
208 9 Solution of the Monge problem I: Global approach
y0
y1
m0
x
m1
Fig. 9.1. Scheme of proof for Theorem 9.2. Here there is a curve (yt )0≤t≤1 lying
entirely in ∂c ψ(x), and there is a nontrivial path (mt )0≤t≤1 obtained by taking the
midpoint between x and yt . This path has to meet D; but its image by γ(1/2) → γ(0)
is {x}, so x ∈ F (D).
ticular as it may seem, this case is one of the most important for ap-
plications, so I shall state the result as a separate theorem.
Remark 9.5. The assumption that µ does not give mass to sets of di-
mension at most n − 1 is optimal for the existence of a Monge coupling,
as can be seen by choosing µ = H1 |[0,1]×{0} (the one-dimensional Haus-
dorff measure concentrated on the segment [0, 1] × {0} in R2 ), and then
ν = (1/2) H1 |[0,1]×{−1}∪[0,1]×{+1} . (See Figure 9.2.) It is also optimal
for the uniqueness, as can be seen by taking µ = (1/2) H{0}×[−1,1]
1 and
ν = (1/2) H[−1,1]×{0} . In fact, whenever µ, ν ∈ P2 (R ) are supported
1 n
In the next chapter, we shall see that Theorem 9.4 can be improved
in at least two ways: Equation (9.4) can be rewritten y = ∇ψ(x);
210 9 Solution of the Monge problem I: Global approach
Fig. 9.2. The source measure is drawn as a thick line, the target measure as a thin
line; the cost function is quadratic. On the left, there is a unique optimal coupling
but no optimal Monge coupling. On the right, there are many optimal couplings, in
fact any transference plan is optimal.
Bibliographical notes
It is classical that the image of a set of Hausdorff dimension d by a
Lipschitz map is contained in a set of Hausdorff dimension at most d:
See for instance [331, p. 75]. There is no difficulty in modifying the
proof to show that the image of a set of Hausdorff dimension d by a
Hölder-β map is contained in a set of dimension at most d/β.
The proof of Theorem 9.2 is adapted from a classical argument
according to which a real-valued convex function ψ on Rn has a single-
valued subdifferential everywhere out of a set of dimension at most
n − 1; see [11, Theorem 2.2]. The key estimate for the proof of the
212 9 Solution of the Monge problem I: Global approach
latter theorem is that (Id + ∂ψ)−1 exists and is Lipschitz; but this can
be seen as a very particular case of the Mather shortening lemma. In
the next chapter another line of argumentation for that differentiability
theorem, more local, will be provided.
The paternity of Theorem 9.4 is shared by Brenier [154, 156] with
Rachev and Rüschendorf [722]; it builds upon earlier work by Knott and
Smith [524], who already knew that an optimal coupling lying entirely
in the subdifferential of a convex function would be optimal. Brenier
rewrote the result as a beautiful polar factorization theorem, which
is presented in detail in [814, Chapter 3].
The nonuniqueness statement in Remark 9.5 was formulated by
McCann [613]. Related problems (existence and uniqueness of op-
timal couplings between measures supported on polygons) are dis-
cussed by Gangbo and McCann [400], in relation to problems of shape
recognition.
Other forms of Theorem 9.4 appear in Rachev and Rüschendorf [696], in
particular an extension to infinite-dimensional separable Hilbert spaces;
the proof is reproduced in [814, Second Proof of Theorem 2.9]. (This
problem was also considered in [2, 254].) All these arguments are based
on duality; then more direct proofs, which do not use the Kantorovich
duality explicitly, were found by Gangbo [395], and also Caffarelli [187]
(who gives credit to Varadhan for this approach).
A probabilistic approach of Theorem 9.4 was studied by Mikami
and Thieullen [628, 630]. The idea is to consider a minimization prob-
lem over paths which are not geodesics, but geodesics perturbed by
some noise; then let the noise vanish. This is reminiscent of Nelson’s
approach to quantum mechanics, see the bibliographical notes of Chap-
ters 7 and 23.
McCann [613] extended Theorem 9.4 by removing the assumption
of bounded second moment and even the weaker assumption of finite
transport cost: Whenever µ does not charge sets of dimension n − 1,
there exists a unique coupling of (µ, ν) which takes the form y = ∇Ψ (x),
where Ψ is a lower semicontinuous convex function. The tricky part
in this statement is the uniqueness. This theorem will be proven in
the next chapter (see Theorem 10.42, Corollary 10.44 and Particular
Case 10.45).
Ma, Trudinger and X.-J. Wang [585, Section 7.5] were the first to
seriously study Assumption (C); they had the intuition that it was
connected to a certain fourth-order differential condition on the cost
function which plays a key role in the smoothness of optimal transport.
Bibliographical notes 213
Later Trudinger and Wang [793], and Loeper [570] showed that the
above-mentioned differential condition is essentially, under adequate
geometric and regularity assumptions, equivalent to Assumption (C).
These issues will be discussed in more detail in Chapter 12. (See in
particular Proposition 12.15(iii).)
The counterexample in Proposition 9.6 is extracted from [585]. The
fact that c(x, y) = (1 + |x − y|2 )p/2 satisfies Assumption (C) on the ball
√
of radius 1/ p − 1 is also taken from [585, 793]. It is Loeper [571] who
discovered that the squared geodesic distance on S n−1 satisfies Assump-
tion (C); then a simplified argument was devised by von Nessi [824].
As mentioned in the end of the chapter, by combining Loeper’s
result with Theorems 8.1, 9.2 and 5.30, one can mimick the proof of
Theorem 9.4 and get the unique solvability of the Monge problem for
the quadratic distance on the sphere, as soon as µ does not see sets of
dimension at most n − 1. Such a theorem was first obtained for general
compact Riemannian manifolds by McCann [616], with a completely
different argument.
Other examples of cost functions satisfying Assumption (C) will
be listed in Chapter 12 (for instance |x − y|−2 , or −|x − y|p /p for
−2 ≤ p ≤ 1, or |x − y|2 + |f (x) − g(y)|2 , where f and g are convex
and 1-Lipschitz). But these other examples do not come from a smooth
convex Lagrangian, so it is not clear whether they satisfy Assumption
(ii) in Theorem 9.2.
In the particular case when ν has finite support, one can prove
the unique solvability of the Monge problem under much more general
assumptions, namely that the cost function is continuous, and µ does
not charge sets of the form {x; c(x, a) − c(x, b) = k} (where a, b, k
are arbitrary), see [261]. This condition was recently used again by
Gozlan [429].
10
A heuristic argument
Let ψ be a c-convex function on a Riemannian manifold M , and φ = ψ c .
Assume that y ∈ ∂c ψ(x); then, from the definition of c-subdifferential,
C. Villani, Optimal Transport, Grundlehren der mathematischen 215
Wissenschaften 338,
c Springer-Verlag Berlin Heidelberg 2009
216 10 Solution of the Monge problem II: Local approach
∈ M,
one has, for all x
It follows that
ψ(x) − ψ(
x) ≤ c(
x, y) − c(x, y). (10.2)
Now the idea is to see what happens when x → x, along a given
direction. Let w be a tangent vector at x, and consider a path ε → x(ε),
defined for ε ∈ [0, ε0 ), with initial position x and initial velocity w.
(ε) = expx (εw); or in Rn , just consider x
(For instance, x (ε) = x + εw.)
Assume that ψ and c( · , y) are differentiable at x, divide both sides
of (10.2) by ε > 0 and pass to the limit:
x x
0 π 2π 0 π 2π
Fig. 10.1. The distance function d(·, y) on S 1 , and its square. The upper-pointing
singularity is typical. The square distance is not differentiable when |x − y| = π; still
it is superdifferentiable, in a sense that is explained later.
For instance, if N and S respectively stand for the north and south
poles on S 2 , then d(x, S) fails to be differentiable at x = N .
Of course, for any x this happens only for a negligible set of y’s; and
the cost function is differentiable everywhere else, so we might think
that this is not a serious problem. But who can tell us that the optimal
transport will not try to take each x (or a lot of them) to a place y
such that c(x, y) is not differentiable?
To solve these problems, it will be useful to use some concepts from
nonsmooth analysis: subdifferentiability, superdifferentiability, approxi-
mate differentiability. The short answers to the above problems are that
(a) under adequate assumptions on the cost function, ψ will be differ-
entiable out of a very small set (of codimension at least 1); (b) c will
be superdifferentiable because it derives from a Lagrangian, and subd-
ifferentiable wherever ψ itself is differentiable; (c) where it exists, ∇x c
will be injective because c derives from a strictly convex Lagrangian.
The next three sections will be devoted to some basic reminders
about differentiability and regularity in a nonsmooth context. For the
convenience of the non-expert reader, I shall provide complete proofs
of the most basic results about these issues. Conversely, readers who
feel very comfortable with these notions can skip these sections.
Proof that ∇f (x) is well-defined. Since this concept is local and invari-
ant by diffeomorphism, it is sufficient to treat the case when U is a
subset of Rn .
Let f1 and f2 be two measurable functions on U which are both
differentiable at x and coincide with f on a set of density 1. The problem
is to show that ∇f1 (x) = ∇f2 (x).
For each r > 0, let Zr be the set of points in Br (x) where either
f (x) = f1 (x) or f (x) = f2 (x). It is clear that vol [Zr ] = o(vol [Br (x)]).
Since f1 and f2 are continuous at x, one can write
1
f1 (x) = lim f1 (z) dz
r→0 vol [Br (x)]
1 1
= lim f1 (z) dz = lim f2 (z) dz
r→0 vol [Br (x) \ Zr ] r→0 vol [Br (x) \ Zr ]
1
= lim f2 (z) dz = f2 (x).
r→0 vol [Br (x)]
ζ ∗ [Dv+w f − Dv f − Dw f ] = 0.
for almost all x ∈ Rn \ (Av ∩ Aw ∩ Av+w ), that is, for almost all x ∈ Rn .
Now it is easy to conclude. Let Bv,w be the set of all x ∈ Rn such
that Dv f (x), Dw f (x) or Dv+w f (x) is not well-defined, or (10.7) does
2 hold true. Let (vk )k∈N be a dense sequence in R , and let B :=
not n
j,k∈N Bvj ,vk . Then B is still Lebesgue-negligible, and for each x ∈/B
we have
Dvj +vk f (x) = Dvj f (x) + Dvk f (x). (10.8)
Since Dv f (x) is a Lipschitz continuous function of v, it can be extended
uniquely into a Lipschitz continuous function, defined for all x ∈ / B
and v ∈ R , which turns out to be Dv f (x) in view of Property (c).
n
Since ψ(x0 ) is fixed and ψ(y) is bounded above, it follows that ψ(x) is
bounded below for x ∈ B.
Step 4: ψ is locally Lipschitz. Let x0 ∈ U , let V be a neighborhood of
x0 on which |ψ| ≤ M < +∞, and let r > 0 be such that Br (x0 ) ⊂ V .
For any y, y ∈ Br/2 (x0 ), we can write y = (1 − t) y + t z, where
t = |y − y |/r, so z = (y − y )/t + y ∈ Br (x0 ) and |y − z| = r. Then
ψ(y ) ≤ (1 − t) ψ(y) + t ψ(z) + 2 t(1 − t) ω(|y − z|), so
where Σ () is the set of points x such that ∇− ψ(x) contains a segment
[p, p ] of length 1/ and |p| ≤ . To conclude, it is sufficient to show that
each Σ () is countably (n − 1)-rectifiable, and for that it is sufficient to
show that for each x ∈ Σ () the dimension of the tangent cone Tx Σ ()
is at most n − 1 (Theorem 10.48(i) in the First Appendix).
Regularity in a nonsmooth world 227
combine to yield
p − pk , x − xk ≥ −2 ω(|x − xk |).
So x − xk ω(|x − xk |) |x − xk |
p − pk , ≥ −2 .
tk |x − xk | tk
Passing to the limit, we find
p − p , q ≥ 0.
p − p , q = 0.
xk − x xk − x
−−−→ p; −−−→ p .
tk k→∞ tk k→∞
Then m(xk , xk ) ∈ D and m(xk , xk ) = (xk + xk )/2 + o(|xk − xk |) =
x + tk (pk + pk )/2 + o(tk ), so
p + p m(xk , xk ) − x
= lim ∈ Tx D.
2 k→∞ tk
Thus Tx D is a convex cone. This leaves two possibilities: either Tx D is
included in a half-space, or it is the whole of Rn .
Assume that Tx D = Rn . If C is a small (Euclidean) cube of side 2r,
centered at x0 , for r small enough any point in a neighborhood of x0 can
be written as a combination of barycenters of the vertices x1 , . . . , xN of
C, and all these barycenters will lie within a ball of radius 2r centered
at x0 . (Indeed, let C0 stand for the set of the vertices
of the cube C, then
C0 = {x ; ε ∈ {±1} }, where x = x0 + r
(ε) n (ε) εj ej , (ej )1≤j≤n being
(ε)
the canonical basis of Rn . For each ε ∈ {±1}n−1 , let C1 be the union
(ε)
of geodesic segments [x(ε,−1) , x(ε,1) ]. Then for ε ∈ {±1}n−2 let C2 be
(ε,−1)
the union of geodesic segments between an element of C1 and an
(ε,1)
element of C1 ; etc. After n operations we have a simply connected set
Cn which asymptotically coincides with the whole (solid) cube as r → 0;
so it is a neighborhood of x0 for r small enough.) Then in the interior of
C, ψ is bounded above by max ψ(x1 ), . . . , ψ(xN ) +sup {ω(s); s ≤ 4r}.
This shows at the same time that x0 lies in the interior of D, and that
ψ is bounded above around x0 .
Semiconvexity and semiconcavity 231
Since x0 is fixed and ψ(y ) is bounded above, this shows that ψ(y) is
bounded below as y varies in B.
Next, let us show that ψ is locally Lipschitz. Let V be a neigh-
borhood of x0 in which |ψ| is bounded by M . If r > 0 is small
enough, then for any y, y ∈ Br (x0 ) there is z = z(y, y ) ∈ V such
that y = [y, z]λ , λ = d(y, y )/4r ∈ [0, 1/2]. (Indeed, choose r so small
that all geodesics in B5r (x0 ) are minimizing, and B5r (x0 ) ⊂ V . Given
y and y , take the geodesic going from y to y , say expy (tv), 0 ≤ t ≤ 1;
extend it up to time t(λ) = 1/(1 − λ), write z = expy (t(λ) v). Then
d(x0 , z) ≤ d(x0 , y) + t(λ) d(y, y ) ≤ d(x0 , y) + 2 d(y, y ) < 5r.) So
whence
ψ(y ) − ψ(y) ψ(y ) − ψ(y) ψ(z) − ψ(y) ω(d(y, z))
= ≤ + . (10.13)
d(y, y ) λ d(y, z) d(y, z) d(y, z)
Indeed, let γ(t) = expx (tw), y = expx w, then for any t ∈ [0, 1],
so
ψ(γ(t)) − ψ(x) ψ(y) − ψ(x) ω(|w|)
≤ + (1 − t) .
t|w| |w| |w|
On the other hand, by subdifferentiability,
The limit t → 0 gives (10.14). At the same time, it shows that for
|w| ≤ r, 3 4
w ω(r)
p, ≤ ψLip (B2r (x0 )) + .
|w| r
By choosing w = rp, we conclude that |p| is bounded above, indepen-
dently of x. So ∇− ψ is locally bounded in the sense that there is a
uniform bound on the norms of all elements of ∇− ψ(x) when x varies
in a compact subset of the domain of ψ.
At last we can conclude. Let x be interior to D. By Theorem 10.8(i),
there is a sequence xk → x such that ∇− ψ(xk ) = ∅. For each k ∈ N,
let pk ∈ ∇− ψ(xk ). As k → ∞, there is a uniform bound on |pk |, so we
may extract a subsequence such that pk → p ∈ Rn . For each xk , for
each w ∈ Rn small enough,
Similarly,
f (γ0 ) ≤ f (γt ) + p, γ0 − γt + t ω d(γ0 , γ1 ) . (10.16)
Now take the linear combination of (10.15) and (10.16) with coefficients
t and 1 − t: Since t(γ1 − γt ) + (1 − t)(γ0 − γt ) = 0 (in Tγt M ), we recover
(H∞)1 For any x and for any measurable set S which does not “lie
on one side of x” (in the sense that Tx S is not contained in a
half-space) there is a finite collection of elements z1 , . . . , zk ∈
S, and a small open ball B containing x, such that for any
y outside of a compact set,
Remark 10.17. The requirements in (ii) and (iii) are fulfilled if the
Lagrangian L is time-independent, C 2 , strictly convex superlinear as a
function of v (recall Example 7.5). But they also hold true for other
interesting cases such as L(x, v, t) = |v|1+α , 0 < α < 1.
Remark 10.18. Part (i) of Proposition 10.15 means that the behavior
of the (squared) distance function is typical: if one plots c(x, y) as a
function of x, for fixed y, one will always see upward-pointing crests as
in Figure 10.1, never downward-pointing ones.
Finally, let us consider (iii). When x and y vary in small balls, the
velocity v along the minimizing curves will be bounded by assump-
tion; so the function ω will also be uniform. Then c(x, y) is locally
superdifferentiable as a function of x, and the conclusion comes from
Proposition 10.12.
Proposition 10.15 is basically all that is needed to treat quite general
cost functions on a compact Riemannian manifold. But for noncompact
manifolds, it might be difficult to check Assumptions (Lip), (SC) or
(H∞). Here are a few examples where this can be done.
Example 10.19. Gangbo and McCann have considered cost functions
of the form c(x, y) = c(x − y) on Rn × Rn , satisfying the following
assumption: For any given r > 0 and θ ∈ (0, π), if |y| is large enough
then there is a cone Kr,θ (y, e), with apex y, direction e, height r and
angle θ, such that c takes its maximum on Kr,θ (y, e) at y. Let us check
briefly that this assumption implies (H∞)1 . (The reader who feels
that both assumptions are equally obscure may very well skip this and
jump directly to Example 10.20.) Let x and S be given such that Tx S
is included in no half-space. So for each direction e ∈ S n−1 there are
points z+ and z− in S, each of which lies on one side of the hyperplane
passing through z and orthogonal to e. By a compactness argument,
one can find a finite collection of points z1 , . . . , zk in S, an angle θ < π
and a positive number r > 0 such that for all e ∈ S n−1 and for any w
close enough to x, the truncated cone Kr,θ (w, e) contains at least one of
the zj . Equivalently, Kr,θ (w − y, e) contains zj − y. But by assumption,
for |w−y| large enough there is a cone Kr,θ (w−y, e) such that c(z−y) ≤
c(w − y) for all z ∈ Kr,θ (w − y, e). This inequality applies to z = zj (for
some j), and then c(zj − y) ≤ c(w − y).
Example 10.20. As a particular case of the previous example, (H∞)1
holds true if c = c(x − y) is radially symmetric and strictly increasing
as a function of |x − y|.
Example 10.21. Gangbo and McCann considered cost functions that
also satisfy c(x, y) = c(x − y) with c convex and superlinear. This
assumption implies (H∞)2 . Indeed, if x in Rn and ε > 0 are given, let
z = x − ε(x − y)/|x − y|; it suffices to show that
or equivalently, with h = x − y,
238 10 Solution of the Monge problem II: Local approach
ε
c(h) − c h 1 − −−−→ +∞.
|h| h→∞
∂c ψ(x) = ∅ =⇒ ∇− ψ(x) = ∅.
So
So if y ∈
/ K, there is a z such that ψ(z) + c(z, y) ≤ c(x, y) − (M + 1) ≤
ψ(x) + c(x, y) − 1, and
φ(y) − c(x, y) ≤ inf ψ(z) + c(z, y) − c(x, y)
z∈B
≤ ψ(x) − 1 = sup [φ(y ) − c(x, y )] − 1.
y ∈Y
Then the supremum of φ(y) − c(x, y) over all Y is the same as the
supremum over only K. But this is a maximization problem for an
upper semicontinuous function on a compact set, so it admits a solution,
which belongs to ∂c ψ(x).
The same reasoning can be made with x replaced by w in a small
neighborhood B of x, then the conclusion is that ∂c ψ(w) is nonempty
and contained in the compact set K, uniformly for z ∈ B. If K ⊂ Ω
Differentiability of c-convex functions 241
(H∞) and Theorem 10.24 imply that ∂c ψ transforms compact sets into
compact sets; so the sequence (yk )k∈N takes values in a compact set,
and up to extraction of a subsequence it converges to some y ∈ Y. By
passing to the limit in the inequality defining the c-subdifferential, we
obtain y ∈ ∂c ψ(x). Since x ∈ Z, this determines y = T (x) uniquely,
so the whole sequence (T (xk ))k∈N converges to T (x), and T is indeed
continuous.
Equation (10.21) follows from the continuity of T . Indeed, the in-
clusion Spt µ ⊂ T −1 (T (Spt µ)) implies
ν[T (Spt µ)] = µ T −1 (T (Spt µ)) ≥ µ[Spt µ] = 1;
This section and the next one deal with extensions of Theorem 10.28.
Here we shall learn how to cover situations in which no control at
infinity is assumed, and in particular Assumption (iii) of Theorem 10.28
might not be satisfied. The short answer is that it is sufficient to replace
the gradient in (10.20) by an approximate gradient. (Actually a little
bit more will be lost, see Remarks 10.39 and 10.40 below.)
Removing the conditions at infinity 247
determines the unique optimal coupling between µ and ν , for the cost
c . (Note that ∇x c coincides with ∇x c when x is in the interior of B ,
and µ [∂B ] = 0, so equation (10.24) does hold true π -almost surely.)
Now we can define our Monge coupling. For each ∈ N, and each
x ∈ S \ Z , ψ coincides with ψ on a set which has density 1 at x, so
∇ψ (x) = ∇ψ(x), and (10.24) becomes
∇x c(x, y) + ∇ψ(x) = 0. (10.25)
then
∇ψ(x) + ∇x c(x, y) = 0 π(dx dy)-almost surely. (10.28)
µ[S ∩ Br (x)]
−−−→ 1.
µ[Br (x)] r→0
(The proof of this uses the fact that we are working on a Riemannian
manifold; see the bibliographical notes for more information.)
Moreover, the transport plan π induced by π on S coincides with
the deterministic transport associated with the map
So at least one of the sets {ψ < ψ } ∩ Br (x) and {ψ > ψ } ∩ Br (x)
has µ-measure at least µ[Br (x)]/2. Without loss of generality, I shall
assume that this is the set {ψ > ψ }; so
µ[Br (x)]
µ {ψ > ψ } ∩ Br (x) ≥ . (10.30)
2
Next, ψ coincides with ψ on the set S , which has µ-density 1 at x,
and similarly ψ coincides with ψ on a set of µ-density 1 at x. It follows
that
µ {ψ > ψ} ∩ {ψ > ψ } ∩ Br (x) ≥ µ[Br (x)] 1 − o(1) as r → 0.
2
(10.31)
Then since x is a Besicovich point of {∇ψ = ∇ψ } ∩ C ∩ C ,
∩ {ψ > ψ } ∩ {∇ψ = ∇ψ } ∩ (C ∩ C
µ {ψ > ψ} ) ∩ Br (x)
∩ {ψ > ψ } ∩ Br (x) − µ[Br (x) \ (C ∩ C
≥ µ {ψ > ψ} )]
1
≥ µ[Br (x)] − o(1) − o(1) .
2
As a conclusion,
∀r > 0 µ {ψ > ψ}∩{ψ
> ψ }∩{∇ψ = ∇ψ }∩(C ∩ C )∩Br (x) > 0.
(10.32)
Now let
A := {ψ > ψ}.
The proof will result from the next two claims:
Claim 1: T−1 (T (A)) ⊂ A;
254 10 Solution of the Monge problem II: Local approach
and let
r := d x, S ∩ T−1 (T (A)) /2.
On the one hand, since S ∩ T−1 (T (A)) = ∅ by definition, µ[S ∩B(x, r)∩
T−1 (T (A))] = µ[∅] = 0. On the other hand, r is positive by Claim 2,
so µ[S ∩ B(x, r)] > 0 by (10.32). Then
µ A \ T−1 (T (A)) ≥ µ S ∩ B(x, r) \ T−1 (T (A)) = µ[S ∩ B(x, r)] > 0.
This implies that ψ(x) < ψ(x), i.e. x ∈ A, and proves Claim 1.
Proof of Claim 2: Assume that this claim is false; then there is a se-
quence (xk )k∈N , valued in {ψ > ψ } ∩ (C ∩ C
) ∩ T−1 (T (A)), such
Removing the assumption of finite cost 255
(Recall that x is such that ψ (x) = ψ(x) = ψ(x) = ψ (x).) On the
other hand, since ψ is differentiable at x, we have
which is possible only if ∇ψ (x) − ∇ψ (x) = 0. But this contradicts the
definition of x. So Claim 2 holds true, and this concludes the proof of
Theorem 10.42.
The next theorem summarizes two results which were useful in the
present chapter:
Proof of Theorem 10.48. For each x ∈ S, let πx stand for the orthogonal
projection on Tx S, and let πx⊥ = Id − πx stand for the orthogonal
projection on (Tx S)⊥ . I claim that
258 10 Solution of the Monge problem II: Local approach
x − y x − y
⊥ k k
π x > πx . (10.38)
|x − yk | |x − yk |
Sk := x ∈ Sk ; πx − π ≤ δ .
this will imply that the intersection of Sk with a ball of diameter 1/k
is contained in an L-Lipschitz graph over π (Rn ), and the conclusion
will follow immediately.
To prove (10.39), note that, if π, π are any two orthogonal projec-
tions, then |(π ⊥ −(π )⊥ )(z)| = |(Id −π)(z)−(Id −π )(z)| = |(π−π )(z)|,
therefore π ⊥ − (π )⊥ = π − π , and
First Appendix: A little bit of geometric measure theory 259
|y − x|
∀x ∈ ∂S, ∃r > 0, ∃ν ∈ F, ∀y ∈ ∂S ∩Br (x), y −x, ν ≤
.
2
(10.40)
Indeed, otherwise there is x ∈ ∂S such that for all k ∈ N and for all
ν ∈ F there is yk ∈ ∂S such that |yk − x| ≤ 1/k and yk − x, ν >
|yk − x|/2. By assumption there is ξ ∈ S n−1 such that
∀ζ ∈ Tx S, ξ, ζ ≤ 0.
Let ν ∈ F be such that |ξ − ν| < 1/8 and let (yk )k∈N be a sequence as
above. Since yk ∈ ∂S and yk = x, there is yk ∈ S such that |yk − yk | <
|yk − x|/8. Then
|yk − x| |x − yk |
yk − x, ξ ≥ yk − x, ν − |yk − x| |ξ − ν| − |y − yk | ≥ ≥ .
4 8
So y − x 1
k
,ξ ≥ . (10.41)
|yk − x| 8
Up to extraction of a subsequence, (yk − x)/|yk − x| converges to some
ζ ∈ Tx S, and then by passing to the limit in (10.41) we have ζ, ξ ≥
1/8. But by definition, ξ is such that ζ, ξ ≤ 0 for all ζ ∈ Tx S. This
contradiction establishes (10.40).
As a consequence, ∂S is included in the union of all sets A1/k,ν ,
where k ∈ N, ν ∈ F , and
|y − x|
x = ζ(x , y),
x ∈ M ⇐⇒ y = ϕ(x ).
Rn−k ϕ
Rk
Fig. 10.2. k-dimensional graph
Here comes the main result of this If (Ai )1≤i≤m are sub-
Appendix.
sets of a vector space, I shall write Ai = { i ai ; ai ∈ Ai }.
and
Z = (−β, β) × Z ; D = V \ Z.
I claim that λn [Z] = 0. To prove this it is sufficient to check that
λn−1 [Z ] = 0. But Z is the nonincreasing limit of (Z )∈N , where
Z = y ∈ B(0, r0 );
By Fubini’s theorem,
λn x ∈ O; ∇fi (x) does not exist for some i ≥ (λn−1 [Z ]) × (1/);
and the left-hand side is equal to 0 since all fi are differentiable almost
everywhere. It follows that λn−1 [Z ] = 0, and by taking the limit → ∞
we obtain λ
n−1 [Z ] = 0.
Let f = fi , and let ∂1 f = ∇f, v stand for its partial derivative
with respect to the first coordinate. The first step of the proof has
shown that ∂1 f (x) ≥ α/2 at each point x where all functions fi are
differentiable. So, for each y ∈ B(0, r0 ) \ Z , the function t → f (t, y)
is Lipschitz and differentiable λ1 -almost everywhere on (−β, β), and it
satisfies f (t, y) ≥ α/2. Thus for all t, t ∈ (−β, β),
This holds true for all ((t, y), (t , y)) in D × D. Since Z = V \ D
has zero Lebesgue measure, D is dense in V , so (10.43) extends to
all ((t, y), (t , y)) ∈ V .
For all y ∈ B(0, r0 ), inequality (10.43), combined with the estimate
αβ
|f (0, y)| = |f (0, y) − f (0, 0)| ≤ f Lip |y| ≤ ,
4
guarantees that the equation f (t, y) = 0 admits exactly one solution
t = ϕ(y) in (−β, β).
It only remains to check that ϕ is Lipschitz on B(0, r0 ). Let y, z ∈
B(0, r0 ), then f (ϕ(y), y) = f (ϕ(z), z) = 0, so
264 10 Solution of the Monge problem II: Local approach
f (ϕ(y), y) − f (ϕ(z), y) = f (ϕ(z), z) − f (ϕ(z), y). (10.44)
Since the first partial derivative of f is no less than α/2, the left-hand
side of (10.44) is bounded below by (α/2)|ϕ(y) − ϕ(z)|, while the right-
hand side is bounded above by f Lip |z − y|. The conclusion is that
2 f Lip
|ϕ(y) − ϕ(z)| ≤ |z − y|,
α
so ϕ is indeed Lipschitz.
discussed in Chapter 14.) On [0, d(x, y)) the operator H(t) is well-
defined (since the geodesic is minimizing, it is only at t = d(x, y) that
eigenvalues −∞ may appear). It starts at H(0) = Id and then its
eigenvectors and eigenvalues vary smoothly as t varies in (0, d(x, y)).
The unit vector γ̇(t) is an eigenvector of H(t), associated with the
eigenvalue +1. The problem is to bound the eigenvalues in the or-
thogonal subspace S(t) = (γ̇)⊥ ⊂ Tγ(t) M . So let (e2 , . . . , en ) be an
orthonormal basis of S(0), and let (e2 (t), . . . , en (t)) be obtained by
parallel transport of (e2 , . . . , en ) along γ; for any t this remains an or-
thonormal basis of S(t). To achieve our goal, it is sufficient to bound
above the quantities h(t) = H(t) · ei (t), ei (t)γ(t) , where i is arbitrary
in {2, . . . , n}.
Since H(0) is the identity, we have h(0) = 1. To get a differential
equation on h(t), we can use a classical computation of Riemannian
geometry, about the Hessian of the distance (not squared!): If k(t) =
∇2x d(y, x) · ei (t), ei (t)γ(t) , then
where σ(t) is the sectional curvature of the plane generated by γ̇(t) and
ei (t) inside Tγ(t) M . To relate k(t) and h(t), we note that
d(y, x)2
∇x = d(y, x) ∇x d(y, x);
2
d(y, x)2
∇2x = d(y, x) ∇2x d(y, x) + ∇x d(x, y) ⊗ ∇x d(x, y).
2
By applying this to the tangent vector ei (t) and using the fact that
∇x d(x, y) at x = γ(t) is just γ̇(t), we get
From (10.46) follow the two comparison results which were used in
Theorem 10.41 and Corollary 10.44:
(a) Assume that the sectional curvatures of M are all non-
negative. Then (10.46) forces ḣ ≤ 0, so h remains bounded above by 1
for all times. In short:
266 10 Solution of the Monge problem II: Local approach
d(x, y)2
nonnegative sectional curvature =⇒ ∇2x ≤ Id Tx M .
2
(10.47)
(If we think of the Hessian as a bilinear form, this is the same as
∇2x (d(x, y)2 /2) ≤ g, where g is the Riemannian metric.) Inequal-
ity (10.47) is rigorous if d(x, y)2 /2 is twice differentiable at x; otherwise
the conclusion should be reinterpreted as
d(x, y)2 r2
x→ is semiconcave with a modulus ω(r) = .
2 2
(b) Assume now that the sectional curvatures at point x are
bounded below by −C/d(x0 , x)2 , where x0 is an arbitrary point. In
this case I shall say that M has asymptotically nonnegative curvature.
Then if y varies in a compact subset, we have a lower bound like σ(t) ≥
−C /d(y, x)2 = −C /t2 , where C is some positive constant. So (10.46)
implies the differential inequality
Bibliographical notes
The key ideas in this chapter were first used in the case of the quadratic
cost function in Euclidean space [154, 156, 722].
The existence of solutions to the Monge problem and the differ-
entiability of c-convex functions, for strictly superlinear convex cost
functions in Rn (other than quadratic) was investigated by several au-
thors, including in particular Rüschendorf [717] (formula (10.4) seems
to appear there for the first time), Smith and Knott [754], Gangbo and
McCann [398, 399]. In the latter reference, the authors get rid of all mo-
ment assumptions by avoiding the explicit use of Kantorovich duality.
These results are reviewed in [814, Chapter 2]. Gangbo and McCann
impose some assumptions of growth and superlinearity, such as the one
described in Example 10.19. Ekeland [323] applies similar tools to the
optimal matching problem (in which both measures are transported to
another topological space).
It is interesting to notice that the structure of the optimal map can
be guessed heuristically from the old-fashioned method of Lagrange
multipliers; see [328, Section 2.2] where this is done for the case of the
quadratic cost function.
The terminology of the twist condition comes from dynamical sys-
tems, in particular the study of certain classes of diffeomorphisms in
dimension 2 called twist diffeomorphisms [66]. Moser [641] showed that
under certain conditions, these diffeomorphisms can be represented as
the time-1 map of a strictly convex Lagrangian system. In this setting,
the twist condition can be informally recast by saying that the dynam-
ics “twists” vertical lines (different velocities, common position) into
oblique curves (different positions).
There are cases of interest where the twist condition is satisfied
even though the cost function does not have a Lagrangian struc-
ture. Examples are the so-called symmetrized Bregman cost function
c(x, y) = ∇φ(x) − ∇φ(y), x − y where φ is strictly convex [206],
268 10 Solution of the Monge problem II: Local approach
which requires the doubling property. The price to pay for Besicovich’s
theorem is that it only works in Rn (or a Riemannian manifold, by
localization) rather than on a general metric space.
The nonsmooth implicit function theorem in the second Appendix
(Theorem 10.50) seems to be folklore in nonsmooth real analysis; the
core of its proof was explained to me by Fathi. Corollary 10.52 was
discovered or rediscovered by McCann [613, Appendix], in the case
where ψ and ψ are convex functions in Rn .
Everything in the Third Appendix, in particular the key differential
inequality (10.46), was explained to me by Gallot. The lower bound
assumption on the sectional curvatures σx ≥ −C/d(x0 , x)2 is sufficient
to get upper bounds on ∇2x d(x, y)2 as y stays in a compact set, but it
is not sufficient to get upper bounds that are uniform in both x and y.
A counterexample is developed in [393, pp. 213–214].
The exact computation about the hyperbolic space in Example 10.53
is the extremal situation for a comparison theorem about the Hessian of
the squared distance [246, Lemma 3.12]: If M is a Riemannian manifold
with sectional curvature bounded below by κ < 0, then
d(x, y)2 |κ| d(x, y)
∇x2
≤ .
2 tanh( |κ| d(x, y))
If Tt0 →t = Tt ◦ Tt−1
0
stands for the transport map between µt0 and µt ,
then the equation
also holds true for t0 ∈ (0, 1); but now this is just the theorem of change
of variables for Lipschitz maps.
Then,
ft0 (x)
F (y, ft (y)) vol(dy) = F Tt0 →t (x), Jt0 →t (x) vol(dx).
M M Jt0 →t (x)
Furthermore, µt0 (dx)-almost surely, Jt0 →t (x) > 0 for all t ∈ [0, 1].
Proof of Theorem 11.3. For brevity I shall abbreviate vol(dx) into just
dx. Let us first consider the case when (µt )0≤t≤1 is compactly sup-
ported. Let Π be a probability measure on the set of minimizing curves,
such that µt = (et )# Π. Let Kt = et (Spt Π) and Kt0 = et0 (Spt Π). By
Theorem 8.5, the map γt0 → γt is well-defined and Lipschitz for all
γ ∈ Spt Π. So Tt0 →t (γt0 ) = γt is a Lipschitz map Kt0 → Kt . By assump-
tion µt is absolutely continuous, so Theorem 10.28 (applied with the
cost function ct0 ,t (x, y), or maybe ct,t0 (x, y) if t < t0 ) guarantees that
the coupling (γt , γt0 ) is deterministic, which amounts to saying that
γt0 → γt is injective apart from a set of zero probability.
Then we can use the change of variables formula with g = 1Kt ,
T = Tt0 →t , and we find f (x) = Jt0 →t (x). Therefore, for any nonnegative
measurable function G on M ,
G(y) dy = G(y) d((Tt0 →t )# µ)(y)
Kt Kt
= (G ◦ Tt0 →t (x)) f (x) dx
Kt0
= G(Tt0 →t (x)) Jt0 →t (x) dx.
Kt0
We can apply this to G(y) = F (y, ft (y)) and replace ft (Tt0 →t (x)) by
ft0 (x)/Jt0 →t (x); this is allowed since in the right-hand side the contri-
bution of those x with ft (Tt0 →t (x)) = 0 is negligible, and Jt0 →t (x) = 0
implies (almost surely) ft0 (x) = 0. So in the end
ft0 (x)
F (y, ft (y)) dy = F Tt0 →t (x), Jt0 →t (x) dx.
Kt Kt0 Jt0 →t (x)
Since this is true almost surely on K , for each , it is also true almost
surely.
Next, for any nonnegative measurable function G, by monotone con-
vergence and the first part of the proof one has
G(y) dy = lim G(y) dy
U Kt, →∞ Kt,
= lim G(Tt0 →t (x)) Jt0 →t (x) dx
→∞ Kt ,
0
maps F1 : γ(t0 ) → (γ(0), γ(t0 )), F2 : (γ(0), γ(t0 )) → (γ(0), γ̇(0)) and
F3 : (γ(0), γ̇(0)) → γ(t). Both F2 and F3 have a positive Jacobian de-
terminant, at least if t < 1; so if x is chosen in such a way that F1 has
a positive Jacobian determinant at x, then also Tt0 →t = F3 ◦ F2 ◦ F1
will have a positive Jacobian determinant at x for t ∈ [0, 1).
Bibliographical notes
Theorem 11.1 can be obtained (in Rn ) by combining Lemma 5.5.3
in [30] with Theorem 3.83 in [26].
In the context of optimal transport, the change of variables
formula (11.1) was proven by McCann [614]. His argument is based
on Lebesgue’s density theory, and takes advantage of Alexandrov’s
theorem, alluded to in this chapter and proven later as Theorem 14.25:
A convex function admits a Taylor expansion at order 2 at almost each x
in its domain of definition. Since the gradient of a convex function has
locally bounded variation, Alexandrov’s theorem can be seen essentially
as a particular case of the theorem of approximate differentiability of
functions with bounded variation. McCann’s argument is reproduced
in [814, Theorem 4.8].
Along with Cordero-Erausquin and Schmuckenschläger, McCann
later generalized his result to the case of Riemannian manifolds [246].
Modulo certain complications, the proof basically follows the same pat-
tern as in Rn . Then Cordero-Erausquin [243] treated the case of strictly
convex cost functions in Rn in a similar way.
Ambrosio pointed out that those results could be retrieved within
the general framework of push-forward by approximately differentiable
mappings. This point of view has the disadvantage of involving more
subtle arguments, but the advantage of showing that it is not a special
feature of optimal transport. It also applies to nonsmooth cost functions
such as |x − y|p . In fact it covers general strictly convex costs of the
form c(x−y) as soon as c has superlinear growth, is C 1 everywhere and
C 2 out of the origin. A more precise discussion of these subtle issues
can be found in [30, Section 6.2.1].
It is a general feature of optimal transport with strictly convex cost
in Rn that if T stands for the optimal transport map, then the Jacobian
matrix ∇T , even if not necessarily nonnegative symmetric, is diagonal-
izable with nonnegative eigenvalues; see Cordero-Erausquin [243] and
Bibliographical notes 279
Ambrosio, Gigli and Savaré [30, Section 6.2]. From an Eulerian perspec-
tive, that diagonalizability property was already noticed by Otto [666,
Proposition A.4]. I don’t know if there is an analog on Riemannian
manifolds.
Changes of variables of the form y = expx (∇ψ(x)) (where ψ is
not necessarily d2 /2-convex) have been used in a remarkable paper
by Cabré [181] to investigate qualitative properties of nondivergent el-
liptic equations (Liouville theorem, Alexandrov–Bakelman–Pucci esti-
mates, Krylov–Safonov–Harnack inequality) on Riemannian manifolds
with nonnegative sectional curvature. (See for instance [189, 416, 786]
for classical proofs in Rn .) It is mentioned in [181] that the methods
extend to sectional curvature bounded below. For the Harnack inequal-
ity, Cabré’s method was extended to nonnegative Ricci curvature by
S. Kim [516].
12
Smoothness
f (x)
det ∇2 ψ(x) = . (12.4)
g(∇ψ(x))
Caffarelli’s counterexample
Caffarelli understood that regularity results for (12.2) in Rn cannot
be obtained unless one adds an assumption of convexity of the support
of ν. Without such an assumption, the optimal transport may very well
be discontinuous, as the next counterexample shows.
Theorem 12.3 (Anexampleofdiscontinuousoptimaltransport).
There are smooth compactly supported probability densities f and g on
Rn , such that the supports of f and g are smooth and connected, f and g
are (strictly) positive in the interior of their respective supports, and yet
the optimal transport between µ(dx) = f (x) dx and ν(dy) = g(y) dy, for
the cost c(x, y) = |x − y|2 , is discontinuous.
284 12 Smoothness
Fig. 12.1. Principle behind Caffarelli’s counterexample. The optimal transport from
the ball to the “dumb-bells” has to be discontinuous, and in effect splits the upper
region S into the upper left and upper right regions S− and S+ . Otherwise, there
should be some transport along the dashed lines, but for some lines this would
contradict monotonicity.
Proof of Theorem 12.3. Let f be the indicator function of the unit ball
B in R2 (normalized to be a probability measure), and let g = gε be
the (normalized) indicator function of a set Cε obtained by first sepa-
rating the ball into two halves B1 and B2 (say with distance 2), then
building a thin bridge between those two halves, of width O(ε). (See
Figure 12.1.) Let also g be the normalized indicator function of B1 ∪B2 :
this is the limit of gε as ε ↓ 0. It is not difficult to see that gε (identified
with a probability measure) can be obtained from f by a continuous
deterministic transport (after all, one can deform B continuously into
Cε ; just think that you are playing with clay, then it is possible to mas-
sage the ball into Cε , without tearing off). However, we shall see here
that for ε small enough, the optimal transport cannot be continuous.
The proof will rest on the stability of optimal transport: If T is
the unique optimal transport between µ and ν, and Tε is an optimal
transport betwen µ and νε , then Tε converges to T in µ-probability as
ε ↓ 0 (Corollary 5.23).
In the present case, choosing µ(dx) = f (x) dx, ν(dy) = g(y) dy,
c(x, y) = |x − y|2 , it is easy to figure out that the unique optimal
transport T is the one that sends (x, y) to (x − 1, y) if x < 0, and to
(x + 1, y) if x > 0.
Let now S, S− and S+ be the upper regions in the ball, the left
half-ball and the right half-ball, respectively, as in Figure 12.1. As a
consequence of the convergence in probability, for ε small enough, a
Loeper’s counterexample 285
Loeper’s counterexample
sense there is no hope for general regularity results outside the world
of nonnegative sectional curvature.
A+
B− O B+
A−
Fig. 12.2. Principle behind Loeper’s counterexample. This is the surface S, im-
mersed in R3 , “viewed from above”. By symmetry, O has to stay in place. Because
most of the initial mass is close to A+ and A− , and most of the final mass is close
to B+ and B− , at least some mass has to move from one of the A-balls to one
of the B-balls. But then, because of the modified (negative curvature) Pythagoras
inequality, it is more efficient to replace the transport scheme (A → B, O → O), by
(A → O, O → B).
Loeper’s counterexample 287
equation (y = 0) (resp. x = 0); then the lines (O, A± ) and (O, B± ) are
orthogonal at O. Since we are on a negatively curved surface, Pythago-
ras’s identity in a triangle with a square angle is modified in favor of
the diagonal, so
Proof of Theorem 12.7. Note first that the continuity of the cost func-
tion and the compactness of X × Y implies the continuity of ψ. In the
proof, the notation d will be used interchangeably for the distances on
X and Y.
Let C1 and C2 be two disjoint connected components of ∂c ψ(x).
Since ∂c ψ(x) is closed, C1 and C2 lie a positive distance apart. Let
r = d(C1 , C2 )/5, and let C = {y ∈ Y; d(y, ∂c ψ(x)) ≥ 2 r}. Further, let
y 1 ∈ C1 , y 2 ∈ C2 , B1 = Br (y 1 ), B2 = Br (y 2 ). Obviously, C is compact,
and any path going from B1 to B2 has to go through C.
Then let K = {z ∈ X ; ∃ y ∈ C; y ∈ ∂c ψ(z)}. It is clear that K is
compact: Indeed, if zk ∈ K converges to z ∈ X , and yk ∈ ∂c ψ(zk ), then
without loss of generality yk converges to some y ∈ C and one can pass
to the limit in the inequality ψ(zk ) + c(zk , yk ) ≤ ψ(t) + c(t, yk ), where
t ∈ X is arbitrary. Also K is not empty since X and Y are compact.
Next I claim that for any y ∈ C, for any x such that y ∈ ∂c ψ(x),
and for any i ∈ {1, 2},
Indeed, y ∈
/ ∂c ψ(x), so
ψ(x) + c(x, y) = inf ψ(
x) + c(
x, y) < ψ(x) + c(x, y). (12.8)
∈X
x
for some ε > 0. This follows easily from a contradiction argument based
on the compactness of C and K, and once again the continuity of ∂c ψ.
Then let δ ∈ (0, r) be small enough that for any (x, y) ∈ K × C, for
any i ∈ {1, 2}, the inequalities
290 12 Smoothness
imply
c(x , y) − c(x, y) ≤ ε , c(x , y) − c(x, y) ≤ ε , (12.11)
10 10
c(x , y i ) − c(x, y i ) ≤ ε , c(x , y i ) − c(x, y i ) ≤ ε .
10 10
Let K δ = {x ∈ X ; d(x, K) ≤ δ}. From the assumptions on X and Y,
Bδ (x) has positive volume, so we can fix a smooth positive probability
density f on X such that the measure µ = f vol satisfies
3
µ[Bδ (x)] ≥ ; f ≥ ε0 > 0 on K δ . (12.12)
4
Also we can construct a sequence of smooth positive probability den-
sities (gk )k∈N on Y such that the measures νk = gk vol satisfy
1
νk −−−→ δ y 1 + δy 2 weakly. (12.13)
k→∞ 2
Let us assume the existence of a continuous optimal transport Tk
sending µk to νk , for any k. We shall reach a contradiction, and this
will prove the theorem.
From (12.13), νk [Bδ (y 1 )] ≥ 1/3 for k large enough. Then by (12.12)
the transport Tk has to send some mass from Bδ (x) to Bδ (y 1 ), and
similarly from Bδ (x) to Bδ (y 2 ). Since Tk (Bδ (x)) is connected, it has to
meet C. So there are yk ∈ C and xk ∈ Bδ (x) such that Tk (xk ) = yk .
Let xk ∈ K be such that yk ∈ ∂c ψ(xk ). Without loss of generality
we may assume that xk → x∞ ∈ K as k → ∞. By the second part
of (12.12), m := µ[Bδ (x∞ )] ≥ ε0 vol [Bδ (x∞ )] > 0. When k is large
enough, νk [Bδ (y 1 ) ∪ Bδ (y 2 )] > 1 − m (by (12.13) again), so Tk has to
send some mass from Bδ (x∞ ) to either Bδ (y 1 ) or Bδ (y 2 ); say Bδ (y 1 ).
In other words, there is some xk ∈ Bδ (x∞ ) such that T (xk ) ∈ Bδ (y 1 ).
Let us recapitulate: for k large enough,
Before stating the main definition of this section, I shall now in-
troduce some more notation. If X is a closed subset of a Riemannian
manifold M and c : X → Y → R is a continuous cost function, for
any x in the interior of X I shall denote by Dom (∇x c(x, · )) the in-
terior of Dom (∇x c(x, · )). Moreover I shall write Dom (∇x c) for the
union of all sets {x} × Dom (∇x c(x, · )), where x varies in the inte-
rior of X . For instance, if X = Y = M is a complete Riemannian
manifold and c(x, y) = d(x, y)2 is the square of the geodesic distance,
then Dom (∇x c) is obtained from M × M by removing the cut locus,
while Dom (∇x c) might be slightly bigger (these facts are recalled in
the Appendix).
∇− −
c ψ(x) = ∇ ψ(x),
where (x, y0 ) and (x, y1 ) belong to Dom (∇x c). (The functions ψx,y0 ,y1
play in some sense the role of x → |x1 | in usual convexity theory.) See
Figure 12.3 for an illustration of the resulting “recipe”.
y0 y1/2 y1
Fig. 12.3. Regular cost function. Take two cost-shaped mountains peaked at y0
and y1 , let x be a pass, choose an intermediate point yt on (y0 , y1 )x , and grow a
mountain peaked at yt from below; the mountain should emerge at x. (Note: the
shape of the mountain is the negative of the cost function.)
Regular cost functions 295
Proof of Proposition 12.15. Let us start with (i). The necessity of the
condition is obvious since y0 , y1 both belong to ∂c ψx,y0 ,y1 (x). Con-
versely, if the condition is satisfied, let ψ be any c-convex function
X → R, let x belong to the interior of X and let y0 , y1 ∈ ∂c ψ(x). By
adding a suitable constant to ψ (which will not change the subdifferen-
tial), we may assume that ψ(x) = 0. Since ψ is c-convex, ψ ≥ ψx,y0 ,y1 ,
so for any t ∈ [0, 1] and x ∈ X ,
E(∇− ψ(x)) ⊂ ∇− −
c ψ(x) ⊂ ∇ ψ(x),
The next result will show that the regularity property of the cost
function is a necessary condition for a general theory of regularity of
optimal transport. In view of Remark 12.19 it is close in spirit to The-
orem 12.7.
Now let us prove Theorem 12.20. The following lemma will be useful.
Remark 12.24. One can refine Proposition 10.15 to show that cost
functions deriving from a well-behaved Lagrangian do satisfy the strong
twist condition. In the Appendix I shall give more details for the im-
portant particular case of the squared geodesic distance.
by the formula
3
Sc (x, y) (ξ, η) = cij,r cr,s cs,k − cij,k ξ i ξ j η k η . (12.20)
2
ijkrs
3 ∂2 ∂2
Sc (x, y)(ξ, η) = − c x, c-expx (p) , (12.23)
2 ∂p2η ∂x2ξ x=x, p=−dx c(x,y)
3 ∂2 ∂2
− c č-expy (q), c-exp x (p) . (12.24)
2 ∂p2η ∂qξ2 p=−dx c(x,y), q=−dy c(x,y)
Proof of Theorem 12.35. Let us start with some reminders about clas-
sical convexity in Euclidean space. If Ω is an open set in Rn with C 2
boundary and x ∈ ∂Ω, let Tx Ω stand for the tangent space to ∂Ω at x,
and n for the outward normal on ∂Ω (extended smoothly in a neigh-
borhood of x). The second fundamental
form of Ω, evaluated at x, is
defined on Tx Ω by II(x)(ξ) = ij ∂i nj ξ i ξ j . A defining function for Ω
at x is a function Φ defined in a neighborhood of x, such that Φ < 0
in Ω, Φ > 0 outside Ω, and |∇Φ| > 0 on ∂Ω. Such a function always
exists (locally), for instance one can choose Φ(x) = ±d(x, ∂Ω), with +
sign when x is outside Ω and − when x is in Ω. (In that case ∇Φ is the
outward normal on ∂Ω.) If Φ is a defining function, then n = ∇Φ/|∇Φ|
on ∂Ω, and for all ξ⊥n,
Φij Φj Φik Φij i j
i j
∂i nj ξ ξ = − ξi ξj = ξ ξ .
|∇Φ| |∇Φ| 3 |∇Φ|
(The derivation indices are raised because these are derivatives with
respect to the p variables.) Let ξ = (ξ j ) ∈ Ty M be tangent to C;
the bilinear form ∇2 c identifies ξ with an element in (Tx M )∗ whose
coordinates are denoted by (ξi ) = (−ci,j ξ j ). Then
Ψ ij ξi ξj = Φrs − crs,k ck, Φ ) ξ r ξ s .
By assumption
the left-hand side is finite, and the right-hand side is
exactly #{t; ytz ∈ Σ} Hn−1 (dz); so the integrand is finite for almost
all z, and in particular there is a sequence zk → 0 such that each (ytzk )
intersects Σ finitely many often.
Proof of Theorem 12.36. Let us first assume that the cost c is regular
on D, and prove the nonnegativity of Sc .
Let (x, y) ∈ D be given, and let p ∈ Tx M be such that ∇x c(x, y) +
p = 0. Let ν be a unit vector in Tx M . For ε ∈ [−ε0 , ε0 ] one can define
a c-segment by the formula y(ε) = (∇x c)−1 (x, −p(ε)), p(ε) = p + ε ν;
The Ma–Trudinger–Wang condition 307
So
ḣ(t) = − c,j (x, yt ) − c,j (x, yt ) ζ j = ci,j η i ζ j ,
where ηj = −c,j (x, yt ) + c,j (x, yt ) and η i = −cj,i ηj .
Next,
ḧ(t) = − c,ij (x, yt ) − c,ij (x, yt ) ζ i ζ j
+ c,j (x, yt ) − c,j (x, yt ) cj,k ck,i ζ ζ i
= − c,ij (x, yt ) − c,ij (x, yt ) − η c,k ck,ij ζ i ζ j
= − c,ij (x, yt ) − c,ij (x, yt ) − η k ck,ij ζ i ζ j .
Now freeze t, yt , ζ, and let Φ(x) = c,ij (x, yt ) ζ i ζ j = ∇2y c(x, yt )·ζ, ζ.
This can be seen either as a function of x or as a function of q =
−∇y c(x, yt ), viewed as a 1-form on Tyt M . Computations to go back
and forth between the q-space and the x-space are the same as those
between the p-space and the y-space. If q and q are associated to x and
x respectively, then η = q − q, and
The Ma–Trudinger–Wang condition 309
c,ij (x, yt ) − c,ij (x, yt ) − η k ck,ij ζ i ζ j = Φ(q) − Φ(q) − dq Φ · (q − q)
1
2
= ∇q Φ (1 − s)q + sq · (q − q, q − q) (1 − s) ds.
0
domains. The norms |ζ| and |η| remain bounded as t varies in [0, 1],
so (12.26) implies
ḧ(t) ≥ −C |ḣ(t)|. (12.27)
Now let hε (t) = h(t) + ε(t − 1/2)k , where ε > 0 and k ∈ 2N will
be chosen later. Let t0 be such that hε admits a maximum at t0 . If
t0 ∈ (0, 1), then ḣε (t0 ) = 0, ḧε (t0 ) ≤ 0, so
use of the c-second fundamental form; and similarly for the x variable.
Then one checks the sign condition on Sc . In simple enough situations,
this strategy can be worked out successfully.
Remark 12.43. The implications (i) ⇒ (ii) and (ii) ⇒ (iii) remain
true without the convexity assumptions. It is a natural open prob-
lem whether these assumptions can be completely dispensed with in
Theorem 12.42. A bold conjecture would be that (i), (ii) and (iii)
are always equivalent and automatically imply the total c-convexity
of Dom (∇x c).
(k) (k)
yi → yi (i = 0, 1). In particular, there are pi ∈ Tx M , uniquely
(k) (k)
determined, such that ∇x c(x, yi ) + pi = 0.
(k) (k)
Then let ψ (k) = ψx,y(k) ,y(k) . The c-segment [y0 , y1 ]x is well-
0 1
defined and included in ∂c ψ (k) (x) (because c is regular). In other
(k) (k)
words, c-expx ((1 − t) p0 + t p1 ) ∈ ∂c ψ (k) (x). Passing to the limit as
k → ∞, after extraction of a subsequence if necessary, we find vectors
p0 , p1 (not necessarily uniquely determined) such that c-expx (p0 ) = y0 ,
c-expx (p1 ) = y1 , and c-expx ((1 − t) p0 + t p1 ) ∈ ∂c ψ(x). This proves the
desired connectedness property.
Conversely, assume that (i) holds true. Let x, y be such that (x, y) ∈
Dom (∇x c), and let p = −∇x c(x, y). Let ν be a unit vector in Ty M ;
(ε) (ε)
for ε > 0 small enough, y0 = c-expx (p + εν) and y1 = c-expx (p − εν)
belong to Dom (∇x c(x, · )). Let ψ (ε) = ψx,y(ε) ,y(ε) . As ε → 0, ∂c ψ (ε) (x)
0 1
shrinks to ∂c ψx,y,y (x) = {y} (because (x, y) ∈ Dom (∇x c)). Since
Dom (∇x c) is open, for ε small enough the whole set ∂c ψ ε (x) is included
in Dom (∇x c). By the same reasoning as in the proof of Proposition
12.15(iii), the connectedness of ∂c ψ ε (x) implies its c-convexity. Then
the proof of Theorem 12.36(i) can be repeated to show that Sc (x, y) ≥ 0
along pairs of vectors satisfying the correct orthogonality condition.
To prove (12.29) it is sufficient to show that h(t) ≥ h(0) for all t ∈ [0, 1].
The č-convexity of Ď implies that (xt , y) always lies in D. Let
q = −∇y c(x, y), q = −∇y c(x, y), η = q − q, then as in the proof of
Theorem 12.36 we have (ẋ)j = −ck,j ηk = η j ,
ḣ(t) = ψi (xt ) + ci (xt , y) η i = −ci,j (xt , y) η i ζ j
ḧ ≥ −C |ḣ(t)|, (12.32)
The property of c-convexity of the target is the key to get good control
of the localization of the gradient of the solution to (12.2). This asser-
tion might seem awkward: After all, we already know that under gen-
eral assumptions, T (Spt µ) = Spt ν (recall the end of Theorem 10.28),
where the transport T is related to the gradient of ψ by T (x) =
(∇x c)−1 (x, −∇ψ(x)); so ∇ψ(x) always belongs to −∇x c(Spt µ, Spt ν)
when x varies in Spt µ.
316 12 Smoothness
To understand why this is not enough, assume that you are approx-
imating ψ by smooth approximate solutions ψk . Then ∇ψk −→ ∇ψ
at all points of differentiability of ψ, but you have no control of the
behavior of ∇ψk (x) if ψ is not differentiable at x! In particular, in
the approximation process the point y might very well get beyond the
support of ν, putting you in trouble. To guarantee good control of
smooth approximations of ψ, you need an information on the whole
c-subdifferential ∂c ψ(x). The next theorem says that such control is
available as soon as the target is c-convex.
Theorem 12.49 (Control of c-subdifferential by c-convexity of
target). Let X , Y, c : X × Y → R, µ ∈ P (X ), ν ∈ P (Y) and
ψ : X → R ∪ {+∞} satisfy the same assumptions as in Theorem 10.28
(including (H∞)). Let Ω ⊂ X be an open set such that Spt µ = Ω, and
let C ⊂ Y be a closed set such that Spt ν ⊂ C. Assume that:
(a) Ω × C ⊂ Dom (∇x c);
(b) C is c-convex with respect to Ω.
Then ∂c ψ(Ω) ⊂ C.
Proof of Theorem 12.49. We already know from Theorem 10.28 that
T (Ω) ⊂ C, where T (x) = (∇x c)−1 (x, −∇ψ(x)) stands for the optimal
transport. In particular, ∂c ψ(Ω ∩ Dom (∇ψ)) ⊂ C. The problem is to
control ∂c ψ(x) when ψ is not differentiable at x.
Let x ∈ Ω be such a point, and let y ∈ ∂c ψ(x). By Theorem 10.25,
−∇x c(x, y) ∈ ∇− ψ(x). Since ψ is locally semiconvex, we can apply
Remark 10.51 to deduce that ∇− ψ(x) is the convex hull of limits of
∇ψ(xk ) when xk → x. Then by Proposition 12.15(ii), there are L ∈ N
= n + 1 would do), α ≥ 0 and (xk, )k∈N (1 ≤ ≤ L) such that
(L
α = 1, xk, → x as k → ∞, and
L
α ∇ψ(xk, ) −−−→ −∇x c(x, y). (12.33)
k→∞
=1
Smoothness results
Remark 12.53. Theorem 12.51 shows that the regularity of the cost
function is sufficient to build a strong regularity theory. These results
are still not optimal and likely to be refined in the near future; in
particular one can ask whether C α → C 2,α estimates are available for
plainly regular cost functions (but Caffarelli’s methods strongly use the
affine invariance properties of the quadratic cost function); or whether
interior estimates exist (Theorem 12.52(ii) shows that this is the case
for uniformly regular costs).
Remark 12.54. On the other hand, the first part of Theorem 12.52
shows that a uniformly regular cost function behaves better, in certain
ways, than the square Euclidean norm! For instance, the condition in
Theorem 12.52(i) is automatically satisfied if µ(dx) = f (x) dx, f ∈ Lp
for p > n; but it also allows µ to be a singular measure. (Such estimates
are not even true for the linear Laplace equation!) As observed by
specialists, uniform regularity makes the equation much more elliptic.
With Theorem 12.56 and some more work to establish the strict
convexity, it is possible to extend Caffarelli’s theory to unbounded
domains.
Bibliographical notes
Monge himself was probably not aware of the relation between the
Monge problem and Monge–Ampère equations; this link was made
much later, maybe in the work of Knott and Smith [524]. In any case
it is Brenier [156] who made this connection popular among the com-
munity of partial differential equations. Accordingly, weak solutions of
Monge–Ampère-type equations constructed by means of optimal trans-
port are often called Brenier solutions in the literature. McCann [614]
proved that such a solution automatically satisfies the Monge–Ampère
equation almost everywhere (see the bibliographical notes for Chapter
11). Caffarelli [185] showed that for a convex target, Brenier’s notion
of solution is equivalent to the older concepts of Alexandrov solution
and viscosity solution. These notions are reviewed in [814, Chapter 4]
and a proof of the equivalence between Brenier and Alexandrov so-
lutions is recast there. (The concept of Alexandrov solution is devel-
oped in [53, Section 11.2].) Feyel and Üstünel [359, 361, 362] studied
the infinite-dimensional Monge–Ampère equation induced by optimal
transport with quadratic cost on the Wiener space.
The modern regularity theory of the Monge–Ampère equation was
pioneered by Alexandrov [16, 17] and Pogorelov [684, 685]. Since then
it has become one of the most prestigious subjects of fully nonlinear
partial differential equations, in relation to geometric problems such
324 12 Smoothness
domain makes the oblique condition more elliptic in some sense than
the Neumann condition.) Fully nonlinear elliptic equations with oblique
boundary condition had been studied before in [555, 560, 561], and the
connection with the second boundary value problem for the Monge–
Ampère equation had been suggested in [562]. Compared to Caffarelli’s
method, this one only covers the global estimates, and requires higher
initial regularity; but it is more elementary.
The generalization of these regularity estimates to nonquadratic
cost functions stood as an open problem for some time. Then Ma,
Trudinger and Wang [585] discovered that the older interior estimates
by Wang [833] could be adapted to general cost functions satisfying the
condition Sc > 0 (this condition was called (A3) in their paper and
in subsequent works). Theorem 12.52(ii) is extracted from this refer-
ence. A subtle caveat in [585] was corrected in [793] (see Theorem 1
there). A key property discovered in this study is that if c is a regular
cost function and ψ is c-convex, then any local c-support function for
ψ is also a global c-support function (which is nontrivial unless ψ is
differentiable); an alternative proof can be derived from the method of
Y.-H. Kim and McCann [519, 520].
Trudinger and Wang [794] later adapted the method of Urbas to
treat the boundary regularity under the weaker condition Sc ≥ 0 (there
called (A3w)). The proof of Theorem 12.51 can be found there.
At this point Loeper [570] made three crucial contributions to the
theory. First he derived the very strong estimates in Theorem 12.52(i)
which showed that the Ma–Trudinger–Wang (A3) condition (called
(As) in Loeper’s paper) leads to a theory which is stronger than
the Euclidean one in some sense (this was already somehow implicit
in [585]). Secondly, he found a geometric interpretation of this condi-
tion, namely the regularity property (Definition 12.14), and related it to
well-known geometric concepts such as sectional curvature (Particular
Case 12.30). Thirdly, he proved that the weak condition (A3w) (called
(Aw) in his work) is mandatory to derive regularity (Theorem 12.21).
The psychological impact of this work was important: before that, the
Ma–Trudinger–Wang condition could be seen as an obscure ad hoc as-
sumption, while now it became the natural condition.
The proof of Theorem 12.52(i) in [570] was based on approximation
and used auxiliary results from [585] and [793] (which also used some
of the arguments in [570]. . . but there is no loophole!)
Bibliographical notes 327
Loeper [571] further proved that the squared distance on the sphere
is a uniformly regular cost, and combined all the above elements to
derive Theorem 12.58; the proof is simplified in [572]. In [571], Loeper
derived smoothness estimates similar to those in Theorem 12.52 for the
far-field reflector antenna problem.
The exponent β in Theorem 12.52(i) is explicit; for instance, in
the case when f = dµ/dx is bounded above and g is bounded below,
Loeper obtained β = (4n − 1)−1 , n being the dimension. (See [572] for
a simplified proof.) However, this is not optimal: Liu [566] improved
this into β = (2n − 1)−1 , which is sharp.
In a different direction, Caffarelli, Gutiérrez and Huang [191] could
get partial regularity for the far-field reflector antenna problem by very
elaborate variants of Caffarelli’s older techniques. This “direct” ap-
proach does not yet yield results as powerful as the a priori estimates
by Loeper, Ma, Trudinger and Wang, since only C 1 regularity is ob-
tained in [191], and only when the densities are bounded from above
and below; but it gives new insights into the subject.
In dimension 2, the whole theory of Monge–Ampère equations be-
comes much simpler, and has been the object of numerous studies [744].
Old results by Alexandrov [17] and Heinz [471] imply C 1 regularity of
the solution of det(∇2 ψ) = h as soon as h is bounded from above (and
strict convexity if it is bounded from below). Loeper noticed that this
implied strenghtened results for the solution of optimal transport with
quadratic cost in dimension 2, and together with Figalli [368] extended
this result to regular cost functions.
Now I shall briefly discuss counterexamples stories. Counterexam-
ples by Pogorelov and Caffarelli (see for instance [814, pp. 128–129])
show that solutions of the usual Monge–Ampère equation are not
smooth in general: some strict convexity on the solution is needed, and
it has to come from boundary conditions in one way or the other.
The counterexample in Theorem 12.3 is taken from Caffarelli [185],
where it is used to prove that the “Hessian measure” (a generalized
formulation of the Hessian determinant) cannot be absolutely continu-
ous if the bridge is thin enough; in the present notes I used a slightly
different reasoning to directly prove the discontinuity of the optimal
transport. The same can be said of Theorem 12.4, which is adapted
from Loeper [570]. (In Loeper’s paper the contradiction was obtained
indirectly as in Theorem 12.44.)
328 12 Smoothness
Qualitative picture
Recap
s,t
c (x, y) = inf L(γτ , γ̇τ , τ ) dτ ; γs = x, γt = y .
s
for some nontrivial (i.e. not identically −∞, and never +∞) function φ.
In case (ii), if nothing is known about the behavior of the distance
function at infinity, then the gradient ∇ in the left-hand side of (13.2)
should be replaced by an approximate gradient ∇.
4. Under the same assumptions, the (generalized) displacement
interpolation (µt )0≤t≤1 is unique. This follows from the almost sure
uniqueness of the minimizing curve joining γ0 to γ1 , where (γ0 , γ1 ) is
the optimal coupling. (Corollary 7.23 applies when the total cost is
finite; but even if the total cost is infinite, we can apply a reasoning
similar to the one in Corollary 7.23. Note that the result does not follow
from the vol ⊗ vol (dx0 dx1 )-uniqueness of the minimizing curve joining
x0 to x1 .)
5. Without loss of generality, one might assume that
φ(y) = inf ψ(x) + c(x, y)
x∈M
(these are true supremum and true infimum, not just up to a negligible
set). One can also assume without loss of generality that
and
φ(x1 ) − ψ(x0 ) = c(x0 , x1 ) almost surely.
6. It is still possible that two minimizing curves meet at time t = 0
or t = 1, but this event may occur only on a very small set, of dimension
at most n − 1.
7. All of the above remains true if one replaces µ0 at time 0 by µt
at time t, with obvious changes of notation (e.g. replace c = c0,1 by
ct,1 ); the function φ is unchanged, but now ψ should be changed into
ψt defined by
ψt (y) = inf ψ0 (x) + c0,t (x, y) . (13.3)
x∈M
In particular,
9. Whenever 0 ≤ t0 < t1 ≤ 1,
ψt1 dµt1 − ψt0 dµt0 = C t0 ,t1 (µt0 , µt1 )
t1
= L x, [(∇v L)(x, · , t)]−1 (∇ψt (x)), t dµt (x) dt;
t0
recall indeed Theorems 7.21 and 7.36, Remarks 7.25 and 7.37, and (13.4).
Open Problem 13.1. If the initial and final densities, ρ0 and ρ1 , are
positive everywhere, does this imply that the intermediate densities ρt
are also positive? Otherwise, can one identify simple sufficient con-
ditions for the density of the displacement interpolant to be positive
everywhere?
Z ↑ 1; Z π ↑ π; Z µt, ↑ µt ; Z Π ↑ Π.
(i) each π k is an optimal transference plan between µk0 and µk1 , and
any one of the probability measures µk0 , µk1 has a smooth, compactly
supported density;
(ii) µk0 → µ0 , µk1 → µ1 , π k → π in the weak sense as k → ∞.
Proof of Theorem 13.4. By Theorem 7.21, there exists a displacement
interpolation (µt )0≤t≤1 between µ0 and µ1 ; let (γt )0≤t≤1 be such that
µt = law (γt ). The assumptions on L imply that action-minimizing
curves solve a differential equation with Lipschitz coefficients, and
therefore are uniquely determined by their initial position and veloc-
ity, a fortiori by their restriction to some time-interval [0, t0 ]. So for
any t0 ∈ (0, 1/2), by Theorem 7.30(ii), (γt0 , γ1−t0 ) is the unique op-
timal coupling between µt0 and µ1−t0 . Now it is easy to construct a
sequence (µkt0 )k∈N such that µkt0 converges weakly to µt0 as k → ∞, and
each µkt0 is compactly supported with a smooth density. (To construct
such a sequence, first truncate to ensure the property of compact sup-
port, then localize to charts by a partition of unity, and apply a reg-
ularization in each chart.) Similarly, construct a sequence (µk1−t0 )k∈N
such that µk1−t0 converges weakly to µ1−t0 , and each µk1−t0 is com-
pactly supported with a smooth density. Let πtk0 ,1−t0 be the unique
optimal transference plan between µt0 and µ1−t0 . By stability of op-
timal transport (Theorem 5.20), πtk0 ,1−t0 converges as k → ∞ to
πt0 ,1−t0 = law (γt0 , γ1−t0 ). Then by continuity of γ, the random vari-
able (γt0 , γ1−t0 ) converges pointwise to (γ0 , γ1 ) as t0 → 0, which implies
that πt0 ,1−t0 converges weakly to π. The conclusion follows by choosing
t0 = 1/n, k = k(n) large enough.
∂µt
+ ∇ · (ξt µt ) = 0
∂t
in the sense of distributions (be careful: ξt is not necessarily a gradient,
unless L is quadratic). Then we can write down the equations of
displacement interpolation:
⎧
⎪
⎪ ∂µt
⎪
⎪ + ∇ · (ξt µt ) = 0;
⎪
⎪ ∂t
⎨
∇v L x, ξt (x), t = ∇ψt (x); (13.5)
⎪
⎪
⎪
⎪ ψ 0 is c-convex;
⎪
⎩∂ ψ + L∗ x, ∇ψ (x), t = 0.
⎪
t t t
If the cost function is just the square of the distance, then these
equations become
⎧
⎪
⎪ ∂µt
+ ∇ · (ξt µt ) = 0;
⎪
⎪
⎪
⎪ ∂t
⎪
⎨ξ (x) = ∇ψ (x);
t t
(13.6)
⎪
⎪
2
ψ0 is d /2-convex;
⎪
⎪
⎪
⎪
⎩∂t ψt + |∇ψt | = 0.
2
⎪
2
Finally, for the square of the Euclidean distance, this simplifies into
⎧
⎪
⎪
∂µt
+ ∇ · (ξt µt ) = 0;
⎪
⎪
⎪
⎪ ∂t
⎪
⎪
⎪
⎨ξt (x) = ∇ψt (x);
⎪ |x|2 (13.7)
⎪
⎪ x → + ψ 0 (x) is lower semicontinuous convex;
⎪
⎪ 2
⎪
⎪
⎪
⎩∂t ψt + |∇ψt | = 0.
2
⎪
2
Apart from the special choice of initial datum, the latter system
is well-known in physics as the pressureless Euler equation, for a
potential velocity field.
Quadratic cost function 341
Spt(ψ) ⊂ K, ψC 2 ≤ ε
b
is d2 /2-convex.
ball.
If y ∈ K and d(x, y) ≥ δ, then obviously fy (x) ≥ δ 2 /4 > ψ(y) =
fy (y); so the minimum of fy can be achieved only in Bδ (y). If there
are two distinct such minima, say x0 and x1 , then we can join them by
a geodesic (γt )0≤t≤1 which stays within B2δ (y) and then the function
t → fy (γt ) is uniformly convex (because fy is uniformly convex in
B2δ (y)), and minimum at t = 0 and t = 1, which is impossible.
/ K , then ψ(x) = 0 implies d(x, y) ≥ 1, so fy (x) ≥ (1/2)−δ 2 /4,
If y ∈
while fy (y) = 0. So the minimum of fy can only be achieved at x such
that ψ(x) = 0, and it has to be at x = y.
In any case, fy has exactly one minimum, which lies in Bδ (y). We
shall denote it by x = T (y), and it is characterized as the unique
solution of the equation
d(x, y)2
∇ψ(x) + ∇x = 0, (13.9)
2
where x is the unknown.
Let x be arbitrary in M , and y = expx (∇ψ(x)). Then (as a con-
sequence of the first variation formula), ∇x [d(x, y)2 /2] = −∇ψ(x), so
equation (13.9) holds true, and x = T (y). This means that, with the
notation c(x, y) = d(x, y)2 /2, one has ψ c (y) = ψ(x) + c(x, y). Then
ψ cc (x) = sup[ψ c (y) − c(x, y)] ≥ ψ(x). Since x is arbitrary, actually we
have shown that ψ cc ≥ ψ; but the converse inequality is always true,
so ψ cc = ψ, and then ψ is c-convex.
Remark 13.7. The end of the proof took advantage of a general prin-
ciple, independent of the particular cost c: If there is a surjective map
T such that fy : x → ψ(x) + c(x, y) is minimum at T (y), then ψ is
c-convex.
The structure of P2 (M ) 343
The structure of P2 (M )
A striking discovery made by Otto at the end of the nineties is that the
differentiable structure on a Riemannian manifold M induces a kind of
differentiable structure in the space P2 (M ). This idea takes substance
from the following remarks: All of the path (µt )0≤t≤1 is determined
from the initial velocity field ξ0 (x), which in turn is determined by ∇ψ
as in (13.4). So it is natural to think of the function ∇ψ as a kind of
“initial velocity” for the path (µt ). The conceptual shift here is about
the same as when we decided that µt could be seen either as the law
of a random minimizing curve at time t, or as a path in the space of
measures: Now we decide that ∇ψ can be seen either as the field of the
initial velocities of our minimizing curves, or as the (abstract) velocity
of the path (µt ) at time t = 0.
There is an abstract notion of tangent space Tx X (at point x) to a
metric space (X , d): in technical language, this is the pointed Gromov–
Hausdorff limit of the rescaled space. It is a rather natural notion: fix
your point x, and zoom onto it, by multiplying all distances by a large
factor ε−1 , while keeping x fixed. This gives a new metric space Xx,ε ,
and if one is not too curious about what happens far away from x, then
the space Xx,ε might converge in some nice sense to some limit space,
that may not be a vector space, but in any case is a cone. If that limit
space exists, it is said to be the tangent space (or tangent cone) to X
at x. (I shall come back to these issues in Part III.)
In terms of that construction, the intuition sketched above is in-
deed correct: let P2 (M ) be the metric space consisting of probability
measures on M , equipped with the Wasserstein distance W2 . If µ is
absolutely continuous, then the tangent cone Tµ P2 (M ) exists and can
be identified isometrically with the closed vector space generated by
d2 /2-convex functions ψ, equipped with the norm
1/2
∇ψL2 (µ;TM ) := |∇ψ|2x dµ(x) .
M
Actually, in view of Theorem 13.5, this is the same as the vector space
generated by all smooth, compactly supported gradients, completed
with respect to that norm.
With what we know about optimal transport, this theorem is not
that hard to prove, but this would require a bit too much geometric
machinery for now. Instead, I shall spend some time on an important
344 13 Qualitative picture
W2 (µs , µt ) ≤ L |t − s|.
For any t ∈ [0, 1], let Ht be the Hilbert space generated by gradients of
continuously differentiable, compactly supported ψ:
L2 (µt ;TM )
Ht := Vect {∇ψ; ψ ∈ Cc1 (M )} .
Then there exists a measurable vector field ξt (x) ∈ L∞ (dt; L2 (dµt (x))),
µt (dx) dt-almost everywhere unique, such that ξt ∈ Ht for all t (i.e. the
velocity field really is tangent along the path), and
∂t µt + ∇ · (ξt µt ) = 0 (13.10)
The proof of Theorem 13.8 requires some analytical tools, and the
reader might skip it at first reading.
Now the key remark is that the time-derivative (d/dt) (ψ +C) dµt
does not depend on the constant C. This shows that (d/dt) ψ dµt
really is a functional of ∇ψ, obviously linear. The above estimate shows
that this functional is continuous with respect to the norm in L2 (dµt ).
Actually, this is not completely rigorous, since this functional is only
defined for almost all t, and “almost all” here might depend on ψ. Here
is a way to make things rigorous: Let L be the set of all Lipschitz
functions ψ on M with Lipschitz constant at most 1, such that, say,
ψ(x0 ) = 0, where x0 ∈ M is arbitrary but fixed once for all, and ψ is
346 13 Qualitative picture
just showed that there is a negligible set of times, τK , such that (13.12)
holds true for all ψ ∈ CK 1 (M ) and t ∈/ τK . Now choose an increasing
family of compact sets (Km )m∈N , with ∪Km = M , so that any compact
set is included in some Km . Then (13.12) holds true for all ψ ∈ Cc1 (M ),
as soon as t does not belong to the union of τKm , which is still a
negligible set of times.
But equation (13.12) is really the weak formulation of (13.10). Since
ξt is uniquely determined in L2 (dµt ), for almost all t, actually the vector
field ξt (x) is dµt (x) dt-uniquely determined.
To conclude the proof of the theorem, it only remains to prove the
converse implication. Let (µt ) and (ξt ) solve (13.10). By the equation of
conservation of mass, µt = law (γt ), where γt is a (random) solution of
γ̇t = ξt (γt ).
Let s < t be any two times in [0, 1]. From the formula
t
d(γs , γt ) = (t − s) inf
2
|ζ̇τ |2 dτ ; ζs = γs , ζt = γt ,
s
we deduce
t t
d(γs , γt ) ≤ (t − s)
2
|γ̇τ | dτ ≤ (t − s)
2
|ξt (γt )|2 dτ.
s s
The structure of P2 (M ) 347
So
t
E d(γs , γt )2 ≤ (t − s) |ξτ (x)|2 dµτ (x) dτ ≤ (t − s)2 ξL∞ (dt; L2 (dµt )) .
s
In particular
Remark 13.9. With hardly any more work, the preceding theorem can
be extended to cover paths that are absolutely continuous of order 2,
in the sense defined on p. 115. Then of course the velocity field will not
live in L∞ (dt; L2 (dµt )), but in L2 (dµt dt).
Observe that in a displacement interpolation, the initial measure µ0
and the initial velocity field ∇ψ0 uniquely determine the final measure
µ1 : this implies that geodesics in P2 (M ) are nonbranching, in the strong
sense that their initial position and velocity determine uniquely their
final position.
Finally, we can now derive an “explicit” formula for the action func-
tional determining displacement interpolations as minimizing curves.
Let µ = (µt )0≤t≤1 be any Lipschitz (or absolutely continuous) path in
P2 (M ); let ξt (x) = ∇ψt (x) be the associated time-dependent velocity
field. By the formula of conservation of mass, µt can be interpreted as
the law of γt , where γ is a random solution of γ̇t = ξt (γt ). Define
1
A(µ) := inf E µt |ξt (γt )|2 dt, (13.13)
0
where the infimum is taken over all possible realizations of the random
curves γ. By Fubini’s theorem,
1 1
A(µ) = inf E |ξt (γt )| dt = inf E
2
|γ̇t |2 dt
0 0
1
≥ E inf |γ̇t |2 dt
0
= E d(γ0 , γ1 )2 ,
and the infimum is achieved if and only if the coupling (γ0 , γ1 ) is mini-
mal, and the curves γ are (almost surely) action-minimizing. This shows
348 13 Qualitative picture
Bibliographical notes
almost never hits the cut locus; that is, the set of points x such that
the image T (x) belongs to the cut locus of x is of zero probability. Al-
though we already know that almost surely, x and T (x) are joined by a
unique geodesic, this alone does not imply that the cut locus is almost
never hit, because it is possible that y belongs to the cut locus of x and
still x and y are joined by a unique minimizing geodesic. (Recall the
discussion after Problem 8.8.) But Cordero-Erausquin, McCann and
Schmuckenschläger show that if such is the case, then d(x, z)2 /2 fails to
be semiconvex at z = y. On the other hand, by Alexandrov’s second dif-
ferentiability theorem (Theorem 14.1), ψ is twice differentiable almost
everywhere; formula (13.8), suitably interpreted, says that d(x, · )2 /2
is semiconvex at T (x) whenever ψ is twice differentiable at x.
At least in the Euclidean case, and up to regularity issues, the ex-
plicit formulas for geodesic curves and action in the space of measures
were known to Brenier, around the mid-nineties. Otto [669] took a
conceptual step forward by considering formally P2 (M ) as an infinite-
dimensional Riemannian manifold, in view of formula (13.15). For
some time it was used as a purely formal, yet quite useful, heuris-
tic method (as in [671], or later in Chapter 15). It is only recently
that rigorous constructions were performed in several research papers,
e.g. [30, 203, 214, 577]. The approach developed in this chapter relies
heavily on the work of Ambrosio, Gigli and Savaré [30] (in Rn ). A more
geometric treatment can be found in [577, Appendix A]; see also [30,
Section 12.4], and [655, Section 3] (I shall give a few more details in the
bibliographical notes of Chapter 26). As I am completing this course, an
important contribution to the subject has just been made by Lott [575]
who established “explicit” formulas for the Riemannian connection and
curvature in P2ac (M ), or rather in the subset made of smooth positive
densities, when M is compact.
The pressureless Euler equations describe the evolution of a gas of
particles which interact only when they meet, and then stick together
(sticky particles). It is a very degenerate system whose study turns out
to be tricky in general [169, 320, 745]. But in applications to optimal
transport, it comes in the very particular case of potential flow (the
velocity field is a gradient), so the evolution is governed by a simple
Hamilton–Jacobi equation.
There are two natural ways to extend a minimizing geodesic (µt )0≤t≤1
in P2 (M ) into a geodesic defined (but not necessarily minimizing) at all
times. One is to solve the Hamilton–Jacobi equation for all times, in the
350 13 Qualitative picture
viscosity sense; then the gradient of the solution will define a velocity
field, and one can let (µt ) evolve by transport, as in (13.6). Another way
[30, Example 11.2.9] is to construct a trajectory of the gradient flow of
the energy −W2 (σ, · )2 /2, where σ is, say, µ0 , and the trajectory starts
from an intermediate point, say µ1/2 . The existence of this gradient
flow follows from [30, Theorems 10.4.12 and 11.2.1], while [30, Theo-
rem 11.2.1] guarantees that it coincides (up to time-reparameterization)
with the original minimizing geodesic for short times. (This is close to
the construction of quasigeodesics in [678].) It is natural to expect that
both approaches give the same result, but as far as I know, this has
not been established.
Khesin suggested to me the following nice problem. Let µ = (µt )t≥0
be a geodesic in the Wasserstein space P2 (Rn ) (defined with the help
of the pressureless Euler equation), and characterize the cut time of
µ as the “time of the first shock”. If µ0 is absolutely continuous with
positive density, this essentially means the following: let tc = inf {t1 ;
(µt )0≤t≤t1 is not a minimizing geodesic}, let ts = sup {t0 ; µt is ab-
solutely continuous for t ≤ t0 }, and show that tc = ts . Since tc should
be equal to sup {t2 ; |x|2 /2 + t ψ(0, x) is convex for t ≤ t2 }, the solution
of this problem is related to a qualitative study of the way in which
convexity degenerates at the first shock. In dimension 1, this problem
can be studied very precisely [642, Chapter 1] but in higher dimensions
the problem is more tricky. Khesin and Misiolek obtain some results in
this direction in [515].
Kloeckner [522] studied the isometries of P2 (R) and found that they
are not all induced by isometries of R; roughly speaking, there is one
additional “exotic” isometry.
In his PhD thesis, Agueh [4] studied the structure of Pp (M ) for
p > 1 (not necessarily equal to 2). Ambrosio, Gigli and Savaré [30]
pushed these investigations further.
Displacement interpolation becomes somewhat tricky in presence of
boundaries. In his study of the porous medium equations, Otto [669]
considered the case of a bounded open set of Rn with C 2 boundary.
For many years, the great majority of applications of optimal trans-
port to problems of applied mathematics have taken place in Euclidean
setting, but more recently some “genuinely Riemannian” applications
have started to pop out. There was an original suggestion to use opti-
mal transport in a three-dimensional Riemannian manifold (actually, a
cube equipped with a varying metric) related to image perception and
Bibliographical notes 351
The issues discussed in this part are concisely reviewed in the sur-
veys by Cordero-Erausquin [244] and myself [821] (both in French).
Ricci curvature
Riem(X, Y ) := ∇Y ∇X − ∇X ∇Y + ∇[X,Y ] ;
x n
S2
Fig. 14.1. The dashed line gives the recipe for the construction of the Gauss map;
its Jacobian determinant is the Gauss curvature.
but from an intrinsic point of view it was not: A tiny creature living
on the surface of the sheet, unable to measure the lengths of curves
going outside of the surface, would never have noticed that you bent
the sheet.
To construct isometries from (M, g) to something else, pick up any
diffeomorphism ϕ : M → M , and equip M = ϕ(M ) with the metric
g = (ϕ−1 )∗ g, defined by gx (v) = gϕ−1 (x) (dx ϕ−1 (v)). Then ϕ is an
isometry between (M, g) and (M , g ). Gauss’s theorem says that the
curvature computed in (M, g) and the curvature computed in (M , g )
are the same, modulo obvious changes (the curvature at point x along
a plane P should be compared with the curvature at ϕ(x) along a plane
dx ϕ(P )). This is why one often says that the curvature is “invariant
under the action of diffeomorphisms”.
Curvature is intimately related to the local behavior of geodes-
ics. The general rule is that, in the presence of positive curvature,
geodesics have a tendency to converge (at least for short times), while
in the presence of negative curvature they have a tendency to diverge.
This tendency can usually be felt only at second or third order in time:
at first order, the convergence or divergence of geodesics is dictated by
the initial conditions. So if, on a space of (strictly) positive curvature,
you start two geodesics from the same point with velocities pointing in
different directions, the geodesics will start to diverge, but then the ten-
dency to diverge will diminish. Here is a more precise statement, which
will show at the same time that the Gauss curvature is an intrinsic
notion: From a point x ∈ M , start two constant-speed geodesics with
unit speed, and respective velocities v, w. The two curves will spread
apart; let δ(t) be the distance between
their respective positions at time
t. In a first approximation, δ(t) 2(1 − cos θ) t, where θ is the angle
between v and w (this is the same formula as in Euclidean space). But
a more precise study shows that
κx cos2 (θ/2) 2
δ(t) = 2(1 − cos θ) t 1 − t + O(t4 ) , (14.1)
6
where κx is the Gauss curvature at x.
Once the intrinsic nature of the Gauss curvature has been es-
tablished, it is easy to define the notion of sectional curvature for
Riemannian manifolds of any dimension, embedded or not: If x ∈ M
and P ⊂ Tx M , define σx (P ) as the Gauss curvature of the surface which
is obtained as the image of P by the exponential map expx (that is, the
collection of all geodesics starting from x with a velocity in P ). Another
360 14 Ricci curvature
∇2 f · v = ∇v (∇f ).
(Recall that ∇v stands for the covariant derivation in the direction v.)
In short, ∇2 f is the covariant gradient of the gradient of f .
A convenient way to compute the Hessian of a function is to differ-
entiate it twice along a geodesic path. Indeed, if (γt )0≤t≤1 is a geodesic
path, then ∇γ̇ γ̇ = 0, so
d2 d
f (γ t ) = ∇f (γ t ), γ̇t = ∇ γ̇ ∇f (γt ), γ̇t + ∇f (γt ), ∇ γ̇
γ̇ t
dt2 dt 2
= ∇ f (γt ) · γ̇t , γ̇t .
∆f = ∇ · (∇f ),
Remark 14.3. As the proof will show, Property (i) can be replaced by
the following more precise statement involving the subdifferential of ψ:
If ξ is any vector field valued in ∇− ψ (i.e. ξ(y) ∈ ∇− ψ(y) for all y),
then ∇v ξ(x) = Av.
Remark 14.4. For the main part of this course, we shall not need
the full strength of Theorem 14.1, but just the particular case when
ψ is continuously differentiable and ∇ψ is Lipschitz; then the proof
becomes much simpler, and ∇ψ is almost everywhere differentiable in
the usual sense. Still, on some occasions we shall need the full generality
of Theorem 14.1.
vol [T (Pδ )]
J (x) := lim .
vol [Pδ ]
For that purpose, T (Pδ ) can be approximated by the infinitesimal par-
allelogram built on T (x + δe1 ), . . . , T (x + δen ). Explicitly,
T (x + δei ) = expx+δei (ξ(x + δei )).
(If ξ is not defined at x+δei it is always possible to make an infinitesimal
perturbation and replace x + δei by a point which is extremely close
and at which ξ is well-defined. Let me skip this nonessential subtlety.)
Assume for a moment that we are in Rn , so T (x) = x + ξ(x);
then, by a classical result of real analysis, J (x) = | det(∇T (x))| =
| det(In + ∇ξ(x))|. But in the genuinely Riemannian case, things are
much more intricate (unless ξ(x) = 0) because the measurement of
infinitesimal volumes changes as we move along the geodesic path
γ(t, x) = expx (t ξ(x)).
To appreciate this continuous change, let us parallel-transport along
the geodesic γ to define a new family E(t) = (e1 (t), . . . , en (t)) in
Tγ(t) M . Since (d/dt)ei (t), ej (t) = ėi (t), ej (t) + ei (t), ėj (t) = 0, the
family E(t) remains an orthonormal basis of Tγ(t) M for any t. (Here
the dot symbol stands for the covariant derivation along γ.) Moreover,
e1 (t) = γ̇(t, x)/|γ̇(t, x)|. (See Figure 14.2.)
To express the Jacobian of the map T = exp ξ, it will be convenient
to consider the whole collection of maps Tt = exp(t ξ). For brevity, let
us write
Tt (x + δE) = Tt (x + δe1 ), . . . , Tt (x + δen ) ;
then
Tt (x + δE) Tt (x) + δ J,
E(t)
E(0)
Fig. 14.2. The orthonormal basis E, here represented by a small cube, goes along
the geodesic by parallel transport.
366 14 Ricci curvature
J(t)
E(t)
J(0) = E(0)
Fig. 14.3. At time t = 0, the matrices J(t) and E(t) coincide, but at later times
they (may) differ, due to geodesic distortion.
where
d
J = (J1 , . . . , Jn ); Ji (t, x) := Tt (x + δei ).
dδ δ=0
(All of these quantities depend implicitly on the starting point x.) The
reader who prefers to stay away from the Riemann curvature tensor
can take (14.6) as the equation defining the matrix R; the only things
that one needs to know about it are (a) R(t) is symmetric; (b) the
first row of R(t) vanishes (which is the same, modulo identification, as
The Jacobian determinant of the exponential map 367
R(t) γ̇(t) = 0); (c) tr R(t) = Ricγt (γ̇t , γ̇t ) (which one can also adopt as
a definition of the Ricci tensor); (d) R is invariant under the transform
t → 1 − t, E(t) → −E(1 − t), γt → γ1−t .
Equation (14.6) is of second order in time, so it should come with
initial conditions for both J(0) and J(0). ˙ On the one hand, since
T0 (y) = y,
d
Ji (0) = (x + δei ) = ei ,
dδ δ=0
so J(0) is just the identity matrix. On the other hand,
˙ D d D d
Ji (0) = Tt (x + δei ) = Tt (x + δei )
Dt t=0 dδ δ=0 Dδ δ=0 dt t=0
D
= ξ(x + δei ) = (∇ξ)ei ,
Dδ δ=0
d d DJi
Jij = Ji , ej = , ej = (∇ξ)ei , ej .
dt dt Dt
We conclude that the initial conditions are
J(0) = In , ˙
J(0) = ∇ξ(x), (14.8)
(tr U )2
tr (U 2 ) ≥ ;
n
then, by plugging this inequality into (14.12), we obtain an important
differential inequality involving Ricci curvature:
d (tr U )2
(tr U ) + + Ric(γ̇) ≤ 0. (14.13)
dt n
There are several ways to rewrite this result in terms of the Jacobian
determinant J (t). For instance, by differentiating the formula
J˙
tr U = ,
J
one obtains easily
2
d (tr U )2 J¨ 1 J˙
(tr U ) + = − 1− .
dt n J n J
So (14.13) becomes
2
J¨ 1 J˙
− 1− ≤ − Ric(γ̇). (14.14)
J n J
D̈ Ric(γ̇)
≤− . (14.15)
D n
Yet another useful formula is obtained by introducing (t) :=
− log J (t), and then (14.13) becomes
˙ 2
¨ ≥ (t) + Ric(γ̇).
(t) (14.16)
n
In all of these formulas, we have always taken t = 0 as the starting
time, but it is clear that we could do just the same with any starting
time t0 ∈ [0, 1], that is, consider, instead of Tt (x) = exp(t∇ψ(x)), the
map Tt0 →t (x) = exp((t − t0 )∇ψ(x)). Then all the differential inequali-
ties are unchanged; the only difference is that the Jacobian determinant
at time t = 0 is not necessarily 1.
The previous formulas are quite sufficient to derive many useful geo-
metric consequences. However, one can refine them by taking advantage
of the fact that curvature is not felt in the direction of motion. In other
words, if one is traveling along some geodesic γ, one will never be able
to detect some curvature by considering variations (in the initial posi-
tion, or initial velocity) in the direction of γ itself: the path will always
be the same, up to reparametrization. This corresponds to the property
R(t)γ̇(t) = 0, where R(t) is the matrix appearing in (14.6). In short,
curvature is felt only in n − 1 directions out of n. This loose principle
often leads to a refinement of estimates by a factor (n − 1)/n.
Here is a recipe to “separate out” the direction of motion from the
other directions. As before, assume that the first vector of the ortho-
normal basis J(0) is e1 (0) = γ̇(0)/|γ̇(0)|. (The case when γ̇(0) = 0
can be treated separately.) Set u// = u11 (this is the coefficient in U
which corresponds to just the direction of motion), and define U⊥ as
the (n − 1) × (n − 1) matrix obtained by removing the first line and
first column in U . Of course, tr (U ) = u// + tr (U⊥ ). Next decompose
the Jacobian determinant J into a parallel and an orthogonal contri-
butions: t
J = J// J⊥ , J// (t) = exp u// (s) ds .
0
Further define parallel and orthogonal distortions by
Taking out the direction of motion 371
1
D// = J// , D⊥ = J⊥n−1 ;
and, of course,
d
¨⊥ = − (tr U ) − ¨// = tr (U 2 ) + Ric(γ̇) − u21j .
dt
Since tr U 2 = tr (U⊥ )2 + 2 u21j , this implies
(˙⊥ )2
¨⊥ ≥ + Ric(γ̇), (14.21)
n−1
D̈⊥ Ric(γ̇)
≤− . (14.22)
D⊥ n−1
To summarize: The basic inequalities for ⊥ and // are the same
as for , but with the exponent n replaced by n − 1 in the case of ⊥ ,
and 1 in the case of // ; and the number Ric(γ̇) replaced by 0 in the
case of // . The same for D⊥ and D// .
372 14 Ricci curvature
Bochner’s formula
of view, in which the focus is not on the trajectory, but on the velocity
field ξ = ξ(t, x). To switch from Lagrangian to Eulerian description,
just write
γ̇(t) = ξ t, γ(t) . (14.23)
In general, this can be a subtle issue because two trajectories might
cross, and then there might be no way to define a meaningful veloc-
ity field ξ(t, · ) at the crossing point. However, if a smooth vector
field ξ = ξ(0, · ) is given, then around any point x0 the trajectories
γ(t, x) = exp(t ξ(x)) do not cross for |t| small enough, and one can de-
fine ξ(t, x) without ambiguity. The covariant differentiation of (14.23)
along ξ itself, and the geodesic equation γ̈ = 0, yield
∂ξ
+ ∇ξ ξ = 0, (14.24)
∂t
which is the pressureless Euler equation. From a physical point of
view, this equation describes the velocity field of a bunch of particles
which travel along geodesic curves without interacting. The derivation
of (14.24) will fail when the geodesic paths start to cross, at which
point the solution to (14.24) would typically lose smoothness and need
reinterpretation. But for the sequel, we only need (14.24) to be satisfied
for small values of |t|, and locally around x.
All the previous discussion about Ricci curvature can be recast in
Eulerian terms. Let γ(t, x) = expx (tξ(x)); by the definition of the co-
variant gradient, we have
(the same formula that we had before at time t = 0). Under the identi-
fication of Rn with Tγ(t) M provided by the basis E(t), we can identify
J with the matrix J, and then
˙ x) J(t, x)−1 = ∇ξ t, γ(t, x) ,
U (t, x) = J(t, (14.25)
where again the linear operator ∇ξ is identified with its matrix in the
basis E.
Then tr U (t, x) = tr ∇ξ(t, x) coincides with the divergence of
ξ(t, · ), evaluated at x. By the chain-rule and (14.24),
374 14 Ricci curvature
d d
(tr U )(t, x) = (∇ · ξ)(t, γ(t, x))
dt dt
∂ξ
=∇· (t, γ(t, x)) + γ̇(t, x) · ∇(∇ · ξ)(t, γ(t, x))
∂t
= −∇ · (∇ξ ξ) + ξ · ∇(∇ · ξ) (t, γ(t, x)).
All functions here are evaluated at (t, γ(t, x)), and of course we can
choose t = 0, and x arbitrary. So (14.26) is an identity that holds true
for any smooth (say C 2 ) vector field ξ on our manifold M . Of course
it can also be established directly by a coordinate computation.1
While formula (14.26) holds true for all vector fields ξ, if ∇ξ is
symmetric then two simplifications arise:
|ξ|2
(a) ∇ξ ξ = ∇ξ · ξ = ∇ ;
2
(b) tr (∇ξ)2 = ∇ξ2HS , HS standing for Hilbert–Schmidt norm.
So (14.26) becomes
|ξ|2
−∆ + ξ · ∇(∇ · ξ) + ∇ξ2HS + Ric(ξ) = 0. (14.27)
2
We shall apply it only in the case when ξ is a gradient: ξ = ∇ψ; then
∇ξ = ∇2 ψ is indeed symmetric, and the resulting formula is
|∇ψ|2
−∆ + ∇ψ · ∇(∆ψ) + ∇2 ψ2HS + Ric(∇ψ) = 0. (14.28)
2
The identity (14.26), or its particular case (14.28), is called the
Bochner–Weitzenböck–Lichnerowicz formula, or just Bochner’s
formula.2
1
With the notation ∇ξ = ξ·∇ (which is classical in fluid mechanics), and tr (∇ξ)2 =
∇ξ··∇ξ, (14.26) takes the amusing form −∇·ξ·∇ξ+ξ·∇∇·ξ+∇ξ··∇ξ+Ric(ξ) = 0.
2
In (14.26) or (14.28) I have written Bochner’s formula in purely “metric” terms,
which will probably look quite ugly to many geometer readers. An equivalent but
more “topological” way to write Bochner’s formula is
∆ + ∇∇∗ + Ric = 0,
Bochner’s formula 375
Remark 14.5. With the ansatz ξ = ∇ψ, the pressureless Euler equa-
tion (14.24) reduces to the Hamilton–Jacobi equation
∂ψ |∇ψ|2
+ = 0. (14.29)
∂t 2
One can use this equation to obtain (14.28) directly, instead of first
deriving (14.26). Here equation (14.29) is to be understood in a viscosity
sense (otherwise there are many spurious solutions); in fact the reader
might just as well take the identity
% &
d(x, y)2
ψ(t, x) = inf ψ(y) +
y∈M 2t
Remark 14.6. Here I have not tried to derive Bochner’s formula for
nonsmooth functions. This could be done for semiconvex ψ, with an
appropriate “compensated” definition for −∆ |∇ψ|
2
2 + ∇ψ · ∇(∆ψ). In
fact, the semiconvexity of ∇ψ prevents the formation of instantaneous
shocks, and will allow the Lagrangian/Eulerian duality for a short time.
Remark 14.7. The operator U (t, x) coincides with ∇2 ψ(t, γ(t, x)),
which is another way to see that it is symmetric for t > 0.
From this point on, we shall only work with (14.28). Of course,
by using the Cauchy–Schwarz identity as before, we can bound below
∇2 ψ2HS by (∆ψ)2 /n; therefore (14.25) implies
|∇ψ|2 (∆ψ)2
∆ − ∇ψ · ∇(∆ψ) ≥ + Ric(∇ψ). (14.30)
2 n
Apart from regularity issues, this inequality is strictly equivalent to
(14.13), and therefore to (14.14) or (14.15).
Not so much has been lost when going from (14.28) to (14.30): there
is still equality in (14.30) at all points x where ∇2 ψ(x) is a multiple of
the identity.
where ∆ = −(dd∗ + d∗ d) is the Laplace operator on 1-forms, ∇ is the covariant
differentiation (under the identification of a 1-form with a vector field) and the
adjoints are in L2 (vol). Also I should note that the name “Bochner formula” is
attributed to a number of related identities.
376 14 Ricci curvature
8 := (∇ψ)/|∇ψ|,
One can also take out the direction of motion, ∇ψ
from the Bochner identity. The Hamilton–Jacobi equation implies
8 + ∇2 ψ · ∇ψ
∂t ∇ψ 8 = 0, so
8 ∇ψ
∂t ∇2 ψ · ∇ψ, 8
8 ∇ψ
= − ∇2 (|∇ψ|2 /2) · ∇ψ, 8 − 2 ∇2 ψ · (∇2 ψ · ∇ψ),
8 ∇ψ 8 ,
8 2.
and by symmetry the latter term can be rewritten −2 |(∇2 ψ) · ∇ψ|
From this one easily obtains the following refinement of Bochner’s for-
mula: Define
8 ∇ψ
∆// f = ∇2 f · ∇ψ, 8 , ∆⊥ = ∆ − ∆// ,
then
⎧ 2
⎪ |∇ψ|2 8 2
⎨∆// 2 − ∇ψ · ∇∆// ψ + 2 (∇ ψ) · ∇ψ ≥ (∆// ψ)
2
⎪
⎩ 2
∆⊥ |∇ψ|
2
− ∇ψ · ∇∆ ⊥ ψ − 2 8 2 ≥ ∇2 ψ2 + Ric(∇ψ).
(∇ ψ) · ∇ψ
2 ⊥ HS
(14.31)
This is the “Bochner formula with the direction of motion taken out”.
I have to confess that I never saw these frightening formulas anywhere,
and don’t know whether they have any use. But of course, they are
equivalent to their Lagrangian counterpart, which will play a crucial
role in the sequel.
where
r
V (r) = S(r ) dr ,
0
⎧ 7
⎪
⎪ K
⎪
⎪sinn−1 r if K > 0
⎪
⎪ n−1
⎪
⎪
⎪
⎪
⎪
⎨
S(r) = cn,K rn−1 if K = 0
⎪
⎪
⎪
⎪
⎪
⎪ 7
⎪
⎪
⎪
⎪ |K|
⎪
⎩sinh
n−1
r if K < 0.
n−1
378 14 Ricci curvature
Here of course S(r) is the surface area of Br (0) in the model space, that
is the (n − 1)-dimensional volume of ∂Br (0), and cn,K is a nonessential
normalizing constant. (See Theorem 18.8 later in this course.)
2. Diameter estimates: The Bonnet–Myers theorem states that,
if K > 0, then M is compact and more precisely
7
n−1
diam (M ) ≤ π ,
K
with equality for the model sphere.
3. Spectral gap inequalities: If K > 0, then the spectral gap λ1 of
the nonnegative operator −∆ is bounded below:
nK
λ1 ≥ ,
n−1
with equality again for the model sphere. (See Theorem 21.20 later in
this course.)
4. (Sharp) Sobolev inequalities: If K > 0 and n ≥ 3, let µ =
vol/vol[M ] be the normalized volume measure on M ; then for any
smooth function on M ,
4 2n
f 2L2 (µ) ≤ f 2L2 (µ) + ∇f 2L2 (µ) , 2 = ,
Kn(n − 2) n−2
and those constants are sharp for the model sphere.
5. Heat kernel bounds: There are many of them, in particular the
well-known Li–Yau estimates: If K ≥ 0, then the heat kernel pt (x, y)
satisfies
C d(x, y)2
pt (x, y) ≤ exp − ,
vol [B√t (x)] 2 Ct
for some constant C which only depends on n. For K < 0, a similar
bound holds true, only now C depends on K and there is an additional
factor eCt . There are also pointwise estimates on the derivatives of
log pt , in relation with Harnack inequalities.
The list could go on. More recently, Ricci curvature has been at
the heart of Perelman’s solution of the celebrated Poincaré conjecture,
and more generally the topological classification of three-dimensional
manifolds. Indeed, Perelman’s argument is based on Hamilton’s idea
to use Ricci curvature in order to define a “heat flow” in the space of
metrics, via the partial differential equation
Change of reference measure and effective dimension 379
∂g
= −2 Ric(g), (14.32)
∂t
where Ric(g) is the Ricci tensor associated with the metric g, which
can be thought of as something like −∆g. The flow defined by (14.32)
is called the Ricci flow. Some time ago, Hamilton had already used its
properties to show that a compact simply connected three-dimensional
Riemannian manifold with positive Ricci curvature is automatically
diffeomorphic to the sphere S 3 .
e−|x| dx
2
γ (n)
(dx) = , x ∈ Rn .
(2π)n/2
It is a matter of experience that most theorems which we encounter
about the Gaussian measure can be written just the same in dimen-
sion 1 or in dimension n, or even in infinite dimension, when properly
interpreted. In fact, the effective dimension of (Rn , γ (n) ) is infinite, in
a certain sense, whatever n. I admit that this perspective may look
strange, and might be the result of lack of imagination; but in any
case, it will fit very well into the picture (in terms of sharp constants
for geometric inequalities, etc.).
380 14 Ricci curvature
For later purposes it will be useful to keep track of all error terms
in the inequalities. So rewrite (14.12) as
/ /2
(tr U )2 / tr U /
·
(tr U ) + + Ric(γ̇) = − /
/ U− In /
/ . (14.33)
n n HS
[(log J0 )· ]2
(log J0 )·· + + Ric(γ̇)
n
[(log J )· + γ̇ · ∇V (γ)]2
= (log J )·· + ∇2 V (γ) · γ̇, γ̇ + + Ric(γ̇).
n
By using the identity
2
a2 (a + b)2 b2 n N −n
= − + b− a , (14.34)
n N N − n N (N − n) n
we see that
2
(log J )· + γ̇ · ∇V (γ)
n
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n % &2
n N −n · N
+ (log J ) + γ̇ · ∇V (γ)
N (N − n) n n
Change of reference measure and effective dimension 381
2
(log J )· (γ̇ · ∇V (γ))2
= −
N N −n % &2
n N −n ·
+ (log J0 ) + γ̇ · ∇V (γ)
N (N − n) n
2 % &2
(log J )· (γ̇ · ∇V (γ))2 n N −n
= − + tr U + γ̇ · ∇V (γ)
N N −n N (N − n) n
To summarize these computations it will be useful to introduce some
more notation: first, as usual, the negative logarithm of the Jacobian
determinant:
(t, x) := − log J (t, x); (14.35)
and then, the generalized Ricci tensor:
∇V ⊗ ∇V
RicN,ν := Ric + ∇2 V − , (14.36)
N −n
where the tensor product ∇V ⊗∇V is a quadratic form on TM , defined
by its action on tangent vectors as
∇V ⊗ ∇V x (v) = (∇V (x) · v)2 ;
so
(∇V · γ̇)2
RicN,ν (γ̇) = (Ric + ∇2 V )(γ̇) − .
N −n
It is implicitly assumed in (14.36) that N ≥ n (otherwise the correct
definition is RicN,ν = −∞); if N = n the convention is 0 × ∞ = 0,
so (14.36) still makes sense if ∇V = 0. Note that Ric∞,ν = Ric + ∇2 V ,
while Ricn,vol = Ric.
The conclusion of the preceding computations is that
/ /2
˙2 / tr U /
¨
= /
+ RicN,ν (γ̇) + /U − In /
N n /
HS
% &2
n N −n
+ tr U + γ̇ · ∇V (γ) . (14.37)
N (N − n) n
When N = ∞ this takes a simpler form:
/ /2
/ tr U /
¨ /
= Ric∞,ν (γ̇) + /U − In / (14.38)
n /
HS
As corollaries, ⎧
⎪ ˙ 2
⎪
⎪ ¨⊥ ≥ (⊥ ) + RicN,ν (γ̇)
⎨ N −1
(14.45)
⎪
⎪
⎪
⎩ D̈⊥
−N ≥ RicN,ν (γ̇).
D⊥
L = ∆ − ∇V · ∇, (14.46)
1
Γ (f, g) = L(f g) − f Lg − gLf . (14.47)
2
Note that Γ is a bilinear operator, which in some sense encodes the
deviation of L from being a derivation operator. In our case, for (14.46),
Γ (f, g) = ∇f · ∇g.
|∇ψ|2
Γ2 (ψ) := Γ2 (ψ, ψ) = L − ∇ψ · ∇(Lψ). (14.49)
2
Then our previous computations can be rewritten as
(Lψ)2
Γ2 (ψ)= +RicN,ν (∇ψ)
/ N
/2 % &2
/ 2 ∆ψ / n N − n
+ //∇ ψ − In /
/ + N (N − n) ∆ψ+∇V · ∇ψ .
n HS n
(14.50)
(Lψ)2
Γ2 (ψ) ≥ + RicN,ν (∇ψ). (14.51)
N
And as the reader has certainly guessed, one can now take out the
direction of motion (this computation is provided for completeness but
will not be used): As before, define
8 = ∇ψ ,
∇ψ
|∇ψ|
and next,
Curvature-dimension bounds 385
L⊥ f = ∆⊥ f − ∇V · ∇f,
|∇ψ|2
Γ2,⊥ (ψ) = L⊥ 8 2 − 2|(∇2 ψ) · ∇ψ|
− ∇ψ · ∇(L⊥ ψ) − 2(∇2 ψ) · ∇ψ 8 2.
2
Then
/ /2
(L⊥ ψ)2 / 2 ∆⊥ ψ /
Γ2,⊥ (ψ) = /
+ RicN,ν (∇ψ) + /∇⊥ ψ − In−1 /
N −1 n−1 /
% &2 n
n−1 N −n
+ ∆⊥ ψ + ∇V · ∇ψ + (∂1j ψ)2 .
(N − 1)(N − n) n−1
j=2
Curvature-dimension bounds
• geodesic
paths: If ψ is a given function on M , γ(t, x) = Tt (x) =
expx (t − t0 )∇ψ(x) is the geodesic starting from x with velocity
γ̇(t0 , x) = ∇ψ(x), evaluated at time t ∈ [0, 1]; it is assumed that
J (t, x) does not vanish for t ∈ (0, 1); the starting time t0 may be
the origin t0 = 0, or any other time in [0, 1];
• Jacobian determinants: J (t, x) is the Jacobian determinant of Tt (x)
(with respect to the reference measure ν, not with respect to the
standard volume), = − log J , and D = J 1/N is the mean distor-
tion associated with (Tt );
• the dot means differentiation with respect to time;
• finally, the subscript ⊥ in J⊥ , D⊥ , Γ2,⊥ means that the direction
of motion γ̇ = ∇ψ has been taken out (see above for precise defini-
tions).
So indeed RicN,ν ≥ K.
The proof goes in the same way for the equivalence between (i) and
(ii’), (iii’), (iv’): again the problem is to understand why (ii’) implies
(i), and the reasoning is almost the same as before; the key point being
that the extra error terms in ∂1j ψ, j = 2, all vanish at x0 .
where ⎧ √
⎪
⎪ sin(λθ Λ)
⎪
⎪ √ if Λ > 0
⎪
⎪ sin(θ Λ)
⎪
⎪
⎪
⎪
⎨
(λ)
τ (θ) = λ if Λ = 0
⎪
⎪
⎪
⎪
⎪
⎪ √
⎪
⎪ sinh(λθ −Λ)
⎪
⎪
⎩ √ if Λ < 0.
sinh(θ −Λ)
A more precise statement, together with a proof, are provided in the
Second Appendix of this chapter.
This leads to the following integral characterization of CD(K, N ):
(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (N < ∞)
(14.54)
Kt(1 − t)
(t, x) ≤ (1 − t) (0, x) + t (1, x) − d(x, y)2 (N = ∞),
2
(14.55)
where
7
|K|
α= d(x, y) (α ∈ [0, π] if K > 0).
N
(1−λ) (λ)
D(t, x) ≥ τK,N D(t0 , x) + τK,N D(t1 , x) t = (1 − λ)t0 + λt1 ;
where now α = |K|/N d(γ(t0 ), γ(t1 )). It follows that D(t, x) satis-
fies (14.53) for any choice of t0 , t1 ; and this is equivalent to (14.52).
The reasoning is the same for the case N = ∞, starting from in-
equality (ii) in Theorem 14.8.
(t)
The next result states that the the coefficients τK,N obtained in
Theorem 14.11 can be automatically improved if N is finite and K = 0,
by taking out the direction of motion:
(1−t) (t)
D(t, x) ≥ τK,N D(0, x) + τK,N D(1, x) (14.56)
where now
From differential to integral curvature-dimension bounds 391
⎧ 1
⎪
⎪ 1 sin(tα) 1− N
⎪
⎪ tN if K > 0
⎪
⎪ sin α
⎪
⎪
⎪
⎪
⎨
(t)
τK,N = t if K = 0
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ 1
⎪ 1 sinh(tα) 1− N
⎪
⎩t N if K < 0
sinh α
and 7
|K|
α= d(x, y) (α ∈ [0, π] if K > 0).
N −1
Remark 14.13. When N < ∞ and K > 0 Theorem 14.12 contains the
Bonnet–Myers theorem according to whichd(x, y) ≤ π (N − 1)/K.
With Theorem 14.11 the bound was only π N/K.
(t)
where σK,N = sin(tα)/ sin α if K > 0; t if K = 0; sinh(tα)/ sinh α if
K < 0. Next transform (14.19) into the integral bound
d(expx v, expx w) = L ,
Distortion coefficients
Apart from Definition 14.19, the material in this section is not necessary
to the understanding of the rest of this course. Still, it is interesting
because it will give a new interpretation of Ricci curvature bounds, and
motivate the introduction of distortion coefficients, which will play a
crucial role in the sequel.
y
x
Fig. 14.5. The distortion coefficient is approximately equal to the ratio of the
volume filled with lines, to the volume whose contour is in dashed line. Here the
space is negatively curved and the distortion coefficient is less than 1.
Then [γ]
β t (x, y) = inf β t (x, y),
γ
where the infimum is over all minimizing geodesics γ joining γ(0) = x
[γ]
to γ(1) = y, and β t (x, y) is defined as follows:
• If x, y are not conjugate along γ, let E be an orthonormal basis of
Ty M and define
⎧
⎪
⎪ det J0,1 (t)
⎪
⎨ if 0 < t ≤ 1
[γ] tn
β t (x, y) = (14.60)
⎪
⎪
⎪
⎩ 0,1
det J (s)
lim if t = 0,
s→0 sn
where J0,1 is the unique matrix of Jacobi fields along γ satisfying
J0,1 (0) = 0; J0,1 (1) = E;
• If x, y are conjugate along γ, define
[γ] 1 if t = 1
β t (x, y) =
+∞ if 0 ≤ t < 1.
Distortion coefficients can be explicitly computed for the model
CD(K, N ) spaces and depend only on the distance between the two
points x and y. These particular coefficients (or rather their expression
as a function of the distance) will play a key role in the sequel of these
notes.
Definition 14.19 (Reference distortion coefficients). Given K ∈
R, N ∈ [1, ∞] and t ∈ [0, 1], and two points x, y in some metric space
(K,N )
(X , d), define βt (x, y) as follows:
• If 0 < t ≤ 1 and 1 < N < ∞ then
⎧
⎪
⎪+∞ if K > 0 and α > π,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ sin(tα) N −1
⎪
⎪ if K > 0 and α ∈ [0, π],
⎪
⎨ t sin α
(K,N )
βt (x, y) =
⎪
⎪
⎪
⎪ 1 if K = 0,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ sinh(tα) N −1
⎩ if K < 0,
t sinh α
(14.61)
Distortion coefficients 397
where 7
|K|
α =
d(x, y). (14.62)
N −1
• In the two limit cases N → 1 and N → ∞, modify the above expres-
sions as follows:
(K,1) +∞ if K > 0,
βt (x, y) = (14.63)
1 if K ≤ 0,
K
(K,∞) (1−t2 ) d(x,y)2
βt (x, y) = e 6 . (14.64)
(K,N )
• For t = 0 define β0 (x, y) = 1.
π
(K,N )
Fig. 14.6.
The shape of the curves βt (x, y), for fixed t ∈ (0, 1), as a function
of α = |K|/(N − 1) d(x, y).
398 14 Ricci curvature
(Here J1,0 , J0,1 are identified with their expressions J 1,0 , J 0,1 in the
moving basis E.)
As noted after (14.7), the Jacobi equation is invariant under the
change t → 1 − t, E → −E, so J 1,0 becomes J 0,1 when one exchanges
the roles of x and y, and replaces t by 1 − t. In particular, we have the
formula
det J 1,0 (t)
= β1−t (y, x). (14.66)
(1 − t)n
As in the beginning of this chapter, the issue is to compute the
determinant at time t of a Jacobi field J(t). Since the Jacobi fields
are solutions of a linear differential equation of the form J¨ + R J = 0,
they form a vector space of dimension 2n, and they are invariant under
right-multiplication by a constant matrix. This implies
are not symmetric! It turns out however that they can be positively
cosymmetrized: there is a kernel K(t) such that det K(t) > 0, and
K(t) J 1,0 (t) and K(t) J 0,1 (t) J(1) are both positive symmetric, at least
for t ∈ (0, 1). This remarkable property is a consequence of the structure
of Jacobi fields; see Propositions 14.30 and 14.31 in the Third Appendix
of this chapter.
Once the cosymmetrization property is known, it is obvious how to
fix the proof: just write
1 1 1
det(K(t) J(t)) n ≥ det(K(t) J 1,0 (t)) n + det(K(t) J 0,1 (t) J(1)) n ,
and then factor out the positive quantity (det K(t))1/n to get
1 1 1
det J(t) n ≥ det(J 1,0 (t)) n + det(J 0,1 (t) J(1)) n .
2 max ϕ(xi ) − ϕ(x0 )
1≤i≤n+1
ϕLip(B(x0 ,r)) ≤ .
r
Next, let x ∈ B(x0 , r) and let y ∈ ∂ϕ(x); let z = x + ry/|y| ∈ B(x0 , 2r).
From the subdifferential inequality, r|y| = y, z − x ≤ ϕ(z) − ϕ(x) ≤
2 (max ϕ(xi ) − ϕ(x0 )). This proves (i).
Now let (ϕk )k∈N be a sequence of convex functions, let x0 ∈ Rn and
let r > 0. Let x1 , . . . , xn+1 be such that B(x0 , 2r) is included in the con-
vex hull of x1 , . . . , xn+1 . If ϕk (xj ) converges for all j, then by (i) there
is a uniform bound on ϕk Lip on B(x0 , r). So if ϕk converges pointwise
on B(x0 , r), the convergence has to be uniform. This proves (ii).
Let us assume that ε ≤ |v|; then, by (i), for all z ∈ B2ε (x),
Aw, w/2 + o(t2 ), for any w. If (i’) is false, then there are sequences
xk → 0, |xk | = 0, and yk ∈ ∂ϕ(xk ) such that
yk − Axk
−−−→ 0. (14.71)
|xk | k→∞
Aw, w
Φ(w) = .
2
The functions ϕk are convex, so the convergence is actually locally
uniform by Lemma 14.26.
Since yk ∈ ∂ϕ(xk ),
(∂i ∂j ϕ) ∗ ζ = ∂i (∂j ϕ ∗ ζ) = ∂i ∂j (ϕ ∗ ζ)
= ∂j ∂i (ϕ ∗ ζ) = ∂j (∂i ϕ ∗ ζ) = (∂j ∂i ϕ) ∗ ζ.
2 Qv0 (t, x) − max Qvi (t, x) ≤ Qv (t, x) ≤ max Qvi (t, x).
So
⎪
⎪
⎪
⎪ 1/ /
⎩ /µv − ρv (x0 )λn / −−−→ 0.
δ n TV(Bδ (x0 )) δ→0
The goal is to show that qv (x0 ) = ρv (x0 ). Then the proof will be com-
plete, since ρv is a quadratic form in v (indeed, ρv (x0 ) is obtained by
408 14 Ricci curvature
1
|qv (x) − ρv (0)| dx ≤ µv − ρv (0) λn TV(B(0,Cδ)) .
δn Bδ (0)
f (1 − λ)t0 + λt1 ≥ τ (1−λ) (|t0 − t1 |) f (t0 ) + τ (λ) (|t0 − t1 |) f (t1 ),
410 14 Ricci curvature
where ⎧ √
⎪
⎪ sin(λθ Λ)
⎪
⎪ √ if 0 < Λ < π 2
⎪
⎪ sin(θ Λ)
⎪
⎪
⎪
⎪
⎨
τ (λ) (θ) = λ if Λ = 0
⎪
⎪
⎪
⎪
⎪
⎪ √
⎪
⎪ sinh(λθ −Λ)
⎪
⎪
⎩ √ if Λ < 0.
sinh(θ −Λ)
If Λ = π 2 then f (t) = c sin(πt) for some c ≥ 0; finally if Λ > π 2 then
f = 0.
1 θΛ2
τ (1/2) (θ) = 1+ + o(θ3 )
2 8
and
f (t0 ) + f (t1 ) t + t (t − t )2 t + t
0 1 0 1 0 1
=f + f¨ + o(|t0 − t1 |2 ).
2 2 4 2
So, if we fix t ∈ (0, 1) and let t0 , t1 → t in such a way that t = (t0 +t1 )/2,
we get
Let R : t −→ R(t) be a continuous map defined on [0, 1], valued in the
space of n × n symmetric matrices, and let U be the space of functions
u : t −→ u(t) ∈ Rn solving the second-order linear differential equation
Assume that J10 (t) is invertible for all t ∈ (0, 1]. Then:
(a) S(t) := J10 (t)−1 J01 (t) is symmetric positive for all t ∈ (0, 1], and
it is a decreasing function of t.
(b) There is a unique pair of Jacobi matrices (J 1,0 , J 0,1 ) such that
are symmetric. Moreover, det K(t) > 0 for all t ∈ [0, 1).
Proposition 14.31 (Jacobi matrices with positive determinant).
Let S(t) and K(t) be the matrices defined in Proposition 14.30. Let J
be a Jacobi matrix such that J(0) = In and J(0) ˙ is symmetric. Then
the following properties are equivalent:
˙
(i) J(0) + S(1) > 0;
(ii) K(t) J 0,1 (t) J(1) > 0 for all t ∈ (0, 1);
(iii) K(t) J(t) > 0 for all t ∈ [0, 1];
(iv) det J(t) > 0 for all t ∈ [0, 1].
The equivalence remains true if one replaces the strict inequalities in
(i)–(ii) by nonstrict inequalities, and the time-interval [0, 1] in (iii)–(iv)
by [0, 1).
Before proving these propositions, it is interesting to discuss their
geometric interpretation:
• If γ(t) = expx (t∇ψ(x)) is a minimizing geodesic, then γ̇(t) =
∇ψ(t, γ(t)), where ψ solves the Hamilton–Jacobi equation
⎧
⎪ ∂ψ(t, x) |∇ψ(t, x)|2
⎪
⎨ + =0
∂t 2
⎪
⎪
⎩
ψ(0, · ) = ψ;
414 14 Ricci curvature
H(t)
0 = J01 (t) − J10 (t) ,
t
where (modulo identification)
d(·, γ(t))2
H(t) = ∇2x .
2
(The extra t in the denominator comes from time-reparameterization.)
So S(t) = J10 (t)−1 J01 (t) = H(t)/t should be symmetric.
• The assumption
J(0) = In , ˙
J(0) + S(1) ≥ 0
d
u(s), v̇(s) − u̇(s), v(s) = −u(s), R(s) v(s) + R(s) u(s), v(s)
ds
=0 (14.76)
then the flow (u(s), u̇(s)) −→ (u(t), u̇(t)), where u ∈ U, preserves ω.
The subspaces U0 = {u ∈ U; u(0) = 0} and U̇0 = {u ∈ U; u̇(0) = 0} are
Lagrangian: this means that their dimension is the half of the dimension
416 14 Ricci curvature
So
∂2u
Ṡ(t)w, w = − w, (0, t)
∂s ∂t
∂ ∂v
= − w, (∂t u)(0, t) = − u(0), (0) ,
∂s ∂s
where s −→ v(s) = (∂t u)(s, t) and s −→ u(s) = u(s, t) are solu-
tions of the Jacobi equation. Moreover, by differentiating the conditions
in (14.77) one obtains
∂u
v(t) + (t) = 0; v(0) = 0. (14.78)
∂s
By (14.76) again,
∂v ∂u ∂v ∂u
− u(0), (0) = − (0), v(0) − u(t), (t) + (t), v(t) .
∂s ∂s ∂s ∂s
The first two terms in the right-hand side vanish because v(0) = 0 and
u(t) = u(t, t) = 0. Combining this with the first identity in (14.78) one
finds in the end
Ṡ(t)w, w = −v(t)2 . (14.79)
We already know that v(0) = 0; if in addition v(t) = 0 then 0 =
v(t) = J10 (t)v̇(0), so (by invertibility of J10 (t)) v̇(0) = 0, and v vanishes
Third Appendix: Jacobi fields forever 417
J 1,0 (t) = J01 (t) − J10 (t) J10 (1)−1 J01 (1); J 0,1 (t) = J10 (t) J10 (1)−1 .
(14.80)
Moreover, J (0) = −J1 (1) J0 (1) and J (t) = J1 (1) J1 (1)−1 are
˙ 1,0 0 −1 1 ˙ 0,1 ˙ 0 0
is positive symmetric since by (a) the matrix S(t) = J10 (t)−1 J01 (t) is
a strictly decreasing function of t. In particular K(t) is invertible for
t > 0; but since K(0) = In , it follows by continuity that det K(t)
remains positive on [0, 1).
Finally, if J satisfies the assumptions of (c), then S(1) J(0)−1 is
symmetric (because S(1)∗ = S(1)). Then
(K(t) J 0,1 (t) J(1) J(0)−1
= t J10 (t)−1 J10 (t) J10 (1)−1 J01 (1) + J10 (1) J(0)
˙ J(0)−1
= t J10 (1)−1 J01 (1) J(0)−1 + J(0)
˙ J(0)−1
is also symmetric.
As we already noticed, the first matrix is positive for t ∈ (0, 1); and the
second is also positive, by assumption. In particular (ii) holds true.
The implication (ii) ⇒ (iii) is obvious since J(t) = J 1,0 (t) +
is the sum of two positive matrices for t ∈ (0, 1). (At t = 0
J 0,1 (t) J(1)
one sees directly K(0) J(0) = In .)
418 14 Ricci curvature
If (iii) is true then (det K(t)) (det J(t)) > 0 for all t ∈ [0, 1), and we
already know that det K(t) > 0; so det J(t) > 0, which is (iv).
˙
It remains to prove (iv) ⇒ (i). Recall that K(t) J(t) = t [S(t)+ J(0)];
since det K(t) > 0, the assumption (iv) is equivalent to the statement
that the symmetric matrix A(t) = t S(t) + t J(0) ˙ has positive deter-
minant for all t ∈ (0, 1]. The identity t S(t) = K(t) J01 (t) shows that
A(t) approaches In as t → 0; and since none of its eigenvalues vanishes,
A(t) has to remain positive for all t. So S(t) + J(0)˙ is positive for all
t ∈ (0, 1]; but S is a decreasing function of t, so this is equivalent to
˙
S(1) + J(0) > 0, which is condition (i).
The last statement in Proposition 14.31 is obtained by similar
arguments and its proof is omitted.
Bibliographical notes
Recommended textbooks about Riemannian geometry are the ones by
do Carmo [306], Gallot, Hulin and Lafontaine [394] and Chavel [223].
All the necessary background about Hessians, Laplace–Beltrami oper-
ators, Jacobi fields and Jacobi equations can be found there.
Apart from these sources, a review of comparison methods based on
Ricci curvature bounds can be found in [846].
Formula (14.1) does not seem to appear in standard textbooks of
Riemannian geometry, but can be derived with the tools found therein,
or by comparison with the sphere/hyperbolic space. On the sphere,
the computation can be done directly, thanks to a classical formula
of spherical trigonometry: If a, b, c are the lengths of the sides of a
triangle drawn on the unit sphere S 2 , and γ is the angle opposite to c,
then cos c = cos a cos b+sin a sin b cos γ. A more standard computation
usually found in textbooks is the asymptotic expansion of the perimeter
of a circle centered at x with (geodesic) radius r, as r → 0. Y. H.
Kim and McCann [520, Lemma 4.5] recently generalized (14.1) to more
general cost functions, and curves of possibly differing lengths.
The differential inequalities relating the Jacobian of the exponen-
tial map to the Ricci curvature can be found (with minor variants) in
a number of sources, e.g. [223, Section 3.4]. They usually appear in
conjunction with volume comparison principles such as the Heintze–
Kärcher, Lévy–Gromov, Bishop–Gromov theorems, all of which ex-
press the idea that if the Ricci curvature is bounded below by K,
Bibliographical notes 419
and the dimension is less than N , then volumes along geodesic fields
grow no faster than volumes in model spaces of constant sectional cur-
vature having dimension N and Ricci curvature identically equal to
K. These computations are usually performed in a smooth setting;
their adaptation to the nonsmooth context of semiconvex functions
has been achieved only recently, first by Cordero-Erausquin, McCann
and Schmuckenschläger [246] (in a form that is somewhat different
from the one presented here) and more recently by various sets of au-
thors [247, 577, 761].
Bochner’s formula appears, e.g., as [394, Proposition 4.15] (for a
vector field ξ = ∇ψ) or as [680, Proposition 3.3 (3)] (for a vector field
ξ such that ∇ξ is symmetric, i.e. the 1-form p → ξ · p is closed). In
both cases, it is derived from properties of the Riemannian curvature
tensor. Another derivation of Bochner’s formula for a gradient vector
field is via the properties of the square distance function d(x0 , x)2 ; this
is quite simple, and not far from the presentation that I have followed,
since d(x0 , x)2 /2 is the solution of the Hamilton–Jacobi equation at
time 1, when the initial datum is 0 at x0 and +∞ everywhere else.
But I thought that the explicit use of the Lagrangian/Eulerian duality
would make Bochner’s formula more intuitive to the readers, especially
those who have some experience of fluid mechanics.
There are several other Bochner formulas in the literature; Chapter
7 of Petersen’s book [680] is entirely devoted to that subject. In fact
“Bochner formula” is a generic name for many identities involving com-
mutators of second-order differential operators and curvature.
The examples (14.10) are by now standard; they have been discussed
for instance by Bakry and Qian [61], in relation with spectral gap esti-
mates. When the dimension N is an integer, these reference spaces are
obtained by “projection” of the model spaces with constant sectional
curvature.
The practical importance of separating out the direction of motion is
implicit in Cordero-Erausquin, McCann and Schmuckenschläger [246],
but it was Sturm who attracted my attention on this. To implement
this idea in the present chapter, I essentially followed the discussion
in [763, Section 1]. Also the integral bound (14.56) can be found in this
reference.
Many analytic and geometric consequences of Ricci curvature bounds
are discussed in Riemannian geometry textbooks such as the one by
Gallot, Hulin and Lafontaine [394], and also in hundreds of research
papers.
420 14 Ricci curvature
Otto calculus
Both the pressure and the iterated pressure will appear naturally when
one differentiates the energy functional: the pressure for first-order
derivatives, and the iterated pressure for second-order derivatives.
Example 15.1. Let m = 1, and
ρm − ρ
U (ρ) = U (m) (ρ) = ;
m−1
then
p(ρ) = ρm , p2 (ρ) = (m − 1) ρm .
There is an important limit case as m → 1:
then
p(ρ) = ρ, p2 (ρ) = 0.
By the way, the linear part −ρ/(m − 1) in U (m) does not contribute
to the pressure, but has the merit of displaying the link between U (m)
and U (1) .
Differential operators will also be useful. Let ∆ be the Laplace oper-
ator on M , then the distortion of the volume element by the function V
leads to a natural second-order operator:
L = ∆ − ∇V · ∇. (15.5)
gradµ H = −∆µ,
which can be identified with the function −∆ρ. Thus the gradient of
Boltzmann’s entropy is the Laplace operator. This short statement is
one of the first striking conclusions of Otto’s formalism.
15 Otto calculus 425
= ∇U (ρ) · ∇ψ ρ e−V
= ∇U (ρ) · ∇ψ dµ.
15 Otto calculus 427
If this should hold true for all ψ, the only possible choice is that
∇U (ρ) = ∇ζ(ρ), at least µ-almost everywhere. In any case ζ := U (ρ)
provides an admissible representation of gradµ Uν . This proves for-
mula (15.8). The other two formulas are obtained by noting that
p (ρ) = ρ U (ρ), and so
therefore
∇ · µ ∇U (ρ) = ∇ · e−V ρ ∇U (ρ) = e−V L p(ρ).
For the second order (formula (15.7)), things are more intricate.
The following identity will be helpful: If ξ is a tangent vector at x on
a Riemannian manifold M, and F is a function on M, then
d2
Hessx F (ξ) = 2 F (γ(t)), (15.15)
dt t=0
From the proof of the gradient formula, one has, with the notation
µt = ρt ν,
dUν (µt )
= ∇ψt · ∇U (ρt ) ρt dν
dt
M
= ∇ψt · ∇p(ρt ) dν
M
=− (Lψt ) p(ρt ) dν.
M
δF
gradµ F = −∇ · (µ∇φ), φ= ,
δµ
Open Problem 15.11. Find a nice formula for the Hessian of the
functional F appearing in (15.21).
430 15 Otto calculus
Open Problem 15.12. Find a nice formalism playing the role of the
Otto calculus in the space Pp (M ), for p = 2. More generally, are there
nice formal rules for taking derivatives along displacement interpola-
tion, for general Lagrangian cost functions?
Bibliographical notes
Displacement convexity I
∇2 F ≥ 0 (16.4)
0 t 1 s
1
ϕ(t) = (1 − t) ϕ(0) + t ϕ(1) − ϕ̈(s) G(s, t) ds. (16.5)
0
s (1 − t) if s ≤ t
G(s, t) = (16.6)
t (1 − s) if s ≥ t.
Λ(γt , γ̇t )
λ[γ] := inf > −∞. (16.7)
0≤t≤1 |γ˙t |2
Proof of Proposition 16.2. The arguments in this proof will come again
several times in the sequel, in various contexts.
Assume that (i) holds true. Consider x0 and x1 in M , and introduce
a constant-speed minimizing geodesic γ joining γ0 = x0 to γ1 = x1 .
Then
d2 2
F (γ t ) = ∇ F (γt ) · γ̇ t , γ˙t ≥ Λ(γt , γ̇t ).
dt2
Then Property (ii) follows from identity (16.5) with ϕ(t) := F (γt ).
As for Property (iii), it can be established either by dividing the
inequality in (ii) by t > 0, and then letting t → 0, or directly from (i)
by using the Taylor formula at order 2 with ϕ(t) = F (γt ) again. Indeed,
ϕ̇(0) = ∇F (γ0 ), γ̇0 , while ϕ̈(t) ≥ Λ(γt , γ̇t ).
To go from (iii) to (iv), replace the geodesic γt by the geodesic γ1−t ,
to get
1
F (γ0 ) ≥ F (γ1 ) − ∇F (γ1 ), γ̇1 + Λ(γ1−t , γ̇1−t ) (1 − t) dt.
0
Displacement convexity
λ t(1 − t)
∀t ∈ [0, 1] F (µt )≤(1−t) F (µ0 )+t F (µ1 )− W2 (µ0 , µ1 )2 ;
2
µ̇ + ∇ · (∇ψ µ) = 0.
These functions will come back again and again in the sequel, and the
associated functionals will be denoted by HN,ν .
If inequality (16.16) holds true, then
Hessµ Uν ≥ KΛU ,
where
ΛU (µ, µ̇) = |∇ψ|2 p(ρ) dν. (16.18)
M
So the conclusion is as follows:
Comparing the latter expression with (16.19) shows that RicN,ν (v0 ) ≥
K|v0 |2 . Since x0 and v0 were arbitrary, this implies RicN,ν ≥ K. Note
that this reasoning only used the functional HN,ν = (UN )ν , and prob-
ability measures µ that are very concentrated around a given point.
This heuristic discussion is summarized in the following:
Guess 16.7. If, for each x0 ∈ M , HN,ν is KΛU -displacement convex when
applied to probability measures that are supported in a small neighborhood
of x0 , then M satisfies the CD(K, N ) curvature-dimension bound.
(i) Ric ≥ 0;
(ii) If the nonlinearity U is such that the nonnegative iterated pres-
sure p2 is nonnegative, then the functional Uvol is displacement convex;
(iii) H is displacement convex;
(iii’) For any x0 ∈ M , the functional H is displacement convex
when applied to probability measures that are supported in a small
neighborhood of x0 .
Example 16.9. The above considerations also suggest that the in-
equality Ric ≥ Kg is equivalent to the K-displacement convexity of
the H functional, whatever the value of K ∈ R.
t=1
t=0
t = 1 /2
S = − ρ log ρ
t=0 t=1
Fig. 16.2. The lazy gas experiment: To go from state 0 to state 1, the lazy gas
uses a path of least action. In a nonnegatively curved world, the trajectories of the
particles first diverge, then converge, so that at intermediate times the gas can afford
to have a lower density (higher entropy).
Bibliographical notes
Displacement convexity II
In Chapter 16, a conjecture was formulated about the links between dis-
placement convexity and curvature-dimension bounds; its plausibility
was justified by some formal computations based on Otto’s calculus.
In the present chapter I shall provide a rigorous justification of this
conjecture. For this I shall use a Lagrangian point of view, in contrast
with the Eulerian approach used in the previous chapter. Not only is
the Lagrangian formalism easier to justify, but it will also lead to new
curvature-dimension criteria based on so-called “distorted displacement
convexity”.
The main results in this chapter are Theorems 17.15 and 17.37.
p(r)
(ii) is a nondecreasing function of r;
r1−1/N 9
δ N U (δ −N ) (δ > 0) if N < ∞
(iii) u(δ) := is a convex
eδ U (e−δ ) (δ ∈ R) if N = ∞
function of δ.
Remark 17.3. It is clear (from condition (i) for instance) that DCN ⊂
DCN if N ≥ N . So the smallest class of all is DC∞ , while DC1 is the
largest (actually, conditions (i)–(iii) are void for N = 1).
∀r ≥ 0, U (r) ≥ a r log r + b r.
(iii) Let N ∈ [1, ∞] and let U ∈ DCN . If r0 ∈ (0, +∞) is such that
p(r0 ) > 0, then there is a constant K > 0 such that p (r) ≥ Kr−1/N
for all r ≥ r0 . If on the contrary p(r0 ) = 0, then U is linear on [0, r0 ].
In particular, the set {r; U (r) = 0} is either empty, or an interval of
the form [0, r0 ].
(iv) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nondecreasing limit of a sequence of functions (U )∈N in DCN , such
that (a) U coincides with U on [0, r ], where r is arbitrarily large;
1
(b) for each there are a ≥ 0 and b ∈ R such that U (r) = −a r1− N +b r
(or a r log r + b r if N = ∞) for r large enough; (c) U (∞) → U (∞)
as → ∞.
(v) Let N ∈ [1, ∞] and let U ∈ DCN . Then U is the pointwise
nonincreasing limit of a sequence of functions (U )∈N in DCN , such
452 17 Displacement convexity II
U (r) U (r)
u(δ) = δ N Ψ (δ −N ).
r− N − 1 − 1
1
−1
N u (r N ) − (N − 1) u (r N )
p (r) = r U (r) = r
N2 ⎛ ⎞
−N
1
(N − 1) u (r0 )⎠
≥ −⎝ r− N .
1
N2
−1/N
If on the other hand u (r0 ) = 0, then necessarily u (r−1/N ) = 0 for
−1/N
all r ≤ r0 , which means that u is constant on [r0 , +∞), so U is
linear on [0, r0 ].
The reasoning is the same in the case N = ∞, with the help of the
formulas
1 1 1 1
p(r) = −r u log , U (r) = u log − u log
r r r r
and 1
r ≥ r0 =⇒ p (r) = r U (r) ≥ −u log .
r0
Displacement convexity classes 455
Since u is nondecreasing and u (a+1 ) < u (a /2), the curve T lies
strictly below the curve T+1 on [0, aa /2], and therefore on [0, a+1 ]. By
choosing χ in such a way that a /2 χ u is very small, we can make
sure that u is very close to the line T (s) on [0, a ]; and in particular
that the whole curve u is bounded above by T+1 on [a+1 , a ]. This
will ensure that u is a nondecreasing function of .
To recapitulate: u ≤ u; u+1 ≤ u ; u = u on [a , +∞); 0 ≤ u ≤ u ;
0 ≥ u ≥ u ; u is affine on [0, a /2].
Now let
U (r) = r u (r−1/N ).
By direct computation,
r−1− N − 1 − 1
1
U (r) = r
−1
N u (r N ) − (N − 1) u (r N ) .
(17.4)
N2
Since u is convex nonincreasing, the above expression is nonnegative;
so U is convex, and by construction, it lies in DCN . Moreover U sat-
isfies the first requirement in (vi), since U (r) is bounded above by
(r−1−1/N /N 2 ) (r−1/N u (r−1/N ) − (N − 1) u (r−1/N )) = U (r).
456 17 Displacement convexity II
r−1− N
1
U (r) r− N χ (r− N ) r N − (N − 1) u (r)
1 1 1
=
N2
r−1− N
1
≤ 1 + (N − 1)a ;
N2
N −1
a r−1− N .
1
U (r) =
N2
So C = 1 + 1/((N − 1)a) is admissible.
Case 3: u is not affine at infinity and u (+∞) = 0. The proof is
based again on the same principle, but modified as follows:
• on [0, a ], u coincides with u;
• on [a , +∞), u (s) = ζ (s) u (s), where ζ is a smooth function
identically equal to 1 close to a , identically equal to 0 at infinity,
with values in [0, 2].
Choose a < b < c , and ζ supported in [a , c ], so that 1 ≤ ζ ≤ 2
on [a , b ], 0 ≤ ζ ≤ 2 on [b , c ], and
b c
ζ (s) u (s) ds > u (b ) − u (a ); ζ (s) u (s) ds = −u (a ).
a a
+∞
This is possible since u and u are continuous and a (2u (s)) ds =
2(u (+∞) − u (a )) > u (+∞) − u (a ) > 0 (otherwise u would be affine
on [a , +∞)). Then choose a+1 > c + 1.
The resulting function u is convex and it satisfies u (+∞) =
u (a ) − u (a ) = 0, so u ≤ 0 and u is constant at infinity.
true by choosing a1 large enough thatu (a1 ) ≥ 3u (+∞). Then u (s) ≥
u (b ) + u (+∞) (s − b ) ≥ u (b ) + b u = u(s); so u ≥ u also on
s
[b , +∞).
Define U (r) = r u (r−1/N ). All the desired properties of U can
be shown just as before, except for the bound on U , which we shall
now check. On [a−N −N
, +∞), U = U . On [0, a ), with the notation
a = −u (+∞), we have u (r −1/N
) ≥ −a, u (r −1/N ) ≥ −3a (recall that
we imposed u (a1 ) ≥ −3a), so
r−1− N − 1 − 1
1
r−1− N − 1 − 1
1
U (r) ≥ r N u (r N ) + (N − 1)a ,
N2
and once again U ≤ CU with C = 3 + 1/((N − 1)a).
U (r) ≤ C(N, r0 ) u (r−1/N ) − u (r−1/N )
≤ 2C(N, r0 ) u (r−1/N ) − u (r−1/N )
and
U (r) ≥ K(N, r0 ) u (r−1/N ) − u (r−1/N ) ,
where C(N, r0 ), K(N, r0 ) are positive constants. Finally, on [b , c ],
u = η (−u ) is decreasing close to c (indeed, η is decreasing close to
c , and −u is positive nonincreasing); and of course −u is decreasing
as well. So u (r−1/N ) and −u (r−1/N ) are increasing functions of r in
a small interval [r0 , r1 ]. This concludes the argument.
To prove (vi), we may first approximate u by a C ∞ convex, non-
increasing function u , in such a way that u − u C 2 ((a,b)) → 0 for
any a, b > 0. This can be done in such a way that u (s) is nonde-
creasing for small s and nonincreasing for large s; and u (0) → u (0),
u (+∞) → u (+∞). The conclusion follows easily since p(r)/r1−1/N is
nondecreasing and equal to −(1/N )u (r−1/N ) (−u (log 1/r) in the case
N = ∞).
Theorem 17.8 below solves this issue: It shows that under some
integral growth condition on ν, the quantity Uν (µ) is well-defined if
µ has finite moments of order p large enough. This suggests that we
study Uν on the set Ppac (M ) of absolutely continuous measures with
finite moment of order p, rather than on the whole space P2ac (M ).
Since this theorem only uses the metric structure, I shall state it in
the context of general Polish spaces rather than Riemannian manifolds.
⎪
⎪
⎪
⎪ e−c d(x0 ,x) dν(x) < +∞
p
⎩∃ c > 0; if N = ∞.
M
(17.5)
Then, for any U ∈ DCN , the formula
Uν (µ) = U (ρ) dν, µ = ρν
X
real number greater or equal than 2, satisfying (17.5) (the metric space
(X , d) and the reference measure ν should be obvious from the context);
or the symbol “c”, so that Pp (X ) stands for the set Pc (X ) of compactly
supported probability measures.
Proof of Theorem 17.8. The problem is to show that under the assump-
tions of the theorem,
U (ρ) is bounded below by a ν-integrable function;
then Uν (µ) = U (ρ) dν will be well-defined in R ∪ {+∞}.
Suppose first that N < ∞. By convexity of u, there is a constant
A > 0 so that δ N U (δ −N ) ≥ −Aδ − A, which means
1
U (ρ) ≥ −A ρ + ρ1− N . (17.6)
−cd(x0 ,x)p X ρdν X ρdν
e dν(x) log −cd(x ,x)p
−cd(x0 ,x)p dν(x)
X e X e dν(x)
0
X
−c d(x0 , x)p ρ(x)dν(x).
X
I shall often write Hν instead of H∞,ν ; and I may even write just H if
the reference measure is the volume measure. This notation
is justified
by analogy with Boltzmann’s H functional: H(ρ) = ρ log ρ dvol.
For each U ∈ DCN , formula (16.18) defines a functional ΛU which
will later play a role in displacement convexity estimates. It will be
convenient to compare this quantity with ΛN := ΛUN ; explicitly,
1
ΛN (µ, v) = |v(x)|2 ρ1− N (x) dν(x), µ = ρ ν. (17.9)
M
Remark 17.16. Statement (ii) means, explicitly, that for any displace-
ment interpolation (µt )0≤t≤1 in Ppac (M ), and for any t ∈ [0, 1],
1
Uν (µt ) + KN,U
1
s (x)|2 dν(x) G(s, t) ds
ρs (x)1− N |∇ψ
0 M
≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ), (17.11)
0,s
where ρt is the density of µt , ψs = H+ ψ (Hamilton–Jacobi semigroup),
ψ is d /2-convex, exp(∇ψ) is the Monge transport µ0 → µ1 , and KN,U
2
is defined by (17.10).
Core of the proof of Theorem 17.15. Before giving a complete proof, for
pedagogical reasons I shall give the main argument behind the impli-
cation (i) ⇒ (ii) in Theorem 17.15, in the simple case K = 0.
Let (µt )0≤t≤1 be a Wasserstein geodesic, with µt absolutely contin-
uous, and let ρt be the density of µt with respect to ν. By change of
variables,
ρ0
U (ρt ) dν = U Jt dν,
Jt
where Jt is the Jacobian of the optimal transport taking µ0 to µt . The
next step consists in rewriting this as a function of the mean distortion.
Let u(δ) = δ N U (δ −N ), then
⎛ 1⎞
ρ0 Jt JN
U ρ0 dν = u ⎝ t1 ⎠ ρ0 dν.
Jt ρ 0 ρN
0
Displacement convexity from curvature bounds, revisited 467
Note that the last integral is finite since |∇ψs (x)|2 is almost surely
bounded by D2 , where D is the maximum distance between elements
1− 1
of Spt(µ0 ) and elements of Spt(µ1 ); and that ρs N dν ≤ ν[Spt µs ]1/N
by Jensen’s inequality.
If either Uν (µ0 ) = +∞ or Uν (µ1 ) = +∞, then there is nothing to
prove; so let us assume that these quantities are finite.
Let t0 be a fixed time in (0, 1); on Tt0 (M ), define, for all t ∈ [0, 1],
Tt0 →t expx (t0 ∇ψ(x)) = expx (t∇ψ(x)).
Upon integration against µt0 and use of Fubini’s theorem, this inequal-
ity becomes
This concludes the proof of Property (ii) when µ0 and µ1 have com-
pact support. In a second step I shall relax this compactness as-
sumption by a restriction argument. Let p ∈ [2, +∞) ∪ {c} satisfy the
assumptions of Theorem 17.8, and let µ0 , µ1 be two probability mea-
sures in Ppac (M ). Let (Z )∈N , (µt, )0≤t≤1, ∈N (ψt, )0≤t≤1, ∈N be as in
Proposition 13.2. Let ρt, stand for the density of µt, . By Remark 17.4,
the function U : r → U (Z r) belongs to DCN ; and it is easy to check
1
1−
that KN,U = Z N KN,U . Since the measures µt, are compactly sup-
ported, we can apply the previous inequality with µt replaced by µt,
and U replaced by U :
U (Z ρt, ) dν ≤ (1 − t) U (Z ρ0, ) dν + t U (Z ρ1, ) dν
1
1
1− N 1
− Z KN,U ρs, (y)1− N |∇ψs, (y)|2 dν(y) G(s, t) ds. (17.16)
0 M
On the other hand, the proof of Theorem 17.8 shows that U− (r) ≤
1
A(r + r1− N ) for some constant A = A(N, U ); so
1− 1 1− 1 1− 1
U− (Z ρt, ) ≤ A Z ρt, + Z N ρt, N ≤ A ρt + ρt N . (17.17)
By the proof of Theorem 17.8 and Remark 17.12, the function on the
right-hand side of (17.17) is ν-integrable, and then we may pass to the
limit by dominated convergence. To summarize:
U+ (Z ρt, ) dν −−−→ U+ (ρt ) dν by monotone convergence;
→∞
U− (Z ρt, ) dν −−−→ U− (ρt ) dν by dominated convergence.
→∞
µ0 = ρ0 ν; µt = exp(t∇ψ)# µ0 .
δ̈(t, x) 1 2
= − θ RicN,ν (v0 ) + O(θ3 ) . (17.20)
δ(t, x) N
By repeating the proof of (i) ⇒ (ii) with U = UN and using (17.20),
one obtains
472 17 Displacement convexity II
1
then
1− 1
t |2 dν
ρt N |∇ψ (1 − t) dt < +∞.
0 M
More precisely, there are constants C = C(N, q, δ) > 0 and θ =
θ(N, q, δ) ∈ (0, 1 − 1/N ) such that
1− 1
(1 − t) t |2 dν
ρt N |∇ψ (17.25)
M
1
C ν(dx) N
≤ W 2 (µ0 , µ 1 )2θ
(1 − t)1−2θ (1 + d(z, x))q(N −1)−2N −δ
(1−θ)− 1
N
× 1 + d(z, x) µ0 (dx) + d(z, x) µ1 (dx)
q q
.
Proof of Proposition 17.24. First, |∇ψ t (x)| = d(x, Tt→1 (x))/(1 − t),
where Tt→1 is the optimal transport µt → µ1 . So ρt |∇ψ t |2 dν =
W2 (µt , µ1 )2 /(1 − t)2 = W2 (µ0 , µ1 )2 . This proves (i).
To prove (ii), I shall start from
1
(1 − t) ρt (x)1− N |∇ψ t (x)|2 ν(dx)
1 1
= ρt (x)1− N d(x, Tt→1 (x))2 ν(dx).
1−t
Let us first see how to bound the integral in the right-hand side, with-
out worrying about the factor (1 − t)−1 in front. This can be done
with the help of Jensen’s inequality, in the same spirit as the proof of
Theorem 17.8: If r ≥ 0 is to be chosen later, then
1
ρt (x)1− N d(x, Tt→1 (x))2 ν(dx)
N −1
2(NN
N
≤ ρt (x) 1 + d(z, x) r
d(x, Tt→1 (x)) −1 ) ν(dx)
1
N
ν(dx)
× N −1
1 + d(z, x)r
Displacement convexity from curvature bounds, revisited 475
N −1
2( N ) N
≤C r
ρt (x) 1+d(z, x) d(z, x)+d(z, Tt→1 (x) N −1 ν(dx)
N −1
N
r+2(NN
−1 )
r+2(NN
−1 )
≤C ρt (x) 1+d(z, x) +d(z, Tt→1 (x)) ν(dx)
N −1
N
r+2(NN ) r+2(NN )
= C 1 + ρt (x) d(z, x) −1 + ρ1 (y) d(z, y) −1 ν(dy) ,
where 1
N
ν(dx)
C = C(r, N, ν) = C(r, N ) N −1 ,
1 + d(z, x)r
and C(r, N ) stands for some constant depending only on r and N . By
Remark 17.12, the previous expression is bounded by
N −1
N
r+2(NN ) r+2 ( N
)
C(r, N, ν) 1+ d(z, x) −1 µ (dx)+
0 d(z, x) N −1 µ (dx)
1 ;
1
N
ν(dx)
× q(N −1)−2N .
1 + d(z, x)
This estimate 1 is not enough to establish the convergence of the time-
integral, since 0 dt/(1−t) = +∞; so we need to gain some cancellation
as t → 1. The idea is to interpolate with (i), and this is where δ will be
useful. Without loss of generality, we may assume δ < q(N − 1) − 2N .
Let θ ∈ (0, 1 − 1/N ) to be chosen later, and N = (1 − θ)N ∈ (1, ∞).
By Hölder’s inequality,
1 1
ρt (x)1− N d(x, Tt→1 (x))2 ν(dx)
1−t
θ
1
≤ 2
ρt (x) d(x, Tt→1 (x)) ν(dx)
1−t
1−θ
1− N1 2
ρt (x) d(x, Tt→1 (x)) ν(dx)
476 17 Displacement convexity II
1−θ
1 1− N1
= W2 (µt , µ1 )2θ ρt (x) 2
d(x, Tt→1 (x)) ν(dx)
1−t
1−θ
1 1− N1
= W2 (µ0 , µ1 )2θ ρt (x) 2
d(x, Tt→1 (x)) ν(dx) .
(1 − t)1−2θ
In Theorem 17.15, all the 1 influence of the Ricci curvature bounds lies
in the additional term 0 (. . .) G(s, t) ds. As a consequence, as soon as
K = 0 and N < ∞, the formulation involves not only µt , µ0 and µ1 , but
the whole geodesic path (µs )0≤s≤1 . This makes the exploitation of the
resulting inequality (in geometric applications, for instance) somewhat
delicate, if not impossible.
I shall now present a different formulation, expressed only in terms of
µt , µ0 and µ1 . As a price to pay, the functionals Uν (µ0 ) and Uν (µ1 ) will
be replaced by more complicated expressions in which extra distortion
coefficients will appear. From the technical point of view, this new
formulation relies on the principle that one can “take the direction of
motion out”, in all reformulations of Ricci curvature bounds that were
examined in Chapter 14.
Remark 17.26. Most of the time, I shall use Definition 17.25 with
(K,N )
β = βt , that is, the reference distortion coefficients introduced in
Definition 14.19.
β
then the integral Uπ,ν (µ) appearing in Definition 17.25 makes sense in
R ∪ {+∞} as soon as µ ∈ Ppac (X ).
β
Even if there is no such p, Uπ,ν (µ) still makes sense in R ∪ {+∞}
if µ ∈ Pcac (X ) and ν is finite on compact sets.
We already know by the proof of Theorem 17.8 that (ρ log ρ)− and
ρ lie in L1 (ν), or equivalently in L1 (π(dy|x) ν(dx)). To check the in-
tegrability of the negative part of −a ρ log β(x, y), it suffices to note
that
ρ(x) (log β(x, y))+ π(dy|x) ν(dx) ≤ (log β(x, y))+ π(dy|x) µ(dx)
= (log β(x, y))+ π(dx dy),
whichisfinitebyassumption.ThisconcludestheproofofTheorem17.28.
(K,N )
β
Application 17.29 (Definition of Uπ,ν t
). Let X = M be a
Riemannian manifold satisfying a CD(K, N ) curvature-dimension
(K,N )
bound, let t ∈ [0, 1] and let β = βt be the distortion coefficient
defined in (14.61)–(14.64).
Ricci curvature bounds from distorted displacement convexity 479
(K,N )
• If K ≤ 0, then βt is bounded;
• If K > 0, N < ∞ and diam (M ) < DK,N := π (N − 1)/K, then β
is bounded;
(K,N )
• If K > 0 and N = ∞, then log βt (x, y) is bounded above by a
constant multiple of d(x, y)2 , which is π(dx dy)-integrable whenever
π is an optimal coupling arising in some displacement interpolation.
β
In all three cases, Theorem 17.28 shows that Uπ,ν is well-defined in
R ∪ {+∞}, more precisely that the integrand entering the definition
is bounded below by an integrable function. The only remaining cases
are (a) when K > 0, N < ∞ and diam (M ) coincides with the limit
Bonnet–Myers diameter DK,N ; and (b) when N = 1. These two cases
are covered by the following definition.
β
Convention 17.30 (Definition of Uπ,ν in
the limit cases). If ei-
ther (a) K > 0, N < ∞ and diam (M ) = π (N − 1)/K or (b) N = 1,
I shall define
(K,N ) (K,N )
t β tβ
Uπ,ν (µ) = lim
Uπ,ν (µ). (17.31)
N ↓N
(K,N )
Remark 17.31. The limit in (17.31) is well-defined; indeed, βt is
increasing as N decreases, and U (r)/r is nondecreasing as a function
(K,N ) (K,N )
of r; so U (ρ(x)/βt (x, y)) βt (x, y) is a nonincreasing function
of N and the limit in (17.31) is monotone. The monotone convergence
theorem guarantees that this definition coincides with the original defi-
nition (17.27) when it applies, i.e. when the integrand is bounded below
by a π(dy|x) ν(dx)-integrable function.
Remark
0 17.33. If diam (M ) = DK,N then actually M is the sphere
N N −1
S ( K ) and ν = vol; but we don’t need this information. (The as-
sumption of M being complete without boundary is important for this
statement to be true, otherwise the one-dimensional reference spaces
of Examples 14.10 provide a counterexample.) See the end of the bib-
liographical notes for more explanations. In the case N = 1, if M is
distinct from a point then it is one-dimensional, so it is either the real
line or a circle.
Before explaining the proof of this theorem, let me state two very
natural open problems (I have no idea how difficult they are).
Open Problem 17.39. Theorem 17.15 and 17.37 yield two different
upper bounds for Uν (µt ): on the one hand,
Can one compare those two bounds, and if yes, which one is sharpest?
Proof of Theorem 17.37. The proof shares many common points with
the proof of Theorem 17.15. I shall restrict to the case N < ∞, since
the case N = ∞ is similar.
Let us start with the implication (i) ⇒ (ii). In a first step, I shall
assume that µ0and µ1 are compactly supported, and (if K > 0)
diam (M ) < π (N − 1)/K. With the same notation as in the be-
ginning of the proof of Theorem 17.15,
U (ρt (x)) dν(x) = u(δt0 (t, x)) dµt0 (x).
M M
Proposition 13.2. Let t ∈ [0, 1] be fixed. By the first step, applied with
the probability measures µt, and the nonlinearity U : r → U (Z r),
(K,N ) (K,N )
β β
(U )ν (µt, ) ≤ (1 − t) (U )π1−t
,ν (µ0, ) + t (U )π̌t ,ν (µ1, ). (17.35)
Theorem 17.28 (together with Application 17.29) shows that the latter
quantity is always integrable. As a conclusion,
Z ρ0, (x0 )
U+ β(x0 , T (x0 )) ν(dx0 )
β(x0 , T (x0 ))
ρ0 (x0 )
−−−→ U+ β(x0 , T (x0 )) ν(dx0 )
→∞ β(x0 , T (x0 ))
Let us see how this expression behaves in the limit θ → 0; for instance
I shall focus on the first term in (17.39). From the definitions,
(K,N )
β1−t 1 (K,N ) 1
HN,π,ν (µ0 ) − HN,ν (µ0 )=N ρ0 (x)1− N 1 − β1−t (x, T (x)) N dν(x),
(17.40)
where T = exp(∇ψ) is the optimal transport from µ0 to µ1 . A standard
Taylor expansion shows that
Proof of Theorem 17.41. When K > 0, (i) is obviously false since ν has
to be equal to vol (otherwise Ric1,ν will take values −∞); but (ii) is
(K,1)
obviously false too since βt = +∞ for 0 < t < 1. So we may assume
that K ≤ 0. Then the proof of (i) ⇒ (ii) is along the same lines as in
Theorem 17.37. As for the implication (ii) ⇒ (i), note that DCN ⊂ DC1
for all N < 1, so M satisfies Condition (ii) in Theorem 17.37 with N
replaced by N , and therefore RicN ,ν ≥ Kg. If N < 2, this forces M
to be one-dimensional. Moreover, if V is not constant there is x0 such
that RicN ,ν = V − (V )2 /(N − 1) is < K for N small enough. So V
is constant and Ric1,ν = Ric = 0, a fortiori Ric1,ν ≥ K.
that (17.30) holds true with ν = vol and N = n. Then the following
three statements are equivalent:
(i) β ≤ β;
(ii) For any U ∈ DCn , the functional Uν is displacement convex on
ac
Pp (M ) with distortion coefficients (βt );
(iii) The functional Hn is locally displacement convex with distortion
coefficients (βt ).
The proof of this theorem is along the same lines as the proof of
Theorem 17.37, with the help of Theorem 14.20; details are left to the
reader.
Bibliographical notes
I shall conclude with some comments about Remark 17.33. The classi-
cal Cheng–Toponogov theorem states the following: If a Riemannian
manifold M has dimension N , Ricci curvature bounded below by
K > 0, and diameter equal to the limit Bonnet–Myers diameter
DK,N = π (N − 1)/K, then it is a sphere. I shall explain why this
result remains true when the reference measure is not the volume, and
M is assumed to satisfy CD(K, N ). Cheng’s original proof was based on
eigenvalue comparisons, but there is now a simpler argument relying on
the Bishop–Gromov inequality [846, p. 229]. This proof goes through
when the volume measure is replaced by another reference measure ν,
and then one sees that Ψ = − log(dν/dvol) should solve a certain dif-
ferential equation of Ricatti type (replace the inequality in [573, (4.11)]
by an equality). Then the second fundamental forms of the spheres in
M have to be the same as in S N , and one gets a formula for the radial
derivative of Ψ . After some computations, one finds that M is an n-
sphere of diameter DK,N ; and that the measure ν, in coordinates (r, θ)
from the north pole, is c (sin(kr))N −n times the volume, where c > 0
and k = K/(N − 1). If n < N , the density of dν/dvol vanishes at the
north pole, which is not allowed by our assumptions. The only possi-
bility left out is that M has dimension N and ν is a constant multiple
of the volume. All this was explained to me by Lott.
18
Volume control
Remark 18.3. It does not really matter whether the definition is for-
mulated in terms of open or in terms of closed balls; at worst this
changes the value of the constant D.
When the distance d and the reference measure ν are clear from the
context, I shall often say that the space X is doubling (resp. locally
doubling), instead of writing that the measure ν is doubling on the
metric space (X , d).
C. Villani, Optimal Transport, Grundlehren der mathematischen 493
Wissenschaften 338,
c Springer-Verlag Berlin Heidelberg 2009
494 18 Volume control
Fig. 18.1. The natural volume measure on this “singular surface” (a balloon with
a spine) is not doubling.
Proof. Let x ∈ X , and let r > 0. Since ν is nonzero, there is R > 0 such
that ν[BR] (x)] > 0. Then there is a constant C, possibly depending on
x and R, such that ν is C-doubling inside BR] (x). Let n ∈ N be large
enough that R ≤ 2n r; then
0 < ν[BR] (x)] ≤ C n ν[Br] (x)].
So ν[Br] (x)] > 0. Since r is arbitrarily small, x has to lie in the support
of ν.
• If N = ∞,
1 1 1
log ≤ (1 − t) log + t log
ν [A0 , A1 ]t ν[A 0 ] ν[A 1]
Kt(1 − t)
− sup d(x0 , x1 )2 . (18.5)
2 x0 ∈A0 , x1 ∈A1
Detailed proof of Theorem 18.5. First consider the case N < ∞. For
(K,N )
brevity I shall write just βt instead of βt . By regularity of the
Distorted Brunn–Minkowski inequality 497
where βt stands for the minimum of βt (x0 , x1 ) over all pairs (x0 , x1 ) ∈
A0 × A1 . Then, by explicit computation,
498 18 Volume control
1 1 1 1
1− N
ρ0 (x0 ) dν(x0 ) = ν[A0 ] , N ρ1 (x1 )1− N dν(x1 ) = ν[A1 ] N .
M M
Bishop–Gromov inequality
The Bishop–Gromov inequality states that the volume of balls in a
space satisfying CD(K, N ) does not grow faster than the volume of
balls in the model space of constant sectional curvature having Ricci
curvature equal to K and dimension equal to N . In the case K = 0, it
takes the following simple form:
ν[Br (x)]
is a nonincreasing function of r.
rN
(It does not matter whether one considers the closed or the open ball
of radius r.) In the case K > 0 (resp. K < 0), the quantity on the
left-hand side should be replaced by
ν[Br (x)]
7 N −1 ,
r
K
sin t dt
0 N −1
Bishop–Gromov inequality 499
ν[Br (x)]
resp. 7 N −1 .
r
|K|
sinh t dt
0 N −1
Here is a precise statement:
for K < 0 the same formula remains true with sin replaced by sinh, K
by |K| and r + ε by r − ε. In the sequel, I shall only consider K > 0, the
treatment of K < 0 being obviously similar. After applying the above
bounds, inequality (18.4) yields
⎛ 0 ⎞ NN−1
1 K
1
N ⎜ sin t N −1 (r + ε) ⎟ N
ν Bt(r+ε) \ Btr ≥t⎝ 0 ⎠ ν Br+ε \ Br ;
K
t sin N −1 (r + ε)
If φ(r) stands for ν[Br ], then the above inequality can be rewritten as
φ(tr + tε) − φ(tr) φ(r + ε) − φ(r)
(K,N )
≥ .
εs (t(r + ε)) ε s(K,N ) (r + ε)
In the limit ε → 0, this yields
φ (tr) φ (r)
≥ .
s(K,N ) (tr) s(K,N ) (r)
This was for any t ∈ [0, 1], so φ /s(K,N ) is indeed nonincreasing, and
the proof is complete.
The following lemma was used in the proof of Theorem 18.8. At first
sight it seems obvious and the reader may skip its proof.
Lemma 18.9. Let a < b in R ∪ {+∞}, let g : (a, b) → R+ be a pos-
r
itive continuous function, integrable at a, and let G(r) = a g(s) ds.
Let F : [a, b) → R+ be a nondecreasing measurable function satisfying
F (a) = 0, and let f (r) = d+ F/dr be its upper derivative. If f /g is
nonincreasing then also F/G is nonincreasing.
Doubling property 501
This implies x y
f f
x ≤ xy ,
a
a g x g
and (18.9) follows.
Exercise 18.10. Give an alternative proof of the Bishop–Gromov in-
equality for CD(0, N ) Riemannian manifolds, using the convexity of
t −→ t Uν (µt ) + N t log t, for U ∈ DCN , when (µt )0≤t≤1 is a displace-
ment interpolation.
Doubling property
From Theorem 18.8 and elementary estimates on the function s(K,N )
it is easy to deduce the following corollary:
Corollary 18.11 (CD(K, N ) implies doubling). Let M be a Rie-
mannian manifold equipped with a reference measure ν = e−V vol, sat-
isfying a curvature-dimension condition CD(K, N ) for some K ∈ R,
1 < N < ∞. Then ν is doubling with a constant C that is:
• uniform and no more than 2N if K ≥ 0;
• locally uniform and no more than 2N D(K, N, R) if K < 0, where
: 7 ;N −1
|K|
D(K, N, R) = cosh 2 R , (18.10)
N −1
where V (r) is the volume of Br (x) in the model space. This implies
that ν[Br (x)] is a continuous function of r. Of course, this property
is otherwise obvious, but the Bishop–Gromov inequality provides an
explicit modulus of continuity.
Dimension-free bounds
r2
ν[Br (x0 )] ≤ eCr e(K− ) 2 ; (18.11)
r2
ν[Br+δ (x0 ) \ Br (x0 )] ≤ eCr e−K 2 if K > 0. (18.12)
In particular, if K < K then
K 2
e 2 d(x0 ,x) ν(dx) < +∞. (18.13)
Proof of Theorem 18.12. For brevity I shall write Br for Br] (x0 ). Ap-
ply (18.5) with A0 = Bδ , A1 = Br , and t = δ/(2r) ≤ 1/2. For any
minimizing geodesic γ going from A0 to A1 , one has d(γ0 , γ1 ) ≤ r+δ, so
1 δ 1 δ 1 K− δ δ
log ≤ 1− log + log + 1− (r+δ)2 .
ν[B2δ ] 2r ν[Bδ ] 2r ν[Br ] 2 2r 2r
Bibliographical notes
Remark 19.3. The word “local” in Definition 19.1 means that the
inequality is interested in averages around some point x0 . This is in
contrast with the “global” Poincaré inequalities that will be considered
later in Chapter 21, in which averages are over the whole space.
(19.2)
−N
1
−N
1 −N
where by convention (1 − t) a + tb = 0 if either a or b
is 0;
• If N = ∞, one has the pointwise bound
Kt(1 − t)
ρt (x) ≤ sup ρ0 (x0 ) ρ1 (x1 ) exp −
1−t t 2
d(x0 , x1 ) .
x∈[x0 ,x1 ]t 2
(19.3)
Corollary 19.5 (Preservation of uniform bounds in nonnegative
curvature). With the same notation as in Theorem 19.4, if K ≥ 0
then
ρt L∞ (ν) ≤ max ρ0 L∞ (ν) , ρ1 L∞ (ν) .
508 19 Density control and local regularity
As I said before, there are (at least) two possible schemes of proof
for Theorem 19.4. The first one is by direct application of the Jacobian
estimates from Chapter 14; the second one is based on the displacement
convexity estimates from Chapter 17. The first one is formally simpler,
while the second one has the advantage of being based on very robust
functional inequalities. I shall only sketch the first proof, forgetting
about regularity issues, and give a detailed treatment of the second one.
Similarly,
ρ0 (x0 ) = ρ1 (x1 ) J (1, x0 ).
Then the result follows directly from Theorems 14.11 and 14.12: Apply
1
equation (14.56) if N < ∞, (14.55) if N = ∞ (recall that D = J N ,
= − log J ).
Z = γ ∈ Γ (M ); γt ∈ Bδ (y) .
Further, define π = law (γ0 , γ1 ), µs = law (γs ) = (es )# Π . Obviously,
Pointwise estimates on the interpolant density 509
Π Π
Π ≤ = ,
Π[Z] µt [Bδ (y)]
since
1γt ∈Bδ (y) 1x∈Bδ (y) ((et )# Π)
(et )# = .
µt [Bδ (y)] µt [Bδ (y)]
(This is more difficult to write down than to understand!)
From the restriction property (Theorem 4.6), (γ0 , γ1 ) is an optimal
coupling of (µ0 , µ1 ), and therefore (µs )0≤s≤1 is a displacement interpo-
1
lation. By Theorem 17.37 applied with U (r) = −r1− N ,
(ρt )1− N (ρ0 (x0 ))− N β1−t (x0 , x1 ) N π (dx0 dx1 )
1 1 1
dν ≥ (1 − t)
M M ×M
(ρ1 (x1 ))− N βt (x0 , x1 ) N π (dx0 dx1 ). (19.6)
1 1
+ t
M ×M
On the other hand, from (19.4) the right-hand side of (19.6) can be
bounded below by
(1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
1 1 1
µt [Bδ (y)] N
M ×M
1
+ t (ρ1 (x1 ))− N βt (x0 , x1 ) N π (dx0 dx1 )
1
= µt [Bδ (y)] N E (1 − t) (ρ0 (γ0 ))− N β1−t (γ0 , γ1 ) N
1 1 1
+ t (ρ1 (γ1 ))− N βt (γ0 , γ1 ) N
1 1
(1 − t) (ρ0 (γ0 ))− N β1−t (γ0 , γ1 ) N
1 1 1
≥ µt [Bδ (y)] N E inf
γt ∈[x0 ,x1 ]t
+ t (ρ1 (γ1 ))− N βt (γ0 , γ1 ) N , (19.9)
1 1
where the last inequality follows just from the (obvious) remark that
γt ∈ [γ0 , γ1 ]t . In all of these inequalities, we can restrict π to the set
{ρ0 (x0 ) > 0, ρ1 (x1 ) > 0} which is of full measure.
Let
%
(1 − t) (ρ0 (x0 ))− N β1−t (x0 , x1 ) N
1 1
F (x) := inf
x∈[x0 ,x1 ]t
&
−N
1 1
+ t (ρ1 (x1 )) βt (x0 , x1 ) ;
N
provided that ρt (y) > 0; and then ρt (y) ≤ F (y)−N , as desired. In the
case ρt (y) = 0 the conclusion still holds true.
Some final words about measurability. It is not clear (at least to me)
that F is measurable; but instead of F one may use the measurable
function
x∈[x0 ,x1 ]t
−N
+ t βt (x0 , x1 ) N ρ1 (x1 )− N
1 1
. (19.12)
Clearly, the sum above can be restricted to those pairs (x0 , x1 ) such that
x1 lies in the support of µ1 , i.e. x1 ∈ B; and x0 lies in the support of µt0 ,
Pointwise estimates on the interpolant density 513
ρ1 (x1 )
= sup .
x∈[x0 ,x1 ]t ; x0 ∈[z0 ,B]t0 ; x1 ∈B tN βt (x0 , x1 )
Since ρ1 = 1B /ν[B], actually
S(t0 , z0 , B)
ρt (x) ≤ ,
tN ν[B]
where
Democratic condition
Before giving the proof of Theorem 19.13 I shall state a corollary which
follows immediately from this theorem together with Corollary 18.11 and
Theorem 19.10:
However, a geodesic joining two points in B cannot leave the ball 2B,
so (19.23) and the democratic condition together imply that
2C r
− |u − uB | dν ≤ g dν. (19.24)
ν[B] 2B
B
Now integrate this against ρt (x) dν(x): since the right-hand side
does not depend on x any longer,
% &
1 1 1
ρt (x)1− N
dν(x) ≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
x∈[A0 ,A1 ]t
% &
1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N .
x∈[A0 ,A1 ]t
Proof ofTheorem
19.16. By an easy homogeneity argument, we may
assume f = g = 1. Then write ρ0 = f , ρ1 = g; by Theorem 19.4,
the displacement interpolant ρt between
ρ0ν and ρ1 ν satisfies (19.3).
From (19.26), h ≥ ρt . It follows that h ≥ ρt = 1, as desired.
Bibliographical notes
with similar tools (namely, the Jacobian estimates in Chapter 14). The
presentation that I have adopted makes it clear that the Prékopa–
Leindler inequality, and even the stronger pointwise bounds in
Theorem 19.4, can really be seen as a consequence of displacement con-
vexity inequalities (together with the restriction property). This deter-
mination to derive everything from displacement convexity, rather than
directly from Jacobian estimates, will find a justification in Part III of
this course: In some sense the notion of displacement convexity is softer
and more robust.
In RN , there is also a “stronger” version of Theorem 19.18 in which
the exponent q can go down to −1/(N − 1) instead of −1/N ; it reads
h (1 − t)x0 + tx1 ≥ Mqt (f (x0 ), g(x1 )) =⇒
q
−1)
1 1
h(z) dz ≥ M1+q(N
t mi (f ), mi (g) · M1
t f, g ,
mi (f ) mi (g)
(19.33)
It was recently shown by Bobkov and Ledoux [132] that this inequality
can be used to establish optimal Sobolev inequalities in RN (with the
usual Prékopa–Leindler inequality one can apparently reach only the
logarithmic Sobolev inequality, that is, the dimension-free case [131]).
See [132] for the history and derivation of (19.33).
20
on a nonsmooth length space, there are still natural definitions for the
norm of the gradient, |∇f |. The most common one is
|f (y) − f (x)|
|∇f |(x) := lim sup . (20.1)
y→x d(x, y)
Rigorously speaking, this formula makes sense only if x is not isolated,
which will always be the case in the sequel. A slightly finer notion is
the following:
[f (y) − f (x)]−
|∇− f |(x) := lim sup , (20.2)
y→x d(x, y)
where a− = max(−a, 0) stands for the negative part of a (which is
a nonnegative number!). It is obvious that |∇− f | ≤ |∇f |, and both
notions coincide with the usual one if f is differentiable. Note that
|∇− f |(x) is automatically 0 if x is a local minimum of f .
Theorem 20.1 (Differentiating an energy along optimal trans-
port). Let (X , d, ν) and U be as above, and let (µt )0≤t≤1 be a geodesic
in P2 (X ), such that each µt is absolutely continuous with respect to ν,
with density ρt , and U (ρt )− is ν-integrable for all t. Further assume
that ρ0 is Lipschitz continuous, U (ρ0 ) and ρ0 U (ρ0 ) are ν-integrable,
and U is Lipschitz continuous on ρ0 (X ). Then
% &
Uν (µt ) − Uν (µ0 )
lim inf ≥
t↓0 t
− U (ρ0 (x0 )) |∇− ρ0 |(x0 ) d(x0 , x1 ) π(dx0 dx1 ), (20.3)
X
where π is an optimal coupling of (µ0 , µ1 ) associated with the geodesic
path (µt )0≤t≤1 .
Remark 20.2. The technical assumption on the negative part of U (ρt )
being integrable is a standard way to make sure that Uν (µt ) is well-
defined, with values in R ∪ {+∞}. As for the assumption about U
being Lipschitz on ρ0 (X ), it means in practice that either U is twice
(right-)differentiable at the origin, or ρ0 is bounded away from 0.
Remark 20.3. Here is a more probabilistic reformulation of (20.3)
(which will also make more explicit the link between π and µt ): Let
γ be a random geodesic such that µt = law (γt ), then
% &
Uν (µt ) − Uν (µ0 )
lim inf ≥ − E U (ρ0 (γ0 )) |∇− ρ0 |(γ0 ) d(γ0 , γ1 ) .
t↓0 t
Time-derivative of the energy 527
Now let γ be a random geodesic, such that µt = law (γt ). Then the
above inequality can be rewritten
Uν (µt ) − Uν (µ0 ) ≥ E U (ρ0 (γt )) − E U (ρ0 (γ0 ))
= E U (ρ0 (γt )) − U (ρ0 (γ0 )) .
Since U is nondecreasing,
U (ρ0 (γt )) − U (ρ0 (γ0 )) ≥ U (ρ0 (γt )) − U (ρ0 (γ0 )) 1ρ0 (γ0 )>ρ0 (γt ) .
Multiplying and dividing by ρ0 (γt ) − ρ0 (γ0 ), and then by d(γ0 , γt ), one
arrives at
Uν (µt ) − Uν (µ0 ) ≥
U (ρ0 (γt )) − U (ρ0 (γ0 )) ρ0 (γt ) − ρ0 (γ0 )
E 1ρ0 (γ0 )>ρ0 (γt ) d(γ0 , γt ).
ρ0 (γt ) − ρ0 (γ0 ) d(γ0 , γt )
After division by t and use of the identity d(γ0 , γt ) = t d(γ0 , γ1 ), one
obtains in the end
1
Uν (µt )−Uν (µ0 ) ≥
t
U (ρ0 (γt ))−U (ρ0 (γ0 )) ρ0 (γt )−ρ0 (γ0 )
E 1ρ0 (γ0 )>ρ0 (γt ) d(γ0 , γ1 ).
ρ0 (γt )−ρ0 (γ0 ) d(γ0 , γt )
(20.5)
528 20 Infinitesimal displacement convexity
Similarly,
ρ0 (γt ) − ρ0 (γ0 )
lim inf 1ρ0 (γ0 )>ρ0 (γt ) ≥ − |∇− ρ0 |(γ0 ).
t→0 d(γ0 , γt )
So, if vt (γ) stands for the integrand in the right-hand side of (20.5),
one has
(note indeed that p (r) = r U (r)). Since π = (ρ0 ν) ⊗ δx1 =T (x0 ) with
T = exp ∇ψ, the right-hand side can be rewritten
U (ρ0 )∇ρ0 · ∇ψ dπ.
HWI inequalities
Recall from Chapters 16 and 17 that CD(K, N ) bounds imply convexity
properties of certain functionals Uν along displacement interpolation.
For instance, if a Riemannian manifold M , equipped with a reference
measure ν, satisfies CD(0, ∞), then by Theorem 17.15 the Boltzmann
H functional is displacement convex.
If (µt )0≤t≤1 is a geodesic in P2ac (M ), for t ∈ (0, 1] the convexity
inequality
Hν (µt ) ≤ (1 − t) Hν (µ0 ) + t Hν (µ1 )
may be rewritten as
Hν (µt ) − Hν (µ0 )
≤ Hν (µ1 ) − Hν (µ0 ).
t
Under suitable assumptions we may then apply Theorem 20.1 to pass
to the limit as t → 0, and get
|∇ρ0 (x0 )|
− d(x0 , x1 ) π(dx0 dx1 ) ≤ Hν (µ1 ) − Hν (µ0 ).
ρ0 (x0 )
This implies, by Cauchy–Schwarz inequality,
1 1
|∇ρ0 (x0 )|
Hν (µ0 )−Hν (µ1 ) ≤ d(x0 , x1 )2 π(dx0 dx1 ) π(dx0 dx1 )
ρ0 (x0 )2
1
|∇ρ0 |2
= W2 (µ0 , µ1 ) dν, (20.8)
ρ0
530 20 Infinitesimal displacement convexity
By explicit computation,
⎧ N −1
⎪ α
⎨ sin α > 1 if K > 0
β(x0 , x1 ) = 1 if K = 0 (20.9)
⎪
⎩ α N −1
sinh α < 1 if K < 0,
⎧
⎪
⎨−(N − 1) 1 − tan α < 0 if K > 0
α
β (x0 , x1 ) = 0 if K = 0 (20.10)
⎪
⎩ α
(N − 1) tanh α − 1 > 0 if K < 0,
where 7
|K|
α= d(x0 , x1 ).
N −1
Moreover, a standard Taylor expansion shows that, as α → 0 while K
is fixed (which means that either d(x0 , x1 ) → 0 or N → ∞), then
K K
β 1− d(x0 , x1 )2 , β − d(x0 , x1 )2 ,
6 3
whatever the sign of K.
The next definition is a generalization of the classical notion of
Fisher information:
HWI inequalities 531
(Strictly speaking this is true only if ρ > 0, but the integral in (20.11)
may be restricted to the set {ρ > 0}.) Also, in Definition 20.6 one can
replace |∇ρ| by |∇− ρ| and |∇p(ρ)| by |∇− p(ρ)| since a locally Lipschitz
function is differentiable almost everywhere.
ρ is not locally Lipschitz. In particular, Iν (µ) makes sense in [0, +∞] for
all probability measures µ (with the understanding that Iν (µ) = +∞
if µ is singular). I shall not develop this remark, and in the sequel shall
only consider densities which are locally (and even globally) Lipschitz.
Uν (µ0 ) − Uν (µ1 )
W2 (µ0 , µ1 )2
≤ U (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ) − K∞,U
2
0
W2 (µ0 , µ1 )2
≤ W2 (µ0 , µ1 ) IU,ν (µ0 ) − K∞,U , (20.14)
2
where K∞,U is defined in (17.10).
(iii) If N < ∞, K ≥ 0 and Uν (µ1 ) < +∞ then
Uν (µ0 ) − Uν (µ1 )
≤ U (ρ0 (x0 )) |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max ρ0 L∞ (ν) , ρ1 L∞ (ν) N
2
0
≤ W2 (µ0 , µ1 ) IU,ν (µ0 )
− 1 W2 (µ0 , µ1 )2
− KλN,U max ρ0 L∞ (ν) , ρ1 L∞ (ν) N ,
2
(20.15)
HWI inequalities 533
where
p(r)
λN,U = lim 1 . (20.16)
r→0 r1− N
Exercise 20.11. When U is well-behaved, give a more direct deriva-
tion of (20.14), via plain displacement convexity (rather than dis-
torted displacement convexity). The same for (20.15), with the help
of Exercise 17.23.
Proof of Theorem 20.10. First recall from the proof of Theorem 17.8
that U− (ρ0 ) is integrable; since ρ0 U (ρ0 ) ≥ U (ρ0 ), the integrability
of ρ0 U (ρ0 ) implies the integrability of U (ρ0 ). Moreover, if N = ∞
then U (r) ≥ a r log r − b r for some positive constants a, b (unless U is
linear). So
if N = ∞
ρ0 U (ρ0 ) ∈ L1 =⇒ U (ρ0 ) ∈ L1 =⇒ ρ0 log+ ρ0 ∈ L1 .
where C stands for various numeric constants. Then (20.19) also holds
true for K < 0.
Second term of (20.18): This is the same as for the first term except
that the inequalities are reversed. If K > 0 then U (ρ1 ) ≥ U (ρ1 /βt ) βt ↓
U (ρ1 /β) β, and to pass to the limit it suffices to check the integrability
of U+ (ρ1 ). If N < ∞ this follows from the Lipschitz continuity of U ,
while if N = ∞ this comes from the assumption ρ1 log+ ρ1 ∈ L1 (ν).
If K < 0 then U (ρ1 ) ≤ U (ρ1 /βt ) βt ↑ U (ρ1 /β) β, and now we can
conclude because U− (ρ1 ) is integrable by Theorem 17.8. In either case,
ρ1 (x1 )
U βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
βt (x0 , x1 )
ρ1 (x1 )
−−→ U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 ). (20.20)
t→0 β(x0 , x1 )
or equivalently
U βρ1−t
0
β1−t − U (ρ0 )
ρ0 1 − β1−t
≤p . (20.21)
t β1−t t
In particular,
ρ1 (x1 ) β(x0 , x1 )
U
β(x0 , x1 ) ρ1 (x1 )
1
≤ U (ρ1 (x1 ))
ρ1 (x1 )
ρ1 (x1 )
β(x0 , x1 ) 1 β(x0 , x1 )
+ p log − log
ρ1 (x1 )
β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
U (ρ1 (x1 )) β(x0 , x1 ) ρ1 (x1 ) K
= − p d(x0 , x1 )2
ρ1 (x1 ) ρ1 (x1 ) β(x0 , x1 ) 6
U (ρ1 (x1 )) K∞,U
≤ − d(x0 , x1 )2 .
ρ1 (x1 ) 6
Thus
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )
β(x0 , x1 )
K∞,U
≤ U (ρ1 (x1 )) ν(dx1 ) − ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6
K∞,U
= U (ρ1 ) dν − W2 (µ0 , µ1 )2 . (20.28)
6
540 20 Infinitesimal displacement convexity
This combined with Corollary 19.5 will lead from (20.12) to (20.15).
So let us prove (20.30). By convexity of s −→ sN U (s−N ),
ρ1 (x1 ) β(x0 , x1 ) U (ρ1 (x1 ))
U ≤
β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 )
1− 1 : 1 ;
β(x0 , x1 ) N ρ1 (x1 ) β(x0 , x1 ) N 1
+N p − 1 ,
ρ1 (x1 ) β(x0 , x1 ) ρ1 (x1 ) ρ1 (x1 ) N
which is the same as
ρ1 (x1 )
U β(x0 , x1 ) ≤ U (ρ1 (x1 ))
β(x0 , x1 )
ρ1 (x1 ) 1 1
+Np β(x0 , x1 )1− N β(x0 , x1 ) N − 1 .
β(x0 , x1 )
As a consequence,
ρ1 (x1 )
U β(x0 , x1 ) π(dx0 |x1 ) ν(dx1 )≤Uν (µ1 )
β(x0 , x1 )
ρ1 (x1 ) 1 1
+N p β(x0 , x1 )1− N β(x0 , x1 ) N −1 π(dx0 |x1 ) ν(dx1 ).
β(x0 , x1 )
(20.31)
HWI inequalities 541
Since
K ≥ 0, β(x0 , x1 ) coincides with (α/ sin α)N −1 , where α =
K/(N − 1) d(x0 , x1 ). By the elementary inequality
α NN−1 N −1
0 ≤ α ≤ π =⇒ N −1 ≥ α2 (20.32)
sin α 6
(see the bibliographical notes for details), the right-hand side of (20.31)
is bounded above by
K ρ1 (x1 ) 1
Uν (µ1 ) − p β(x0 , x1 )1− N π(dx0 |x1 ) ν(dx1 )
6 β(x0 , x1 )
% &
K p(r) 1
≤ Uν (µ1 ) − inf 1 ρ1 (x1 )1− N d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r>0 r1− N
≤ Uν (µ1 )
K p(r) −N 1
− lim (sup ρ1 ) ρ1 (x1 ) d(x0 , x1 )2 π(dx0 |x1 ) ν(dx1 )
6 r→0 r1− N1
K
λ (sup ρ1 )− N W2 (µ0 , µ1 )2 ,
1
= Uν (µ1 ) − (20.33)
6
where λ = λN,U .
On the other hand, since β (x0 , x1 ) = −(N − 1)(1 − (α/ tan α)) < 0,
we can use the elementary inequality
α α2
0 < α ≤ π =⇒ (N − 1) 1 − ≥ (N − 1) (20.34)
tan α 3
(see the bibliographical notes again) to deduce
p(ρ0 (x0 )) β (x0 , x1 ) π(dx1 |x0 ) ν(dx0 ) (20.35)
p(r)
ρ0 (x0 )1− N β (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
1
≤ inf 1
r>0 r 1− N
p(r) −N
ρ0 (x0 ) β (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
1
≤ lim 1 (sup ρ0 )
r→0 r 1− N
p(r) −N1 Kd(x0 , x1 )2
≤ lim 1 (sup ρ0 ) ρ0 (x0 ) π(dx1 |x0 ) ν(dx0 )
r→0 r 1− N 3
1
Kλ (sup ρ0 )− N
= d(x0 , x1 )2 π(dx0 dx1 )
3
Kλ (sup ρ0 )− N
1
= W2 (µ0 , µ1 )2 . (20.36)
3
542 20 Infinitesimal displacement convexity
Bibliographical notes
Isoperimetric-type inequalities
Rn , one has
Iγ
Hγ ≤ , (21.5)
2
independently of the dimension. This is the Stam–Gross logarithmic
Sobolev inequality. By scaling, for any K > 0 the measure γK (dx) =
(2π/K)−n/2 e−K|x| /2 dx satisfies a logarithmic Sobolev inequality with
2
constant K.
Remark 21.4. More generally, if V ∈ C 2 (Rn ) and ∇2 V ≥ KIn ,
Theorem 21.2 shows that ν(dx) = e−V (x) dx satisfies a logarithmic
Sobolev inequality with constant K. When V (x) = K|x|2 /2 the con-
stant K is optimal in (21.4).
Remark 21.5. The curvature assumption CD(K, ∞) is quite restrictive;
however, there are known perturbation theorems which immediately
extend the range of application of Theorem 21.2. For instance, if ν satisfies
a logarithmic Sobolev inequality, v is a bounded function and ν = e−v ν/Z
is another probability measure obtained from ν by multiplication by e−v ,
then also ν satisfies a logarithmic Sobolev inequality (Holley–Stroock
perturbation
α|∇v|2 theorem). The same is true if v is unbounded, but satisfies
e dν < ∞ for α large enough.
Proof of Theorem 21.2. By Theorem 18.12, ν admits square-exponential
moments, in particular it lies in P2 (M ). Then from Corollary 20.13(ii)
and the inequality ab ≤ Ka2 /2 + b2 /(2K),
KW2 (µ, ν)2 Iν (µ)
Hν (µ) ≤ W2 (µ, ν) Iν (µ) − ≤ .
2 2K
Open Problem 21.6. For Riemannian manifolds satisfying
CD(K, N ) with N < ∞, the optimal constant in the logarithmic Sobolev
inequality is not K but KN/(N − 1). Can this be proven by a transport
argument?
548 21 Isoperimetric-type inequalities
Sobolev inequalities
Sobolev inequalities are one among several classes of functional inequal-
ities with isoperimetric content; they are extremely popular in the the-
ory of partial differential equations. They look like logarithmic Sobolev
inequalities, but with powers instead of logarithms, and they take di-
mension into account explicitly.
The most basic Sobolev inequality is in Euclidean space: If u is a
function on Rn such that ∇u ∈ Lp (Rn ) (1 ≤ p < n) and u vanishes
at infinity (in whatever sense, see e.g. Remark 21.13 below), then u
automatically lies in Lp (Rn ) where p = (np)/(n − p) > p. More
quantitatively, there is a constant S = S(n, p) such that
uLp (Rn ) ≤ S ∇uLp (Rn ) .
There are other versions for p = n (in which case essentially exp(c un )
is integrable, n = n/(n − 1)), and p > n (in which case u is Hölder-
continuous). There are also very many variants for a function u defined
Sobolev inequalities 549
(n − 1) p
uLp (Ω) ≤ A ∇uLp (Ω) + C uLp
(∂Ω) , p = ,
n−p
uLp (Ω) ≤ A ∇uLp (Ω) + B uLq (Ω) , q ≥ 1,
etc. One can also quote the Gagliardo–Nirenberg interpolation
inequalities, which typically take the form
uLp ≤ G ∇u1−θ
Lp uLq ,
θ
1 ≤ p < n, 1 ≤ q < p , 0 ≤ θ ≤ 1,
with some restrictions on the exponents. I will not say more about
Sobolev-type inequalities, but there are entire books devoted to them.
In a Riemannian setting, there is a famous family of Sobolev inequal-
ities obtained from the curvature-dimension bound CD(K, N ) with
K > 0 and 2 < N < ∞:
2N
1≤q≤ =⇒
N −2
: 2 ;
c q NK
|u| dν
q
− |u| dν ≤ |∇u|2 dν,
2
c= .
q−2 N −1
(21.7)
The way in which I have written inequality (21.9) might look strange,
but it has the merit of showing very clearly how the limitN → ∞ leads
to the logarithmic Sobolev inequality H∞,ν (µ) ≤ (1/2K) (|∇ρ|2 /ρ) dν.
I don’t know whether (21.9), or more generally (21.7), can be ob-
tained by transport. Instead, I shall derive related inequalities, whose
relation to (21.9) is still unclear.
where
Θ(N,K) (r, g) =
7 α 1− 1
N −1 g N −1 N
r sup 1 α + N 1 −
0≤α≤π N 1+
r N K sin α
α 1
+ (N − 1) − 1 r− N . (21.11)
tan α
As a consequence,
2
ρ− N
2
1 |∇ρ|2 N −1
HN,ν (µ) ≤ dν. (21.12)
+ 23 ρ− N
1
2K M ρ N 1
3
and the infimum is taken over all functions g ∈ L1 (Rn ), not identi-
cally 0.
552 21 Isoperimetric-type inequalities
(λ)
After writing (21.14) for µ0 = µ0 and then passing to the limit as
λ → ∞, one obtains
1 1
1− 1 −p(1+ n
1
) p
p
p
n ρ1 n dν ≤ ρ0 |∇ρ0 |p dµ0 |y| dµ1 (y) .
Rn
(21.15)
Let us change unknowns and define ρ0 = u1/p , ρ1 = g; inequal-
ity (21.15) then becomes
⎛ 1 ⎞
p
p
⎜ |y| g(y) dy ⎟
p (n − 1) ⎜ ⎟
1≤ ⎜ ⎟ ∇uLp ,
n (n − p) ⎝ 1 ⎠
g (1− n )
where u and g are only required to satisfy up = 1, g = 1. The
inequality (21.13) follows by homogeneity again.
To conclude this section, I shall assume RicN,ν ≥ K < 0 and derive
Sobolev inequalities for compactly supported functions. Since I shall
not be concerned here with optimal constants, I shall only discuss the
limit case p = 1, p = n/(n − 1), which implies the general inequality
for p < n (via Hölder’s inequality), up to a loss in the constants.
Theorem 21.15 (CD (K , N ) implies L1 -Sobolev inequalities).
Let M be a Riemannian manifold equipped with a reference measure ν =
e−V vol, satisfying a curvature-dimension bound CD(K, N ) for some
K < 0, N ∈ (1, ∞). Then, for any ball B = B(z, R), R ≥ 1, there are
constants A and B, only depending on a lower bound on K, and upper
bounds on N and R, such that for any Lipschitz function u supported
in B,
u N ≤ A ∇uL1 + B uL1 . (21.16)
L N −1
1
ρ0 (x0 )−1− N |∇ρ0 (x0 )| d(x0 , x1 ) π(dx0 dx1 ). (21.17)
1
+
N
Choose ρ1 = 1B(z,R) /ν[B(z, R)] (the normalized indicator function of
the ball). The arguments of β and β in (21.17) belong to B(z, R), so
the coefficients β and β remain bounded by some explicit function of
N , K and R, while the distance d(x0 , x1 ) remains bounded by 2R. So
there are constants δ(K, N, R) > 0 and C(K, N, R) such that
1− 1 1
− ρ0 N dν ≤ − δ(K, N, R) ν[B] N
% &
1− N1
−N1
+ C(K, N, R) ρ0 + ρ0 |∇ρ0 | . (21.18)
Recall that ν[B] = 1; then after the change of unknowns ρ0 = uN/(N −1) ,
inequality (21.18) implies
1 ≤ S K, N, R ∇uL1 (M ) + uL1 (M ) ,
Isoperimetric inequalities
Poincaré inequalities
this section I shall only consider global Poincaré inequalities, which are
rather different from the local inequalities considered in Chapter 19.
Definition 21.17 (Poincaré inequality). Let M be a Riemannian
manifold, and ν a probability measure on M . It is said that ν satisfies
a Poincaré inequality with constant λ if, for any u ∈ L2 (µ) with u
Lipschitz, one has
/ /
/u − u/2 2 ≤ 1 ∇u2 2 , u = u dν. (21.19)
L (ν) L (ν)
λ
Remark 21.18. Throughout Part II, I shall always assume that ν is
absolutely continuous with respect to the volume measure. This implies
that Lipschitz functions are ν-almost everywhere differentiable.
Inequality (21.19) can be reformulated into
% &
∇u2L2
u dν = 0 =⇒ u2L2 ≤ .
λ
This writing makes the formal connection with the logarithmic Sobolev
inequality very natural. (The Poincaré inequality is obtained as the
limit of the logarithmic Sobolev inequality when one sets µ = (1+εu) ν
and lets ε → 0.)
Like Sobolev inequalities, Poincaré inequalities express the domi-
nation of a function by its gradient; but unlike Sobolev inequalities,
they do not include any gain of integrability. Poincaré inequalities have
spectral content, since the best constant λ can be interpreted as the
spectral gap for the Laplace operator on M .1 There is no Poincaré in-
equality on Rn equipped with the Lebesgue measure (the usual “flat”
Laplace operator does not have a spectral gap), but there is a Poincaré
inequality on, say, any compact Riemannian manifold equipped with
its volume measure.
Poincaré inequalities are implied by logarithmic Sobolev inequali-
ties, but the converse is false.
Example 21.19 (Exponential measure). Let ν(dx) = e−|x| dx be
the exponential measure on [0, +∞). Then ν satisfies a Poincaré in-
equality (with constant 1). On the other hand, it does not satisfy any
1
This is one reason to take λ (universally accepted notation for spectral gap) as
the constant defining the Poincaré inequality. Unfortunately this is not consistent
with the convention that I used for local Poincaré inequalities; another choice
would have been to call λ−1 the Poincaré constant.
Poincaré inequalities 557
logarithmic Sobolev inequality. The same conclusions hold true for the
double-sided exponential measure ν(dx) = e−|x| dx/2 on R. More gener-
ally, the measure νβ (dx) = e−|x| dx/Zβ satisfies a Poincaré inequality
β
In other words, if CD(K, N ) holds true, then for any Lipschitz func-
tion f on M with f dν = 0, one has
% &
N −1
f dν = 0 =⇒ f dν ≤
2
|∇f |2 dν. (21.20)
KN
Remark 21.21. Let L = ∆−∇V ·∇, then (21.20) means that L admits
a spectral gap of size at least KN/(N − 1):
KN
λ1 (−L) ≥ .
N −1
Proof of Theorem 21.20. In the case N < ∞, apply (21.12) with
µ = (1 + εf ) ν, where ε is a small positive number, f is Lipschitz
and f dν = 0. Since M has a finite diameter, f is bounded, so µ is
a probability measure for ε small enough. Then, by standard Taylor
expansion of the logarithm function,
2
2 N −1 f
HN,ν (µ) = ε f dν + ε dν + o(ε2 ),
N 2
and the first term on the right-hand side vanishes by assumption. Sim-
ilarly,
ρ− N
2
|∇ρ|2
=ε 2
|∇f |2 dν + o(ε2 ).
ρ 1 2 −N1
3+ ρ3
So (21.12) implies
558 21 Isoperimetric-type inequalities
2
N −1 f2 1 N −1
dν ≤ |∇f |2 dν,
N 2 2K N
Bibliographical notes
|∇Φ(X)|2 |∇Φ(X)|2
Φ(X) ≤ X · ∇Φ(X) + = λ Φ(X) + ,
2 2
so if λ < 1 we obtain Φ(X) ≤ |∇Φ(X)|2 /(2(1 − λ)).
Another inequality for which it would be interesting to have a
transport proof is the Hardy–Littlewood–Sobolev inequality. Recently
Calvez and Carrillo [198] obtained such a proof in dimension 1; further
investigation is under way.
After [248], Maggi and I pushed the method even further [587], to
recover “very optimal” Sobolev inequalities with trace terms, in Rn .
This settled some problems that had been left open in a classical work
by Brézis and Lieb [173]. Much more information can be found in [587];
recently we also wrote a sequel [588] in which limit cases (such as
inequalities of Moser–Trudinger type) are considered.
As far as all these applications of transport to Sobolev or isoperi-
metric inequalities in Rn are concerned, the Knothe coupling works
just about as fine as the optimal coupling. (As a matter of fact, many
such inequalities are studied in [134, 135] via the Knothe coupling.)
But the choice of coupling method becomes important for refinements
such as the characterization of minimizers [248], or even more estab-
lishing quantitative stability estimates around minimizers. This point
is discussed in detail by Figalli, Maggi and Pratelli [369] who use the
optimal coupling to refine Gromov’s proof of isoperimetry, to the point
of obtaining a beautiful (and sharp) quantitative isoperimetric theo-
rem, giving a lower bound on how much the isoperimetric ratio of a set
departs from the optimal ratio, in terms of how much the shape of this
set departs from the optimal shape.
Interestingly enough, the transport method in all of these works
is insensitive to the choice of norm in Rn . (I shall come back to this
observation in the concluding chapter.) Lutwak, Yang and Zhang [582]
developed this remark and noticed that if a function f is given, then the
problem of minimizing ∇f L1 over all norms on Rn can be related to
Minkowski’s problem of prescribed Gauss curvature. The isoperimetric
problem for a non-Euclidean norm, also known as the Wulff problem,
is not an academic issue, since it is used in the modeling of surface
energy (see the references in [369]).
564 21 Isoperimetric-type inequalities
Concentration inequalities
N 1
p
dp (x1 , . . . , xN ); (y1 , . . . , yN ) = d(xi , yi )p .
i=1
⎪
⎪ % &
⎪
⎪ 1 Hν (µ)
⎪
⎩∀ϕ ∈ Cb (X ), log λϕ
e dν = sup ϕ dµ − .
λ X µ∈P (X ) λ
(22.7)
(See the bibliographical notes for proofs of these identities.)
Let us first treat the case p = 2. Apply Theorem 5.26 with
c(x, y) = d(x, y)2 /2, F (µ) = (1/λ)Hν (µ), Λ(ϕ) = (1/λ) log eλϕ dν .
The conclusion is that ν satisfies T2 (λ) if and only if
∀φ ∈ Cb (X ), log exp λ φ dν − λφ dν ≤ 0, c
i.e.
e−λφ dν ≤ e−λ
c φ dν
,
where φc (x) := supy φ(y) − d(x, y)2 /2). Upon changing φ for ϕ = −φ,
this is the desired result. Note that the Particular Case 22.4 is obtained
from (22.5) by choosing p = 1 and performing the change of variables
t → λt.
The case p < 2 is similar, except that now we appeal to the equiva-
lence between (i’) and (ii’) in Theorem 5.26, and choose
2
d(x, y)p p p p2 ∗ 1 1 2
c(x, y) = ; Φ(r) = r 1r≥0 ; Φ (t) = − t 2−p .
p 2 p 2
coupling between µ3 (dx3 |x1 , x2 ) and ν(dy3 ), call it π3 (dx3 dy3 |x1 , x2 );
etc. In the end, glue these plans together to get a coupling
π(dx dy) = π1 (dx1 dy1 ) π2 (dx2 dy2 |x1 ) . . . πN (dxN dyN |xN −1 ).
N
E π dp (x, y)p = E π d(xi , yi )p
i=1
N
= E π( · |xi−1 ) d(xi , yi )p π i−1 (dxi−1 dy i−1 )
i=1
N
= E π( · |xi−1 ) d(xi , yi )p µi−1 (dxi−1 ), (22.8)
i=1
where of course
For each i and each xi−1 = (x1 , . . . , xi−1 ), the measure π( · |xi−1 ) is
an optimal transference plan between its marginals. So the right-hand
side of (22.8) can be rewritten as
N
p
Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ).
i=1
Since this cost is achieved for the transference plan π, we obtain the
key estimate
N
p
Wp (µ, ν ⊗N )p ≤ Wp µi ( · |xi−1 ), ν µi−1 (dxi−1 ). (22.9)
i=1
Optimal transport and concentration 573
2
p
2 i−1
Hν µi ( · |x )
i−1
µ (dxi−1 ). (22.10)
λ
i
p/2
Since p ≤ 2, we can apply Hölder’s inequality, in the form ai ≤
i≤N
N 1−p/2 ( ai )p/2 , and bound (22.10) by
p :
N
; p2
1− p2 2 2
N Hν µi ( · |xi−1 ) µi−1 (dxi−1 ) . (22.11)
λ
i=1
∀µ ∈ P (X ), C(µ, ν) ≤ Hν (µ)
implies
∀µ ∈ P (X N ), C N (µ, ν) ≤ Hν ⊗N (µ),
where C N is the optimal transport cost associated with the cost func-
tion
cN (x, y) = c(xi , yi )
on X N .
The following important lemma was used in the course of the proof
of Proposition 22.5.
574 22 Concentration inequalities
(iii) There is a constant C > 0 such that for any Borel set A ⊂ X ,
1
ν[Ar ] ≥ 1 − e−C r .
2
ν[A] ≥ =⇒ ∀r > 0,
2
(iv) There is a constant C > 0 such that
so (ii) implies
t2
et f (x) ν(dx) ≤ e 2λ et f dν .
With the shorthand f = f dν, this is the same as
t2
et (f −f ) dν ≤ e 2λ .
0
This proves (vii).
578 22 Concentration inequalities
1
N
F (x) = f (xi ).
N
i=1
If f is Lipschitz then F Lip = f Lip /N . Moreover, F dν ⊗N =
f dν. So if we apply (iv) with X replaced by X and f replaced
N
by F , we obtain
1
N
⊗N
ν x∈X ;
N
f (xi ) ≥ f dν + ε
N
i=1
⊗N
=ν x ∈ X ; F (x) ≥ F dν + ε
N
ε2
≤ exp − (C/N )
(f Lip /N )2
N ε2
= exp − C ,
f 2Lip
where C = λ/2 (cf. the remark at the end of the proof of (i) ⇒ (iv)).
The implication (v) ⇒ (iv) is trivial.
Let us now consider the implication (i) ⇒ (iii). Assume that
∀µ ∈ P1 (X ), W1 (µ, ν) ≤ C Hν (µ). (22.14)
therefore % 2 &
r
ν[Ar ] ≥ 1 − exp − − log 2 .
C
This establishes a bound of the type ν[Ar ] ≥ 1 − ae−C r , and (iii)
2
follows. (To get rid of the constant a, it suffices to note that ν[Ar ] ≥
2
max (1/2, 1 − ae−cr ) ≥ 1 − e−c r for c well-chosen.)
2
To show (vi) ⇒ (vii), let A be a compact set such that ν[A] ≥ 1/2;
also, let x0 ∈ A, and let R be the diameter of A. Further, let f (x) =
d(x, A); then f is a 1-Lipschitz function admitting 0 for median. So (vi)
implies
ν d(x, x0 ) ≥ R + r ≤ ν[d(x, A) ≥ r] ≤ e−C r .
2
√ 1/2
/ / 2
/d(x0 , · ) (µ − ν)/ ≤ 1 + log ea d(x0 ,x)2
dν(x) Hν (µ).
TV a X
(22.16)
µ = (1 + u) ν;
note that u ≥ −1 and u dν = 0. We also define
so that
Hν (µ) = h(u) dν. (22.17)
X
1−t
× |u(x)| dν(x) dt ;
2
1 + tu(x)
thus 2
ϕ |u| dν ≤ CHν (µ), (22.18)
where
(1 − t) (1 + tu) ϕ2 dν dt
C := 1 2 · (22.19)
(1 − t) dt
0
The numerator can be rewritten as follows:
(1 − t) (1 + tu) ϕ2 dν dt
= (1 − t)t dt (1 + u) ϕ2 dν + (1 − t)2 dt ϕ2 dν
1 1
= ϕ2 dµ + ϕ2 dν. (22.20)
6 3
From the Legendre representation of the H functional,
2
ϕ dµ ≤ Hν (µ) + log eϕ dν,
2
(22.21)
≤ (H + 2L) 2 (22.24)
where 1 7
8 √ √
m= 1+L+ (L − 1)2 + L ≤ 2 L + 1.
3
This concludes the proof.
Remark 22.19. Part (ii) of Theorem 22.17 shows that the T2 inequal-
ity on a Riemannian manifold contains spectral information, and im-
poses qualitative restrictions on measures satisfying T2 . For instance,
the support of such a measure needs to be connected. (Otherwise take
u = a on one connected component, u = b on another, u = 0elsewhere,
where a and b are two constants
2 chosen in such a way that u dν = 0.
Then |∇u| dν = 0, while u dν > 0.) This remark shows that T2
2
and
φ(t) = Ht g dν + O(t). (22.33)
M
By Proposition 22.16(iv), Ht g converges pointwise to g as t → 0+ ; then
by the dominated convergence theorem,
lim φ(t) = g dν. (22.34)
t→0+ M
So it all amounts to showing that φ(1) ≤ limt→0+ φ(t), and this will
obviously be true if φ(t) is nonincreasing in t. To prove this, we shall
compute the time-derivative φ (t). We shall go slowly, so the hasty
reader may go directly to the result, which is formula (22.41) below.
Let t ∈ (0, 1] be given. For s > 0, we have
φ(t + s) − φ(t) 1 1 1
= − log eK(t+s)Ht+s g dν
s s K(t + s) Kt M
1
+ log eK(t+s)Ht+s g
dν − log e KtHt g
dν .
Kts M M
(22.35)
On the other hand, the second term in the right-hand side of (22.35)
converges to
% &
1 1
lim eK(t+s)Ht+s g dν − eKt Ht g dν ,
s→0+ s
Kt eKt Ht g dν M M
(22.37)
provided that the latter limit exists.
To evaluate the limit in (22.37), we decompose the expression inside
square brackets into
K(t+s)Ht+s g Kt Ht+s g
e − eKt Ht+s g e − eKt Ht g
dν + dν.
M s M s
(22.38)
The integrand of the first term in the above formula can be rewritten as
(eKt Ht+s g )(eKs Ht+s g − 1)/s, which is uniformly bounded and converges
pointwise to (e Kt Ht g )Kt H g as s → 0+ . So the first integral in (22.38)
t
converges to M (K Ht g)eKt Ht g dν.
Let us turn to the second term of (22.38). By Proposition 22.16(vii),
for each x ∈ M ,
−
|∇ Ht g(x)|2
Ht+s g(x) = Ht g(x) − s + o(1) ,
2
and therefore
Ht+s g = Ht g + O(s).
eKtHt+s g − eKt Ht g
= O(1) as s → 0+ . (22.40)
s
588 22 Concentration inequalities
hL2 (ν) / /
hH −1 (ν) = sup = /∇(L−1 h)/L2 (ν) ,
h=0 ∇hL2 (ν)
K 2 t2
eKtHt h = 1 + KtHt h + (Ht h)2 + O(t3 ) (22.43)
2
K 2 t2 2
= 1 + KtHt h + h + o(t2 ).
2
So the right-hand side of (22.42) equals
h − Ht h K
lim sup dν − h2 dν.
t→0+ M t 2 M
To close this section, I will show that the Talagrand inequality does
imply a logarithmic Sobolev inequality under strong enough curvature
assumptions.
1
ν ⊗N [Ar ] ≥ 1 − e−Cr ;
2
ν[A] ≥ =⇒ ∀r > 0,
2
here the enlargement Ar is defined with the d2 distance,
D
EN
E
d2 (x, y) = F d(xi , yi )2 .
i=1
(iii) There is a constant C > 0 such that for any N ∈ N and any
f ∈ Lip(X N , d2 ) (resp. Lip(X N , d2 ) ∩ L1 (ν ⊗N )),
−C r2
⊗N f 2
ν x ∈ X ; f (x) ≥ m + r
N
≤e Lip ,
imply T2 (K).
1
N
N
µx = δxi ∈ P (X ),
N
i=1
and let N
x , ν .
fN (x) = W2 µ
By the triangle inequality for W2 and Theorem 4.8, for any x, y ∈ X N
one has
592 22 Concentration inequalities
1
N
1 2
N
|fN (x) − fN (y)|2 ≤ W2 δ xi , δ yi
N N
i=1 i=1
1
N
1
N
≤ 2
W2 (δxi , δyi ) = d(xi , yi )2 ;
N N
i=1 i=1
√
so fN is (1/ N )-Lipschitz in distance d2 . By (iii),
ν ⊗N fN ≥ mN + r ≤ e−C N r ,
2
∀r > 0, (22.45)
In a compact notation, cq (x, y) = min(d(x, y)2 , d(x, y)). The optimal
total cost associated with cq will be denoted by Cq .
Iν (µ)
|∇ log ρ| ≤ c =⇒ Hν (µ) ≤ , µ = ρ ν; (22.48)
K
(iii) ν ∈ P1 (M ) and there is a constant C > 0 such that
Remark 22.26. The equivalence between (i) and (ii) remains true
when the Riemannian manifold M is replaced by a general metric space.
On the other hand, the equivalence with (iii) uses at least a little bit
of the Riemannian structure (say, a local Poincaré inequality, a local
doubling property and a length property).
Remark 22.27. The equivalence between (i), (ii) and (iii) can be made
more precise. As the proof will show, if√ν satisfies a Poincaré inequality
with constant λ, then for any c < 2 λ there is an explicit constant
K = K(c) > 0 such that (22.48) holds true; and the K(c) converges to
λ as c → 0. Conversely, if for each c > 0 we call K(c) the best constant
in (22.48), then ν satisfies a Poincaré inequality with constant λ =
limc→0 K(c). Also, in (ii) ⇒ (iii) one can choose C = max (4/K, 2/c),
while in (iii) ⇒ (i) the Poincaré constant can be taken equal to C −1 .
(i) Further, assume that L∗ (ts) ≤ t2 L∗ (s) for all t ∈ [0, 1], s ≥ 0.
If there is λ ∈ (0, 1] such that ν satisfies the generalized logarithmic
Sobolev inequality with constant λ:
For any µ = ρ ν ∈ P (M ) such that log ρ ∈ Lip(M ),
1
Hν (µ) ≤ L∗ |∇ log ρ| dµ; (22.50)
λ
then ν also satisfies the following transport inequality:
Hν (µ)
∀µ ∈ Pp (M ), CL (µ, ν) ≤ . (22.51)
λ
(ii) If ν satisfies (22.51), then it also satisfies the generalized
Poincaré inequality with constant λ:
Proof of Theorem
22.25. We shall start with the proof of (i) ⇒ (ii). Let
f = log ρ − (log ρ) dν; so f dν = 0 and the assumption
f in (ii) reads
|∇f | ≤ c. Moreover, with a = (log ρ) dν and X = e dν,
Iν (µ) = ea |∇f |2 ef dν;
Hν (µ) = (f + a)e dν −f +a
e dν log e dν f +a f +a
=e a
f e dν − e dν + 1 − ea (X log X − X + 1)
f f
≤ ea f ef dν − ef dν + 1 .
So it is sufficient to prove
f 1
|∇f | ≤ c =⇒ f e − e + 1 dν ≤
f
|∇f |2 ef dν. (22.53)
K
√
In the sequel, c is any constant satisfying 0 < c < 2 λ. Inequal-
ity (22.53) will be proven by two auxiliary inequalities:
Poincaré inequalities and quadratic-linear transport cost 597
√
f 2 dν ≤ ec 5/λ
f 2 e−|f | dν; (22.54)
√ 2
1 2 λ+c
f 2 ef dν ≤ √ |∇f |2 ef dν. (22.55)
λ 2 λ−c
Note that the upper bound on |∇f | is crucial in both inequalities.
Once (22.54) and (22.55) are established, the result is immedi-
ately obtained. Indeed, the right-hand side of (22.54) is obviously
bounded by the left-hand side of (22.55), so both expressions are
bounded above by a constant multiple of |∇f |2 ef dν. On the other
hand, an elementary study shows that
∀f ∈ R, f ef − ef + 1 ≤ max (f 2 , f 2 ef ),
(22.56)
By the Poincaré
2 inequality, f 2 dν ≤ (1/λ) |∇f |2 dν ≤ c2 /λ, which
implies ( f dν) ≤ (c2 /λ) f 2 dν. Also by the Poincaré inequality,
2
2
(f ) dν −
2 2 2
f dν ≤ (1/λ) |∇(f 2 )|2 dν
= (4/λ) f |∇f | dν ≤ (4c /λ) f 2 dν.
2 2 2
in other words
|f |3 dν
f dν ≤ exp
2 f 2 e−|f | dν.
f 2 dν
Combining this inequality with (22.57) finishes the proof of (22.54).
To establish (22.55), we first use the condition f dν = 0 and the
Poincaré inequality to write
2
f /2
f e dν
2
1
= [f (x) − f (y)] [ef (x)/2
−ef (y)/2
] dν(x) dν(y)
4
1
≤ |f (x) − f (y)| dν(x) dν(y)
2
[e f (x)/2
−ef (y)/2 2
] dν(x) dν(y)
4
2 2
= f 2 dν − f dν ef dν − ef /2 dν
1
≤ 2 |∇f | dν
2
|∇(e )| dν
f /2 2
λ
c2
≤ 2 |∇f |2 ef dν. (22.58)
4λ
Next, also by the Poincaré inequality and the chain-rule,
2
f e dν −
2 f f /2
f e dν
1 2
≤ ∇(f ef /2 ) dν
λ
1 f 2 f
= |∇f | 1 +
2
e dν
λ 2
1 1
= |∇f | e dν + |∇f | f e dν +
2 f 2 f
|∇f | f e dν
2 2 f
λ 4
1 1
1 c2
≤ |∇f | e dν + c
2 f
|∇f |2 ef dν f 2 ef dν + 2 f
f e dν .
λ 4
(22.59)
Poincaré inequalities and quadratic-linear transport cost 599
Since L∗ is quadratic on [0, 1], factors ε2 cancel out on both sides, and
we are back with the usual Poincaré inequality.
e−C min(r, r 2 )
∀r ≥ 0, ν[A ] ≥ 1 −
r
. (22.61)
ν[A]
then µ⊗N will satisfy a Poincaré inequality with the same constant
as ν, and we may apply Theorem 22.30 to study concentration in
(M N , d2 , ν ⊗N ).
However, there is a more interesting approach, due to Talagrand, in
which one uses both the distance d2 and the distance
d1 (x, y) = d(xi , yi ).
i
1
ν ⊗N [A] ≥ ν ⊗N (Ar;d2 )r ;d1 ≥ 1 − e−C r .
2 2
=⇒ (22.63)
2
ν ⊗N
Lip(M N ,d1 )
x; f (x) ≥ m + r ≤e Lip(M N ,d2 ),
(22.64)
1 1
C(νB , ν ⊗N ) ≥ inf c(x, y) =: c(A, B).
2 x∈A, y∈B 2
1
C(µ, ν ⊗N ) ≤ C Hν ⊗N (µ) = C log .
ν[B]
c(A, B) ≥ r2 .
Poincaré inequalities and quadratic-linear transport cost 603
so z ∈ Ar;d2 . Similarly,
d1 (z, y) = d(xi , yi ) ≤ cq (xi , yi ) = c(x, y) < r2 ;
d(xi ,yi )>1 i
any Borel set in RN , and let y ∈ TN−1 (A) + Brd2 + Brd21 . This means that
there are w and x such that TN (w) ∈ A, |x − w|2 ≤ r, |y − x|1 ≤ r2 .
Then by (22.68),
|TN (w) − TN (y)|22 = |T (wi ) − T (yi )|2
≤ C2 min |wi − yi |, |wi − yi |2
i
≤ C2 2|wi − xi | + 4|xi − yi |2
|wi −xi |≥|xi −yi | |wi −xi |<|xi −yi |
≤ 4C 2
|xi − wi | + |xi − yi |2
≤ 8C 2 r2 ;
d2 =
√
so TN (y) ∈ A + B√ 8Cr
. In summary, if C 8C, then
TN TN−1 (A) + Brd2 + Brd21 ⊂ A + BC d2
r .
for some numeric constant c > 0. This is precisely the Gaussian concen-
tration property as it appears in Theorem 22.10(iii) — in a dimension-
free form.
Dimension-dependent inequalities
1
where Q = (α/ sin α)1− N . But this is immediate because Q is a sym-
metric function of x0 and x1 , and π has marginals µ = ρ ν and ν, so
Q(x0 , x1 ) dν(x0 ) = Q(x0 , x1 ) dν(x1 ) = Q(x0 , x1 ) dπ(x0 , x1 )
= Q(x0 , x1 ) ρ(x0 ) dν(x0 ).
1
1− N
Proof of Corollary 22.39. Write again Q = (α/ sin α) . The classical
Young inequality can be written ab ≤ a /p+b /p , where p = p/(p−1)
p p
≤ (N ρ1− N e−Q ) eQ Q − 2eQ + eρ N ;
1 1
Exercise 22.41. Use the inequalities proven in this section, and the
result of Exercise 22.20, to recover, at least formally, the inequality
% &
KN
h dν = 0 =⇒ h2H −1 (ν) ≤ h2L2 (ν)
N −1
Remark 22.42. If one applies the same procedure to (22.71), one re-
covers a constant K(N p)/(N p − 1), which reduces to the correct con-
stant only in the limit p → 1. As for inequality (22.73), it leads to just
K (which would be the limit p → ∞).
Note that inequality (22.73) does not solve this problem, since by
Remark 22.42 it only implies the Poincaré inequality with constant K.
I shall conclude with a very loosely formulated open problem, which
might be nonsense:
Recap
In the end, the main results of this chapter can be summarized by just
a few diagrams:
• Relations between functional inequalities: By combining
Theorems 21.2, 22.17, 22.10 and elementary inequalities, one has
(ii) For any x ∈ M , inf f ≤ (Ht f )(x) ≤ f (x); moreover, for any
t > 0 the infimum over M in (22.74) can be restricted to the closed ball
B[x, R(f, t)], where
−1 sup f − inf f
R(f, t) = t L .
t
Proof of Theorem 22.46. First, note that the inverse L−1 of L is well-
defined R+ → R+ since L is strictly increasing and goes to +∞ at
infinity. Also L (∞) = limr→∞ (L(r)/r) is well-defined in (0, +∞]. Fur-
ther, note that
612 22 Concentration inequalities
L∗ (p) = sup p r − L(r)
r≥0
so
% &
d(x, y)
g(x) ≥ Hs g(x) ≥ g(x) + inf sL − L (∞) d(x, y)
d(x,y)≤R(g,s) s
% &
R(g, s)
≥ g(x) + s L − L (∞) R(g, s) , (22.76)
s
0 ≤ g(x) − Hs g(x)
% &
d(x, y)
= sup g(x) − g(y) − s L
d(x,y)≤R(g,s) s
9
[g(y) − g(x)]− d(x, y) d(x, y)
≤s sup −L
d(x,y)≤R(g,s) d(x, y) s s
[g(y) − g(x)]−
≤ s L∗ sup , (22.77)
d(x,y)≤R(g,s) d(x, y)
where I have used the inequality p r ≤ L(r) + L∗ (p). Statement (v) fol-
lows at once from (22.77). Moreover, if L (∞) = +∞, then L∗ is contin-
uous on R+ , so by the definition of |∇− g| and the fact that R(g, s) → 0,
g(x) − Hs g(x) [g(y) − g(x)]−
lim sup ≤ L∗ lim sup
s↓0 s s↓0 d(x,y)≤R(g,s) d(x, y)
= L∗ |∇− g(x)| ,
g(x) − Hs g(x)
≤ L∗ (gLip ) ≤ L∗ (L (∞)) = L∗ (|∇− g(x)|).
s
• If |∇− g(x)| < L (∞), I claim that there is a function α = α(s),
depending on x, such that α(s) −→ 0 as s → 0, and
% &
d(x, y)
Hs g(x) = inf g(y) + s L . (22.78)
d(x,y)≤α(s) s
This is obvious if |∇− g(x)| = 0, so let us assume |∇− g(x)| > 0. (Note
that |∇− g(x)| < +∞ since g is Lipschitz.)
By the same computation as before,
% &
g(x) − Hs g(x) 1 d(x, y)
= sup g(x) − g(y) − s L .
s s d(x,y)≤R(g,s) s
R(g, s)
−→ L−1 (∞) = +∞.
s
So for s small enough, R(g, s) > s q. This implies
% &
g(x) − Hs g(x) 1 d(x, y)
≥ sup g(x) − g(y) − s L
s s d(x,y)=s q s
% &
g(x) − g(y)
= sup q − L(q) . (22.81)
d(x,y)=s q d(x, y)
Let % &
g(x) − g(y)
ψ(r) = sup .
d(x,y)=r d(x, y)
If it can be shown that
g(x) − Hs g(x)
lim inf ≥ |∇− g(x)| q − L(q) = L∗ (|∇− g(x)|).
s↓0 s
If L (∞) < +∞ and |∇− g(x)| = L (∞), the above reasoning fails
because ∂L∗ (|∇− g(x)|) might be empty. However, for any θ < |∇− g(x)|
we may find q ∈ ∂L∗ (θ), then the previous argument shows that
g(x) − Hs g(x)
lim inf ≥ L∗ (θ);
s↓0 s
the conclusion is obtained by letting θ → |∇− g(x)| and using the lower
semicontinuity of L∗ .
Appendix: Properties of the Hamilton–Jacobi semigroup 617
ψ(r) − ψ(r ) ≥ − C (r − r ).
[g(y) − g(x)]−
lim sup < L.
y→x d(x, y)
Bibliographical notes
has further worked on the links between Ricci curvature and con-
centration, see his very influential book [438], especially Chapter 3 12
therein. Also Talagrand made decisive contributions to the theory of
concentration of measure, mainly in product spaces, see in particu-
lar [772, 773]. Dembo [294] showed how to recover several of Tala-
grand’s results in an elegant way by means of information-theoretical
inequalities.
Tp inequalities have been studied for themselves at least since the
beginning of the nineties [695]; the Csiszár–Kullback–Pinsker inequal-
ity can be considered as their ancestor from the sixties (see below).
Sometimes it is useful to consider more general transport inequalities
of the form C(µ, ν) ≤ Hν (µ), or even C(µ, µ ) ≤ Hν (µ) + Hν (µ) (recall
Theorem 22.30 and its proof).
It is easy to show that transport inequalities are stable under
weak convergence [305, Lemma 2.2]. They are also stable under push-
forward [425].
Proposition 22.3 was studied by Rachev [695] and Bobkov and
Götze [128], in the cases p = 1 and p = 2. These duality formulas were
later systematically exploited by Bobkov, Gentil and Ledoux [127, 131].
The Legendre reformulation of the H functional can be found in many
sources (for instance [577, Appendix B] when X is compact).
The tensorization argument used in Proposition 22.5 goes back to
Marton [595]. The measurable selection theorem used in the construc-
tion of the coupling π can be found e.g. in [288]. As for Lemma 22.8, it
is as old as information theory, since Shannon [746] used it to motivate
the introduction of entropy in this context. After Marton’s work, this
tensorization technique has been adapted to various situations, such
as weakly dependent Markov chains; see [124, 140, 305, 596, 621, 703,
730, 841]. Relations with the so-called Dobrushin(–Shlosman) mixing
condition appear in some of these works, for instance [596, 841]; in fact,
the original version of the mixing condition [308, 311] was formulated
in terms of optimal transport!
It is also Marton [595] who introduced the simple argument by which
Tp inequalities lead to concentration inequalities (implication (i) ⇒ (iii)
in Theorem 22.10), and which has since then been reproduced in nearly
all introductions to the subject. She used it mainly
with the so-called
Hamming distance: d((xi )1≤i≤n , (yi )1≤i≤n ) = 1xi =yi .
There are alternative functional approaches to the concentration
of measure: via logarithmic Sobolev inequalities [546, Chapter 5] [41,
620 22 Concentration inequalities
Sanov’s theorem match well goes back at least to [139]; but Gozlan’s
theorem uses this ingredient with a quite new twist.
Varadarajan’s theorem (law of large numbers for empirical mea-
sures) was already used in the proof of Theorem 5.10; it is proven for
instance in [318, Theorem 11.4.1]. It is anyway implied by Sanov’s the-
orem.
Theorem 22.10 shows that T1 is quite well understood, but many
questions remain open about the more interesting T2 inequality. One
of the most natural is the following: given a probability measure ν
satisfying T2 , and a bounded function v, does e−v ν/( e−v dν) also
satisfies a T2 inequality? For the moment, the only partial result in this
direction is (22.29). This formula was first established by Blower [122]
and later recovered with simpler methods by Bolley and myself [140].
If one considers probability measures of the form e−V (x) dx with
V (x) behaving like |x|β for large |x|, then the critical exponents for
concentration-type inequalities are the same as we already discussed for
isoperimetric-type inequalities: If β ≥ 2 there is the T2 inequality, while
for β = 1 there is the transport inequality with linear-quadratic cost
function. What happens for intermediate values of β has been investi-
gated by Gentil, Guillin and Miclo in [410], by means of modified log-
arithmic Sobolev inequalities in the style of Bobkov and Ledoux [130].
Exponents β > 2 have also been considered in [131].
It was shown in [671] that (Talagrand) ⇒ (log Sobolev) in Rn , if
the reference measure ν is log concave (with respect to the Lebesgue
measure). It was natural to conjecture that the same argument would
work under an assumption of nonnegative curvature (say CD(0, ∞));
Theorem 22.21 shows that such is indeed the case.
It is only recently that Cattiaux and Guillin [219] produced a coun-
terexample on the real line, showing that the T2 inequality does not
necessarily imply a log Sobolev inequality. Their counterexample takes
the form dν = e−V dx, where V oscillates rather wildly at infinity,
in particular V is not bounded below. More precisely, their potential
looks like V (x) = |x|3 + 3x2 sin2 x + |x|β as x → +∞; then ν satisfies a
logarithmic Sobolev inequality only if β ≥ 5/2, but a T2 inequality as
soon as β > 2. Counterexamples with V bounded below have still not
yet been found.
Even more recently, Gozlan [425, 426, 428] exhibited a characteri-
zation of T2 and other transport inequalities on R, for certain classes
of measures. He even identified situations where it is useful to deduce
624 22 Concentration inequalities
for the Gaussian measure (Remark 22.34 and Example 22.36, which
I copied from [546]). Then Maurey [607] found a simple approach to
concentration inequalities for the product exponential measure. Later,
Talagrand [774] made the connection with transport inequalities for
the quadratic-linear cost. Bobkov and Ledoux [130] introduced modi-
fied logarithmic Sobolev inequalities, and showed their equivalence with
Poincaré inequalities. (The proof of (i) ⇒ (ii) is copied almost verba-
tim from [130].) Very recently, alternative methods based on Lyapunov
functionals have been developed to handle these inequalities in an ele-
mentary way [447].
Bobkov and Ledoux also showed how to recover concentration in-
equalities directly from their modified logarithmic Sobolev inequali-
ties, showing in some sense that the concentration properties of the
exponential measure were shared by all measures satisfying a Poincaré
inequality. Finally, Bobkov, Gentil and Ledoux [127] understood how
to deduce quadratic-linear transport inequalities from modified log-
arithmic Sobolev inequalities, thanks to the Hamilton–Jacobi semi-
group. The proof of the direct implication in Theorem 22.28 is just
an expanded version of the arguments suggested in [127]; the proof of
the converse follows Gozlan [429].
In the particular case when ν(dx) = e−|x| dx on R+ , there are sim-
pler proofs of Theorem 22.25, also with improved constants; see for
instance the above-mentioned works by Talagrand or Ledoux, or a re-
cent remark by Barthe [73]. One may also note that deviation estimates
with bounds like exp(−c min(t, t2 )) for sums of independent real vari-
ables go back to the elementary Bernstein inequalities (see [96] or [775,
Theorem A.2.1]).
The treatment of dimension-dependent Talagrand-type inequalities
in the last section is inspired from a joint work with Lott [578]. That
topic had been addressed before, with different tools, by Gentil [409];
it would be interesting to compare precisely his results with the ones
in this chapter. I shall also mention another dimension-dependent in-
equality in Remark 24.14.
Theorem 22.46 (behavior of solutions of Hamilton–Jacobi equations)
has been obtained by generalizing the proof of Proposition 22.16 as it
appears in [579]. When L (∞) = +∞, the proof is basically the same,
while there are a few additional technical difficulties if L (∞) < +∞.
In fact Proposition 22.16 was established in [579] in a more general con-
text, namely when M is a finite-dimensional Alexandrov spaces with
Bibliographical notes 627
1
A related remark, which I learnt from Ben Arous, is that the logarithmic Sobolev
inequality compares the rate functions of two large deviation principles, one for
the empirical measure of independent samples and the other one for the empirical
time-averages.
628 22 Concentration inequalities
where (Xs )s≥0 is the symmetric diffusion process with invariant mea-
sure ν, µ = law (X0 ), and ϕ is an arbitrary Lipschitz function. (Com-
pare with Theorem 22.10(v).)
23
Gradient flows I
d 2
Φ(X(t)) = − gradX(t) Φ .
dt
Around the end of the nineties, Jordan, Kinderlehrer and Otto made
the important discovery that a number of well-known partial differen-
tial equations can be reformulated as gradient flows in the Wasserstein
space. The most emblematic example is that of the heat equation,
∂t ρ = ∆ρ,
(It is not a priori assumed that o(ε) is uniform in w.) Proposition 16.2
will also be useful.
(v) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,
1
d+ d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − Λ(γs , γ̇s ) (1 − s) ds;
dt 2 0
632 23 Gradient flows I
(vi) For any y ∈ M , and any geodesic (γs )0≤s≤1 joining γ0 = X(t)
to γ1 = y,
d+ d(X(t), y)2 d(X(t), y)2
≤ Φ(y) − Φ(X(t)) − λ[γ] ,
dt 2 2
Remark 23.2. As the proof will show, the equivalence between (iii),
(iv), (v) and (vi) does not require the differentiability of Φ; it is sufficient
that Φ be valued in R ∪ {+∞} and Φ(X(t)) < +∞.
Finally, Properties (iv) to (vi) are quite handy to study gradient flows
in abstract metric spaces. As a matter of fact, in the sequel I shall use
(iv) to define gradient flows in the Wasserstein space.
d
− Φ(X(t)) = −∇Φ(X(t)), Ẋ(t)
dt
|∇Φ(X(t))|2 + |Ẋ(t)|2
≤ ∇Φ(X(t)) Ẋ(t) ≤ ,
2
with equality if and only if ∇Φ(X(t)) and Ẋ(t) have the same norm
and opposite directions. So (i) is equivalent to (ii).
Next, if Φ is differentiable then
So (v) implies
d+ d(X(t), y)2
− εw, Ẋ(t) =
dt 2
d(X(t), y)2
≤ Φ(expX(t) εw) − Φ(X(t)) + λ[γ]
2
ε |w|2
2
= Φ(expX(t) εw) − Φ(X(t)) − λ[γ] .
2
As a consequence,
Φ(expX(t) εw) ≥ Φ(X(t)) + ε w, −Ẋ(t) + o(ε),
The proof is the same as for the implication (iv) ⇒ (v) in Proposition
23.1. I could have used Inequality (23.2) to define gradient flows in
metric spaces, at least for λ-convex functions; but Definition 23.7 is
more general.
X = P2ac (M ) is not complete, but after all it is a geodesic space in its own
right, as a geodesically convex subset of P2 (M ) (recall Theorem 8.7), and
I shall not need completeness. Of course, this does not mean that it is
not interesting to study gradient flows in the whole of P2 (M ).
To go on with this program, I have to
• compute the (upper) derivative of the distance function;
• compute the subdifferential of a given energy functional.
This will be the purpose of the next two sections, more precisely The-
orems 23.9 and 23.14. Proofs will be long because I tried to achieve
what looked like the “correct” level of generality; so the reader should
focus on the results at first reading.
Remark 23.10. Recall that Theorem 10.41 gives a list of a few condi-
can be replaced by the
tions under which the approximate gradient ∇
usual gradient ∇ in the formulas above.
⎧
⎪
⎪T (x) = x;
⎨ t→t
(23.5)
⎪
⎪
⎩ d Tt→s (x) = ξs (Tt→s x).
ds
(If ξs (x) is the velocity field at time s and position x, then Tt→s (x) is
the position at time s of a particle which was at time t at x and then
followed the flow.)
By the formula of conservation of mass, for all t, s ∈ [t1 , t2 ),
µs = (Tt→s )# µt .
The idea is to compose the transport Tt→s with some optimal transport;
this will not result in an optimal transport, but at least it will provide
bounds on the Wasserstein distance.
In other words, µt = law (γt ), where γt is a random solution of
γ̇t = ξt (γt ). Restricting the time-interval slightly if necessary, we may
assume that these curves are defined on the closed interval [t1 , t2 ]. Each
of these curves is continuous and therefore bounded.
If γ solves γ̇t = ξt (γt ), then d(γs , γt ) ≤ |ξτ (γτ )| dτ . Since (γs , γt ) is
a coupling of (µs , µt ), it follows from the very definition of W2 that
638 23 Gradient flows I
1
t 2
W2 (µs , µt ) ≤ E |ξτ (γτ )| dτ
s
1
t
≤ E (s − t) |ξτ (γτ )|2 dτ
s
1
t
= |s − t| E |ξτ (γτ )|2 dτ
s
1
t
= |s − t| |ξτ |2 dµτ dτ
s
t
1
≤ |s − t| + |ξτ | dµτ
2
dτ .
2 s
The maps exp(∇ψ) and exp(∇ are inverse to each other in the
ψ)
almost sure sense. So for σ(dx)-almost all x, there is a minimizing
geodesic connecting T (x) to x with initial velocity ∇ψ(T (x)); then by
the formula of first variation,
: 2 ;
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
lim sup
s↓0 2s
≤ − ξt (T (x)), ∇ψ(T (x)) .
So we should check that we can indeed pass to the lim sup in (23.9).
Let v(s, x) be the integrand in the right-hand side of (23.9): If 0 < s ≤ 1
then
2
d x, Tt→t+s ◦ T (x) − d(x, T (x))2
v(s, x) =
2s
d x, T d x, T
t→t+s ◦T (x) −d(x, T (x)) t→t+s ◦ T (x) + d(x, T (x))
≤
s 2
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
d T (x), Tt→t+s (T (x)) d T (x), Tt→t+s (T (x))
≤ d(x, T (x)) +
s 2
2
d(x, T (x))2 d T (x), Tt→t+s (x)
≤ + =: w(s, x).
2 s2
Note that x → d(x, T (x))2 ∈ L1 (σ), since
d(x, T (x))2 dσ(x) = W2 (σ, µt )2 < +∞. (23.10)
Moreover,
2
d T (x), Tt→t+s (T (x)) d(y, Tt→t+s (y))2
dσ(x) = dµt (y)
s2 s2
s 2
1
≤
ξt+τ (Tt→t+τ (y)) dτ dµt (y)
s2 0
s
1
≤ ξt+τ (Tt→t+τ (y))2 dτ dµt (y)
s 0
s
1
≤ ξt+τ (Tt→t+τ (y))2 dµt (y) dτ
s 0
1 s
= |ξt+τ (z)| dµt+τ (z) dτ.
2
s 0
By assumption, the latter quantity converges as s → 0 to
|ξt (x)|2 dµt (x) = |ξt (T (x))|2 dσ(x).
Since d(T (x), Tt→t+s (T (x)))2 /s2 −→ |ξt (T (x))|2 as s → 0, we can com-
bine this with (23.10) to deduce that
Derivative of the Wasserstein distance 641
% &
lim sup w(s, x) dσ(x) ≤ lim w(s, x) dσ(x).
s↓0 s↓0
Under this assumption I shall show, with the same notation as in Step 1,
that t → W2 (µt , σ)2 is differentiable on the whole of (t1 , t2 ), and
d W2 (µt , σ)2 t , ξt dµt .
=− ∇ψ
dt 2 M
I shall start with some estimates on the flow Tt→s . From the as-
sumptions,
d
d(z, Tt→s (x)) ≤ ξs (Tt→s (x)) ≤ C 1 + d(z, Tt→s (x)) ,
ds
In the sequel, the symbol C will stand for other constants that may
depend only on τ and the Lipschitz constant of ξ.
642 23 Gradient flows I
≤ C (1 + d(z, x))2 µt0 (dx) < +∞. (23.12)
≤ C |t − s|2
(1 + d(z, x))2 µt (dx);
W2 (µt , µs ) ≤ C |t − s|.
For each s > 0, let T (s) be the optimal transport between σ and
µt+s . As s ↓ 0 we can extract a subsequence sk → 0, such that
W2 (µt+s , σ)2 − W2 (µt , σ)2
lim inf
s↓0 2s
W2 (µt+sk , σ)2 − W2 (µt , σ)2
= lim .
k→∞ 2sk
Changing signs and reasoning as in Step 1, we obtain
W2 (µt , σ)2 − W2 (µt+s , σ)2
lim sup ≤ lim sup vk (x) σ(dx),
s↓0 2s k→∞
(23.14)
where
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)
vk (x) = .
2sk
Let us bound the functions vk . For each x and k, we can use (23.11)
to derive
2 2
d x, Tt+sk →t ◦ T (t+sk ) (x) − d x, T (t+sk ) (x)
≤ d x, Tt+sk →t (T (t+sk ) (x)) + d(x, T (t+sk ) (x)
× d T (t+sk ) (x), Tt+sk →t (T (t+sk ) (x))
≤ C 1 + d(z, x) + d(z, T (t+sk ) (x) sk d z, T (t+sk ) (x)
2
≤ C sk 1 + d(z, x)2 + d z, T (t+sk ) (x) .
So 2
vk (x) ≤ C 1 + d(z, x)2 + d z, T (t+sk ) (x) .
t )2 ≤ C |s − s |
t )2 − W2 (µs , µ
W2 (µs , µ
∂µt,k
+ ∇ · (ξt µt,k ) = 0. (23.18)
∂t
But by definition µt,k is concentrated on the ball B[z, k], so in (23.18)
we may replace ξt by ξt,k = ξχk , where χk is a smooth cutoff function,
0 ≤ χk ≤ 1, χk = 1 on B[z, k], χk = 0 outside of B[z, 2k].
k , Z
Let A t,k and ξt,k be defined similarly in terms of ξ and µ
k , µ t .
Since ξt,k and ξt,k are compactly supported, we may apply the result
of Step 3: for all t ∈ (t1 , t2 ),
d W2 (µt,k , µt,k )2
=−
∇ψt,k , ξt,k dµt,k − ∇ ψt,k , ξt,k dµt,k ,
dt 2
(23.19)
where exp(∇ψ t,k ) and exp(∇ ψt,k ) are the optimal transports µt,k →
t,k and µ
µ t,k → µt,k .
Since t → µt,k and t → µ t,k are Lipschitz paths, also W2 (µt,k , µ
t,k )
is a Lipschitz function of t, so (23.19) integrates up to
Derivative of the Wasserstein distance 647
t,k )2
W2 (µt,k , µ W2 (µ0,k , µ 0,k )2
=
2 2
t
− s,k , ξs,k dµs,k + ∇
∇ψ
ψs,k , ξs,k d
µs,k ds. (23.20)
0
t )2
W2 (µt , µ 0 )2
W2 (µ0 , µ
=
2 2
t
−
∇ψs , ξs dµs + ∇ψs,k , ξs,k d
µs,k ds, (23.21)
0
Then
Uν (µt ) − Uν (µ) ∇p(ρ) dν;
lim inf ≥ ∇ψ, (23.25)
t↓0 t
and
∇p(ρ) dν
Uν (σ) ≥ Uν (µ) + ∇ψ,
1
+ KN,U
|∇ψt (x)| ρt (x)
2 1
1− N
ν(dx) (1 − t) dt. (23.26)
0
so
− V (exp t∇ψ(x))−V (x)
J0→t (x) = e det dx exp(t∇ψ)
= 1 − t ∇V (x) · ∇ψ(x) 1 + t ∆ψ(x) + o(t)
= 1 + t (Lψ)(x) + o(t). (23.32)
d2 u(t, x) − 1 2
2
≥ KN,U ρt exp(t∇ψ(x)) N ∇ψt (exp(t∇ψ(x))) ,
dt
and by assumption KN,U is finite. Note that |∇ψt (exp(t∇ψ(x)))| =
d(x, T (x)), which is bounded (µ-almost surely) by the maximum dis-
tance between points in the support of µ and points in the support
of ν. So there is a positive constant C such that
d2 u(t, x) − 1
≥ − C ρ t exp(t∇ψ(x)) N. (23.34)
dt2
Let t s − 1
R(t, x) = C ρτ exp(τ ∇ψ(x)) N dτ ds; (23.35)
0 0
Subdifferential of energy functionals 653
(t, x) − u
u (0, x)
(t, x) = u(t, x) + R(t, x);
u x) =
w(t, .
t
Then t → u(t, x) is a convex function, so the previous reasoning applies
to show that
lim k , x) µ(dx) =
w(t k , x) µ(dx).
lim w(t
k→∞ k→∞
Also recall from Remark 10.32 that µ-almost surely, x and T (x) =
expx (∇ψ(x)) are joined by a unique geodesic. So (23.39) implies
∇ψ(x) = ∇ψ(x), µ(dx)-almost surely.
Uν (µt ) − Uν (µ0 )
≤ Uν (µ1 ) − Uν (µ0 )
t
1
1 G(s, t)
− KN,U ρs (x) 1− N
|∇ψs (x)| ν(dx)
2
ds. (23.48)
0 t
Then we can use Steps 1 and 2 to pass to the lim inf in the left-hand
side.
As for the right-hand side of (23.48), we note that if B is a large
ball containing the supports of all ρs (0 ≤ s ≤ 1), and D = diam (B),
then
1 1− 1
ρs (x)1− N
|∇ψs (x)| ν(dx) ≤ D
2 2
ρs N dν
1
≤ D2 ν[B] N < +∞.
(Here I have used Jensen’s inequality again as in Step 1.) So the quan-
tity inside brackets in the right-hand side of (23.48) is uniformly (in
t) bounded. Since G(s, t)/t converges to 1 − s in L1 (ds) as t → 0, we
can pass to the limit in the right-hand side of (23.48) too. The final
result is
∇ψ, ∇p(ρ) dν ≤ Uν (µ1 ) − Uν (µ0 )
1
1
− KN,U ρs (x)1− N |∇ψs (x)|2 ν(dx) (1 − s) ds,
0
χk ρ0
Zk = χk dµ0 ; ρ0,k = ; µ0,k = ρ0,k ν;
Zk
χk (γ0 ) Π(dγ)
Πk (dγ) = .
Zk
ρ− N |∇ρ0 |
1
χk p (χk ρ0 ) |∇ρ0 | ≤ C |∇ψ|
|∇ψ| 0
p (ρ0 ) |∇ρ0 |.
≤ C |∇ψ|
So we can pass to the limit in the third term of (23.49), and the proof
is complete.
Case 2: Spt(µ1 ) is not compact. In this case we shall definitely
not use the standard approximation scheme by restriction, but instead
a more classical procedure of smooth truncation.
Again let χ : R+ → R+ be a smooth nondecreasing function with
0 ≤ χ ≤ 1, χ(r) = 1 for r ≤ 1, χ(r) = 0 for r ≥ 2; now we require in
addition that (χ )2 /χ is bounded. (It is easy to construct such a cutoff
function rather explicitly.) Then we define χk (x) = χ(d(z, x)/k),
where
z is an arbitrary point in M . For k large enough, Z1,k := χk dµ1 ≥
1/2. Then we choose = (k) large enough that Z0,k := χ dµ0 is
larger than Z1,k ; this is possible since Z1,k < 1 (otherwise µ1 would be
compactly supported). Then we let
χ(k) µ0 χk µ1
µ0,k = ; µ1,k = .
Z0,k Z1,k
For each k, these are two compactly supported, absolutely continuous
probability measures; let (µt,k )0≤t≤1 be the displacement interpolation
joining them, and let ρt,k be the density of µt,k . Further, let ψk be a
d2 /2-convex function so that exp(∇ψk ) is the optimal (Monge) trans-
port µ0 and µ1 , and let ψt,k be deduced from ψk by the Hamilton–Jacobi
forward semigroup.
Note carefully: It is obvious from the construction that Z0,k ρ0,k ↑ ρ0 ,
Z1,k ρ1,k ↑ ρ1 , but there is a priori no monotone convergence relating ρt,k
to ρt ! Instead, we have the following information. Since µ0,k → µ0 and
µ1,k → µ1 , Corollary 7.22 shows that the geodesic curves (µt,k )0≤t≤1
converge, up to extraction of a subsequence, to some geodesic curve
(µt,∞ )0≤t≤1 joining µ0 to µ1 . (The convergence holds true for all t.)
Since (µt ) is the unique such curve, actually µt,∞ = µt , which shows
that µt,k converges weakly to µt for all t ∈ [0, 1].
For each k, we can apply the results of Step 4 with U replaced by
Uk = Uk (Z1,k · ) and µt replaced by µt,k :
Uk,ν (µ1,k ) ≥ Uk,ν (µ0,k ) + ∇ψk , ∇pk (ρ0,k ) dν
1
1
+ KN,Uk ρt,k (x)1− N
|∇ψt,k (x)| ν(dx) (1 − t) dt, (23.56)
2
0 M
a few results which will be proven later on in Part III of this course (in
a more general context).
First term of (23.56): Since Uk,ν (µ1,k ) = U (Z1,k ρ1,k ) dν and
Z1,k ρ1,k = χk ρ1 converges monotonically to ρ1 , the same arguments
as in the proof of Theorem 17.15 apply to show that
∇ψ
= ρ0,k |∇ψk | dν − ρ0,k |∇ψ| dν − 2 ρ0,k ∇ψk − ∇ψ,
2 2 dν.
(23.62)
Observe that:
(a) ρ0,k |∇ψk |2 dν = W2 (µ0,k , µ1,k )2 . To prove that this converges
to W2 (µ0 , µ1 )2 , by Corollary 6.11 it suffices to check that
W2 (µ0,k , µ0 ) −→ 0; W2 (µ1,k , µ1 ) −→ 0;
Plugging back the results of (a), (b) and (c) into (23.62), we obtain
2 dν −−−→ 0.
ρ0,k |∇ψk − ∇ψ| (23.63)
k→∞
I claim that
1
ρt (x)
1
|∇ψt (x)| ν(dx) (1 − t) dt
1− N 2
0
1
1
≥ lim sup ρt,k (x)1− N
|∇ψt,k (x)| ν(dx) (1 − t) dt. (23.65)
2
k→∞ 0
Subdifferential of energy functionals 665
and this quantity is positive if δ is chosen small enough and then C large
enough. Then the event E has zero Πk ⊗ Πk measure since (e0 , e1 )# Πk
is cyclically monotone (Theorems 5.10 and 7.21). So the left-hand side
in (23.70) vanishes, and (23.67) is true. A similar result holds for µt,k
and ∇ψt,k replaced by µt and ∇ψ t , respectively.
Let Zk = χ dµt,k (which is positive if is large enough), and let
Πk be the probability measure on geodesics defined by
k (dγ) = χ (γt ) Πk (dγ) .
Π
Zk
Subdifferential of energy functionals 667
1−1/N
On the other hand, ρt,k dν is bounded independently of k (by
Theorem 17.8; the moment condition used there is weaker than the one
which is presently enforced). This and (23.71) imply
1
χ (x) ρt,k (x)1− N |wk (x) − w(x)| ν(dx) −−−→ 0. (23.72)
k→∞
(see Theorem 29.20 later in Chapter 29 and change signs; note that
χ w ν is a compactly supported measure).
668 23 Gradient flows I
This completes the proof of (23.66). Then we can at last pass to the
lim sup in the fourth term of (23.56).
Let us recapitulate: In this step we have shown that
• if N = ∞, then
∇p(ρ0 ) dν + K∞,U W2 (µ0 , µ1 )2
Uν (µ1 ) ≥ Uν (µ0 ) + ∇ψ, ;
2
(a) p (r) = a r1−1/N for r large enough, where p (r) = r U (r)−U (r)
(this holds true for N < ∞ as well as for N = ∞);
(b) p (r) = 0 for r small enough;
(c) If [0, r0 ()] is the interval on which p vanishes, then there are
r1 > r0 , a function h nondecreasing on [r0 , r1 ], and constants K, C
such that Kh ≤ p ≤ Ch; in particular, if r ≤ r ≤ r1 , then p (r) ≤
(C/K) p (r );
(d) p (r) ≥ K0 r−1/N for r ≥ r1 .
This implies that each U satisfies all the assumptions for Step 5 to
go through. Moreover, Proposition 17.7 guarantees that U ≤ C U for
some constant C which does not depend on ; in particular, p ≤ Cp .
Admitting for a while that this implies
1a≤ρ0 ≤b ∇p (ρ0 ) = p (ρ0 ) 1a≤ρ0 ≤b ∇ρ0 −−−→ 1a≤ρ0 ≤b ∇p(ρ0 ).
→∞
This proves that ∇p (ρ0 ) converges almost surely to ∇p(ρ0 ) on each
set {r0 < a ≤ ρ0 ≤ b}, and therefore on the whole of {ρ0 > r0 }. On the
other hand, if ρ0 ≤ r0 then p(ρ0 ) = 0, so ∇p(ρ0 ) vanishes almost surely
on {ρ0 ≤ r0 } (this is a well-known theorem from distribution theory;
see the bibliographical notes in case of need), and also ∇p (ρ0 ) = 0 on
that set. This proves the almost everywhere convergence of ∇p (ρ0 ) to
∇p(ρ0 ). At the same time, this reasoning proves (23.74).
So to pass to the limit in (23.75) it suffices to prove that the inte-
grand is dominated by an integrable function. But
∇p (ρ0 ) ≤ |∇ψ|
∇ψ, |∇p (ρ0 )|
|∇p(ρ0 )|
≤ C |∇ψ|
By Theorem 17.15,
Subdifferential of energy functionals 671
1
Uν (µt ) + KN,U
1
ρs (x)1− N s (x)| ν(dx) G(s, t) ds
|∇ψ 2
0
≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ),
and the result follows by letting t → 0. (This works for N < ∞ as well
as for N = ∞; recall Proposition 17.24(i).)
Case (II): N = ∞ and K < 0. Then
2
∇p(ρ0 ) dν + KW2 (µ0 , µ1 ) .
Uν (µ1 ) ≥ Uν (µ0 ) + ∇ψ, (23.78)
2
After dividing by t and passing to the lim inf, we recover (23.77) again.
The first part of the proof of Proposition 17.24(ii) shows that the
expression inside brackets is uniformly bounded as soon as, say, t ≤
1/2. So
∇p(ρ0 ) dν − O(t2 )
Uν (µt ) ≥ Uν (µ0 ) + t ∇ψ, as t → 0,
Proof of Theorem 23.19. First note that the assumptions of the theorem
imply KN,U > −∞.
Because U is C 3 on (0, +∞), the function U (ρ) is C 2 , so
Stability
A good point of our weak formulation of gradient flows is that it comes
with stability estimates almost for free. This is illustrated by the next
theorem, in which regularity assumptions are far from optimal.
Theorem 23.26 (Stability of gradient flows in the Wasserstein
t be two solutions of (23.79), satisfying the assump-
space). Let µt , µ
tions of Theorem 23.19 with either K ≥ 0 or N = ∞. Let λ = K∞,U if
N = ∞; and λ = 0 if N < ∞ and K ≥ 0. Then, for all t ≥ 0,
t ) ≤ e−λ t W2 (µ0 , µ
W2 (µt , µ 0 ).
Remark 23.27. I don’t know how seriously the restrictions on K
and N should be taken.
Proof of Theorem 23.26. By Theorem 23.9, for almost any t,
d t )2
W2 (µt , µ t , ∇U (ρt ) dµt + ψt , ∇U (
= ∇ψ ∇ ρt ) d
µt ,
dt 2
(23.82)
where exp(∇ψ ψt )) is the optimal transport µt → µ
t ) (resp. exp(∇ t
t → µt ).
(resp. µ
676 23 Gradient flows I
t )2
W2 (µt , µ
µt ) − Uν (µt ) − λ
≤ Uν ( . (23.83)
2
Similarly,
ψt , ∇U ( µt , µt )2
W2 (
∇ µt ≤ Uν (µt ) − Uν (
ρt ) d µt ) − λ . (23.84)
2
The combination of (23.82), (23.83) and (23.84) implies
t )2
d W2 (µt , µ t )2
W2 (µt , µ
≤ −2λ .
dt 2 2
Then the result follows from Gronwall’s lemma.
This last section evokes some key issues which I shall not develop,
although they are closely related to the material in the rest of this
chapter.
There is a general theory of gradient flows in metric spaces, based for
instance on Definition 23.7, or other variants appearing in Proposition
23.1. Motivations for these developments come from both pure and ap-
plied mathematics. This theory was pushed to a high degree of sophis-
tication by many researchers, in particular De Giorgi and his school.
A key role is played by discrete-time approximation schemes, the
simplest of which can be stated as follows:
1. Choose your initial datum X0 ;
2. Choose a time step τ , which in the end will decrease to 0;
(τ ) d(X0 , X)2
3. Let X1 be a minimizer of X −→ Φ(X)+ ; then define
2τ
(τ )
(τ ) d(Xk , X)2
inductively Xk+1 as a minimizer of X −→ Φ(X) + .
2τ
(τ )
4. Pass to the limit in Xk as τ → 0, kτ → t, hopefully recover a
function X(t) which is the value of the gradient flow at time t.
General theory and time-discretization 677
W2 (µt , µt+dt )2
Uν (µt+dt ) − Uν (µt ) + .
2 dt
By using the interpretation of the Wasserstein distance between infin-
itesimally close probability measures, this can also be rewritten as
W2 (µt , µt+dt )2 ∂µ
inf |v| dµt ;
2
+ ∇ · (µv) = 0 .
dt ∂t
Kdt − dS,
The following important lemma was used in the proof of Theorems 23.9
and 23.26.
⎪
⎪ t
⎪
⎪
⎪
⎩ sup F (s, t) − F (s, t ) ≤ v(τ ) dτ.
0≤s≤T t
Bibliographical notes
and in [814, Theorem 5.30] for general functions U ; all these references
only consider M = Rn . The procedure of extension of ∇ψ (Step 2)
appears e.g. in [248, Proof of Theorem 2] (in the particular case of
convex functions). The integration by parts of Step 3 appears in many
papers; under adequate assumptions, it can be justified in the whole of
Rn (without any assumption of compact support): see [248, Lemma 7],
[214, Lemma 5.12], [30, Lemma 10.4.5]. The proof in [214, 248] relies on
the possibility to find an exhaustive sequence of cutoff functions with
Hessian uniformly bounded, while the proof in [30] uses the fact that
in Rn , the distance to a convex set is a convex function. None of these
arguments seems to apply in more general noncompact Riemannian
manifolds (the second proof would probably work in nonnegative cur-
vature), so I have no idea whether the integration by parts in the proof
of Theorem 23.14 could be performed without compactness assump-
tion; this is the reason why I went through the painful2 approximation
procedure used in the end of the proof of Theorem 23.14.
It is interesting to compare the two strategies used in the extension
from compact to noncompact situations, in Theorem 17.15 on the one
hand, and in Theorem 23.14 on the other. In the former case, I could
use the standard approximation scheme of Proposition 13.2, with an ex-
cellent control of the displacement interpolation and the optimal trans-
port. But for Theorem 23.14, this seems to be impossible because of
the need to control the smoothness of the approximation of ρ0 ; as a
consequence, passing to the limit is more delicate. Further, note that
Theorem 17.15 was used in the proof of Theorem 23.14, since it is the
convexity properties of Uν along displacement interpolation which al-
lows us to go back and forth between the integral and the differential
(in the t variable) formulations.
The argument used to prove that the first term of (23.61) converges
to 0 is reminiscent of the well-known argument from functional analysis,
according to which convergence in weak L2 combined with convergence
of the L2 norm imply convergence in strong L2 .
1,1
At some point I have used the following theorem: If u ∈ Wloc (M ),
then for any constant c, ∇u = 0 almost everywhere on {u = c}. This
classical result can be found e.g. in [554, Theorem 6.19].
Another strategy to attack Theorem 23.14 would have been to
start from the “curve-above-tangent” formulation of the convexity
1/N
of Jt , where Jt is the Jacobian determinant. (Instead I used the
2
Still an intense experience!
684 23 Gradient flows I
Here I have assumed that Φ is bounded below (which is the case when
Φ is the free energy functional). When inf Φ = −∞, there are still esti-
mates of the same type, only quite more complicated [30, Section 3.2].
Ambrosio and Savaré recently found a simplified proof of error esti-
mates and convergence for time-discretized gradient flows [34].
Otto applied the same method to various classes of nonlinear dif-
fusion equations, including porous medium and fast diffusion equa-
tions [669], and parabolic p-Laplace type equations [666], but also more
exotic models [667, 668] (see also [737]). For background about the
theory of porous medium and fast diffusion equations, the reader may
consult the review texts by Vázquez [804, 806].
In his work about porous medium equations, Otto also made two
important conceptual contributions: First, he introduced the abstract
formalism allowing him to interpret these equations as gradient flows,
directly at the continuous level (without going through the time-
discretization). Secondly, he showed that certain features of the porous
medium equations (qualitative behavior, rates of convergence to equi-
librium) were best seen via the new gradient flow interpretation.
Bibliographical notes 685
This looked a bit like a formal game, but it was later found out
that related equations were common in the physical literature about
flux-limited diffusion processes [627], and that in fact Brenier’s very
equation had already been considered by Rosenau [708]. A rigorous
treatment of these equations leads to challenging analytical difficulties,
which triggered several recent technical works, see e.g. [39, 40, 618] and
Bibliographical notes 687
the references therein. By the way, the cost function 1 − 1 − |x − y|2
was later found to have applications in the design of lenses [712].
Lemma 23.28 in the Appendix is borrowed from [30, Lemma 4.3.4].
As Ambrosio pointed out to me, the argument is quite reminiscent of
Kruzkhov’s doubling method for the proof of uniqueness in the the-
ory of scalar conservation laws, see for instance the nice presentation
in [327, Sections 10.2 and 11.4]. It is important to note that the al-
most everywhere differentiability of F in both variables separately is
not enough to apply this lemma.
The stability theorem (Theorem 23.26) is a particular case of more
abstract general results, see for instance [30, Theorem 4.0.4(iv)].
In their study of gradient flows, Ambrosio, Gigli and Savaré point
out that it is useful to construct curves (µt ) satisfying the convexity-
type inequality
transport on the one hand, large deviations on the other) have a lot in
common, although the formalisms look very different. By the way, some
links between optimal transport and large deviations have recently been
explored in a book by Feng and Kurtz [355] as well as in [429, 430, 448].
As for the idea to rewrite (possibly linear) diffusion equations as
nonlinear transport equations, it may at first sight seem odd, but looks
like the right point of view in many problems, and has been used for
some time in numerical analysis.
So far I have mainly discussed gradient flows associated with cost
functions that are quadratic (p = 2), or at least strictly convex. But
there are some quite interesting models of gradient flows for, say, the
cost function which is equal to the distance (p = 1). Such equations
have been used for instance in the modeling of sandpiles [46, 48, 329,
336, 691] or compression molding [47]. These issues are briefly reviewed
by Evans [328].
Also gradient flows of certain “energy” functionals in the Wasserstein
space may be used to define certain diffusive semigroups; think for in-
stance of the problem of constructing the heat semigroup on a nonsmooth
metric space. Recently Ohta [655] showed that the heat equation could
be introduced in this way on Alexandrov spaces of nonnegative sectional
curvature. An independent contribution by Savaré [735] addresses the
same problem on certain classes of metric spaces including Alexandrov
spaces with curvature bounded below. On such spaces, the heat equation
can be constructed by other means [537], but it might be that the optimal
transport approach will apply to even more singular situations, and in
any case this provides a genuinely different approach of the problem.
One can also hope to treat subriemannian situations in the same
way; at least from a formal point of view, this is addressed in [514].
It can also be expected that the gradient flow interpretation will
provide powerful general tools to study the stability of diffusion equa-
tions. In a striking recent application of this principle, Ambrosio, Savaré
and Zambotti [35] proved the stability of the linear diffusion process
associated with the Fokker–Planck equation, in finite or infinite dimen-
sion, with respect to the weak convergence of the invariant probability
measure, as soon as the latter is log concave. Savaré [735, 736] also
used this interpretation to prove the stability of the heat semigroup
(on manifolds, or on certain metric-measure spaces) under measured
Gromov–Hausdorff convergence.
Bibliographical notes 689
as one could wish. Cullen, Gangbo and Pisante [267] have studied the
approximation of some of these equations by particle systems. There is
also a work by Gangbo, Nguyen and Tudorascu on the one-dimensional
Euler–Poisson model [401]. Their study provides evidence of striking
“pathological” behavior: If one defines variational solutions of Euler–
Poisson as the minimizing paths for the natural action, then a solution
might very well start absolutely continuous at initial time, collapse on a
Dirac mass in finite time, stay in this state for a positive time, and then
spread again. Also Wolansky [837] obtained coagulation–fragmentation
phenomena through a Hamiltonian formalism (based on an internal
energy rather than an interaction energy).
The precise sense of the “Hamiltonian structure” should be taken
with some care. It was suggested to me some time ago by Ghys that
this really is a Poisson structure, in the spirit of Kirillov. This guess
was justified and completely clarified (at least formally) by Lott [575],
who also made the link with previous work by Weinstein and collabo-
rators (see [575, Section 6] for explanations). Further contributions are
due to Khesin and Lee [514], who also studied the sense in which the
semigeostrophic system defines a Hamiltonian flow [513].
A particularly interesting “dissipative Hamiltonian equation” that
should have an interpretation in terms of optimal transport is the ki-
netic Fokker–Planck equation, with or without self-interaction. Huang
and Jordan [485] studied this model in the setting of gradient flows
(which is not natural here since we are rather dealing with a Hamil-
tonian flow with dissipation). A quite different contribution by Carlen
and Gangbo [204] approaches the kinetic Fokker–Planck equation by a
splitting scheme alternating free transport and time-discretized gradi-
ent flow in the velocity variable.
More recently, Gangbo and Westdickenberg [404] suggested an ap-
proximation scheme for the isentropic Euler system based on a kinetic
approximation (so one works with probability measures on Rn × Rn ),
with a splitting scheme alternating free transport, time-discretized gra-
dient flow in the position variable, and an optimal transport problem
to reconstruct the velocity variable. Their scheme, which has some hid-
den relation to the Huang–Jordan contribution, can be reinterpreted
in terms of minimization of an action that would involve the squared
acceleration rather than the squared velocity; and there are heuristic
arguments to believe that this system should converge to a physical
solution satisfying Dafermos’s entropy criterion (the physical energy,
Bibliographical notes 691
The density and momentum should satisfy the equation of mass con-
servation, namely ∂t ρ + ∇ · m = 0. At least formally, critical points
of (23.93), for fixed ρ0 , ρ1 , satisfy the Euler–Lagrange equation
⎧
⎪
⎪ ∂t ρ + ∇ · (ρ ∇ϕ) = 0
⎪
⎨
√ (23.94)
⎪
⎪ |∇ϕ| 2 ∆ ρ
⎪
⎩∂t ϕ + =2 √ ;
2 ρ
√ √
the pressure term (∆ ρ)/ ρ is sometimes called the “Bohm potential”.
√
Then the change of unknown ψ = ρ ei ϕ transforms (23.94) into the
usual linear Schrödinger equation.
Variational problems of the form of (23.93) can be used to derive
many systems of Hamiltonian type, and some of these actions are in-
teresting in their own right. The choice F = 0 gives just the squared
2-Wasserstein distance; this is the Benamou–Brenier
formula [814,
Theorem 8.1]. The functional F (ρ) = − |∇∆−1 (ρ − 1)|2 appears
in the so-called reconstruction problem in cosmology [568] and leads
692 23 Gradient flows I
to the Euler–Poisson
equations (see also [401]). As a final example,
F (ρ) = −(π 2 /6) ρ3 appears (in dimension n = 1) in the qualita-
tive description of certain random matrix models, and leads to the
isentropic Euler equations with negative cubic pressure law, as first re-
alized by Matytsin; see [449] for rigorous justification and references.
(Some simple remarks about uniqueness and duality for such negative
pressure models can be found in [811].) Unexpectedly, the same varia-
tional problem appears in a seemingly unrelated problem of stochastic
control [563].
24
Calculation rules
Having put equation (24.1) in gradient flow form, one may use Otto’s
calculus to shortcut certain formal computations, and quickly get rel-
evant results, without risks of computational errors. When it comes
to rigorous justification, things however are not so nice, and regular-
ity issues should — alas! — be addressed.1 For the most important of
these gradient flows, such as the heat, Fokker–Planck or porous medium
equations, these regularity issues are nowadays under good control.
To avoid inflating the size of this chapter much further, I shall not
go into these regularity issues, and be content with theorems that will
be conditional to the regularity of the solution.
1
Sometimes the gradient flow structure allows one to dispense with regularity, but
I shall not explore this possibility.
Calculation rules 695
(f ) ρ, p(ρ), Lp(ρ), p2 (ρ), ∇p2 (ρ), U (ρ), ∇U (ρ), LU (ρ), ∇LU (ρ),
L|∇U (ρ)|2 , L(∇U (ρ) ∇LU (ρ)) and e−V satisfy adequate growth/decay
conditions at infinity.
Then the following formulas hold true:
d
(i) ∀t > 0, A(ρt ) dν = − p (ρt ) A (ρt )|∇ρt |2 dν;
dt
d
(ii) ∀t > 0, Uν (µt ) = −IU,ν (µt );
dt
(iii) ∀t > 0,
d
IU,ν (µt ) = −2 ∇2 U (ρt )2HS + Ric + ∇2 V (∇U (ρt )) p(ρt ) dν
dt M
2
+ LU (ρt ) p2 (ρt ) dν;
M
0
d
(iv) ∀σ ∈ P2 (M ), W2 (σ, µt ) ≤ IU,ν (µt ) for almost all t > 0.
ac
dt
which is (ii).
Next, we can differentiate the previous expression once again along
the gradient flow µ̇ = −gradUν (µ):
d/ /
/2
/
/gradµt Uν / = −2 Hessµt · gradµt Uν , gradµt Uν ,
dt
and then (iii) follows from Formula 15.7.
As for (iv), this is just a particular case of the general formula
(d/dt)d(X0 , γ(t)) ≤ |γ̇(t)|γ(t) .
= U (ρ) Lp(ρ) dν = LU (ρ) p(ρ) dν,
(24.2)
+ LU (ρt ) p (ρt ) ∇ν · (ρt ∇U (ρt )) dν. (24.3)
698 24 Gradient flows II: Qualitative properties
The last two terms in this formula are actually equal: Indeed, if ρ is
smooth then
L ρ U (ρ) LU (ρ) p(ρ) dν = ρ U (ρ) LU (ρ) Lp(ρ) dν
= p (ρ) LU (ρ) ∇ν · (ρ ∇U (ρ)) dν.
where exp(∇ψ) is the optimal transport µt → σ. It follows that
1 1
d+ W2 (µt , σ)2 |∇p(ρt )|2 2 dν
≤ dν ρt |∇ψ|
dt 2 ρt
0
= IU,ν (µt ) W2 (µt , σ);
Large-time behavior
Otto’s calculus, described in Chapter 15, was first developed to estimate
rates of equilibration for certain nonlinear diffusion equations. The next
theorem illustrates this.
Theorem 24.7 (Equilibration in positive curvature). Let M be
a Riemannian manifold equipped with a reference measure ν = e−V ,
V ∈ C 4 (M ), satisfying a curvature-dimension bound CD(K, N ) for
some K > 0, N ∈ (1, ∞], and let U ∈ DCN . Then:
(i) (exponential convergence to equilibrium) Any smooth so-
lution (µt )t≥0 of (24.1) satisfies the following estimates:
⎧
⎪
⎪ (a) [Uν (µt ) − Uν (ν)] ≤ e−2Kλ t [Uν (µ0 ) − Uν (ν)]
⎪
⎪
⎪
⎪
⎨
(b) IU,ν (µt ) ≤ e−2Kλ t IU,ν (µ0 ) (24.5)
⎪
⎪
⎪
⎪
⎪
⎪
⎩
(c) W2 (µt , ν) ≤ e−Kλ t W2 (µ0 , ν),
where − 1
p(r) N
λ := 1lim sup ρ0 (x)
(24.6) .
r1− Nr→0x∈M
In particular, λ is independent of ρ0 if N = ∞.
(ii) (exponential contraction) Any two smooth solutions (µt )t≥0
and (µt )t≥0 of (24.1) satisfy
t ) ≤ e−Kλ t W2 (µ0 , µ
W2 (µt , µ 0 ), (24.7)
where
− 1
p(r) N
λ := lim 1 max sup ρ0 (x), sup ρ1 (x) . (24.8)
r→0 1− N
r x∈M x∈M
700 24 Gradient flows II: Qualitative properties
Remark 24.10. The rate of decay O(e−λ t ) is optimal for (24.9) if di-
mension is not taken into account; but if N is finite, the optimal rate
of decay is O(e−λt ) with λ = KN/(N − 1). The method presented in
this chapter is not clever enough to catch this sharp rate.
> λ.
as T → ∞. It follows that µt converges to ν as O(e−K λ t ) for any λ
Large-time behavior 701
sup ρ(s) ≤ max (sup ρt , sup ρt ) ≤ max (sup ρ0 , sup ρ0 ), (24.12)
On the other hand, Theorem 23.9 shows that (d/dt)(W2 (µt , µ t )2 /2)
is equal to the left-hand side of (24.15), for almost all t. We conclude
that
d+
t )2 ≤ −2Kλ W2 (µt , µ
W2 (µt , µ t )2 , (24.16)
dt
and the desired result follows.
Remark 24.13. Here is an alternative scheme of proof for Theorem
24.7(ii). The problem is to estimate
∇U (ρt ), ∇ψ dµt + ρt ), ∇ψ d
∇U ( µt .
Short-time behavior
A popular and useful topic in the study of diffusion processes consists
in establishing regularization estimates in short time. Typically, a
certain functional used to quantify the regularity of the solution (for
instance, the supremum of the unknown or some Lebesgue or Sobolev
norm) is shown to be bounded like O(t−κ ) for some characteristic expo-
nent κ, independent of the initial datum (or depending only on certain
weak estimates on the initial datum), when t > 0 is small enough. Here
I shall present some slightly unconventional estimates of this type.
Theorem 24.16 (Short-time regularization for gradient flows).
Let M be a Riemannian manifold satisfying a curvature-dimension
bound CD(K, ∞), K ∈ R; let ν = e−V vol ∈ P2 (M ), with V ∈ C 4 (M ),
and let U ∈ DC∞ with U (1) = 0. Further, let (µt )t≥0 be a smooth
solution of (24.1). Then:
(i) If K ≥ 0 then for any t ≥ 0,
t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 ≤ W2 (µ0 , ν)2 .
In particular,
W2 (µ0 , ν)2
Uν (µt ) ≤ , (24.18)
2t
W2 (µ0 , ν)2
IU,ν (µt ) ≤ . (24.19)
t2
(ii) If K ≥ 0 and t ≥ s > 0, then
0
W2 (µs , µt ) ≤ min 2 Uν (µs ) |t − s|, IU,ν (µs ) |t − s| (24.20)
|t − s| |t − s|
≤ W2 (µ0 , ν) min √ , . (24.21)
s s
(iii) If K < 0, the previous conclusions become
e2Ct W2 (µ0 , ν)2 e2Ct W2 (µ0 , ν)2
Uν (µt ) ≤ ; IU,ν (µt ) ≤ ;
2t t2
0
W2 (µs , µt ) ≤ e Ct
min |t − s|, IU,ν (µs ) |t − s|
2 Uν (µs )
|t − s| |t − s|
≤ e2Ct W2 (µ0 , ν) min √ , ,
s s
with C = −K.
706 24 Gradient flows II: Qualitative properties
Remark 24.21. I would bet that the estimates in (24.22) are optimal
in general (although they would deserve more thinking) as far as the
dependence on µ0 and t is concerned. On the other hand, if µ0 is given,
these bounds are terrible estimates for the short-time behavior of the
Kullback and Fisher informations as functions of just t. Indeed, the
correct scale for the Kullback information Hν (µt ) is O(log(1/t)), and
for the Fisher information it is O(1/t), as can be checked easily in the
particular case when M = Rn and ν is the Gaussian measure.
d+
W2 (µt , ν)2 ≤ −2 Uν (µt ). (24.25)
dt
Now introduce
d+ ψ
≤ [a (t) − b(t)] IU,ν (µt ) + [b (t) − 2c(t)] Uν (µt ) + c (t) W2 (µt , ν)2 .
dt
If we choose
d+
W2 (µs , µt )2 ≤ 2 [Uν (µs ) − Uν (µt )] ≤ 2 Uν (µs ).
dt
So
W2 (µs , µt )2 ≤ 2 Uν (µs ) |t − s|. (24.27)
Then (ii) results from the combination of (24.26) and (24.27), together
with (i).
708 24 Gradient flows II: Qualitative properties
The proof of (iii) is pretty much the same, with the following mod-
ifications:
d IU,ν (µt )
≤ (−2K) IU,ν (µt );
dt
d+
W2 (µt , ν)2 ≤ −2 Uν (µt ) + (−2K) W2 (µt , ν)2 ;
dt
ψ(t) := e2Kt t2 IU,ν (µt ) + 2t Uν (µt ) + W2 (µt , ν)2 .
Details are left to the reader. (The estimates in (iii) can be somewhat
refined.)
Uν (µ0 )
IU,ν (µt ) ≤ .
t
Remark 24.23. There are many known regularization results in short
time, for certain of the gradient flows considered in this chapter. The
two most famous examples are:
• the Li–Yau estimates, which give lower bounds on ∆ log ρt , for
a solution of the heat equation on a Riemanian manifold, under
certain curvature-dimension conditions. For instance, if M satisfies
CD(0, N ), then
N
∆ log ρt ≥ − ;
2t
• the Aronson–Bénilan estimates, which give lower bounds on
∆ρm−1
t for solutions of the nonlinear diffusion equation ∂t ρ = ∆ρm
in R , where 1 − 2/n < m < 1:
n
m n
∆(ρm−1 )≥− , λ = 2 − n(1 − m).
m−1 t
λt
There is an obvious similarity between these two estimates, and both
can be interpreted as a lower bound on the rate of divergence of the
vector field which drives particles in the gradient flow interpretation of
these partial differential equations. I think it would be very interesting
to have a unified proof of these inequalities, under certain geometric
conditions. For instance one could try to use the gradient flow interpre-
tation of the heat and nonlinear diffusion equations, and maybe some
localization by restriction.
Bibliographical notes 709
Bibliographical notes
In [669], Otto advocated the use of his formalism both for the purpose
of finding new schemes of proof, and for giving a new understanding of
certain results.
What I call the Fokker–Planck equation is
∂µ
= ∆µ + ∇ · (µt ∇V ).
∂t
This is in fact an equation on measures. It can be recast as an equation
on functions (densities):
∂ρ
= ∆ρ − ∇V · ∇ρ.
∂t
From the point of view of stochastic processes, the relation between
these two formalisms is the following: µt can be thought
√ of as law (Xt ),
where Xt is the stochastic process defined by dXt = 2 dBt −∇V (Xt ) dt
(Bt = standard Brownian motion on the manifold), while ρt (x) is de-
fined by the equation ρt (x) = E x ρ0 (Xt ) (the subscript x means that
the process Xt starts at X0 = x). In the particular case when V is a
quadratic potential in Rn , the evolution equation for ρt is often called
the Ornstein–Uhlenbeck equation.
The observation that the Fisher information Iν is the time-derivative
of the entropy functional −Hν along the heat semigroup seems to first
appear in a famous paper by Stam [758] at the end of the fifties, in the
case M = R (equipped with the Lebesgue measure). Stam gives credit
to de Bruijn for that remark. The generalization appearing in Theorem
24.2(ii) has been discovered and rediscovered by many authors.
Theorem 24.2(iii) goes back to Bakry and Émery [56] for the case
U (r) = r log r. After many successive generalizations, the statement as
I wrote it was formally derived in [577, Appendix D]. To my knowledge,
the argument given in the present chapter is the first rigorous one to
be written down in detail (modulo the technical justifications of the
integrations by parts), although it is a natural expansion of previous
works.
Theorem 24.2(iv) was proven by Otto and myself [671] for σ = µ0 .
The case σ = ν is also useful and was considered in [219].
Regularity theory for porous medium equations has been the object
of many works, see in particular the synthesis works by Vázquez [804,
805, 806]. When one studies nonlinear diffusions by means of optimal
710 24 Gradient flows II: Qualitative properties
transport theory, the regularity theory is the first thing to worry about.
In a Riemannian context, Demange [291, 292, 290, 293] presents many
approximation arguments based on regularization, truncation, etc. in
great detail. Going into these issues would have led me to considerably
expand the size of this chapter; but ignoring them completely would
have led to incorrect proofs.
It has been known since the mid-seventies that logarithmic Sobolev
inequalities yield rates of convergence to equilibrium for heat-like equa-
tions, and that these estimates are independent of the dimension. For
certain problems of convergence to equilibrium involving entropy, log-
arithmic Sobolev inequalities are quite more convenient than spectral
tools. This is especially true in infinite dimension, although logarithmic
Sobolev inequalities are also very useful in finite dimension. For more
information see the bibliographical notes of Chapter 21.
As recalled in Remark 24.11, convergence in the entropy sense im-
plies convergence in total variation. In [220] various functional methods
leading to convergence in total variation are examined and compared.
Around the mid-nineties, Toscani [784, 785] introduced the loga-
rithmic Sobolev inequality in kinetic theory, where it was immediately
recognized as a powerful tool (see e.g. [300]). The links between log-
arithmic Sobolev inequalities and Fokker–Planck equations were re-
investigated by the kinetic theory community, see in particular [43]
and the references therein. The emphasis was more on proving loga-
rithmic Sobolev inequalities thanks to the study of the convergence to
equilibrium for Fokker–Planck equations, than the reverse. So the key
was the study of convergence to equilibrium in the Fisher information
sense, as in Chapter 25; but the final goal really was convergence in
the entropy sense. To my knowledge, it is only in a recent study of
certain algorithms based on stochastic integration [549], that conver-
gence in the Fisher information sense in itself has been found useful.
(In this work some constructive criteria for exponential convergence in
Fisher information are given; for instance this is true for the heat equa-
tion ∂t ρ = ∆ρ, under a CD(K, ∞) bound (K < 0) and a logarithmic
Sobolev inequality.)
Around 2000, it was discovered independently by Otto [669], Carrillo
and Toscani [215] and Del Pino and Dolbeault [283] that the same
“information-theoretical” tools could be used for nonlinear equations
of the form
∂ρ
= ∆ρm (24.28)
∂t
Bibliographical notes 711
slightly stronger than the one which I derived in Theorem 24.7 and
Remark 24.12, but the asymptotic rate is the same.
All the methods described before apply to the study of the time
asymptotics of the porous medium equation ∂t ρ = ∆ρm , but only under
the restriction m ≥ 1 − 1/N . In that regime one can use time-rescaling
and tools similar to the ones described in this chapter, to prove that
the solutions become close to Barenblatt’s self-similar solution.
When m < 1−1/N , displacement convexity and related tricks do not
apply any more. This is why it was rather a sensation when Carrillo and
Vázquez [217] applied the Aronson–Bénilan estimates to the problem
of asymptotic behavior for fast diffusion equations with exponents m
in (1 − N2 , 1 − N1 ), which is about the best that one can hope for, since
Barenblatt profiles do not exist for m ≤ 1 − 2/N .
Here we see the limits of Otto’s formalism: such results as the di-
mensional refinement of the rate of convergence for diffusive equations
(Remark 24.10), or the Carrillo–Vázquez estimates, rely on inequalities
of the form
p(ρ) Γ2 ∇U (ρ) dν + p2 (ρ) (LU (ρ))2 dν ≥ . . .
in which ones takes advantage of the fact that the same function ρ
appears in the terms p(ρ) and p2 (ρ) on the one hand, and in the terms
∇U (ρ) and LU (ρ) on the other. The technical tool might be changes of
variables for the Γ2 (as in [541]), or elementary integration by parts (as
in [217]); but I don’t see any interpretation of these tricks in terms of
the Wasserstein space P2 (M ).
The story about the rates of equilibration for fast diffusion equa-
tions does not end here. At the same time as Carrillo and Vázquez
obtained their main results, Denzler and McCann [298, 299] computed
the spectral gap for the linearized fast diffusion equations in the same
interval of exponents (1 − N2 , 1 − N1 ). This study showed that the rate
of convergence obtained by Carrillo and Vázquez is off the value sug-
gested by the linearized analysis by a factor 2 (except in the radially
symmetric case where they obtain the optimal rate thanks to a compar-
ison method). The connection between the nonlinear and the linearized
dynamics is still unclear, although some partial results have been ob-
tained by McCann and Slepčev [619]. More recently, S.J. Kim and Mc-
Cann [517] have derived optimal rates of convergence for the “fastest”
nonlinear diffusion equations, in the range 1−2/N < m ≤ 1−2/(N +2),
by comparison methods involving Newtonian potentials. Another work
714 24 Gradient flows II: Qualitative properties
by Cáceres and Toscani [183] also recovers some of the results of Den-
zler and McCann by means of completely different methods with their
roots in kinetic theory. There is still ongoing research to push the rates
of convergence and the range of admissible nonlinearities, in particular
by Denzler, Koch, McCann and probably others.
In dimension 2, the limit case m = 0 corresponds to a logarithmic
diffusion; it is related to geometric problems, such as the evolution of
conformal surfaces or the Ricci flow [806, Chapter 8].
More general nonlinear diffusion equations of the form ∂t ρ =
∆p(ρ) have been studied by Biler, Dolbeault and Esteban [119], and
Carrillo, Di Francesco and Toscani [210, 211] in Rn . In the latter work
the rescaling procedure is recast in a more geometric and physical
interpretation, in terms of temperature and projections; a sequel by
Carrillo and Vázquez [218] shows that the intermediate asymptotics
can be complicated for well-chosen nonlinearities. Nonlinear diffusion
equations on manifolds were also studied by Demange [291] under a
CD(K, N ) curvature-dimension condition, K > 0.
Theorem 24.7(ii) is related to a long tradition of study of contraction
rates in Wasserstein distance for diffusive equations [231, 232, 458, 662].
Sturm and Renesse [764] noted that such contraction rates character-
ize nonnegative Ricci curvature; Sturm [761] went on to give various
characterizations of CD(K, N ) bounds in terms of contraction rates for
possibly nonlinear diffusion equations.
In the one-dimensional case (M = R) there are alternative methods
to get contraction rates in W2 distance, and one can also treat larger
classes of models (for instance viscous conservation laws), and even ob-
tain decay in Wp for any p; see for instance [137, 212]. Recently, Brenier
found a re-interpretation of these one-dimensional contraction proper-
ties in terms of monotone operators [167]. Also the asymptotic behavior
of certain conservation laws has been analyzed in this way [208, 209]
(with the help of the strong “W∞ distance”!).
Another model for which contraction in W2 distance has been estab-
lished is the Boltzmann equation, in the particular case of a spatially
homogeneous gas of Maxwellian molecules. This contraction property
was discovered by Tanaka [644, 776, 777]; see [138] for recent work on
the subject. Some striking uniqueness results have been obtained by
Fournier and Mouhot [377, 379] with a related method (see also [378]).
To conclude this discussion about contraction estimates, I shall
briefly discuss some links with Perelman’s analysis of the backward
Bibliographical notes 715
where the infimum is taken over all C 1 paths γ : [t0 , t1 ] → M such that
γ(t0 ) = x and γ(t1 ) = y, and S(x, t) is the scalar curvature of M (evolv-
ing under backward Ricci flow) at point x and time t. As in Chapter 7,
this induces a Hamilton–Jacobi equation, and an action in the space of
measures. Then it is shown in [576] that H(µt ) − φt dµt is convex in
t along the associated displacement interpolation, where (φt ) is a so-
lution of the Hamilton–Jacobi equation. Other theorems in [576, 782]
deal with a variant of L0 in which some time-rescalings have been per-
formed. Not only do these results generalize the contraction property
of [620], but they also imply Perelman’s estimates of monotonicity of
the so-called W -entropy and reduced volume functionals (which were
an important tool in the proof of the Poincaré conjecture).
I shall now comment on short-time decay estimates. The short-time
behavior of the entropy and Fisher information along the heat flow
(Theorem 24.16) was studied by Otto and myself around 1999 as a
technical ingredient to get certain a priori estimates in a problem of hy-
drodynamical limits. This work was not published, and I was quite sur-
prised to discover that Bobkov, Gentil and Ledoux [127, Theorem 4.3]
had found similar inequalities and applied them to get a new proof of
the HWI inequality. Otto and I published our method [672] as a com-
ment to [127]; this is the same as the proof of Theorem 24.16. It can be
considered as an adaptation, in the context of the Wasserstein space, of
some classical estimates about gradient flows in Hilbert spaces, that can
be found in Brézis [171, Théorème 3.7]. The result of Bobkov, Gentil
and Ledoux is actually more general than ours, because these authors
seem to have sharp constants under CD(K, ∞) for all values of K ∈ R,
716 24 Gradient flows II: Qualitative properties
IU,ν (µ)
Uν (µ) − Uν (ν) ≤ .
2Kλ
Iν (µ)
∀µ ∈ P ac (M ), Hν (µ) ≤ .
2K
Sloppy proof of Theorem 25.1. By using Theorem 17.7(vii) and an ap-
proximation argument, we may assume that ρ is smooth, that U is
smooth on (0, +∞), that the solution (ρt )t≥0 of the gradient flow
∂ρ
= L p(ρt ),
∂t
starting from ρ0 = ρ is smooth, that Uν (µ0 ) is finite, and that t →
Uν (µt ) is continuous at t = 0.
For notational simplicity, let
dH(t)
= −I(t), I(t) ≤ I(0) e−2Kλ t .
dt
By Theorem 24.7(i)(a), H(t) → 0 as t → ∞. So
+∞ +∞
I(0)
H(0) = I(t) dt ≤ I(0) e−2Kλ t dt = ,
0 0 2Kλ
which is the desired result.
722 25 Gradient flows III: Functional inequalities
density ρ on M ,
1
A(ρ) dν ≤ ρ |∇U (ρ)|2 dν, (25.1)
M 2Kλ M
where
p(r)
λ = lim 1 .
r↓0 r1− N
Remark 25.4. For a given U , there might not necessarily exist a suit-
able A. For instance, if U = UN , it is only for N > 2 that we can
construct A.
Sloppy proof of Theorem 25.3. By density, we may assume that the den-
sity ρ0 of µ is smooth; we may also assume that A and U are smooth
on (0, +∞) (recall Proposition 17.7(vii)). Let (ρt )t≥0 be the solution of
the gradient flow equation
∂ρ
= ∇ · (ρ∇U (ρ)), (25.2)
∂t
and as usual µt = ρt ν. It can be shown that ρt is uniformly bounded
below by a positive number as t → ∞.
From log Sobolev to Talagrand, revisited 723
By Theorem 24.2(iii),
d 1
1− N
IU,ν (µt ) ≤ −2Kλ ρt |∇U (ρt )|2 dν. (25.3)
dt M
Further assume that the Cauchy problem associated with the gradient
flow ∂t ρ = L p(ρ) admits smooth solutions for smooth initial data.
Then, for any µ ∈ P2ac (M ), holds the inequality
The proofs in the present chapter were based on gradient flows of dis-
placement convex functionals, while proofs in Chapters 21 and 22 were
more directly based on displacement interpolation. How do these two
strategies compare to each other?
From a formal point of view, they are not so different as one may
think. Take the case of the heat equation,
∂ρ
= ∆ρ,
∂t
or equivalently
∂ρ
+ ∇ · ρ ∇(− log ρ) = 0.
∂t
The evolution of ρ is determined by the “vector field” ρ → (− log ρ),
in the space of probability densities. Rescale time and the vector field
itself as follows:
εt
ϕε (t, x) = −ε log ρ ,x .
2
726 25 Gradient flows III: Functional inequalities
∂ϕε |∇ϕε |2 ε
+ = ∆ϕε .
∂t 2 2
Passing to the limit as ε → 0, one gets, at least formally, the Hamilton–
Jacobi equation
∂ϕ |∇ϕ|2
+ = 0,
∂t 2
which is in some sense the equation driving displacement interpolation.
There is a general principle here: After suitable rescaling, the ve-
locity field associated with a gradient flow resembles the velocity field
of a geodesic flow. Here might be a possible way to see this. Take an
arbitrary smooth function U , and consider the evolution
need a control of Hess Uν in all directions. This might explain why there
is at present no displacement convexity analogue of Demange’s proof of
the Sobolev inequality (so far only weaker inequalities with nonsharp
constants have been obtained).
On the other hand, proofs based on displacement convexity are usu-
ally rather simpler, and more robust than proofs based on gradient
flows: no issues about the regularity of the semigroup, no subtle in-
terplay between the Hessian of the functional and the “direction of
evolution”. . .
In the end we can put some of the main functional inequalities dis-
cussed in these notes in a nice array. Below, “LSI” stands for “Logarith-
mic Sobolev inequality”; “T” for “Talagrand inequality”; and “Sob2 ”
for the Sobolev inequality with exponent 2. So LSI(K), T(K), HWI(K)
and Sob2 (K, N ) respectively stand for (21.4), (22.4) (with p = 2),
(20.17) and (21.8).
Theorem Gradient flow proof Displ. convexity proof
CD(K, ∞) ⇒ LSI(K) Bakry–Émery Otto–Villani
LSI(K) ⇒ T(K) Otto–Villani Bobkov–Gentil–Ledoux
CD(K, ∞) ⇒ HWI(K) Bobkov–Gentil–Ledoux Otto–Villani
CD(K, N ) ⇒ Sob2 (K, N ) Demange ??
Bibliographical notes
Stam used a heat semigroup argument to prove an inequality which
is equivalent to the Gaussian logarithmic Sobolev inequality in dimen-
sion 1 (recall the bibliographical notes for Chapter 21). His argument
was not completely rigorous because of regularity issues, but can be
repaired; see for instance [205, 783].
The proof of Theorem 25.1 in this chapter follows the strategy by
Bakry and Émery, who were only interested in the Particular Case 25.2.
These authors used a set of calculus rules which has been dubbed the
“Γ2 calculus”. They were not very careful about regularity issues, and
for that reason the original proof probably cannot be considered as
completely rigorous (in particular for noncompact manifolds, in which
regularity issues are not so innocent, even if the curvature-dimension
condition prevents the blow-up of the heat semigroup). However, re-
cently Demange [291] carried out complete proofs for much more deli-
cate situations, so there is no reason to doubt that the Bakry–Émery
728 25 Gradient flows III: Functional inequalities
9N rU (r) 1
rq (r) + q(r) ≥ q(r)2 , q(r) =
+ . (25.10)
4(N + 2) U (r) N
x x0
y z y0 z0
Fig. 26.1. The triangle on the left is drawn in X , the triangle on the right is drawn
on the model space R2 ; the lengths of their edges are the same. The thin geodesic
lines go through the apex to the middle of the basis; the one on the left is longer than
the one on the right. In that sense the triangle on the left is fatter than the triangle
on the right. If all triangles in X look like this, then X has nonnegative curvature.
(Think of a triangle as the belly of some individual, the belt being the basis, and
the neck being the apex; of course the line going from the apex to the middle of the
basis is the tie. The fatter the individual, the longer his tie should be.)
x x0
y z y0 z0
Bibliographical notes
∀t ∈ [0, 1], d(x, γt )2 ≥ (1−t) d(x, y)2 +t d(x, z)2 −S 2 t(1−t) d(y, z)2 .
The central question in this chapter is the following: What does it mean
to say that a metric-measure space (X , dX , νX ) is “close” to another
metric-measure space (Y, dY , νY )? We would like to have an answer
that is as “intrinsic” as possible, in the sense that it should depend
only on the metric-measure properties of X and Y.
So as not to inflate this chapter too much, I shall omit many proofs
when they can be found in accessible references, and prefer to insist on
the main stream of ideas.
Hausdorff topology
Fig. 27.1. In solid lines, the borders of the two sets X and Y; in dashed lines, the
borders of their enlargements. The width of enlargement is just sufficient that any
of the enlarged sets covers both X and Y.
dP (µ, ν) = inf r > 0; ∀C, µ[C] ≤ ν[C r] ]+r and ν[C] ≤ µ[C r] ]+r ,
where C stands for an arbitrary closed set, C r] is the set of all points
whose distance to C is no more than r, i.e. the union of all closed balls
B[x, r], x ∈ C.
The analogy between the two notions goes further: While the
Prokhorov distance can be defined in terms of couplings, the Hausdorff
distance can be defined in terms of correspondences. By definition, a
correspondence (or relation) between two sets X and Y is a subset R of
X × Y: if (x, y) ∈ R, then x and y are said to be in correspondence; it
is required that each x ∈ X should be in correspondence with at least
one y, and each y ∈ Y should be in correspondence with at least one x.
Then we have the two very similar formulas:
⎧
⎪
⎪ d (µ, ν) = inf r > 0; ∃ coupling (X, Y ) of (µ, ν);
⎪
⎪
P
⎪
⎨ P [d(X, Y ) > r] ≤ r ;
⎪
⎪ r > 0; ∃ correspondence R in X × Y;
⎪
⎪ d H (µ, ν) = inf
⎪
⎩ ∀(x, y) ∈ R, d(x, y) ≤ r .
The Gromov–Hausdorff distance 745
nothing in common? First one would like to say that two spaces which
are isometric really are the same. Recall the definition of an isometry:
If (X , d) and (X , d ) are two metric spaces, a map f : X → X is called
an isometry if:
(a) it preserves distances: for all x, y ∈ X , d (f (x), f (y)) = d(x, y);
(b) it is surjective: for any x ∈ X there is x ∈ X with f (x) = x .
An isometry is automatically injective, so it has to be a bijection,
and its inverse f −1 is also an isometry. Two metric spaces are said to
be isometric if there exists an isometry between them. If two spaces
X and X are isometric, then any statement about X which can be
expressed in terms of just the distance, is automatically “transported”
to X by the isometry.
This motivates the desire to work with isometry classes, rather than
metric spaces. By definition, an isometry class X is the set of all metric
spaces which are isometric to some given space X . Instead of “isometry
class”, I shall often write “abstract metric space”. All the spaces in a
given isometry class have the same topological properties, so it makes
sense to say of an abstract metric space that it is compact, or complete,
etc.
This looks good, but a bit frightening: There are so many metric
spaces around that the concept of abstract metric space seems to be
ill-posed from the set-theoretical point of view (just like there is no “set
of all sets”). However, things becomes much more friendly when one
realizes that any compact metric space, being separable, is isometric to
the completion of N for a suitable metric. (To see this, introduce a dense
sequence (xk ) in your favorite space X , and define d(k, ) = dX (xk , x ).)
Then we might think of an isometry class as a subset of the set of all
distances on N; this is still huge, but at least it makes sense from a
set-theoretical point of view.
Now the problem is to find a good distance on the set of abstract
compact metric spaces. The natural concept here is the Gromov–
Hausdorff distance, which is obtained by formally taking the quotient
of the Hausdorff distance by isometries: If (X , dX ) and (Y, dY ) are two
compact metric spaces, define
Representation by semi-distances
x R x ⇐⇒ d(x, x ) = 0.
Z = (X Y)/d := (X Y)/R
there is a triple of metric spaces (X1 , X2 , X3 ), all subspaces of a common
metric space (Z, dZ ), such that (X1 , X2 ) is isometric (as a coupling) to
(X1 , X2 ), and (X2 , X3 ) is isometric to (X2 , X3 ).
Z = (X1 X2 X3 )/d,
∀y ∈ Y ∃x ∈ X; d(f (x), y) ≤ ε.
In particular, dH (f (X ), Y) ≤ ε.
then dis (R) ≤ 3ε, and the left inequality in (27.3) follows by for-
mula (27.2). Conversely, if R is a relation with distortion η, then for
any ε > η one can define an ε-isometry f whose graph is included in
R: The idea is to define f (x) = y, where y is such that (x, y) ∈ R. (See
the comments at the end of the bibliographical notes.)
The symmetry between X and Y seems to have been lost in (27.3),
but this is not serious, because any approximate isometry admits an
approximate inverse: If f is an ε-isometry X → Y, then there is a
(4ε)-isometry f : Y → X such that for all x ∈ X , y ∈ Y,
dX f ◦ f (x), x ≤ 3ε, dY f ◦ f (y), y ≤ ε. (27.4)
Fig. 27.2. A very thin tire (2-dimensional manifold) is very close, in Gromov–
Hausdorff sense, to a circle (1-dimensional manifold)
Fig. 27.3. A balloon with a very small handle (not simply connected) is very close
to a balloon without handle (simply connected).
Remark 27.7. Keeping Remark 27.3 in mind, two spaces are close in
Gromov–Hausdorff topology if they look the same to a short-sighted
person (see Figure 27.2). I learnt from Lott the expression Mr. Magoo
topology to convey this idea.
Noncompact spaces
If one works in length spaces, as will be the case in the rest of this
course, the above definition does not seem so good because K () will in
general not be a strictly intrinsic (i.e. geodesic) length space: Geodesics
joining elements of K () might very well leave K () at some intermediate
time; so properties involving geodesics might not pass to the limit. This
is the reason for requirement (iii) in the following definition.
dis (Rk ) −→ 0;
Many theorems about metric spaces still hold true, after appropriate
modification, for converging sequences of metric spaces. Such is the
Functional analysis on Gromov–Hausdorff converging sequences 759
Remark 27.23. In the previous proposition, the fact that the maps
fk are approximate isometries is useful only in the noncompact case;
otherwise it just boils down to the compactness of P (X ) when X is
compact.
∀k ∈ N, νk [BR] (!k )] ≤ M.
Proof of Proposition 27.24. For any fixed R > 0, (fk )# νk [BR] (!)] =
νk [(fk )−1 (BR] (!))] ≤ νk [BR+εk ] (!k )] is uniformly bounded by M (R +1)
for k large enough. Since on the other hand BR] (!) is compact, we may
extract a subsequence in k such that (fk )# νk [BR] (!)] converges to some
finite measure νR in the weak-∗ topology of BR] (!). Then the result
follows by taking R = → ∞ and applying a diagonal extraction.
(The metric structure of X and Y has not disappeared since the infi-
mum is only over isometries.)
Both dGHP and dGP satisfy the triangle inequality, as can be checked
by a gluing argument again. (Now one should use both the metric and
the probabilistic gluing!) Then there is no difficulty in checking that
dGHP is a honest distance on classes of metric-measure isomorphisms,
with point of view (a). Similarly, dGP is a distance on classes of metric-
measure isomorphisms, with point of view (b), but now it is quite non-
trivial to check that [dGP (X , Y) = 0] =⇒ [X = Y]. I shall not insist on
this issue, for in the sequel I shall focus on point of view (a).
There are several variants of these constructions:
1. Use other distances on probability metrics. Essentially everybody
agrees on the Hausdorff distance to measure distances between sets,
but as we know, there are many natural choices of distances between
probability measures. In particular, one can replace the Prokhorov dis-
tance by the Wasserstein distance of order p, and thus obtain the
Gromov–Hausdorff–Wasserstein distance of order p:
The discussion of the previous section showed that one should be cau-
tious about which notion of convergence is used. However, whenever
they are available, doubling estimates, in the sense of Definition 18.1,
basically rule out the discrepancy between approaches (a) and (b)
above. The idea is that doubling prevents the formation of sharp spikes
as in Figure 27.4. This discussion is not so clearly made in the literature
that I know, so in this section I shall provide more careful proofs.
where 1
ΦR,D (δ) = max 8 δ, R (16 δ) log2 D + δ
Now let x ∈ X . There is at least one index j such that d(x, xj ) < 2r;
otherwise x would lie in the complement of the union of all the balls
B2r (xj ), and could be added to the family {xj }. So {xj } constitutes
a 2r-net in X , with cardinality at most N = Dn . This concludes the
proof.
After all this discussion I can state a precise definition of the notion of
convergence that will be used in the sequel for metric-measure spaces:
this is the measured Gromov–Hausdorff topology. It is associated
with the convergence of spaces as metric spaces and as measure spaces.
This concept can be defined quantitatively in terms of, e.g., the distance
dGHP and its variants, but I shall be content with a purely topological
(qualitative) definition. As in the case of the plain Gromov–Hausdorff
topology, there is a convenient reformulation in terms of approximate
isometries.
(fk )# νk −−−→ ν
k→∞
(fk )# νk −−−→ ν,
k→∞
Bibliographical notes
⎪
⎪
⎪
⎩D(X , Y) ≥ (1/2)3/2 (X , Y) 32 .
1
The alternative point of view in which one takes care of both the
metric and the measure was introduced by Fukaya [386]. This is the
one which was used by Lott and myself in our study of displacement
convexity in geodesic spaces [577].
The pointed Gromov–Hausdorff topology is presented for instance
in [174]; it has become very popular as a way to study tangent spaces
in the absence of smoothness. In the context of optimal transport, the
pointed Gromov–Hausdorff topology was used independently in [30,
Section 12.4] and in [577, Appendix A].
I introduced the definition of local Gromov–Hausdorff topology for
the purpose of these notes; it looks to me quite natural if one wants to
preserve the idea of pointing in a setting that might not necessarily be
Bibliographical notes 771
locally compact. This is not such a serious issue and the reader who does
not like this notion can still go on with the rest of these notes. Still, let
me recommend it as a natural concept to treat the Gromov–Hausdorff
convergence of the Wasserstein space over a noncompact metric space
(see Chapter 28).
The statement of completeness of the Gromov–Hausdorff space ap-
pears in Gromov’s book [438, p. 78]. Its proof can be found, e.g., in
Fukaya [387, Theorem 1.5], or in the book by Petersen [680, Chapter 10,
Proposition 1.7].
The theorem briefly alluded to in the end of Remark 27.8 states the
following: If M is an n-dimensional compact Riemannian manifold, and
(Mk )k∈N is a sequence of n-dimensional compact Riemannian manifolds
converging to M , with uniform upper and lower bounds on the sectional
curvatures, and a volume which is uniformly bounded below, then Mk
is diffeomorphic to M for k large enough. This result is due to Gromov
(after precursors by Shikata); see, e.g., [387, Chapter 3] for a proof and
references.
The Gromov–Hausdorff topology is not the only one used to com-
pare Riemannian manifolds; for instance, some authors have defined a
“spectral distance” based on properties of the heat kernel [97, 492, 508,
509].
I shall conclude with some technical remarks.
The theorem according to which a distance-preserving map X → X
is a bijection if X is compact can be found in [174, Theorem 1.6.14].
Proposition 27.20 appears, in a form which is not exactly the one
that I used, but quite close to, in [436, p. 66] and [443, Appendix A]. The
reader should have no difficulty in adapting the statements there into
Proposition 27.20 (or redo the proof of the Ascoli theorem). Proposition
27.22 is rather easy to prove, and anyway in the next chapter we shall
prove some more complicated related theorems.
That a locally compact geodesic space is automatically boundedly
compact is part of a generalized version of the Hopf–Rinow theorem
[174, Theorem 2.5.28].
Finally, the construction of approximate isometries from correspon-
dences, as performed in [174], uses the full axiom of choice (on p. 258:
“For each x, choose f (x) ∈ Y ”). So I should sketch a proof which does
not use it. Let R be a correspondence with distortion η; the problem
is to construct an ε-isometry f : X → Y for any ε > η. Let D be a
countable dense subset in X . Choose δ so small that 2δ + η < ε. Cover
X by finitely many disjoint sets Ak , such that each Ak is included in
772 27 Convergence of metric-measure spaces
some ball B(xk , δ), with xk ∈ D. Then for each x ∈ D choose f (x)
in relation with x. (This only uses the countable version of the axiom
of choice.) Finally, for each x ∈ Ak define f (x) = f (xk ). It is easy to
check that dis (f ) ≤ 2δ + η < ε.
(This axiomatic issue is also the reason why I work with approximate
inverses that are (4ε)-isometries rather than (3ε)-isometries.)
28
Most of the objects that were introduced and studied in the context of
optimal transport on Riemannian manifolds still make sense on a gen-
eral metric-measure length space (X , d, ν), satisfying certain regularity
assumptions. I shall assume here that (X , d) is a locally compact,
complete separable geodesic space equipped with a σ-finite refer-
ence Borel measure ν. From general properties of such spaces, plus the
results in Chapters 6 and 7:
• the cost function c(x, y) = d(x, y)2 is associated with the coercive
Lagrangian action A(γ) = L(γ)2 , and minimizers are constant-
speed, minimizing geodesics, the collection of which is denoted by
Γ (X );
and |v|(t, x) really is |γ̇ x,t |, that is, the speed at time t and position x.
Thus Definition 28.1 is consistent with the usual notions of kinetic
energy and speed field (speed = norm of the velocity).
and |v| = |v|(t, x) the associated speed field. Then, for each t ∈ (0, 1)
one can modify |v|(t, · ) on a µt -negligible set in such a way that for all
x, y ∈ X ,
C diam (X )
|v|(t, x) − |v|(t, y) ≤ d(x, y), (28.2)
t(1 − t)
Proof of Theorem 28.5. Let t be a fixed time in (0, 1). Let γ1 and γ2
be two minimizing geodesics in the support of Π, and let x = γ1 (t),
y = γ2 (t). Then by Theorem 8.22,
L(γ1 ) − L(γ2 ) ≤ C diam (X ) d(x, y). (28.3)
t(1 − t)
Let Xt be the union of all γ(t), for γ in the support of Π. For a given
x ∈ Xt , there might be several geodesics γ passing through x, but
(as a special case of (28.3)) they will all have the same length; define
|v|(t, x) to be that length. This is a measurable function, since it can
be rewritten
|v|(t, x) = L(γ) Π dγ γ(t) = x ,
Γ
w(x)−w(x )
= inf H d(x, y) + |v|(t, y) − inf H d(x , y ) + |v|(t, y )
y∈Xt y ∈Xt
= sup inf H d(x, y) − H d(x , y ) + |v|(t, y) − |v|(t, y )
y ∈Xt y∈Xt
≤ H sup inf d(x, y) − d(x , y ) + d(y, y )
y ∈Xt y∈Xt
≤ H sup d(x, y ) − d(x , y ) . (28.4)
y ∈Xt
But
d(x, y ) ≤ d(x, x ) + d(x , y ) ≤ d(x, x ) + d(x , y );
so (28.4) is bounded above by H d(x, x ).
To summarize: w coincides with |v|(t, · ) on Xt , and it satisfies the
same Hölder-1/2 estimate. Since µt is concentrated on Xt , w is also
an admissible density for εt , so we can take it as the new definition of
|v|(t, · ), and then (28.2) holds true on the whole of X .
The main goal of this section is the proof of the convergence of the
Wasserstein space P2 (X ), as expressed in the next statement.
Then
GH
P2 (Xk ) −−→ P2 (X ).
Moreover, if fk : Xk → X are approximate isometries, then the maps
(fk )# : P2 (Xk ) → P2 (X ), defined by (fk )# (µ) = (fk )# µ, are approxi-
mate isometries too.
The identity
d2 (f (x1 ), f (y1 ))2 − d1 (x1 , y1 )2
= d2 (f (x1 ), f (y1 )) − d1 (x1 , y1 ) d2 (f (x1 ), f (y1 )) + d1 (x1 , y1 ) ,
implies
d2 (f (x1 ), f (y1 ))2 − d1 (x1 , y1 )2 ≤ ε diam (X1 ) + diam (X2 ) .
2
W2 f# µ1 , f# µ1 ≤ W2 (µ1 , µ1 )2 + ε diam (X1 ) + diam (X2 ) , (28.7)
hence
0
W2 f# µ1 , f# µ1 ≤ W2 (µ1 , µ1 ) + ε diam (X1 ) + diam (X2 ) . (28.8)
Compactness of dynamical transference plans and related objects 779
+ 4ε (2 diam (X2 ) + ε). (28.10)
technical difficulty comes from the fact that ε-isometries, being in gen-
eral discontinuous, will not map geodesic paths into continuous paths.
So we shall be led to work on the horribly large space of measurable
paths [0, 1] → X . I shall daringly embed this space in the even much
larger space of probability measures on [0, 1] × X , via the identification
(iv) lim (fk )# εk,t = εt , in the weak topology of measures, for each
k→∞
t ∈ (0, 1).
Assume further that each Πk is an optimal dynamical transference
plan, for the square distance cost function. Then:
(v) For each t ∈ (0, 1), there is a choice of the speed fields |vk |
associated with the plans Πk , such that lim |vk | ◦ fk = |v|, in the
k→∞
uniform topology;
(vi) The limit Π is an optimal dynamical transference plan, so π is
an optimal transference plan and (µt )0≤t≤1 is a displacement interpo-
lation.
Proof of Theorem 28.9. The proof is quite technical, so the reader might
skip it at first reading and go directly to the last section of this chapter.
In a first step, I shall establish the compactness of the relevant objects,
and in a second step pass to the limit.
It will be convenient to regularize rough paths with the help of some
continuous mollifiers. For δ ∈ (0, 1/2), define
δ+s δ−s
ϕδ (s) = 2
1−δ≤s<0 + 2 10≤s≤δ (28.14)
δ δ
and
ϕδ+ (s) = ϕδ (s − δ), ϕδ− (s) = ϕδ (s + δ). (28.15)
Then supp ϕδ+ ⊂ [0, 2δ] and supp ϕδ− ⊂ [−2δ, 0]. These functions have
a graph that looks like a sharp “tent hat”; their integral (on the real
line) is equal to 1, and as δ → 0 they converge in the weak topology to
the Dirac mass δ0 at the origin.
In the sequel, fk : Xk → X will be an εk -isometry, and the sequence
εk goes to 0 as k → ∞.1
1
In this proof εk and εk,t stand for completely different objects, but I hope this is
not confusing. (As usual the choice of notation is a delicate exercise.)
782 28 Stability of optimal transport
Since all of the lengths d(γk (0), γk (1)) are uniformly bounded, we
conclude that there is a constant C such that for all t0 and s0 as above,
⎧
⎪ δ
⎪L
⎨ t0 →s0 (fk ◦ γk ) − |s0 − t 0 | Lδ
0→1 (fk ◦ γk ) ≤ C(δ + εk );
⎪
⎪
⎩Lδ (f ◦ γ ) ≤ C.
0→1 k k
δ
Lt0 →s0 (σ) − |s0 − t0 | Lδ0→1 (σ) ≤ C(δ + ε); Lδ0→1 (σ) ≤ C .
(28.16)
In particular,
Lδt0 →s0 (σ) ≤ C(|s0 − t0 | + δ). (28.18)
In Lemma 28.11 below it will be shown that, as a consequence
of (28.18), σ can be written as (Id , γ)# λ for some Lipschitz-continuous
curve γ : [0, 1] → X . Once that is known, the end of the proof of (a) is
straightforward: Since
Lδt0 →s0 (σ) = d γ(t0 ), γ(s0 ) + O(δ),
Proof of Lemma 28.11. First disintegrate σ with respect to its first mar-
ginal λ: There is a family (νt )0≤t≤1 , measurable as a map from [0, 1] to
P (X ) and unique up to modification on a set of zero Lebesgue measure
in [0, 1], such that
σ(dt dx) = νt (dx) dt.
The goal is to show that, up to modification of νt on a negligible set of
times t,
νt (dx) = δx=γ(t) ,
where γ is Lipschitz.
The argument will be divided into three steps. It will be convenient
to use W1 , the 1-Wasserstein distance.
788 28 Stability of optimal transport
where
⎧ 1 1
⎪
⎪
⎨F (t, s) =
δ
β(t0 , s0 ) ϕδ+ (t − t0 ) ϕδ− (s − s0 ) dt0 ds0
0 0
⎪
⎪
⎩Λ(t, s) = d(x, y) dνt (x) dνs (y).
X ×X
Now plug this back into (28.26) and let δ → 0 to conclude that
β(t, s) d(x, y) dνt (x) dt dνs (y) ds
X ×[0,1] X ×[0,1]
1 1
≤C β(t, s) |s − t| dt ds.
0 0
Noncompact spaces
Bibliographical notes
Theorem 28.6 is taken from [577, Section 4], while Theorem 28.13 is
an adaptation of [577, Appendix E]. Theorem 28.9 is new. (A part of
this theorem was included in a preliminary version of [578], and later
removed from that reference.)
The discussion about push-forwarding dynamical transference plans
is somewhat subtle. The point of view adopted in this chapter is the
following: when an approximate isometry f is given between two spaces,
use it to push-forward a dynamical transference plan Π, via (f ◦)# Π.
The advantage is that this is the same map that will push-forward the
measure and the dynamical plan. The drawback is that the resulting
object (f ◦)# Π is not a dynamical transference plan, in fact it may
not even be supported on continuous paths. This leads to the kind of
technical sport that we’ve encountered in this chapter, embedding into
probability measures on probability measures and so on.
Another option would be as follows: Given two spaces X and Y, with
an approximate isometry f : X → Y, and a dynamical transference plan
Π on Γ (X ), define a true dynamical transference plan on Γ (Y), which
is a good approximation of (f ◦)# Π. The point is to construct a recipe
which to any geodesic γ in X associates a geodesic S(γ) in Y that is
“close enough” to f ◦ γ. This strategy was successfully implemented
in the final version of [578, Appendix]; it is much simpler, and still
quite sufficient for some purposes. The example treated in [578] is the
stability of the “democratic condition” considered by Lott and myself;
but certainly this simplified version will work for many other stability
issues. On the other hand, I don’t know if it is enough to treat such
topics as the stability of general weak Ricci bounds, which will be
considered in the next chapter.
The study of the kinetic energy measure and the speed field occurred
to me during a parental meeting of the Crèche Le Rêve en Couleurs
Bibliographical notes 793
and
U (ρt ) dν ≤
ρ0 (x0 ) (K,N )
(1 − t) U (K,N )
β1−t (x0 , x1 ) π(dx1 |x0 ) ν(dx0 )
M ×M β1−t (x0 , x1 )
ρ1 (x1 ) (K,N )
+t U (K,N )
βt (x0 , x1 ) π(dx0 |x1 ) ν(dx1 ).
M ×M βt (x0 , x1 )
(29.3)
Here G(s, t) is the one-dimensional Green function of (16.6), KN,U is
(K,N )
defined by (17.10), and the distortion coefficients βt = βt are those
appearing in (14.61).
Which of these formulas should we choose for the extension to non-
smooth spaces? When K = 0, both inequalities reduce to just
U (ρt ) dν ≤ (1 − t) U (ρ0 ) dν + t U (ρ1 ) dν. (29.4)
µ = ρ ν + µs
Remark 29.3. Remark 17.27 applies here too: I shall often identify π
with the probability measure π(dx dy) = µ(dx) π(dy|x) on X × X .
For later use I record here two elementary lemmas about the func-
β
tionals Uπ,ν . The reader may skip them at first reading and go directly
to the next section.
β
First, there is a handy way to rewrite Uπ,ν (µ) when µs = 0:
where v(r)=U (r)/r, with the conventions U (0)/0 = U (0) ∈ [−∞, +∞),
U (∞)/∞=U (∞) ∈ (−∞, +∞], and ρ=0 on Spt µs .
Proof of Lemma 29.6. The identity is formally obvious if one notes that
ρ(x) π(dy|x) ν(dx) = π(dy|x) µ(dx) = π(dy dx); so all the subtlety lies
in the fact that in (29.7) the convention is U (0)/0 = 0, while in (29.8)
it is U (0)/0 = U (0). Switching between both conventions is allowed
because the set {ρ = 0} is anyway of zero π-measure.
β
Secondly, the functionals Uπ,ν (and the functionals Uν ) satisfy a
principle of “rescaled subadditivity”, which might at first sight seem
contradictory with the convexity property, but is not at all.
β
U Zj πj ,ν Zj µj ≥ Zj (UZj )βπj ,ν (µj ),
j
j
with equality if the measures µk are singular with respect to each other.
(K,N )
• In the case K > 0 and N < ∞, the coefficient β t (x, y) takes
the value +∞ if 0 < t < 1 and d(x, y) ≥ DK,N := π (N − 1)/K. In
that case the natural convention is ∞ U (r/∞) = U (0) r, in accordance
with Lemma 29.6. With this convention, Definition 29.8 implies that
1
This is one of the rare instances in this book where a result is used before it is
proven; but I think the resulting presentation is more clear and pedagogical.
Synthetic definition of the curvature-dimension bound 803
replaced by U .
also. So (29.13) also holds true with U
Synthetic definition of the curvature-dimension bound 805
β1−t (x0 , x1 )
In both cases, the upper bound belongs to L1 π(dx0 |x1 ) ν(dx1 ) , which
proves the claim.
Step 3: Behavior at infinity. Finally we consider the case of a
general U ∈ DCN , and approximate it at infinity so that it has the
desired behavior. The reasoning is pretty much the same as for Step 2.
Let us assume for instance N < ∞. By Proposition 17.7(iv), there is
a nondecreasing sequence (U )∈N , converging pointwise to U , with the
desired behavior at infinity, and U (∞) → U (∞). By Step 2,
(K,N ) (K,N )
β β
(U )ν (µt ) ≤ (1 − t) (U )π,ν
1−t t
(µ0 ) + t (U )π̌,ν (µ1 ),
Then we know that U (∞) → U (∞), so we may pass to the limit
in the second term of (29.15). To pass to the limit in the first term
by monotone convergence, it suffices to check that U (ρt ) is bounded
below, uniformly in , by a ν-integrable function. But this is true since,
for instance, U0 which is bounded below by an affine function of the
form r → − C(r + 1), C ≥ 0; so U (ρt ) ≥ − Cρt − C 1ρt >0 , and the
latter function is integrable since ρt has compact support.
Example
−V (x) 29.13. Let V be a continuous function Rn → R with
e dx < ∞, let ν(dx) = e−V (x) dx, and let d2 be the Euclid-
ean distance. Then the space (Rn , d2 , ν) satisfies the usual CD(K, ∞)
condition if V is C 2 and ∇2 V ≥ K In in the classical sense. It satis-
fies the weak CD(K, ∞) condition without any regularity assumption
on V , as soon as ∇2 V ≥ K In in the sense of distributions, which
means that V is K-convex. For instance, if V is merely convex, then
(Rn , d2 , ν) satisfies the weak CD(0, ∞) condition. To see this, note that
if µ(dx) = ρ(x) dx, then
Hν (µ) = ρ(x) log ρ(x) dx + ρ(x) V (x) dx = H(µ) + V dµ;
then the first term is always displacement convex, and the second is
displacement convex if V is convex (simple exercise).
β
Continuity properties of the functionals Uν and Uπ,ν
(ii) Uν (µ) = sup ϕ dµ − U ∗ (ϕ) dν; ϕ ∈ C(X ),
X X
1
U ≤ ϕ ≤ U (M ); M ∈ N .
M
Exercise 29.22. Use Property (ii) in the case U (r) = r log r to recover
the Csiszár–Kullback–Pinsker inequality (22.25). Hint: Take f to be
valued in {0, 1}, apply Csiszár’s two-point inequality
x 1−x
∀x, y ∈ [0, 1], x log + (1 − x) log ≥ 2 (x − y)2
y 1−y
To summarize: Uν (µε ) ≤ Uν (µ) for all ε > 0, and then the conclusion
follows.
If µ is not absolutely continuous, define
ρa,ε (x)= Kε (x, y) ρ(y) ν(dy); ρs,ε (x)= Kε (x, y) µs (dy),
Spt ν Spt ν
whence the conclusion. Finally, if U does not have a constant sign, this
means that there is r0 > 0 such that U (r) ≤ 0 for r ≤ r0 , and U (r) > 0
for r > r0 . Then:
• if r < B −1 r0 , then U+ (ar) = 0 for all a ≤ B and (29.19) is obviously
true;
• if r > B r0 , then U (ar) > 0 for all a ∈ [B −1 , B] and one estab-
lishes (29.20) as before;
• if B −1 r0 ≤ r ≤ B r0 , then U+ (ar) is bounded above for a ≤ B,
while r is bounded below, so obviously (29.19) is satisfied for some
well-chosen constant C.
Next, we may dismiss the case when the right-hand side of the in-
equality in (29.17) is +∞ as trivial; so we might assume that
ρ(x)
β(x, y) U+ π(dy|x) ν(dx)<+∞; U (∞) µs [X ] < +∞.
β(x, y)
(29.21)
814 29 Weak Ricci curvature bounds I: Definition and Stability
Step 2: Reduction to the case U (0) > −∞. The problem now
is to get rid of possibly very large negative values of U close to 0. For
any δ > 0, define Uδ (r) := max (U (δ), U (r)) and
r
Uδ (r) = Uδ (s) ds.
0
Since U is nondecreasing, Uδ converges monotonically to U ∈ L1 (0, r)
as δ → 0. It follows that Uδ (r) decreases to U (r), for all r > 0. Let us
check that all the assumptions which we imposed on U , still hold true
for Uδ . First, Uδ (0) = 0. Also, since Uδ is nondecreasing, Uδ is convex.
Finally, Uδ has polynomial growth; indeed:
• if r ≤ δ, then r (Uδ ) (r) = r U (δ) is bounded above by a constant
multiple of r;
• if r > δ, then r (Uδ ) (r) = r U (r), which is bounded (by assumption)
by C(U (r)+ + r), and this obviously is bounded by C(Uδ (r)+ + r),
for U ≤ Uδ .
The next claim is that the integral
ρ(x)
β(x, y) Uδ π(dy|x) ν(dx)
β(x, y)
makes sense and is not +∞. Indeed, as we have just seen, there is a
constant C such that (Uδ )+ ≤ C(U+ (r) + r). Then the contribution of
the linear part Cr is finite, since
ρ(x)
β(x, y) π(dy|x) ν(dx) = ρ(x) ν(dx) ≤ 1;
β(x, y)
and the contribution of C U+ (r) is also finite in view of (29.21). So
ρ(x)
β(x, y) (Uδ )+ π(dy|x) ν(dx) < +∞,
β(x, y)
which proves the claim.
Now assume that Theorem 29.20(iii) has been proved with Uδ in
place of U . Then, for any δ > 0,
ρε (x)
lim sup β(x, y) U πε (dy|x) ν(dx)
ε↓0 β(x, y)
ρε (x)
≤ lim sup β(x, y) Uδ πε (dy|x) ν(dx)
ε↓0 β(x, y)
ρ(x)
≤ β(x, y) Uδ π(dy|x) ν(dx).
β(x, y)
816 29 Weak Ricci curvature bounds I: Definition and Stability
and for the right-hand side the computation is similar, once one has
noticed that the first marginal of π is the weak limit of the first marginal
µε of πε , i.e. µ (as recalled in the First Appendix).
So in the sequel I shall assume that U ≥ 0.
Step 4: Treatment of the singular part. To take care of the
singular part, the reasoning is similar to the one already used in the
particular case β = 1: Write µ = ρ ν + µs , and
ρa,ε (x) = Kε (x, y) ρ(y) ν(dy); ρs,ε (x) = Kε (x, y) µs (dy).
X X
β
Uπ,ν (µε ) ≤ Uπ,ν
β
(ρa,ε ν) + U (∞) µs [X ].
β
In the next two steps I shall focus on the first term Uπ,ν (ρa,ε ν);
I shall write ρε for ρa,ε .
Step 5: Approximation of β. For any two points x, y in X , define
βε (x, y) = Kε (x, x) Kε (y, y) β(x, y) ν(dx) ν(dy).
X ×X
Now assume that β and β are bounded from above and below by pos-
itive constants, say B −1 ≤ β, β ≤ B, then by (29.19) there is C > 0,
depending only on B, such that
β U ρ − β U ρ ≤ C |β − β| U (ρ) + ρ .
β
β
Then
ρε (x)
βε (x, y) U πε (dy|x) ν(dx)
βε (x, y)
ρε (x)
− β(x, y) U πε (dy|x) ν(dx)
β(x, y)
ρε (x) ρε (x)
≤ βε (x, y) U − β(x, y) U πε (dy|x) ν(dx)
βε (x, y) β(x, y)
≤ C βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) πε (dy|x) ν(dx)
≤ C sup βε (x, y) − β(x, y) U (ρε (x)) + ρε (x) ν(dx)
x,y∈X
≤C sup βε (x, y) − β(x, y) U (ρ) + ρ dν,
x,y∈X
In particular,
ε (dx dy) = U (∞) µs (dx).
g(x, y) π
Spt µs
In particular,
% &
ρ(x)
sup g(x, y) dµ(x) = sup β(x, y) U dν(x) < +∞;
X y X y β(x, y)
in other words, g belongs to the vector space L1 (X , µ); C(X ) of
µ-integrable functions valued in the normed space C(X ) (equipped
with the norm of uniform convergence).
• If U (∞) < +∞ then g(x, · ) is also continuous for all x (it is iden-
tically equal to U (∞) if x ∈ Spt µs ), and supy g(x,y) ≤ U (∞) is
obviously µ(dx)-integrable; so the conclusion g ∈ L1 (X , µ); C(X )
still holds true.
In the sequel I shall drop the index k and write just Ψ for Ψk . It will
be useful in the sequel to know that Ψ (x, y) = U (∞) when x ∈ Spt µs ;
apart from that, Ψ might be any continuous function.
822 29 Weak Ricci curvature bounds I: Definition and Stability
(a)
Step 8: Variations on a regularization theme. Let π ε be the
ε . Explicitly,
contribution of ρ ν to π
ε (dx dy) =
π (a)
Kε (x, x) Kε (y, y) πε (dy|x) ρ(x) ν(dx) ν(dy) ν(dx).
Then let
π (a)
ε (dx dy) = Kε (x, x) Kε (y, y) ρ(x) ν(dx) πε (dy|x) ν(dy) ν(dx);
ε(a) (dx dy)
π = Kε (x, x) Kε (y, y) ρε (x) ν(dx) πε (dy|x) ν(dy) ν(dx).
ε −π
π (a) (a) T V
ε
≤ Kε (x, x) Kε (y, y) |ρ(x) − ρ(x)| πε (dy|x) ν(dx) ν(dy) ν(dx)
= Kε (x, x) |ρ(x) − ρ(x)| πε (dy|x) ν(dx) ν(dx)
= Kε (x, x) |ρ(x) − ρ(x)| ν(dx) ν(dx). (29.27)
ε −π
π (a) ε(a) T V −→ 0. (29.31)
Next,
ε −π
π (a) (a) T V
ε
≤ Kε (x, x) Kε (y, y) |ρε (x) − ρ(x)| πε (dy|x) ν(dx) ν(dy) ν(dx)
= Kε (x, x) |ρε (x) − ρ(x)| πε (dy|x) ν(dx) ν(dx)
= Kε (x, x) |ρε (x) − ρ(x)| ν(dx) ν(dx)
= |ρε (x) − ρ(x)| ν(dx)
= Kε (x, y) ρ(y) − ρ(x) ν(dy) ν(dx)
≤ Kε (x, y) |ρ(y) − ρ(x)| ν(dy) ν(dx),
πε(a) − π (a)
ε T V −→ 0, (29.32)
(a) (a)
πε −
By (29.31) and (29.32), πε T V −→ 0 as ε → 0. In particular,
Ψ dπε(a) − Ψ dπε(a) −−→ 0. (29.33)
ε↓0
Now let
ε(s) (dx dy)
π = Kε (x, x) Kε (y, y) πε (dy|x) ν(dx) ν(dy) µs (dx);
ε(s) (dx dy)
π = Kε (x, x) Kε (y, y) µs,ε (dx) πε (dy|x) ν(dy) ν(dx);
(a) (s)
ε (dx dy) = π
so that π ε (dx dy) + π
ε (dx dy). Further, define
while
πε(s) = U (∞) µs,ε [X ] = U (∞) µs [X ].
Ψ d
At
this point,
to finish the proof of the theorem it suffices to establish
πε −→ Ψ dπ; which is true if π
Ψ d ε converges weakly to π.
Step 9: Duality. Proving the convergence of π ε to π will be easy
because πε is a kind of regularization of πε , and it will be possible to
“transfer the regularization to the test function” by duality. Indeed:
ε (dx dy) = Ψ (x, y) Kε (x, x) Kε (y, y) πε (dy dx) ν(dy) ν(dx)
Ψ (x, y) π
= Ψε (x, y) πε (dy dx),
where
Ψε (x, y) = Ψ (x, y) Kε (x, x) Kε (y, y) ν(dy) ν(dx).
Now we have all the tools to prove the main result of this chapter: The
weak curvature-dimension bound CD(K, N ) passes to the limit. Once
again, the compact case will imply the general statement.
(K,N )
where βt is given by (14.61) (with the distance dk ) and πεk is an
optimal coupling associated with (µkε,t )0≤t≤1 . This means that for each
ε ∈ (0, 1) and k ∈ N there is a dynamical optimal transference plan Πεk
such that
µkε,t = (et )# Πεk , πεk = (e0 , e1 )# Πεk ,
where et is the evaluation at time t.
By Theorem 28.9, up to extraction of a subsequence in k, there is a
dynamical optimal transference plan Πε on Γ (X ) such that, as k → ∞,
⎧
⎪
⎪ (f ◦) Π k −→ Πε weakly in P (P ([0, 1] × X ));
⎨ k # ε
(fk , fk )# πε −→ πε
k weakly in P (X × X );
⎪
⎪
⎩ sup W2 (fk )# µε,t , µε,t −→ 0;
k
0≤t≤1
where
µε,t = (et )# Πε , πε = (e0 , e1 )# Πε .
Each curve (µε,t )0≤t≤1 is D-Lipschitz with D = diam (X ). By As-
coli’s theorem, from ε ∈ (0, 1) we may extract a subsequence (still
denoted ε for simplicity) such that
sup W2 µε,t , µt −−−→ 0, (29.40)
0≤t≤1 ε→0
Inequalities (29.42) and (29.43) take care of the left-hand side of (29.39):
Uν (µt ) ≤ lim inf lim inf Uνk (µkε,t ). (29.44)
ε→0 k→∞
and since ρkε,0 and U are continuous, β(x0 , x1 ) U (ρkε,0 (x0 )/β(x0 , x1 ))
is uniformly close to β(fk (x0 ), fk (x1 )) U (ρkε,0 (x0 )/β(fk (x0 ), fk (x1 ))) as
k → ∞. So
ρkε,0 (x0 )
lim β(x0 , x1 ) U πεk (dx1 |x0 ) νk (dx0 )
k→0 β(x0 , x1 )
ρkε,0 (x0 )
− β(fk (x0 ), fk (x1 )) U πεk (dx1 |x0 ) νk (dx0 ) = 0.
β(fk (x0 ), fk (x1 ))
(29.45)
(Of course, in the second integral β is computed with the distance d,
while in the first integral it is computed with the distance dk .)
Let v(r) = U (r)/r (this is a continuous function if v(0) = U (0));
by Lemma 29.6,
ρkε,0 (x0 )
β fk (x0 ), fk (x1 ) U πεk (dx1 |x0 ) νk (dx0 )
β fk (x0 ), fk (x1 )
ρkε,0 (x0 )
= v πεk (dx0 dx1 )
β fk (x0 ), fk (x1 )
ρε,0 (y0 ) k
= v k β(y , y )
d (f k , fk )# πε (y0 , y1 ).
Zε,0 0 1
(29.46)
830 29 Weak Ricci curvature bounds I: Definition and Stability
Similarly,
(K,N )
β βt
lim sup lim sup Uπ̌kt ,ν (µkε,1 ) ≤ Uπ̌,ν (µ1 ). (29.49)
ε k
ε↓0 k→∞
where N > 1 (recall Remark 17.31) to deduce that for any two proba-
bility measures µk0 , µk1 on Xk , there is a Wasserstein geodesic (µkt )t∈[0,1]
and an associated coupling π k such that
(K,N ) (K,N )
β β
Uνk (µkt ) ≤ (1 − t) Uπk1−t k t k
,νk (µ0 ) + t Uπ̌k ,νk (µ1 ).
Then the same proof as before shows that for any two probability mea-
sures µ0 , µ1 on X , there is a Wasserstein geodesic (µt )t∈[0,1] and an
associated coupling π such that
(K,N ) (K,N )
β β
Uν (µt ) ≤ (1 − t) Uπ,ν
1−t t
(µ0 ) + t Uπ̌,ν (µ1 ).
It remains to pass to the limit as N ↓ 1 to get
(K,1) (K,1)
β β
Uν (µt ) ≤ (1 − t) Uπ,ν
1−t t
(µ0 ) + t Uπ̌,ν (µ1 ).
If K > 0, 1 < N < ∞ and sup diam (Xk ) = DK,N , then we can
apply a similar reasoning, introducing again the bounded coefficients
(K,N )
βt for N > N and then passing to the limit as N ↓ N .
This concludes the proof of Theorem 29.24.
Remark 29.27. What the previous proof really shows is that under
certain assumptions the property of distorted displacement convexity
is stable under measured Gromov–Hausdorff convergence. The usual
displacement convexity is a particular case (take βt ≡ 1).
Proof of Theorem 29.25. Let !k (resp. !) be the reference point in Xk
(resp. X ). The same arguments as in the proof of Theorem 29.24 will
work here, provided that we can localize the problem. So pick up com-
pactly supported probability measures µ0 and µ1 on X , and define µkε,t0
(t0 = 0, 1) in exactly the same way as in the proof of Theorem 29.24.
Let R > 0 be such that the supports of ρ0 and ρ1 are included in BR] (!);
then for k large enough and ε small enough, the supports of µkε,0 and
µkε,1 are contained in BR+1] (!k ). So a geodesic which starts from the
support of µkε,0 and ends in the support of µkε,1 will necessarily have its
image included in B2(R+1)] (!k ); thus each measure µkε,t has its support
included in B2(R+1)] (!k ).
From that point on, the very same reasoning as in the proof of
Theorem 29.24 can be applied, since, say, the ball B2(R+2)] (!k ) in
Xk converges in the measured Gromov–Hausdorff topology to the ball
B2(R+2)] (!) in X , etc.
832 29 Weak Ricci curvature bounds I: Definition and Stability
Remark 29.29. All the interest of this theorem lies in the fact that
the measured Gromov–Hausdorff convergence is a very weak notion of
convergence, which does not imply the convergence of the Ricci tensor.
C
(iii) Kε (x, y) ≤ .
ν[Bε (x)]
(Here C is another numerical constant, depending on the doubling con-
stant of ν.) Assumptions (i), (ii) and (iii), together with the doubling
property of ν, and classical Lebesgue density theory, guarantee that for
any f ∈ L1 (Y) the convergence of Kε f to f holds not only in L1 (Y)
but also almost everywhere.
Proof of Lemma 29.36. Let us first treat the case when Y is just a point.
Then the first part of the statement of Lemma 29.36 is just the density
of C(X ) in L1 (X , µ), which is a classical result. To prove the second
part of the statement, let f ∈ L1 (µ), let K be a compact subset of X ,
let h ∈ C(K) such that f = h on K, and let ε > 0. Let ψ ∈ Cc (X \ K)
be such that ψ − f L1 (X \K,µ) ≤ ε.
Since µ and f ν are regular, there is an open set Oε containing K
such that µ[Oε \ K] ≤ ε/(sup |ψ| + sup |h|) and f − hL1 (Oε ) ≤ ε.
Second Appendix: Separability of L1 (X ; C(Y)) 837
The sets Oε and X \ K are open and cover X , so there are continuous
functions χ and η, defined on X and valued in [0, 1], such that χ+η = 1,
χ is supported in Oε and η is supported in X \ K. (In particular χ ≡ 1
in K.) Let
Ψ = h χ + ψ η.
Then Ψ coincides with h (and therefore with f ) in K, Ψ is continuous,
and
Ψ − f L1 (X ) ≤ f − hL1 (Oε ) + (sup |h|) χ − 1L1 (Oε )
+ (sup |ψ|) η − 1L1 (X \K) + ψ − f L1 (X \K)
≤ ε + (sup |h| + sup |ψ|) µ[Oε \ K] + ε
≤ 3 ε.
Since ε is arbitrarily small, this concludes the proof.
Now let us turn to the case when Y is any compact set. For any
δ > 0 there are L = L(δ) ∈ N and {y1 , . . . , yL } such that the open
balls Bδ (y ) (1 ≤ ≤ L) cover Y. Let (ζ )1≤≤L be a partition of unity
subordinated to that covering: As in the previous Appendix, these are
continuous functions satisfying 0 ≤ ζ ≤ 1, Spt(ζ ) ⊂ Bδ (y ), and
ζ = 1 on Y. Let η > 0.
For each ∈ N, the function f ( · , y ) is µ-integrable, so there exists
ψ ∈ C(X ) such that
f (x, y ) − ψ (x) dµ(x) ≤ η.
So
sup |f (x, y) − Ψ (x, y)| µ(dx)
X
y
≤ sup |f (x, z) − f (x, z )| µ(dx)
X d(z,z )≤δ
+ |f (x, y ) − ψ (x)| µ(dx)
X
≤ mδ (x) µ(dx) + L(δ) η, (29.51)
X
where
Bibliographical notes
Here are some (probably too lengthy) comments about the genesis
of Definition 29.8. It comes after a series of particular cases and/or
variants studied by Lott and myself [577, 578] on the one hand,
Sturm [762, 763] on the other. To summarize: In a first step, Lott
and I [577] treated CD(K, ∞) and CD(0, N ), while Sturm [762] inde-
pendently treated CD(K, ∞). These cases can be handled with just
Bibliographical notes 839
2
As a matter of fact, I was working on precisely this problem when my left lung
collapsed, earning me a one-week holiday in hospital with unlimited amounts of
pain-killers.
840 29 Weak Ricci curvature bounds I: Definition and Stability
should be noted that Sturm manages to prove the stability of his defin-
ition without using this upper semicontinuity explicitly; but this might
be due to the particular form of the functions U that he is considering,
and the assumption of absolute continuity.
The treatment of noncompact spaces here is not exactly the same
as in [577] or [763]. In the present set of notes I adopted a rather weak
point of view in which every “noncompact” statement reduces to the
compact case; in particular in Definition 29.8 I only consider compactly
supported probability densities. This leads to simpler proofs, but the
treatment in [577, Appendix E] is more precise in that it passes to the
limit directly in the inequalities for probability measures that are not
compactly supported.
Other tentative definitions have been rejected for various reasons.
Let me mention four of them:
(i) Imposing the displacement convexity inequality along all dis-
placement interpolations in Definition 29.8, rather than along some
displacement interpolation. This concept is not stable under measured
Gromov–Hausdorff convergence. (See the last remark in the concluding
chapter.)
(ii) Replace the integrated displacement convexity inequalities by
pointwise inequalities, in the style of those appearing in Chapter 14.
For instance, with the same notation as in Definition 29.8, one may
define
ρ0 (x)
Jt (γ0 ) := ,
E ρt (γt )|γ0
where γ is a random geodesic with law (γt ) = µt , and ρt is the absolutely
continuous part of µt with respect to ν. Then J is a continuous function
of t, and it makes sense to require that inequality (29.1) be satisfied
ν(dx)-almost everywhere (as a function of t, in the sense of distrib-
utions). This notion of a weak CD(K, N ) space makes perfect sense,
and is a priori stronger than the notion discussed in this chapter. But
there is no evidence that it should be stable under measured Gromov–
Hausdorff convergence. Integrated convexity inequalities enjoy better
stability properties. (One might hope that integrated inequalities lead
to pointwise inequalities by a localization argument, as in Chapter 19;
but this is not obvious at all, due to the a priori nonuniqueness of
displacement interpolation in a nonsmooth context.)
(iii) Choose inequality (29.2) as the basis for the definition, instead
of (29.3). In the case K < 0, this inequality is stable, due to the convex-
Bibliographical notes 841
ity of −r1−1/N , and the a priori regularity of the speed field provided by
Theorem 28.5. (This was actually my original motivation for Theorem
28.5.) In the case K > 0 there is no reason to expect that the inequal-
ity is stable, but then one can weaken even more the formulation of
CD(K, N ) and replace it by
less sharp than the one based on distortion coefficients, but it may be
technically simpler.
A completely different approach to Ricci bounds in metric-measure
spaces has been under consideration in a work by Kontsevich and
Soibelman [528], in relation to Quantum Field Theory, mirror sym-
metry and heat kernels; see also [756]. Kontsevich pointed out to me
that the class of spaces covered by this approach is probably strictly
smaller than the class defined here, since it does not seem to include
the normed spaces considered in Example 29.16; he also suggested that
this point of view is related to the one of Ollivier, described below.
To close this list, I shall evoke the recent independent contributions
by Joulin [494, 495] and Ollivier [662] who suggested defining the infi-
mum of the Ricci curvature as the best constant K in the contraction
inequality
W1 (Pt δx , Pt δy ) ≤ e−Kt d(x, y),
where Pt is the heat semigroup (defined on probability measures); or
equivalently, as the best constant K in the inequality
where Pt∗ is the adjoint of Pt (that is, the heat semigroup on func-
tions). Similar inequalities have been used before in concentration the-
ory [305, 595], and in the study of spectral gaps [231, 232] or large-time
convergence [310, 458, 679, 729]. In a Riemannian setting, this defini-
tion is justified by Sturm and von Renesse [764], who showed that one
recovers in this way the usual Ricci curvature bound, the key obser-
vation being that the parallel transport is close to be optimal in some
sense. (The distance W1 may be replaced by any Wr , 1 ≤ r < ∞.) This
point of view is natural when the problem includes a distinguished
Markov kernel; in particular, it leads to the possibility of treating dis-
crete spaces equipped with a random walk. Ollivier [662] has demon-
strated the geometric interest of this notion on an impressive list of
examples and applications, most of them in discrete spaces; and he
also derived geometric consequences such as concentration of measure.
The dimension can probably be taken into account by using refined es-
timates on the rates of convergence. On the other hand, the stability of
this notion requires a stronger topology than just measured Gromov–
Hausdorff convergence: also the Markov process should converge. (The
topology explored in [97] might be useful.) To summarize:
844 29 Weak Ricci curvature bounds I: Definition and Stability
Elementary properties
The next proposition gathers some almost immediate consequences of
the definition of weak CD(K, N ) spaces. I shall say that a subset X of
a geodesic space (X , d) is totally convex if any geodesic whose endpoints
belong to X is entirely contained in X .
C. Villani, Optimal Transport, Grundlehren der mathematischen 847
Wissenschaften 338,
c Springer-Verlag Berlin Heidelberg 2009
848 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Displacement convexity
µ = ρ ν + µs .
βt
(K,N ) ρ(x) (K,N )
Uπ,ν (µ) := U (K,N )
βt (x, y) π(dy|x) ν(dx)
X ×X βt (x, y)
+ U (∞) µs [X ],
Proof of Theorem 30.4. The proof is the same as for Theorem 17.28
and Application 17.29, with only two minor differences: (a) ρ is not
necessarily a probability density, but still its integral is bounded above
by 1; (b) there is an additional term U (∞) µs [X ] ∈ R ∪ {+∞}.
The next theorem is the final goal of this section: It extends the
displacement convexity inequalities of Definition 29.8 to noncompact
situations.
λ(K, U ) t(1 − t)
Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ) − W2 (µ0 , µ1 )2 ,
2
(30.6)
where
⎧
⎪
Kp(r) ⎨K p (0) if K > 0
λ(K, U ) = inf = 0 if K = 0 (30.7)
r>0 r ⎪
⎩
K p (∞) if K < 0.
These inequalities are the starting point for all subsequent inequal-
ities considered in the present chapter.
The proof of Theorem 30.5 will use two auxiliary results, which
generalize Theorems 29.20(i) and (iii) to noncompact situations. These
are definitely not the most general results of their kind, but they will
be enough to derive displacement convexity inequalities with a lot of
generality. As usual, I shall denote by M+ (X ) the set of finite (non-
negative) Borel measures on X , and by L1+ (X ) the set of nonnegative
ν-integrable measurable functions on X . Recall from Definition 6.8 the
notion of (weak) convergence in P2 (X ).
and for any sequence of probability measures (πk )k∈N such that the first
marginal of πk is µk , and the second one is µk,1 , the limits
⎧
⎪
⎪πk −−−→ π
⎪
⎪
⎪
⎨
k→∞
d(x, y) πk (dx dy) −−−→ d(x, y)2 π(dx dy)
2
⎪
⎪ k→∞
⎪
⎪
⎪
⎩µk,1 −−−→ µ1 in P2 (X )
k→∞
imply
lim Uπβk ,ν (µk ) = Uπ,ν
β
(µ).
k→∞
Proof of Theorem 30.6. First of all, we may reduce to the case when U
is valued in R+ , just replacing U by r → U (r) + c r. So in the sequel U
will be nonnegative.
Let us start with (i). Let z be an arbitrary base point, and let
(χR )R>0 be a z-cutoff as in the Appendix (that is, a family of cutoff
continuous functions that are identically equal to 1 on a ball BR (z) and
identically equal to 0 outside BR+1] (z)). For any R > 0, write
Uν (χR µ) = U (χR ρ) dν + U (∞) χR dµs .
In particular,
Uν (µ) = sup Uν (χR µ). (30.8)
R>0
and then we can apply Proposition 29.19(i) with the compact space
(BR+1] (z), ν), to get
Uν (χR µ) = sup ϕ χR dµ − U ∗ (ϕ) χR+1 dν;
X X
ϕ ∈ Cb BR+1] (z) , ϕ ≤ U (∞) .
Displacement convexity 855
Let ρ(R) be the density of the absolutely continuous part of µ(R) , and
(R)
µs be the singular part. It is obvious that ρ(R) converges to ρ in L1 (ν)
(R)
and µs [X ] → µs [X ] as R → ∞.
Next, define
⎧
⎪
⎪
⎪ (R)
1 − χR (y ) π(dy |x) δz ;
⎨π (dy|x) = χR (y) π(dy|x) +
⎪
⎪
⎪
⎩π (R) (dx dy) = µ(R) (dx) π (R) (dy|x).
(R)
shows that πk converges to π (R) as k → ∞, for any fixed R.
The plan is to first replace the original expressions by the expressions
with the superscript (R), and then to pass to the limit as k → ∞ for
fixed R. For that I will distinguish two cases.
856 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Similarly,1
(R)
β
U (R) (µk ) − Uπβk ,ν (µk ) ≤ C ρk − ρk L1 (ν) + µk,s [X ] − µk,s [X ]
(R) (R)
πk ,ν
1
+ 2 d(z, y)2 µk,1 (dy) . (30.10)
R
(R) (R) (R)
Note that for k ≥ R, ρk = ρ(R) , and µk,s = µs . Then in view of
the definition of µk and the fact that d(z, y)2 µk,1 (dy) is bounded, we
easily deduce from (30.9) and (30.10) that
⎧
⎪ β
⎪
⎨R→∞ π(R) ,ν
lim U (µ (R)
) − U β
π,ν (µ) = 0;
⎪
⎪ β (R)
⎩ lim lim sup U (R) (µk ) − Uπk ,ν (µk ) = 0.
β
R→∞ k→∞ πk ,ν
(R)
The interest of this reduction is that all probability measures µk (resp.
(R)
πk ) are now supported in a common compact set, namely the closed
(R)
ball B2R (resp. B2R × B2R ). Note that µk converges to µ(R) .
(R)
If k is large enough, µk = µ(R) , so (30.11) becomes
In the sequel, I shall drop the superscript (R), so the goal will be
The argument now is similar to the one used in the proof of Theorem
29.20(iii). Define
ρ(x) β(x, y)
g(x, y) = U ,
β(x, y) ρ(x)
with the convention that g(x, y) = U (∞) when x ∈ Spt µs , and
g(x, y) = U (0) when ρ(x) = 0 and x ∈
/ Spt µs . Then
1
Here again the notation might be confusing: µk,s stands for the singular part of
µk , while µk,1 is the second marginal of πk .
858 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Uπβk ,ν (µ) = g(x, y) µ(dx) πk (dy|x) = g(x, y) πk (dx dy);
Uπ,ν (µ) = g(x, y) π(dx dy).
Using ρ(R) ≤ 2ρ, log(1/β) ≤ C d(x, y)2 and reasoning as in the first
case, we can bound the above expression by
C ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
2
+ C ρ(R) (x) − ρ(x) 1 + d(x, y) π (R) (dy|x) ν(dx)
+ C ρ(x) log(2 + ρ(x)) 1 − χR (y) π(dy|x) ν(dx)
+ C ρ(x) 1 + d(x, z)2 1 − χR (y) π(dy|x) ν(dx)
(R)
≤C ρ (x) − ρ(x) log(2 + ρ(x)) ν(dx)
+ C (1 + D ) ρ(R) (x) − ρ(x) ν(dx)
2
(R)
+C ρ (x) − ρ(x) (1 + d(x, y)2 ) π(dy|x) + δz ν(dx)
d(x,y)≥D
+ C log(2 + M ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
+C ρ(x) log(2 + ρ(x)) π(dy|x) ν(dx)
ρ(x)≥M
+ C (1 + D2 ) ρ(x) 1 − χR (y) π(dy|x) ν(dx)
+C ρ(x) (1 + d(x, z)2 ) π(dy|x) ν(dx)
d(x,z)≥D
≤ C (1 + D ) ρ(R) (x) − ρ(x) log(2 + ρ(x)) ν(dx)
2
+C 1 + d(x, y)2 π(dx dy)
d(x,y)≥D
+C 1 + d(x, z)2 π(dx dy)
d(x,z)≥D
+ C log(2 + M ) + (1 + D ) 2
1 − χR (y) π(dx dy)
+C ρ log(2 + ρ) dν.
ρ≥M
860 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Since d(x, y)2 1d(x,y)≥D ≤ d(x, z)2 1d(x,z)≥D/2 + d(y, z)2 1d(y,z)≥D/2 , the
above expression can in turn be bounded by
(R)
C (1+D ) ρ −ρ log(2 + ρ) dν + C
2
1 + d(x, z)2 µ(dx)
d(z,x)≥D/2
+C 1 + d(y, z)2 µ1 (dy) + C ρ log(2 + ρ) dν
d(z,y)≥D/2 ρ≥M
log(2 + M ) + D2
+C d(z, y)2 µ1 (dy).
R2 d(z,y)≥D/2
From that point on, the proof is similar to the one in the first case.
(To prove that g ∈ L1 (B2R ; C(B2R )) one can use the fact that β is
bounded from above and below by positive constants on B2R × B2R ,
and apply the same estimates as in the proof of Theorem 29.20(ii).)
Proof of Theorem 30.5. By an approximation theorem as in the proof of
Proposition 29.12, we may restrict to the case when U is nonnegative;
we may also assume that U is Lipschitz (in case N < ∞) or that it
behaves at infinity like a r log r+b r (in case N = ∞). By approximating
N by N > N , we may also assume that the distortion coefficients
(K,N ) (K,N )
βt (x, y) are locally bounded and | log βt (x, y)| = O(d(x, y)2 ).
Let (µk,0 )k∈N (resp. (µk,1 )k∈N ) be a sequence converging to µ0 (resp.
to µ1 ) and satisfying the conclusions of Theorem 30.6(ii). For each k
there is a Wasserstein geodesic (µk,t )0≤t≤1 and an associated coupling
πk of (µk,0 , µk,1 ) such that
Displacement convexity 861
(K,N ) (K,N )
β β
Uν (µk,t ) ≤ (1 − t) Uπk1−t
,ν (µk,0 ) + t Uπ̌kt ,ν (µk,1 ). (30.15)
Further, let Πk be a dynamical optimal transference plan such that
(et )# Πk = µk,t and (e0 , e1 )# Πk = πk . Since the sequence µk,0 con-
verges weakly to µ0 , its elements belong to a compact subset of P (X );
the same is true of the measures µk,1 . By Theorem 7.21 the families
(µk,t )0≤t≤1 belong to a compact subset of C([0, 1]; P (X )); and also the
dynamical optimal transference plans Πk belong to a compact subset
of P (C([0, 1]; X )). So up to extraction of a subsequence we may as-
sume that Πk converges to some Π, (µk,t )0≤t≤1 converges to some path
(µt )0≤t≤1 (uniformly in t), and πk converges to some π. Since the eval-
uation map is continuous, it is immediate that π = (e0 , e1 )# Π and
µt = (et )# Π.
By Theorem 30.6(i), Uν (µt ) ≤ lim inf Uν (µk,t ). Then, by construc-
k→∞
tion (and Theorem 30.6(ii)),
⎧
⎪
⎪ β
(K,N ) (K,N )
β1−t
⎨lim sup Uπk1−t,ν (µk,0 ) ≤ Uπ,ν (µ0 )
k→∞
⎪
⎪ β
(K,N )
βt
(K,N )
⎩lim sup Uπ̌kt ,ν (µk,1 ) ≤ Uπ̌,ν (µ1 ).
k→∞
The desired inequality (30.5) follows by plugging the above into (30.15).
so
βt
(K,∞) 1 − t2
Uπ̌,ν (µ1 ) ≤ Uν (µ1 ) − λ(K, U ) W2 (µ0 , µ1 )2 . (30.16)
6
Similarly,
β1−t
(K,∞)
1 − (1 − t)2
Uπ,ν (µ0 ) ≤ Uν (µ0 )−λ(K, U ) W2 (µ0 , µ1 )2 . (30.17)
6
Then (30.6) follows from (30.5), (30.16), (30.17) and the identity
1 − t2 1 − (1 − t)2 t(1 − t)
t + (1 − t) = .
6 6 2
Brunn–Minkowski inequality
The next theorem can be taken as the first step to control volumes in
weak CD(K, N ) spaces:
Theorem 30.7 (Brunn–MinkowskiinequalityinweakCD(K, N )
spaces). Let K ∈ R and N ∈ [1, ∞]. Let (X , d, ν) be a weak CD(K, N )
space, let A0 , A1 be two compact subsets of Spt ν, and let t ∈ (0, 1).
Then:
• If N < ∞,
% &
1 (K,N ) 1 1
ν [A0 , A1 ]t N ≥ (1 − t) inf β1−t (x0 , x1 ) N ν[A0 ] N
(x0 ,x1 )∈A0 ×A1
% &
(K,N ) 1 1
+ t inf βt (x0 , x1 ) N ν[A1 ] N . (30.18)
(x0 ,x1 )∈A0 ×A1
Remark 30.8. The result fails if A0 , A1 are not assumed to lie in the
support of ν. (Take ν = δx0 , x1 = x0 , and A0 = {x0 }, A1 = {x1 }.)
This was the easy part, which does not need the CD(K, N ) condition.
To prove the lower bound, apply (30.18) with A0 = {x}, A1 = A:
This results in
1 (K,N ) 1 1
ν [x, A]t N ≥ t inf βt (x, a) N ν[A] N .
a∈A
864 30 Weak Ricci curvature bounds II: Geometric and analytic properties
(K,N )
As t → 1, inf βt (x, a) converges to 1, so we may pass to the lim inf
and recover
lim inf ν [x, A]t ≥ ν[A].
t→1
Bishop–Gromov inequalities
Once we know that ν has no atom, we can get much more precise
information and control on the growth of the volume of balls, and in
particular prove sharp Bishop–Gromov inequalities for weak CD(K, N )
spaces with N < ∞:
ν[Br (x0 )]
r is a nonincreasing function of r, (30.22)
(K,N )
s (t) dt
0
r2
ν[Br (x0 )] ≤ eCr e(K− ) 2 ; (30.23)
2
−K r2
ν[Br+δ (x0 ) \ Br (x0 )] ≤ eCr e if K > 0. (30.24)
Before providing the proof of this theorem, I shall state three im-
mediate but important corollaries, all of them in finite dimension.
Bishop–Gromov inequalities 865
r ≤ R.
Next, Corollary 30.12 and the definition of Hausdorff measure imply
H [B(z, R)] = 0 for any d > N , where Hd stands for the d-dimensional
d
Proof of Theorem 30.11. The open ball (resp. closed ball) of center x0
and radius r in the space (Spt ν, d) coincides with B(x0 , r)∩Spt ν (resp.
B[x0 , r] ∩ Spt ν). So we may use Theorem 30.2 to reduce to the case
X = Spt ν.
Next, we may dismiss the case where ν is a Dirac mass as trivial;
then by Corollary 30.9, we may assume that ν has no atom.
Let x0 ∈ X and r > 0. The open ball Br (x0 ) contains [x0 , Br] (x0 )]t ,
for all t ∈ (0, 1). By Corollary 30.10,
ν[Br (x0 )] ≥ lim ν x0 , Br] (x0 ) t = ν[Br] (x0 )],
t→1
Uniqueness of geodesics
bounded from above and below; then the integrability of Ψ (ρ) also im-
plies the finiteness of β(x0 , x1 ) Ψ (ρ(x0 )/β(x0 , x1 )) π(dx1 |x0 ) ν(dx0 ),
and the same reasoning as before applies.
To prove (ii), let U (r) = −N r1−1/N ; U ∈ DCN and U (∞) = 0.
Moreover, Uν (µ) < 0 as soon as µ is absolutely continuous. Among all
dynamical optimal transport plans Π with (e0 )# Π = µ0 , (e1 )# Π = µ1
which satisfy the CD(K, N ) displacement convexity inequality, choose
one with minimal Uν (µt0 ). There exists one such dynamical optimal
transport plan by compactness of the set of admissible transference
plans (Lemma 4.4) and lower semicontinuity of Uν (Theorem 30.6(i)).2
Assume that µt0 is not absolutely continuous, i.e. there exists a Borel
set Z ⊂ X with ν[Z] = 0 and µt0 [Z] > 0. Let Π be the restricted
transport obtained by conditioning Π by the event “γ(t0 ) ∈ Z”.
The measures (et )# Π = µt satisfy the following properties: (a) µt ≤
µt /µt0 [Z], and in particular µ0 is absolutely continuous; (b) µt0 is con-
(K,N )
β1−t
centrated on Z. So Uν (µt0 ) = 0, but Uπ,ν 0 (µ0 ) < 0, and
(K,N ) (K,N )
β1−t βt
Uν (µt0 ) ≤ (1 − t0 ) Uπ,ν 0
(µ0 ) + t0 Uπ̌,ν0 (µ1 ) (30.26)
cannot hold.
On the other hand, there has to be some dynamical optimal trans-
port plan Π such that (e0 )# Π = µ0 , (e1 )# Π = µ1 and in-
equality (30.26) holds true with µt0 replaced by µt0 = (et0 )# Π . In
particular, Uν (µt0 ) < Uν (µt0 ) = 0, which implies that µt0 is not purely
singular.
Now consider the plan Π defined by
2
Here I am cheating a bit because Theorem 30.6, in the version which I have
stated, assumes U (r) ≥ −c r. To deal with this issue, one could prove a more
general version of Theorem 30.6; but a simpler remedy is to introduce ε > 0 and
choose U (r) = −N r (r +ε)−1/N instead of −N r1−1/N . If µ0 and µ1 are compactly
supported, one may keep the proof as it is and use Theorem 29.20(i) instead of
Theorem 30.6(i).
870 30 Weak Ricci curvature bounds II: Geometric and analytic properties
= P [γt0 ∈ Z] d(γ0 , γ1 ) Π (dγ) +
2
1γt0 ∈Z 2
/ d(γ0 , γ1 ) Π(dγ)
= d(γ0 , γ1 )2 Π(dγ).
(To pass from the first to the second line I used the fact that Π and
Π are displacement optimal transference plans between the same two
measures.)
It follows from (30.27) and the ν-negligibility of Z that
where ρt0 , ρt0 and ρt0 respectively stand for the density of the absolutely
continuous parts of µ t0 , µt0 and µt0 , and a = P [γt0 ∈ Z] > 0. Then
from the minimality property of µt0 ,
U (ρt0 (x)) dν(x) = Uν (µt0 ) ≤ Uν ( µt0 ) = U ρt0 (x)+a ρt0 (x) dν(x).
If, say, µ0 is not purely singular, then the first term on the right-hand
side is negative, while the second one is nonpositive. It follows that
Uν (µt ) < 0, and therefore µt is not purely singular.
ρt pLp (ν) ≤ Uν (µt ) ≤ (1 − t) ρ0 pLp (ν) + t ρ1 pLp (ν)
p
≤ max ρ0 Lp (ν) , ρ1 Lp (ν) . (30.28)
also finite, and ρt belongs to Lp . Then take powers 1/p in both sides
of (30.28) and pass to the limit as p → ∞, to recover
ρt L∞ (ν) ≤ max ρ0 L∞ (ν) , ρ1 L∞ (ν) .
Remark 30.21. The above argument exploited the fact that in the de-
finition of weak CD(K, N ) spaces the displacement convexity inequal-
ity (29.11) is required to hold for all members of DCN and along a
common Wasserstein geodesic.
where |∇− ρ| is defined by (20.2) (one may also use |∇ρ| in place of
|∇− ρ|). With this notion, one has the following estimates:
Sobolev inequalities
Theorem 30.22 provides a Sobolev inequality of infinite-dimensional
nature; but in weak CD(K, N ) spaces with N < ∞, one also has access
to finite-dimensional Sobolev inequalities. An example is the following
statement.
Theorem 30.23 (Sobolev inequality in weak CD(K, N ) spaces).
Let (X , d, ν) be a weak CD(K, N ) space, where K < 0 and N ∈ [1, ∞).
Then, for any R > 0 there exist constants A = A(K, N, R) and
B = B(K, N, R) such that for any Lipschitz function u supported in a
ball B(z, R),
u N ≤ A ∇− uL1 (ν) + B uL1 (ν) . (30.31)
L N −1 (ν)
Diameter control
Recall from Proposition 29.11 that a weak CD(K, N ) space with K > 0
and N < ∞ satisfies the Bonnet–Myers diameter bound
Poincaré inequalities 873
7
N −1
diam (Spt ν) ≤ π .
K
Slightly weaker conclusions can also be obtained under a priori
weaker assumptions: For instance, if X is at the same time a weak
CD(0, N ) space and a weak CD(K, ∞) space, then there is a universal
constant C such that
7
N −1
diam (Spt ν) ≤ C . (30.32)
K
See the bibliographical notes for more details.
Poincaré inequalities
1
g(γ(1)) − g(γ(0)) ≤ g(γ(t)) |γ̇(t)| dt.
0
Talagrand inequalities
With logarithmic Sobolev inequalities come a rich functional apparatus
for treating concentration of measure. One may also get concentration
from curvature bounds CD(K, ∞) via Talagrand inequalities. As for
the links between logarithmic Sobolev and Talagrand inequalities, they
also remain true, at least under mild stringent regularity assumptions
on X :
Theorem 30.28 (Talagrand inequalities and weak curvature
bounds). (i) Let (X , d, ν) be a weak CD(K, ∞) space with K > 0.
Then ν lies in P2 (X ) and satisfies the Talagrand inequality T2 (K).
(ii) Let (X , d, ν) be a locally compact Polish geodesic space equipped
with a locally doubling measure ν, satisfying a local Poincaré inequality.
If ν satisfies a logarithmic Sobolev inequality for some constant K > 0,
then ν lies in P2 (X ) and satisfies the Talagrand inequality T2 (K).
(iii) Let (X , d, ν) be a locally compact Polish geodesic space. If ν
satisfies a Talagrand inequality T2 (K) for some K > 0, then it also
satisfies a global Poincaré inequality with constant K.
(iv) Let (X , d, ν) be a locally compact Polish geodesic space equipped
with a locally doubling measure ν, satisfying a local Poincaré inequality.
If ν satisfies a global Poincaré inequality, then it also satisfies a modi-
fied logarithmic Sobolev inequality and a quadratic-linear transportation
inequality as in Theorem 22.25.
Remark 30.29. In view of Corollary 30.14 and Theorem 30.26, the
regularity assumptions required in (ii) are satisfied if (X , d, ν) is a non-
branching weak CD(K , N ) space for some K ∈ R, N < ∞; note that
the values of K and N do not play any role in the conclusion.
Proof of Theorem 30.28. Part (i) is an immediate consequence of (30.25)
and (30.29) with µ0 = ν.
The proof of (ii) and (iii) is the same as the proof of Theorem 22.17,
once one has an analog of Proposition 22.16. It turns out that proper-
ties (i)–(vi) of Proposition 22.16 and Theorem 22.46 are still satisfied
when the Riemannian manifold M is replaced by any metric space X ,
but property (vii) might fail in general. It is still true that this property
holds true for ν-almost all x, under the assumption that ν is locally
doubling and satisfies a local Poincaré inequality. See Theorem 30.30
below for a precise statement (and the bibliographical notes for refer-
ences). This is enough for the proof of Theorem 22.17 to go through.
876 30 Weak Ricci curvature bounds II: Geometric and analytic properties
The next statement makes this claim precise. Note that I leave aside
the case N = 1, which is special (for instance U1 is not defined). I shall
write (UN )ν = HN,ν , and (UN )βπ,ν = HN,π,ν
β
. Recall Convention 17.10.
Remark 30.33. In the case N = 1, (30.35) does not make any sense,
but the equivalence (i) ⇒ (iii) still holds. This can be seen by working in
dimension N > 1 and letting N ↓ 1, as in the proof of Theorem 17.41.
already used in Theorem 19.4, that we may condition the optimal trans-
port to lie in a very small ball at time t, and, by passing to the limit,
retrieve a pointwise control of the density ρt . This will work because
the nonbranching property implies the uniqueness of the displacement
interpolation between intermediate times, and forbids the crossing of
geodesics used in the optimal transport, as in Theorem 7.30. Apart
from this simple idea, the proof is quite technical and can be skipped
at first reading.
Since all the measures µk,0 and µk,1 are supported in a uniform compact
set, Corollary 7.22 guarantees that the sequence (Πk )k∈N converges, up
to extraction, to some dynamical optimal transference plan Π with
(e0 )# Π = µ0 and (e1 )# Π = µ1 . Then µk,t converges weakly to µt =
(et )# Π, and πk := (e0 , e1 )# Πk converges weakly to π = (e0 , e1 )# Π. It
Equivalence of definitions in nonbranching spaces 879
Z = γ ∈ Γ ; γt ∈ Bδ (y) ,
In other words,
(ρt )1− N (ρ0 (x0 ))− N β1−t (x0 , x1 ) N π (dx0 dx1 )
1 1 1
dν ≥ (1 − t)
X X ×X
(ρ1 (x1 ))− N βt (x0 , x1 ) N π (dx0 dx1 ), (30.40)
1 1
+ t
X ×X
1 1
βt (γ0 , γ1 ) N
1
ν[Bδ (y)] N β1−t (γ0 , γ1 ) N
1 ≥E Π (1−t) +t γt ∈Bδ (y) .
µt [Bδ (y)] N ρ0 (γ0 ) ρ1 (γ1 )
If we define
1 1
β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
f (γ) := (1−t) +t ,
ρ0 (γ0 ) ρ1 (γ1 )
ν[Bδ (y)] N
1 E f (γ)1
[γt ∈Bδ (y)]
1 ≥ E Π f (γ)|γt ∈ Bδ (y) = . (30.41)
µt [Bδ (y)] N µt [Bδ (y)]
and this coincides with f (Ft (y)) if ρt (y) = 0. All in all, µt (dy)-almost
surely,
1
1 ≥ f (Ft (y)).
ρt (y) N
Equivalently, Π(dγ)-almost surely,
1
1 ≥ f (Ft (γt )) = f (γ).
ρt (γt ) N
1 ≥ (1 − t) + t . (30.43)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
ρt (γt )
1
N 1−ε ρ0 (γ0 )
β N1
t t (γ0 , γ1−ε )
1−ε
+ . (30.44)
1−ε ρ1−ε (γ1−ε )
ρ1−ε (γ1−ε ) N
1
1−t ρt (γt )
β 1−t−ε (γt , γ1 ) N1
1−t−ε 1−t
+ . (30.45)
1−t ρ1 (γ1 )
1−ε ρ0 (γ0 )
β 1−t−ε (γt , γ1 ) β t (γ0 , γ1−ε ) N1
1−t−ε t 1−t 1−ε
+ .
1−t 1−ε ρ1 (γ1 )
as desired.
Step 3: Now we wish to establish inequality (30.36) in the case
when µt is absolutely continuous, that is, we just want to drop the
assumption of compact support.
It follows from Step 2 that (X , d, ν) is a weak CD(K, N ) space, so we
now have access to Theorem 30.19 even if µ0 and µ1 are not compactly
supported; and also we can appeal to Theorem 30.5 to guarantee that
Property (ii) is verified for probability measures that are not necessar-
ily compactly supported. Then we can repeat Steps 1 and 2 without
the assumption of compact support, and in the end establish inequal-
ity (30.36) for measures that are not compactly supported.
Step 4: Now we shall consider the case when µt is not absolutely
continuous. (This is the part of the proof which has interest even in
a smooth setting.) Let (µt )s stand for the singular part of µt , and
m := (µt )s [X ] > 0.
Let E (a) and E (s) be two disjoint Borel sets in X such that the
absolutely continuous part of µt is concentrated on E (a) , and the sin-
gular part of µt is concentrated on E (s) . Obviously, Π[γt ∈ E (s) ] =
(µt )s [X ] = m, and Π[γt ∈ E (a) ] = 1 − m. Let us decompose Π into
Π = (1 − m) Π (a) + m Π (s) , where
1[γt ∈E (a) ] Π(dγ) 1[γt ∈E (s) ] Π(dγ)
Π (a) (dγ) = , Π (s) (dγ) = .
Π[γt ∈ E (a) ] Π[γt ∈ E (s) ]
Further, for any s ∈ [0, 1], let
µ(a)
s = (es )# Π
(a)
, µ(s) (s)
s = (es )# Π ,
π (a) = (e0 , e1 )# Π (a) , π (s) = (e0 , e1 )# Π (s) .
884 30 Weak Ricci curvature bounds II: Geometric and analytic properties
β1−t β (a)
Uπ,ν (µ0 ) = (Um )π1−t
(a) ,ν (µ0 ) + m U (∞). (30.49)
Similarly,
(µ1 ) = (Um )π̌βt(a) ,ν (µ1 ) + m U (∞).
βt (a)
Uπ̌,ν (30.50)
1 1 1 K t(1 − t)
log ≥ (1 − t) log + t log + d(γ0 , γ1 )2 .
ρt (γt ) ρ0 (γ0 ) ρ1 (γ1 ) 2
(30.51)
At a technical level, there is a small simplification since it is not
necessary to treat singular measures (if µ is singular and U is not linear,
then according to Proposition 17.7(ii) U (∞) = +∞, so Uν (µ) = +∞).
Equivalence of definitions in nonbranching spaces 885
The family of balls {Br (z); z ∈ Bδ/2 (y); r ≤ δ/2} generates the Borel
σ-algebra of Bδ/2 (y), so (30.52) holds true for any measurable set S ⊂
Bδ/2 (y) instead of Br (z). Then we can pass to densities:
ρt (z) ≤ exp − inf f (Ft (x)) almost surely in Bδ/2 (y).
x∈Bδ (y)
and define
Πk
Πk (dγ) = gk (γ0 ) ν(dγ0 ) Π(dγ|γ0 ); Πk = .
Zk
Then Πk is a probability measure on geodesics, and since it has been
obtained from Π by restriction, it is actually the unique dynamical
optimal transference plan between the two probability measures µk,0 =
(e0 )# Πk and µk,1 = (e1 )# Πk . From the construction of Πk ,
ρ1
µk,0 = ρk,0 ν; µk,1 = ρk,1 ν; ρk,1 ≤ .
Zk
Next we repeat the process at the other end: Let (hk, )∈N be a
nonincreasing sequence of upper semicontinuous functions such that
0 ≤ hk, ≤ ρk,1 and hk, ↑ ρk,1 (up to a set of zero measure). Define
hk,
Zk, = hk, dν, ρk,,1 = ;
Zk,
(dγ)
Πk,
Πk, (dγ) = Πk (dγ|γ1 ) hk, (γ1 ) ν(dγ1 ); Πk, (dγ) = .
Zk,
Zk ρk,t ↑ ρt as k → ∞;
Zk, ρk,,t ↑ ρk,t as → ∞.
Locality
Locality is one of the most fundamental properties that one may expect
from any notion of curvature. In the setting of weak CD(K, N ) spaces,
the locality problem may be loosely formulated as follows: If (X , d, ν) is
weakly CD(K, N ) in the neighborhood of any of its points, then (X , d, ν)
should be a weakly CD(K, N ) space.
So far it is not known whether this “local-to-global” property holds
in general. However, it is true at least in a nonbranching space, if K = 0
and N < ∞. The validity of a more general statement may depend on
the following:
θ
sin (1 − λ) α|t − t |
f (1 − λ) t + λ t ≥ (1 − λ) f (t)
(1 − λ) sin(α|t − t |)
θ
sin λ α|t − t |
+ λ f (t ) (30.56)
λ sin(α|t − t |)
Proof of Theorem 30.37. If we can treat the case N > 1, then the case
N = 1 will follow by letting N go to 1 (as in the proof of Theorem
29.24). So let us assume 1 < N < ∞. In the sequel, I shall use the
(K,N )
shorthand βt = βt .
Let (X , d, ν) be a nonbranching local weak CD(K, N ) space. By re-
peating the proof of Theorem 30.32, we can show that for any x0 ∈ X
there is r = r(x0 ) > 0 such that (30.58) holds true along any displace-
ment interpolation (µt )0≤t≤1 which is supported in B(x0 , r). Moreover,
if Π is a dynamical optimal transference plan such that (et )# Π = µt ,
and each measure µt is absolutely continuous with density ρt , then
Π(dγ)-almost all geodesics will satisfy inequality (30.43), which I re-
cast below:
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t . (30.59)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
Let µ0 , µ1 be any two compactly supported probability measures
on X , and let B = B(z, R) be a large ball such that any geodesic going
from Spt µ0 to Spt µ1 lies within B. Let Π be a dynamical optimal
transference plan between µ0 and µ1 . The goal is to prove that for all
U ∈ DCN ,
βt
Uν (µt ) ≤ (1 − t) Uπ,νβ1−t
(µ0 ) + t Uπ̌,ν (µ1 ). (30.60)
Locality 891
The plan is to cut Π into very small pieces, each of which will
be included in a sufficiently small ball that the local weak CD(K, N )
criterion can be used. I shall first proceed to construct these small
pieces.
Cover the closed ball B[z, R] by a finite number of balls B(xj , rj /3)
with rj = r(xj ), and let r := inf(rj /3). For any y ∈ B[z, R], the
ball B(y, r) lies inside some B(xj , rj ); so if (µt )0≤t≤1 is any displace-
ment interpolation supported in some ball B(y, r), Π is an associated
dynamical optimal transference plan, and µ0 , µ1 are absolutely contin-
uous, then the density ρt of µt will satisfy the inequality
1 1
1 β1−t (γ0 , γ1 ) N βt (γ0 , γ1 ) N
1 ≥ (1 − t) +t , (30.61)
ρt (γt ) N ρ0 (γ0 ) ρ1 (γ1 )
The sets Γ are disjoint. We discard the sequences such that Π[Γ ] = 0.
Then let Z = Π[Γ ], and let
1Γ Π
Π =
Z
∀k ∈ {0, . . . , m − 2},
Π (dγ)-almost surely, ∀t ∈ [0, 1], ∀(t0 , t1 ) ∈ [kδ, (k + 2)δ],
1 β (γ , γ ) 1 β (γ , γ ) 1
1−t t0 t1 N t t0 t1 N
1 ≥(1−t) +t .
ρ,(1−t)t0 +tt1 γ(1−t)t0 +tt1 N ρ,t0 (γt0 ) ρ,t1 (γt1 )
(30.62)
1 ≥ (1 − t) +t . (30.63)
ρ,t (γt ) N ρ,0 (γ0 ) ρ,1 (γ1 )
XK := γt ; 0 ≤ t ≤ 1; γ0 ∈ K, γ1 ∈ K
and
(k)
∀k ∈ N, ∀j ∈ N (j ≤ 1/δ), sup ρjδ < +∞, (30.69)
(k)
where the supremum really is an essential supremum, and ρt is the
(k)
density of µt = (et )# Π (k) with respect to ν.
If we can do this, then by repeating the proof of Theorem 30.37 we
shall obtain
(K,∞) (K,∞)
(k) β (k) β (k)
Hν (µt ) ≤ (1 − t) Hπk1−t
,ν (µ0 ) + t Hπ̌kt ,ν (µ1 ).
k1
Π
k1 [Γ ];
Z k1 = Π Π k1 = .
Z k1
As k1 goes to infinity, it is clear that Z k1 ↑ 1 (in particular, we may as-
sume without loss of generality that Z k1 > 0) and Z k1 Π k1 ↑ Π. More-
over, if ρkt 1 stands for the density of (et )# Π k1 , then ρkδ 1 = (Z k1 )−1 hkδ 1
is bounded.
Now comes the second step: For each k1 , let (hk2δ1 ,k2 )k2 ∈N be a non-
decreasing sequence of bounded functions converging almost surely to
ρk2δ1 as k2 → ∞. Let
k1 ,k2 (dγ) = (hk1 ,k2 ν)(dγ2δ ) Π k1 (dγ|γ2δ ),
Π 2δ
k1 ,k2
Π
k1 ,k2 [Γ ],
Z k1 ,k2 = Π Π k1 ,k2 = ,
Z k1 ,k2
and let ρkt 1 ,k2 stand for the density of (et )# Π k1 ,k2 . Then ρkδ 1 ,k2 ≤
(Z k1 ,k2 )−1 ρkδ 1 = (Z k1 ,k2 Z k1 )−1 hkδ 1 and ρk2δ1 ,k2 = (Z k1 ,k2 )−1 hk2δ1 ,k2 are
both bounded.
Then repeat the process: If Π k1 ,...,kj has been constructed for any
k1 ,...,kj+1
k1 , . . . , kj in N, introduce a nonincreasing sequence (h(j+1)δ )kj+1 ∈N
k ,...,k
1
converging almost surely to ρ(j+1)δj
as kj+1 → ∞; define
k1 ,...,kj+1 (dγ) = (hk1 ,...,kj+1 ν)(dγ(j+1)δ ) Π k1 ,...,kj+1 (dγ|γ(j+1)δ ),
Π (j+1)δ
k1 ,...,kj+1 [Γ ],
Z k1 ,...,kj+1 = Π
k1 ,...,kj+1 /Z k1 ,...,kj+1 ,
Π k1 ,...,kj+1 = Π
k ,...,kj+1
µt 1 = (et )# Π k1 ,...,kj+1 .
k ,...,k k ,...,k
Then for any t ∈ {δ, 2δ, . . . , (j +1)δ}, the density ρt 1 j+1
of µt 1 j+1
is bounded.
After m operations this process has constructed Π k1 ,...,km such that
all densities ρkjδ1 ,...,km are bounded. The proof is completed by choosing
Π (k) = Π k,...,k , Z (k) = Z k · Z k,k · . . . · Z k,...,k .
Bibliographical notes
Most of the material in this chapter comes from papers by Lott and
myself [577, 578, 579] and by Sturm [762, 763]. Some of the results
are new. Prior to these references, there had been an important se-
ries of papers by Cheeger and Colding [227, 228, 229, 230], with a
follow-up by Ding [303], about the structure of measured Gromov–
Hausdorff limits of sequences of Riemannian manifolds satisfying a uni-
form CD(K, N ) bound. Some of the results by Cheeger and Colding can
be re-interpreted in the present framework, but for many other theo-
rems this remains open: Examples are the generalized splitting theorem
[227, Theorem 6.64], the theorem of mutual absolute continuity of ad-
missible reference measures [230, Theorem 4.17], or the theorem of
continuity of the volume in absence of collapsing [228, Theorem 5.9].
(A positive answer to the problem raised in Remark 30.16 would solve
the latter issue.)
898 30 Weak Ricci curvature bounds II: Geometric and analytic properties
Theorem 30.2 is taken from work by Lott and myself [577] as well
as Corollary 30.9, Theorems 30.22 and 30.23, and the first part of The-
orem 30.28. Theorem 30.7, Corollary 30.10, Theorems 30.11 and 30.17
are due to Sturm [762, 763]. Part (i) of Theorem 30.19 was proven by
Lott and myself in the case K = 0. Part (ii) follows a scheme of proof
communicated to me by Sturm. In a Euclidean context, Theorem 30.20
is well-known to specialists and used in several recent works about
optimal transport; I don’t know who first made this observation.
The Poincaré inequalities appearing in Theorems 30.25 and 30.26
(in the case K = 0) are due to Lott and myself [578]. The concept of
upper gradient was put forward by Heinonen and Koskela [470] and
other authors; it played a key role in Cheeger’s construction [226] of a
differentiable structure on metric spaces satisfying a doubling condition
and a local Poincaré inequality. Independently of [578], there were sev-
eral simultaneous treatments of local Poincaré inequalities under weak
CD(K, N ) conditions, by Sturm [763] on the one hand, and von Renesse
[825] on the other. The proofs in all of these works have many com-
mon points, and also common features with the proof by Cheeger and
Colding [230]. But the argument by Cheeger and Colding uses another
inequality called the “segment inequality” [227, Theorem 2.11], which
as far as I know has not been adapted to the context of metric-measure
spaces. In [578] on the other hand we used the concept of “democratic
condition”, as in Theorem 19.13.
All these notions (possibly coupled with a doubling condition) are
stable under the Gromov–Hausdorff limit: this was checked in [226, 510,
529] for the Poincaré inequality, in [230] for the segment inequality, and
in [578] for the democratic condition.
Theorem 30.28(ii) is due to Lott and myself [579]; it uses Proposition
30.30 with L(d) = d2 /2. In this particular case (and under a nonessen-
tial compactness assumption), a complete proof of Proposition 30.30
can be found in [579]. It is also shown there that the conclusions
of Proposition 22.16 all remain true if (X , d) is a finite-dimensional
Alexandrov space with curvature bounded below; this is a pointwise
property, as opposed to the “almost everywhere” statement appearing
in Proposition 30.30(vii). As for the proof of this almost everywhere
result, it is based on Cheeger’s generalized Rademacher theorem [226,
Theorem 10.2]. (The argument is written down in [579] only for the
case L(d) = d2 /2 but this is no big deal.)
Bibliographical notes 899
• About the lifting problem, one might first think that it follows
from the locality, as stated for instance in Theorems 30.37 or 30.42.
But even in situations where CD(K, N ) has been shown to be local
(say N = 0 and X is nonbranching), the existence of the universal
covering is not obvious. Abstract topology shows that the existence
of the universal covering of X is equivalent to X being semi-locally
simply connected (“délaçable” in the terminology of Bourbaki). This
property is satisfied if X is locally contractible, that is, every point x
has a neighborhood which can be contracted into x. For instance, an
Alexandrov space with curvature bounded below is locally contractible
because any point x has a neighborhood which is homeomorphic to the
tangent cone at x; no such theory is known for weak CD(K, N ) spaces.
(All this was explained to me by Lott.)
Here are some technical notes to conclude.
An introduction to Hausdorff measure and Hausdorff dimension can
be found, e.g., in Falconer’s broad-audience books [337, 338].
The Dunford–Pettis theorem provides a sufficient condition for uni-
form equi-integrability: If a family F ⊂ L1 (ν) is weakly sequentially
compact in L1 (ν), then there exists a function
Ψ : R+ → R+ such that
Ψ (r)/r → +∞ as r → ∞ and supf ∈F Ψ (f ) dν < +∞. A proof can
be found, e.g., in [177, Theorem 2.12] (there the theorem is stated in
Rn but the proof is the same in a more general space), or in my own
course on integration [819, Section VII-5].
Urysohn’s lemma [318, Theorem 2.6.3] states the following: If (X , d)
is a locally compact metric space (or even just a locally compact Haus-
dorff space), K is a compact subset of X and O is an open subset of X
with K ⊂ O, then there is f ∈ Cc (X ) with 1K ≤ f ≤ 1O .
Analysis on metric spaces (in terms of regularity, Sobolev spaces,
etc.) has undergone rapid development in the past ten years, after the
pioneering works by Hajlasz and Koskela [459, 461] and others. Among
dozens and dozens of papers, I shall only quote two reference books [37,
469] and a survey paper [460]; see also the bibliographical notes of
Chapter 26 for more references about analysis in Alexandrov spaces.
The thesis developed in the present course is that optimal transport
has suddenly become an important actor in this theory.
Conclusions and open problems
Conclusions and open problems 905
I did not include this theorem in the body of these notes, because it
appeals to some results that have not yet been adapted to a genuinely
geometric context, and which I preferred not to discuss. I shall sketch
the proof at the end of this text, but first I would like to explain why
this result is at the same time motivating, and a bit shocking:
(a) As pointed out to me by John Lott, if · is not Euclidean,
then the metric-measure space (Rn , · , λn ) cannot be realized as a
limit of smooth Riemannian manifolds with a uniform CD(0, N ) bound,
because it fails to satisfy the splitting principle. (If a nonnegatively
curved space admits a line, i.e. a geodesic parametrized by R, then the
space can be “factorized” by this geodesic.) Results by Jeff Cheeger,
Toby Colding and Detlef Gromoll say that the splitting principle holds
for CD(0, N ) manifolds and their measured Gromov–Hausdorff limits.
(b) If · is not the Euclidean norm, the resulting metric space
is very singular in certain respects: It is in general not an Alexandrov
space, and it can be extremely branching. For instance, if · is the
∞ norm, then any two distinct points are joined by an uncountable
Conclusions and open problems 909
is one of the topics that Donald Knuth has planned to address in his
long-awaited Volume 4 of The Art of Computer Programming.
Needless to say, the theory might also decide to explore new horizons
which I am unable to foresee.
λ I n ≤ ∇2 N 2 ≤ Λ I n
for some positive constants λ and Λ. Then the cost function c(x, y) =
N (x − y)2 is both strictly convex and C 1,1 , i.e. uniformly semiconcave.
This makes it possible to apply Theorem 10.28 (recall Example 10.35)
and deduce the following theorem about the structure of optimal maps:
If µ0 and µ1 are compactly supported and absolutely continuous, then
there is a unique optimal transport, and it takes the form
Since the norm is uniformly convex, geodesic lines are just straight
lines; so the displacement interpolation takes the form (Tt )# (ρ0 λn ),
where
Tt (x) = x − t ∇(N 2 )∗ (−∇ψ(x)) 0 ≤ t ≤ 1.
Let θ(x) = ∇(N 2 )∗ (−∇ψ(x)). By [814, Remark 2.56], the Jacobian
matrix ∇θ, although not symmetric, is pointwise diagonalizable, with
eigenvalues bounded above by 1 (this remark goes back at least to a
1996 preprint by Otto [666, Proposition A.4]; a more general statement
is in [30, Theorem 6.2.7]). It follows easily that t → det(In − t∇θ)1/n
is a concave function of t [814, Lemma 5.21], and one can reproduce
the proof of displacement convexity for Uλn , as soon as U ∈ DCn [814,
Theorem 5.15 (i)].
This shows that (Rn , N, λn ) satisfies the CD(0, n) displacement con-
vexity inequalities when N is a smooth uniformly convex norm. Now if
N is arbitrary, it can be approximated by a sequence (Nk )k∈N of smooth
uniformly convex norms, in such a way that (Rn , N, λn , 0) is the pointed
measured Gromov–Hausdorff limit of (Rn , Nk , λn , 0) as k → ∞. Then
the general conclusion follows by stability of the weak CD(0, n) crite-
rion (Theorem 29.24).
1. Abdellaoui, T., and Heinich, H. Sur la distance de deux lois dans le cas
vectoriel. C. R. Acad. Sci. Paris Sér. I Math. 319, 4 (1994), 397–400.
2. Abdellaoui, T., and Heinich, H. Caractérisation d’une solution optimale
au problème de Monge–Kantorovitch. Bull. Soc. Math. France 127, 3 (1999),
429–443.
3. Agrachev, A., and Lee, P. Optimal transportation under nonholonomic
constraints. To appear in Trans. Amer. Math. Soc.
4. Agueh, M. Existence of solutions to degenerate parabolic equations via the
Monge–Kantorovich theory. Adv. Differential Equations 10, 3 (2005), 309–360.
5. Agueh, M., Ghoussoub, N., and Kang, X. Geometric inequalities via a
general comparison principle for interacting gases. Geom. Funct. Anal. 14, 1
(2004), 215–244.
6. Ahmad, N. The geometry of shape recognition via the Monge–Kantorovich
optimal transport problem. PhD thesis, Univ. Toronto, 2004.
7. Aida, S. Uniform positivity improving property, Sobolev inequalities, and
spectral gaps. J. Funct. Anal. 158, 1 (1998), 152–185.
8. Ajtai, M., Komlós, J., and Tusnády, G. On optimal matchings. Combi-
natorica 4, 4 (1984), 259–264.
9. Alberti, G. On the structure of singular sets of convex functions. Calc. Var.
Partial Differential Equations 2, 1 (1994), 17–27.
10. Alberti, G. Some remarks about a notion of rearrangement. Ann. Scuola
Norm. Sup. Pisa Cl. Sci (4) 29, 2 (2000), 457–472.
11. Alberti, G., and Ambrosio, L. A geometrical approach to monotone func-
tions in Rn . Math. Z. 230, 2 (1999), 259–316.
12. Alberti, G., Ambrosio, L., and Cannarsa, P. On the singularities of
convex functions. Manuscripta Math. 76, 3-4 (1992), 421–435.
13. Alberti, G., and Bellettini, G. A nonlocal anisotropic model for phase
transitions. I: The optimal profile problem. Math. Ann. 310, 3 (1998), 527–560.
14. Albeverio, S., and Cruzeiro, A. B. Global flows with invariant (Gibbs)
measures for Euler and Navier–Stokes two-dimensional fluids. Comm. Math.
Phys. 129, 3 (1990), 431–444.
15. Alesker, S., Dar, S., and Milman, V. D. A remarkable measure preserving
diffeomorphism between two convex bodies in Rn . Geom. Dedicata 74, 2 (1999),
201–212.
915
916 References
37. Ambrosio, L., and Tilli, P. Topics on analysis in metric spaces, vol. 25 of
Oxford Lecture Series in Mathematics and its Applications. Oxford University
Press, Oxford, 2004.
38. Andres, S., and von Renesse, M.-K. Particle approximation of the
Wasserstein diffusion. Preprint, 2007.
39. Andreu, F., Caselles, V., and Mazón, J. M. The Cauchy problem for
a strongly degenerate quasilinear equation. J. Eur. Math. Soc. (JEMS) 7, 3
(2005), 361–393.
40. Andreu, F., Caselles, V., Mazón, J., and Moll, S. Finite propagation
speed for limited flux diffusion equations. Arch. Rational Mech. Anal. 182, 2
(2006), 269–297.
41. Ané, C., Blachère, S., Chafaı̈, D., Fougères, P., Gentil, I., Malrieu,
F., Roberto, C., and Scheffer, G. Sur les inégalités de Sobolev logarith-
miques, vol. 10 of Panoramas et Synthèses. Société Mathématique de France,
2000.
42. Appell, P. Mémoire sur les déblais et les remblais des systèmes continus ou
discontinus. Mémoires présentés par divers Savants à l’Académie des Sci-
ences de l’Institut de France, Paris No. 29 (1887), 1–208. Available online at
gallica.bnf.fr.
43. Arnold, A., Markowich, P., Toscani, G., and Unterreiter, A. On log-
arithmic Sobolev inequalities and the rate of convergence to equilibrium for
Fokker–Planck type equations. Comm. Partial Differential Equations 26, 1–2
(2001), 43–100.
44. Arnol d, V. I. Mathematical methods of classical mechanics, vol. 60 of Grad-
uate Texts in Mathematics. Springer-Verlag, New York. Translated from the
1974 Russian original by K. Vogtmann and A. Weinstein. Corrected reprint of
the second (1989) edition.
45. Aronson, D. G., and Bénilan, Ph. Régularité des solutions de l’équation
des milieux poreux dans RN . C. R. Acad. Sci. Paris Sér. A-B 288, 2 (1979),
A103–A105.
46. Aronsson, G. A mathematical model in sand mechanics: Presentation and
analysis. SIAM J. Appl. Math. 22 (1972), 437–458.
47. Aronsson, G., and Evans, L. C. An asymptotic model for compression
molding. Indiana Univ. Math. J. 51, 1 (2002), 1–36.
48. Aronsson, G., Evans, L. C., and Wu, Y. Fast/slow diffusion and growing
sandpiles. J. Differential Equations 131, 2 (1996), 304–335.
49. Attouch, L., and Soubeyran, A. From procedural rationality to routines:
A “Worthwile to Move” approach of satisficing with not too much sacrificing.
Preprint, 2005. Available online at www.gate.cnrs.fr/seminaires/2006 2007.
50. Aubry, P. Monge, le savant ami de Napoléon Bonaparte, 1746–1818.
Gauthier-Villars, Paris, 1954.
51. Aubry, S. The twist map, the extended Frenkel–Kontorova model and the
devil’s staircase. In Order in chaos (Los Alamos, 1982), Phys. D 7, 1–3 (1983),
240–258.
52. Aubry, S., and Le Daeron, P. Y. The discrete Frenkel–Kantorova model
and its extensions, I. Phys. D. 8, 3 (1983), 381–422.
53. Bakelman, I. J. Convex analysis and nonlinear geometric elliptic equations.
Edited by S. D. Taliaferro. Springer-Verlag, Berlin, 1994.
918 References
75. Barthe, F., Cattiaux, P., and Roberto, C. Interpolated inequalities be-
tween exponential and Gaussian, Orlicz hypercontractivity and application to
isoperimetry. Rev. Mat. Iberoamericana 22, 3 (2006), 993–1067.
76. Barthe, F., and Kolesnikov, A. Mass transport and variants of the loga-
rithmic Sobolev inequality. Preprint, 2008.
77. Barthe, F., and Roberto, C. Sobolev inequalities for probability measures
on the real line. Studia Math. 159, 3 (2003), 481–497.
78. Beckner, W. A generalized Poincaré inequality for Gaussian measures. Proc.
Amer. Math. Soc. 105, 2 (1989), 397–400.
79. Beiglböck, M., Goldstern, M., Maresch, G., and Schachermayer, W.
Optimal and better transport plans. Preprint, 2008. Available online at
www.fam.tuwien.ac.at/~wschach/pubs.
80. Bell, E. T. Men of Mathematics. Simon and Schuster, 1937.
81. Benachour, S., Roynette, B., Talay, D., and Vallois, P. Nonlinear self-
stabilizing processes. I. Existence, invariant probability, propagation of chaos.
Stochastic Process. Appl. 75, 2 (1998), 173–201.
82. Benachour, S., Roynette, B., and Vallois, P. Nonlinear self-stabilizing
processes. II. Convergence to invariant probability. Stochastic Process. Appl.
75, 2 (1998), 203–224.
83. Benaı̈m, M., Ledoux, M., and Raimond, O. Self-interacting diffusions.
Probab. Theory Related Fields 122, 1 (2002), 1–41.
84. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. II. Convergence in
law. Ann. Inst. H. Poincaré Probab. Statist. 39, 6 (2003), 1043–1055.
85. Benaı̈m, M., and Raimond, O. Self-interacting diffusions. III. Symmetric
interactions. Ann. Probab. 33, 5 (2005), 1717–1759.
86. Benamou, J.-D. Transformations conservant la mesure, mécanique des flu-
ides incompressibles et modèle semi-géostrophique en météorologie. Mémoire
présenté en vue de l’Habilitation à Diriger des Recherches. PhD thesis, Univ.
Paris-Dauphine, 1992.
87. Benamou, J.-D., and Brenier, Y. Weak existence for the semigeostrophic
equations formulated as a coupled Monge–Ampère/transport problem. SIAM
J. Appl. Math. 58, 5 (1998), 1450–1461.
88. Benamou, J.-D., and Brenier, Y. A numerical method for the optimal
time-continuous mass transport problem and related problems. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997). Amer. Math. Soc., Providence, RI, 1999, pp. 1–11.
89. Benamou, J.-D., and Brenier, Y. A computational fluid mechanics solution
to the Monge–Kantorovich mass transfer problem. Numer. Math. 84, 3 (2000),
375–393.
90. Benamou, J.-D., and Brenier, Y. Mixed L2 -Wasserstein optimal mapping
between prescribed density functions. J. Optim. Theory Appl. 111, 2 (2001),
255–271.
91. Benamou, J.-D., Brenier, Y., and Guittet, K. The Monge–Kantorovitch
mass transfer and its computational fluid mechanics formulation. ICFD Con-
ference on Numerical Methods for Fluid Dynamics (Oxford, 2001). Internat.
J. Numer. Methods Fluids 40, 1–2 (2002), 21–30.
92. Benedetto, D., Caglioti, E., Carrillo, J. A., and Pulvirenti, M.
A non-Maxwellian steady distribution for one-dimensional granular media.
J. Statist. Phys. 91, 5–6 (1998), 979–990.
920 References
173. Brézis, H., and Lieb, E. H. Sobolev inequalities with a remainder term.
J. Funct. Anal. 62 (1985), 73–86.
174. Burago, D., Burago, Y., and Ivanov, S. A course in metric geometry,
vol. 33 of Graduate Studies in Mathematics. American Mathematical Society,
Providence, RI, 2001. Errata list available online at www.pdmi.ras.ru/staff/
burago.html.
175. Burago, Y., Gromov, M., and Perel man, G. A. D. Aleksandrov spaces
with curvatures bounded below. Uspekhi Mat. Nauk 47, 2(284) (1992), 3–51,
222.
176. Burago, Y. D., and Zalgaller, V. A. Geometric inequalities. Springer-
Verlag, Berlin, 1988. Translated from the Russian by A. B. Sosinskiı̆, Springer
Series in Soviet Mathematics.
177. Buttazzo, G., Giaquinta, M., and Hildebrandt, S. One-dimensional vari-
ational problems. An introduction, vol. 15 of Oxford Lecture Series in Math-
ematics and its Applications. The Clarendon Press Oxford University Press,
New York, 1998.
178. Buttazzo, G., Pratelli, A., and Stepanov, E. Optimal pricing policies for
public transportation networks. SIAM J. Optim. 16, 3 (2006), 826–853.
179. Buttazzo, G., Pratelli, A., Solimini, S., and Stepanov, E. Optimal ur-
ban networks via mass transportation. Preprint, 2007. Available online via
cvgmt.sns.it.
180. Buttazzo, G., and Santambrogio, F. A model for the optimal planning of
an urban area. SIAM J. Math. Anal. 37, 2 (2005), 514–530.
181. Cabré, X. Nondivergent elliptic equations on manifolds with nonnegative
curvature. Comm. Pure Appl. Math. 50, 7 (1997), 623–665.
182. Cabré, X. The isoperimetric inequality via the ABP method. Preprint, 2005.
183. Cáceres, M., and Toscani, G. Kinetic approach to long time behavior of
linearized fast diffusion equations. J. Stat. Phys. 128, 4 (2007), 883–925.
184. Caffarelli, L. A. Interior W 2,p estimates for solutions of the Monge–Ampère
equation. Ann. of Math. (2) 131, 1 (1990), 135–150.
185. Caffarelli, L. A. The regularity of mappings with a convex potential.
J. Amer. Math. Soc. 5, 1 (1992), 99–104.
186. Caffarelli, L. A. Boundary regularity of maps with convex potentials.
Comm. Pure Appl. Math. 45, 9 (1992), 1141–1151.
187. Caffarelli, L. A. Boundary regularity of maps with convex potentials. II.
Ann. of Math. (2) 144, 3 (1996), 453–496.
188. Caffarelli, L. A. Monotonicity properties of optimal transportation and the
FKG and related inequalities. Comm. Math. Phys. 214, 3 (2000), 547–563.
Erratum in Comm. Math. Phys. 225 (2002), 2, 449–450.
189. Caffarelli, L. A., and Cabré, X. Fully nonlinear elliptic equations, vol. 43
of American Mathematical Society Colloquium Publications. American Math-
ematical Society, Providence, RI, 1995.
190. Caffarelli, L. A., Feldman, M., and McCann, R. J. Constructing optimal
maps for Monge’s transport problem as a limit of strictly convex costs. J. Amer.
Math. Soc. 15, 1 (2002), 1–26.
191. Caffarelli, L. A., Gutiérrez, C., and Huang, Q. On the regularity of
reflector antennas. Ann. of Math. (2), to appear.
192. Caffarelli, L. A., and McCann, R. J. Free boundaries in optimal transport
and Monge–Ampère obstacle problems. To appear in Ann. of Math. (2).
References 925
193. Caglioti, E., Pulvirenti, M., and Rousset, F. 2-D constrained Navier–
Stokes equation and intermediate asymptotics. To appear in Physica A.
194. Caglioti, E., Pulvirenti, M., and Rousset, F. Quasi-stationary states for
the 2-D Navier–Stokes equation. Preprint, 2007.
195. Caglioti, E., and Villani, C. Homogeneous cooling states are not always
good approximations to granular flows. Arch. Ration. Mech. Anal. 163, 4
(2002), 329–343.
196. Calvez, V. Modèles et analyses mathématiques pour les mouvements collectifs
de cellules. PhD thesis, Univ. Paris VI, 2007.
197. Calvez, V. Personal communication.
198. Calvez, V., and Carrillo, J. A. Work in progress.
199. Cannarsa, P., and Sinestrari, C. Semiconcave functions, Hamilton–Jacobi
equations, and optimal control. Progress in Nonlinear Differential Equations
and their Applications, 58. Birkhäuser Boston Inc., Boston, MA, 2004.
200. Carfora, M. Fokker–Planck dynamics and entropies for the normalized Ricci
flow. Preprint, 2006. Archived online at arxiv.org/abs/math.DG/0507309.
201. Carlen, E. Behind the wave function: stochastic mechanics today. In
Proceedings of the Joint Concordia Sherbrooke Seminar Series on Functional
Integration Methods in Stochastic Quantum Mechanics (Sherbrooke, PQ and
Montreal, PQ, 1987) (1991), vol. 25, pp. 141–156.
202. Carlen, E. A., Carvalho, M. C., Esposito, R., Lebowitz, J. L., and
Marra, R. Displacement convexity and minimal fronts at phase boundaries.
Preprint, 2007.
203. Carlen, E. A., and Gangbo, W. Constrained steepest descent in the
2-Wasserstein metric. Ann. of Math. (2) 157, 3 (2003), 807–846.
204. Carlen, E. A., and Gangbo, W. Solution of a model Boltzmann equation
via steepest descent in the 2-Wasserstein metric. Arch. Ration. Mech. Anal.
172, 1 (2004), 21–64.
205. Carlen, E. A., and Soffer, A. Entropy production by block variable sum-
mation and central limit theorems. Comm. Math. Phys. 140 (1991), 339–371.
206. Carlier, G., and Jimenez, Ch. On Monge’s problem for Bregman-like cost
functions. To appear in J. Convex Anal.
207. Carlier, G., Jimenez, Ch., and Santambrogio, S. Optimal transportation
with traffic congestion and Wardrop equilibria. Preprint, 2006.
208. Carrillo, J. A., Di Francesco, M., and Lattanzio, C. Contractivity of
Wasserstein metrics and asymptotic profiles for scalar conservation laws. J.
Differential Equations 231, 2 (2006), 425–458.
209. Carrillo, J. A., Di Francesco, M., and Lattanzio, C. Contractivity and
asymptotics in Wasserstein metrics for viscous nonlinear scalar conservation
laws. Boll. Unione Mat. Ital. Ser. B Artic. Ric. Mat. 10, 2 (2007), 277–292.
210. Carrillo, J. A., Di Francesco, M., and Toscani, G. Intermediate as-
ymptotics beyond homogeneity and self-similarity: long time behavior for
ut = ∆φ(u). Arch. Ration. Mech. Anal. 180, 1 (2006), 127–149.
211. Carrillo, J. A., Di Francesco, M., and Toscani, G. Strict contractivity of
the 2-Wasserstein distance for the porous medium equation by mass centering.
Proc. Amer. Math. Soc. 135, 2 (2007), 353–363.
212. Carrillo, J. A., Gualdani, M. P., and Toscani, G. Finite speed of prop-
agation in porous media by mass transportation methods. C. R. Math. Acad.
Sci. Paris 338, 10 (2004), 815–818.
926 References
251. Courtault, J.-M., Kabanov, Y., Bru, B., Crépel, P., Lebon, I., and
Le Marchand, A. Louis Bachelier on the centenary of “Théorie de la
spéculation.” Math. Finance 10, 3 (2000), 341–353.
252. Cover, T. M., and Thomas, J. A. Elements of information theory. John
Wiley & Sons Inc., New York, 1991. A Wiley–Interscience Publication.
253. Csiszár, I. Information-type measures of difference of probability distributions
and indirect observations. Stud. Sci. Math. Hung. 2 (1967), 299–318.
254. Cuesta-Albertos, J. A., and Matrán, C. Notes on the Wasserstein metric
in Hilbert spaces. Ann. Probab. 17, 3 (1989), 1264–1276.
255. Cuesta-Albertos, J. A., and Matrán, C. Stochastic convergence through
Skorohod representation theorems and Wasserstein distances. Rend. Circ. Mat.
Palermo (2) Suppl. No. 35 (1994), 89–113.
256. Cuesta-Albertos, J. A., Matrán, C., Rachev, S. T., and Rüschendorf,
L. Mass transportation problems in probability theory. Math. Sci. 21, 1 (1996),
34–72.
257. Cuesta-Albertos, J. A., Matrán, C., and Rodrı́guez-Rodrı́guez, J. Ap-
proximation to probabilities through uniform laws on convex sets. J. Theoret.
Probab. 16, 2 (2003), 363–376.
258. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. On lower
bounds for the L2 -Wasserstein metric in a Hilbert space. J. Theoret. Probab.
9, 2 (1996), 263–283.
259. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. On the
monotonicity of optimal transportation plans. J. Math. Anal. Appl. 215 (1997),
1, 86–94.
260. Cuesta-Albertos, J. A., Matrán, C., and Tuero-Dı́az, A. Optimal trans-
portation plans and convergence in distribution. J. Multivariate Anal. 60, 1
(1997), 72–83.
261. Cuesta-Albertos, J. A., and Tuero-Dı́az, A. A characterization for the
solution of the Monge–Kantorovich mass transference problem. Statist. Probab.
Lett. 16, 2 (1993), 147–152.
262. Cullen, M. J. P. A mathematical theory of large-scale atmosphere/ocean flow.
World Scientific, 2006.
263. Cullen, M. J. P., and Douglas, R. J. Applications of the Monge–Ampère
equation and Monge transport problem to meteorology and oceanography. In
Monge Ampère equation: applications to geometry and optimization (Deerfield
Beach, FL, 1997). Amer. Math. Soc., Providence, RI, 1999, pp. 33–53.
264. Cullen, M. J. P., Douglas, R. J., Roulstone, I., and Sewell, M. J.
Generalised semigeostrophic theory on a sphere. J. Fluid Mech. 531 (2005),
123–157.
265. Cullen, M., and Feldman, M. Lagrangian solutions of semigeostrophic
equations in physical space. SIAM J. Math. Anal. 37, 5 (2006), 1371–1395.
266. Cullen, M. J. P., and Gangbo, W. A variational approach for the
2-dimensional semigeostrophic shallow water equations. Arch. Ration. Mech.
Anal. 156, 3 (2001), 241–273.
267. Cullen, M. J. P., Gangbo, W., and Pisante, G. The semigeostrophic equa-
tions discretized in reference and dual variables. To appear in Arch. Ration.
Mech. Anal.
268. Cullen, M. J. P., and Maroofi, H. The fully compressible semi-geostrophic
system from meteorology. Arch. Ration. Mech. Anal. 167, 4 (2003), 309–336.
References 929
329. Evans, L. C., Feldman, M., and Gariepy, R. F. Fast/slow diffusion and
collapsing sandpiles. J. Differential Equations 137, 1 (1997), 166–209.
330. Evans, L. C. and Gangbo, W. Differential equations methods for the Monge–
Kantorovich mass transfer problem. Mem. Amer. Math. Soc. 137, no. 653,
1999.
331. Evans, L. C., and Gariepy, R. Measure theory and fine properties of func-
tions. CRC Press, Boca Raton, FL, 1992.
332. Evans, L. C., and Gomes, D. A. Effective Hamiltonians and averaging for
Hamiltonian dynamics. I. Arch. Ration. Mech. Anal. 157, 1 (2001), 1–33.
333. Evans, L. C., and Gomes, D. A. Effective Hamiltonians and averaging for
Hamiltonian dynamics. II. Arch. Ration. Mech. Anal. 161, 4 (2002), 271–305.
334. Evans, L. C., and Gomes, D. A. Linear programming interpretations of
Mather’s variational principle. ESAIM Control Optim. Calc. Var. 8 (2002),
693–702.
335. Evans, L. C., and Ishii, H. A PDE approach to some asymptotic problems
concerning random differential equations with small noise intensities. Ann.
Inst. H. Poincaré Anal. Non Linéaire 2, 1 (1985), 1–20.
336. Evans, L. C., and Rezakhanlou, F. A stochastic model for growing sand-
piles and its continuum limit. Comm. Math. Phys. 197, 2 (1998), 325–345.
337. Falconer, K. J. The geometry of fractal sets, vol. 85 of Cambridge Tracts in
Mathematics. Cambridge University Press, Cambridge, 1986.
338. Falconer, K. J. Fractal geometry, second ed. John Wiley & Sons Inc., Hobo-
ken, NJ, 2003. Mathematical foundations and applications.
339. Fang, S., and Shao, J. Optimal transportation maps for Monge–Kantorovich
problem on loop groups. Preprint Univ. Bourgogne No. 470, 2006.
340. Fang, S., and Shao, J. Distance riemannienne, théorème de Rademacher et
inégalité de transport sur le groupe des lacets. C. R. Math. Acad. Sci. Paris
341, 7 (2005), 445–450.
341. Fang, S., and Shao, J. Transportation cost inequalities on path and loop
groups. J. Funct. Anal. 218, 2 (2005), 293–317.
342. Fang, S., Shao, J., and Sturm, K.-T. Wasserstein space over the Wiener
space. Preprint, 2007.
343. Faris, W., Ed. Diffusion, Quantum Theory, and Radically Elementary Math-
ematics, vol. 47 of Mathematical Notes. Princeton University Press, 2006.
344. Fathi, A. Solutions KAM faibles conjuguées et barrières de Peierls. C. R.
Acad. Sci. Paris Sér. I Math. 325, 6 (1997), 649–652.
345. Fathi, A. Théorème KAM faible et théorie de Mather sur les systèmes la-
grangiens. C. R. Acad. Sci. Paris Sér. I Math. 324, 9 (1997), 1043–1046.
346. Fathi, A. Sur la convergence du semi-groupe de Lax–Oleinik. C. R. Acad.
Sci. Paris Sér. I Math. 327, 3 (1998), 267–270.
347. Fathi, A. Weak KAM theorem in Lagrangian dynamics. To be published by
Cambridge University Press.
348. Fathi, A., and Figalli, A. Optimal transportation on non-compact mani-
folds. To appear in Israel J. Math.
349. Fathi, A., and Mather, J. N. Failure of convergence of the Lax–Oleinik
semigroup in the time-periodic case. Bull. Soc. Math. France 128, 3 (2000),
473–483.
350. Fathi, A., and Siconolfi, A. Existence of C 1 critical subsolutions of the
Hamilton–Jacobi equation. Invent. Math. 155, 2 (2004), 363–388.
References 933
392. Gallay, T., and Wayne, C. E. Global stability of vortex solutions of the
two-dimensional Navier–Stokes equation. Comm. Math. Phys. 255, 1 (2005),
97–129.
393. Gallot, S. Isoperimetric inequalities based on integral norms of Ricci curva-
ture. Astérisque, 157–158 (1988), 191–216.
394. Gallot, S., Hulin, D., and Lafontaine, J. Riemannian geometry, sec-
ond ed. Universitext. Springer-Verlag, Berlin, 1990.
395. Gangbo, W. An elementary proof of the polar factorization of vector-valued
functions. Arch. Rational Mech. Anal. 128, 4 (1994), 381–399.
396. Gangbo, W. The Monge mass transfer problem and its applications. In Monge
Ampère equation: applications to geometry and optimization (Deerfield Beach,
FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc., Providence, RI,
1999, pp. 79–104.
397. Gangbo, W. Review on the book Gradient flows in metric spaces and in the
space of probability measures by Ambrosio, Gigli and Savaré, 2006. Available
online at www.math.gatech.edu/~gangbo.
398. Gangbo, W., and McCann, R. J. Optimal maps in Monge’s mass transport
problem. C. R. Acad. Sci. Paris Sér. I Math. 321, 12 (1995), 1653–1658.
399. Gangbo, W., and McCann, R. J. The geometry of optimal transportation.
Acta Math. 177, 2 (1996), 113–161.
400. Gangbo, W., and McCann, R. J. Shape recognition via Wasserstein dis-
tance. Quart. Appl. Math. 58, 4 (2000), 705–737.
401. Gangbo, W., Nguyen, T., and Tudorascu, A. Euler–Poisson systems as
action-minimizing paths in the Wasserstein space. Preprint, 2006.
Available online at www.math.gatech.edu/~gangbo/publications.
402. Gangbo, W., and Oliker, V. I. Existence of optimal maps in the reflector-
type problems. ESAIM Control Optim. Calc. Var. 13, 1 (2007), 93–106.
403. Gangbo, W., and Swiȩch,´ A. Optimal maps for the multidimensional
Monge–Kantorovich problem. Comm. Pure Appl. Math. 51, 1 (1998), 23–45.
404. Gangbo, W., and Westdickenberg, M. Optimal transport for the system
of isentropic Euler equations. Work in progress.
405. Gao, F., and Wu, L. Transportation-Information inequalities for Gibbs mea-
sures. Preprint, 2007.
406. Gardner, R. The Brunn–Minkowski inequality. Bull. Amer. Math. Soc. (N.S.)
39, 3 (2002), 355–405.
407. Gelbrich, M. On a formula for the L2 Wasserstein metric between measures
on Euclidean and Hilbert spaces. Math. Nachr. 147 (1990), 185–203.
408. Gentil, I. Inégalités de Sobolev logarithmiques et hypercontractivité en
mécanique statistique et en E.D.P. PhD thesis, Univ. Paul-Sabatier (Toulouse),
2001. Available online at
www.ceremade.dauphine.fr/~gentil/maths.html.
409. Gentil, I. Ultracontractive bounds on Hamilton–Jacobi solutions. Bull. Sci.
Math. 126, 6 (2002), 507–524.
410. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities and transportation inequalities. Probab. Theory Related Fields 133,
3 (2005), 409–436.
411. Gentil, I., Guillin, A., and Miclo, L. Modified logarithmic Sobolev in-
equalities in null curvature. Rev. Mat. Iberoamericana 23, 1 (2007), 237–260.
412. Gentil, I., and Malrieu, F. Équations de Hamilton–Jacobi et inégalités
entropiques généralisées. C. R. Acad. Sci. Paris 335 (2002), 437–440.
936 References
473. Hiai, F., Ohya, M., and Tsukada, M. Sufficiency and relative entropy in ∗-
algebras with applications in quantum systems. Pacific J. Math. 107, 1 (1983),
117–140.
474. Hiai, F., Petz, D., and Ueda, Y. Free transportation cost inequalities via
random matrix approximation. Probab. Theory Related Fields 130, 2 (2004),
199–221.
475. Hiai, F., and Ueda, Y. Free transportation cost inequalities for noncommu-
tative multi-variables. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 9, 3
(2006), 391–412.
476. Hohloch, S. Optimale Massebewegung im Monge–Kantorovich-Transport-
problem. Diploma thesis, Freiburg Univ., 2002.
477. Holley, R. Remarks on the FKG inequalities. Comm. Math. Phys. 36 (1974),
227–231.
478. Holley, R., and Stroock, D. W. Logarithmic Sobolev inequalities and sto-
chastic Ising models. J. Statist. Phys. 46, 5–6 (1987), 1159–1194.
479. Horowitz, J., and Karandikar, R. L. Mean rates of convergence of empir-
ical measures in the Wasserstein metric. J. Comput. Appl. Math. 55, 3 (1994),
261–273. (See also the reviewer’s note on MathSciNet.)
480. Hoskins, B. J. Atmospheric frontogenesis models: some solutions. Q.J.R. Met.
Soc. 97 (1971), 139–153.
481. Hoskins, B. J. The geostrophic momentum approximation and the semi-
geostrophic equations. J. Atmosph. Sciences 32 (1975), 233–242.
482. Hoskins, B. J. The mathematical theory of frontogenesis. Ann. Rev. of Fluid
Mech. 14 (1982), 131–151.
483. Hsu, E. P. Stochastic analysis on manifolds, vol. 38 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2002.
484. Hsu, E. P., and Sturm, K.-T. Maximal coupling of Euclidean Brownian
motions. FB-Preprint No. 85, Bonn. Available online at
www-wt.iam.uni-bonn.de/~sturm/de/index.html.
485. Huang, C., and Jordan, R. Variational formulations for Vlasov–Poisson–
Fokker–Planck systems. Math. Methods Appl. Sci. 23, 9 (2000), 803–843.
486. Ichihara, K. Curvature, geodesics and the Brownian motion on a Riemannian
manifold. I. Recurrence properties, II. Explosion properties. Nagoya Math. J.
87 (1982), 101–114; 115–125.
487. Ishii, H. Asymptotic solutions for large time of Hamilton–Jacobi equations.
International Congress of Mathematicians, Vol. III, Eur. Math. Soc., Zürich,
2006, pp. 213–227. Short presentation available online at
www.edu.waseda.ac.jp/~ishii.
488. Ishii, H. Unpublished lecture notes on the weak KAM theorem, 2004. Available
online at www.edu.waseda.ac.jp/~ishii.
489. Itoh, J.-i., and Tanaka, M. The Lipschitz continuity of the distance function
to the cut locus. Trans. Amer. Math. Soc. 353, 1 (2001), 21–40.
490. Jian, H.-Y., and Wang, X.-J. Continuity estimates for the Monge–Ampère
equation. Preprint, 2006.
491. Jimenez, Ch. Dynamic formulation of optimal transport problems. To appear
in J. Convex Anal.
492. Jones, P., Maggioni, M., and Schul, R. Universal local parametrizations
via heat kernels and eigenfunctions of the Laplacian. Preprint, 2008.
493. Jordan, R., Kinderlehrer, D., and Otto, F. The variational formulation
of the Fokker–Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17.
940 References
515. Khesin, B., and Misiolek, G. Shock waves for the Burgers equation and
curvatures of diffeomorphism groups. Shock waves for the Burgers equation
and curvatures of diffeomorphism groups. Proc. Steklov Inst. Math. 250 (2007),
1–9.
516. Kim, S. Harnack inequality for nondivergent elliptic operators on Riemannian
manifolds. Pacific J. Math. 213, 2 (2004), 281–293.
517. Kim, Y. J., and McCann, R. J. Sharp decay rates for the fastest conservative
diffusions. C. R. Math. Acad. Sci. Paris 341, 3 (2005), 157–162.
518. Kim, Y.-H. Counterexamples to continuity of optimal transportation on pos-
itively curved Riemannian manifolds. Preprint, 2007.
519. Kim, Y.-H., and McCann, R. J. On the cost-subdifferentials of cost-convex
functions. Preprint, 2007. Archived online at arxiv.org/abs/0706.1266.
520. Kim, Y.-H., and McCann, R. J. Continuity, curvature, and the general co-
variance of optimal transportation. Preprint, 2007.
521. Kleptsyn, V., and Kurtzmann, A. Ergodicity of self-attracting Brownian
motion. Preprint, 2008.
522. Kloeckner, B. A geometric study of the Wasserstein space of the line.
Preprint, 2008.
523. Knothe, H. Contributions to the theory of convex bodies. Michigan Math. J.
4 (1957), 39–52.
524. Knott, M., and Smith, C. S. On the optimal mapping of distributions.
J. Optim. Theory Appl. 43, 1 (1984), 39–49.
525. Knott, M., and Smith, C. S. On a generalization of cyclic monotonicity and
distances among random vectors. Linear Algebra Appl. 199 (1994), 363–371.
526. Kolesnikov, A. V. Convexity inequalities and optimal transport of infinite-
dimensional measures. J. Math. Pures Appl. (9) 83, 11 (2004), 1373–1404.
527. Kontorovich, L. A linear programming inequality with applications to
concentration of measures. Preprint, 2006. Archived online at arxiv.org/
abs/math.FA/0610712.
528. Kontsevich, M., and Soibelman, Y. Homological mirror symmetry and
torus fibrations. In Symplectic geometry and mirror symmetry (Seoul, 2000).
World Sci. Publ., River Edge, NJ, 2001, pp. 203–263.
529. Koskela, P. Upper gradients and Poincaré inequality on metric measure
spaces. In Lecture notes on Analysis in metric spaces (Trento, 1999), Appunti
Corsi Tenuti Docenti Sc., Scuola Norm. Sup., Pisa, 2000, pp. 55–69.
530. Krylov, N. V. Boundedly nonhomogeneous elliptic and parabolic equations.
Izv. Akad. Nauk SSSR, Ser. Mat. 46, 3 (1982), 487–523. English translation in
Math. USSR Izv. 20, 3 (1983), 459–492.
531. Krylov, N. V. Fully nonlinear second order elliptic equations: recent devel-
opment. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 25, 3–4 (1998), 569–595.
532. Krylov, N. V., and Safonov, M. V. An estimate of the probability that a
diffusion process hits a set of positive measure. Dokl. Acad. Nauk SSSR 245,
1 (1979), 18–20.
533. Kuksin, S., Piatnitski, A., and Shirikyan, A. A coupling approach to ran-
domly forced nonlinear PDE’s, II. Comm. Math. Phys. 230, 1 (2002), 81–85.
534. Kullback, S. A lower bound for discrimination information in terms of vari-
ation. IEEE Trans. Inform. Theory 4 (1967), 126–127.
535. Kurtzmann, A. The ODE method for some self-interacting diffusions on Rd .
Preprint, 2008.
942 References
556. Liese, F., and Vajda, I. Convex statistical distances, vol. 95 of Teubner-Texte
zur Mathematik. BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1987. With
German, French and Russian summaries.
557. Lions, J.-L. Quelques méthodes de résolution des problèmes aux limites non
linéaires. Dunod, 1969.
558. Lions, P.-L. Generalized solutions of Hamilton–Jacobi equations. Pitman
(Advanced Publishing Program), Boston, Mass., 1982.
559. Lions, P.-L. Personal communication, 2003.
560. Lions, P.-L., and Trudinger, N. S. Linear oblique derivative problems for
the uniformly elliptic Hamilton–Jacobi–Bellman equation. Math. Z. 191, 1
(1986), 1–15.
561. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem for
equations of Monge–Ampère type. Comm. Pure Appl. Math. 39, 4 (1986),
539–563.
562. Lions, P.-L., Trudinger, N. S., and Urbas, J. The Neumann problem for
equations of Monge–Ampère type. Proceedings of the Centre for Mathematical
Analysis, Australian National University, 10, Canberra, 1986, pp. 135–140.
563. Lions, P.-L., and Lasry, J.-M. Régularité optimale de racines carrées. C. R.
Acad. Sci. Paris Sér. I Math. 343, 10 (2006), 679–684.
564. Lions, P.-L., Papanicolaou, G., and Varadhan, S. R. S. Homogeneization
of Hamilton–Jacobi equations. Unpublished preprint, 1987.
565. Lisini, S. Characterization of absolutely continuous curves in Wasserstein
spaces. Calc. Var. Partial Differential Equations 28, 1 (2007), 85–120.
566. Liu, J. Hölder regularity of optimal mapping in optimal transportation. To
appear in Calc. Var. Partial Differential Equations.
567. Quasi-neutral limit of the Euler–Poisson and Euler–Monge–Ampère systems.
Comm. Partial Differential Equations 30, 7-9 (2005), 1141–1167.
568. Loeper, G. The reconstruction problem for the Euler-Poisson system in cos-
mology. Arch. Ration. Mech. Anal. 179, 2 (2006), 153–216.
569. Loeper, G. A fully nonlinear version of Euler incompressible equations: the
Semi-Geostrophic system. SIAM J. Math. Anal. 38, 3 (2006), 795–823.
570. Loeper, G. On the regularity of maps solutions of optimal transportation
problems. To appear in Acta Math.
571. Loeper, G. On the regularity of maps solutions of optimal transportation
problems II. Work in progress, 2007.
572. Loeper, G., and Villani, C. Regularity of optimal transport in curved
geometry: the nonfocal case. Preprint, 2007.
573. Lott, J. Optimal transport and Ricci curvature for metric-measure spaces.
To appear in Surveys in Differential Geometry.
574. Lott, J. Some geometric properties of the Bakry–Émery–Ricci tensor. Com-
ment. Math. Helv. 78, 4 (2003), 865–883.
575. Lott, J. Some geometric calculations on Wasserstein space. To appear in
Comm. Math. Phys. Available online at www.math.lsa.umich.edu/~lott.
576. Lott, J. Optimal transport and Perelman’s reduced volume. Preprint, 2008.
577. Lott, J., and Villani, C. Ricci curvature for metric-measure spaces via op-
timal transport. To appear in Ann. of Math. (2) Available online at www.umpa.
ens-lyon.fr/~cvillani.
578. Lott, J., and Villani, C. Weak curvature bounds and functional inequalities.
J. Funct. Anal. 245, 1 (2007), 311–333.
944 References
579. Lott, J., and Villani, C. Hamilton–Jacobi semigroup on length spaces and
applications. J. Math. Pures Appl. (9) 88, 3 (2007), 219–229.
580. Lu, P., Ni, L., Vázquez, J.-L., and Villani, C. Local Aronson–Bénilan
estimates and entropy formulae for porous medium and fast diffusion equations
on manifolds. Preprint, 2008.
581. Lusternik, L. A. Die Brunn–Minkowskische Ungleichung für beliebige mess-
bare Mengen. Dokl. Acad. Sci. URSS, 8 (1935), 55–58.
582. Lutwak, E., Yang, D., and Zhang, G. Optimal Sobolev norms and the Lp
Minkowski problem. Int. Math. Res. Not. (2006), Art. ID 62987, 21.
583. Lytchak, A. Differentiation in metric spaces. Algebra i Analiz 16, 6 (2004),
128–161.
584. Lytchak, A. Open map theorem for metric spaces. Algebra i Analiz 17, 3
(2005), 139–159.
585. Ma, X.-N., Trudinger, N. S., and Wang, X.-J. Regularity of potential
functions of the optimal transportation problem. Arch. Ration. Mech. Anal.
177, 2 (2005), 151–183.
586. Maggi, F. Some methods for studying stability in isoperimetric type problems.
To appear in Bull. Amer. Math. Soc.
587. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
J. Geom. Anal. 15, 1 (2005), 83–121.
588. Maggi, F., and Villani, C. Balls have the worst best Sobolev inequalities.
Part II: Variants and extensions. Calc. Var. Partial Differential Equations 31,
1 (2008), 47–74.
589. Mallows, C. L. A note on asymptotic joint normality. Ann. Math. Statist.
43 (1972), 508–515.
590. Malrieu, F. Logarithmic Sobolev inequalities for some nonlinear PDE’s. Sto-
chastic Process. Appl. 95, 1 (2001), 109–132.
591. Malrieu, F. Convergence to equilibrium for granular media equations and
their Euler schemes. Ann. Appl. Probab. 13, 2 (2003), 540–560.
592. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. In
International Conference on Dynamical Systems (Montevideo, 1995), vol. 362
of Pitman Res. Notes Math. Ser. Longman, Harlow, 1996, pp. 120–131.
593. Mañé, R. Lagrangian flows: the dynamics of globally minimizing orbits. Bol.
Soc. Brasil. Mat. (N.S.) 28, 2 (1997), 141–153.
594. Maroofi, H. Applications of the Monge–Kantorovich theory. PhD thesis,
Georgia Tech, 2002.
595. Marton, K. A measure concentration inequality for contracting Markov
chains. Geom. Funct. Anal. 6 (1996), 556–571.
596. Marton, K. Measure concentration for Euclidean distance in the case of de-
pendent random variables. Ann. Probab. 32, 3B (2004), 2526–2544.
597. Massart, P. Concentration inequalities and model selection. Lecture Notes
from the 2003 Saint-Flour Probability Summer School. To appear in the
Springer book series Lecture Notes in Math. Available online at www.
math.u-psud.fr/~massart.
598. Mather, J. N. Existence of quasiperiodic orbits for twist homeomorphisms
of the annulus. Topology 21, 4 (1982), 457–467.
599. Mather, J. N. More Denjoy minimal sets for area preserving diffeomorphisms.
Comment. Math. Helv. 60 (1985), 508–557.
600. Mather, J. N. Minimal measures. Comment. Math. Helv. 64, 3 (1989),
375–394.
References 945
645. Namah, G., and Roquejoffre, J.-M. Remarks on the long time behaviour
of the solutions of Hamilton–Jacobi equations. Comm. Partial Differential
Equations 24, 5–6 (1999), 883–893.
646. Nash, J. Continuity of solutions of parabolic and elliptic equations. Amer. J.
Math. 80 (1958), 931–954.
647. Nelson, E. Derivation of the Schrödinger equation from Newtonian mechanics.
Phys. Rev. 150 (1966), 1079–1085.
648. Nelson, E. Dynamical theories of Brownian motion. Princeton University
Press, Princeton, N.J., 1967. 2001 re-edition by J. Suzuki. Available online at
www.math.princeton.edu/~nelson/books.html.
649. Nelson, E. The free Markoff field. J. Functional Analysis 12 (1973), 211–227.
650. Nelson, E. Critical diffusions. In Séminaire de probabilités, XIX, 1983/84,
vol. 1123 of Lecture Notes in Math. Springer, Berlin, 1985, pp. 1–11.
651. Nelson, E. Quantum fluctuations. Princeton Series in Physics. Princeton
University Press, Princeton, NJ, 1985.
652. Nelson, E. Stochastic mechanics and random fields. In École d’Été de Proba-
bilités de Saint-Flour XV–XVII, 1985–87, vol. 1362 of Lecture Notes in Math.,
Springer, Berlin, 1988, pp. 427–450.
653. Neunzert, H. An introduction to the nonlinear Boltzmann–Vlasov equation.
In Kinetic theories and the Boltzmann equation, C. Cercignani, Ed., vol. 1048
of Lecture Notes in Math., Springer, Berlin, Heidelberg, 1984, pp. 60–110.
654. Ohta, S.-i. On the measure contraction property of metric measure spaces.
Comment. Math. Helv. 82, 4 (2007), 805–828.
655. Ohta, S.-i. Gradient flows on Wasserstein spaces over compact Alexandrov
spaces. To appear in Amer. J. Math.
656. Ohta, S.-i. Products, cones, and suspensions of spaces with the measure con-
traction property. J. Lond. Math. Soc. (2) 76, 1 (2007), 225–236.
657. Ohta, S.-i. Finsler interpolation inequalities. Preprint, 2008. Available online
at www.math.kyoto-u.ac.jp/~sohta.
658. Øksendal, B. Stochastic differential equations. An introduction with applica-
tions, sixth ed. Universitext. Springer-Verlag, Berlin, 2003.
659. Oliker, V. Embedding Sn into Rn+1 with given integral Gauss curvature and
optimal mass transport on Sn . Adv. Math. 213, 2 (2007), 600–620.
660. Oliker, V. Variational solutions of some problems in convexity via Monge–
Kantorovich optimal mass transport theory. Conference in Oberwolfach, July
2006 (personal communication).
661. Oliker, V., and Prussner, L. D. A new technique for synthesis of offset dual
reflector systems. In 10th Annual Review of Progress in Applied Computational
Electromagnetics, 1994, pp. 45–52.
662. Ollivier, Y. Ricci curvature of Markov chains on metric spaces. Preprint,
2007. Available online at www.umpa.ens-lyon.fr/~yollivie/publs.html.
663. Ollivier, Y., and Pansu, P. Courbure de Ricci et concentration
de la mesure. Working seminar notes. Available online at www.umpa.
ens-lyon.fr/~yollivie.
664. Osserman, R. The isoperimetric inequality. Bull. Amer. Math. Soc. 84, 6
(1978), 1182–1238.
665. Otsu, Y., and Shioya, T. The Riemannian structure of Alexandrov spaces.
J. Differential Geom. 39, 3 (1994), 629–658.
666. Otto, F. Double degenerate diffusion equations as steepest descent. Preprint
Univ. Bonn, 1996.
948 References
750. Sheng, W., Trudinger, N. S., and Wang, X.-J. The Yamabe prob-
lem for higher order curvatures. Preprint, archived online at arxiv.org/abs/
math/0505463.
751. Shioya, T. Mass of rays in Alexandrov spaces of nonnegative curvature. Com-
ment. Math. Helv. 69, 2 (1994), 208–228.
752. Siburg, K. F. The principle of least action in geometry and dynamics,
vol. 1844 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2004.
753. Simon, L. Lectures on geometric measure theory. Proceedings of the Centre
for Mathematical Analysis, Australian National University, 3, Canberra, 1983.
754. Smith, C., and Knott, M. On Hoeffding–Fréchet bounds and cyclic
monotone relations. J. Multivariate Anal. 40, 2 (1992), 328–334.
755. Sobolevskiı̆, A., and Frisch, U. Application of optimal transportation
theory to the reconstruction of the early Universe. Zap. Nauchn. Sem.
S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 312, 11 (2004), 303–309, 317.
English translation in J. Math. Sci. (N. Y.) 133, 4 (2006), 1539–1542.
756. Soibelman, Y. Notes on noncommutative Riemannian geometry. Personal
communication, 2006.
757. Spohn, H. Large scale dynamics of interacting particles. Texts and Mono-
graphs in Physics. Springer-Verlag, Berlin, 1991.
758. Stam, A. Some inequalities satisfied by the quantities of information of Fisher
and Shannon. Inform. Control 2 (1959), 101–112.
759. Stroock, D. W. An introduction to the analysis of paths on a Riemannian
manifold, vol. 74 of Mathematical Surveys and Monographs. American Mathe-
matical Society, Providence, RI, 2000.
760. Sturm, K.-T. Diffusion processes and heat kernels on metric spaces. Ann.
Probab. 26, 1 (1998), 1–55.
761. Sturm, K.-T. Convex functionals of probability measures and nonlinear dif-
fusions on manifolds. J. Math. Pures Appl. (9) 84, 2 (2005), 149–168.
762. Sturm, K.-T. On the geometry of metric measure spaces. I. Acta Math. 196,
1 (2006), 65–131.
763. Sturm, K.-T. On the geometry of metric measure spaces. II. Acta Math. 196,
1 (2006), 133–177.
764. Sturm, K.-T., and von Renesse, M.-K. Transport inequalities, gradient
estimates, entropy and Ricci curvature. Comm. Pure Appl. Math. 58, 7 (2005),
923–940.
765. Sudakov, V. N. Geometric problems in the theory of infinite-dimensional
probability distributions. Proc. Steklov Inst. Math. 141 (1979), 1–178.
766. Sudakov, V. N. and Cirel son, B. S. Extremal properties of half-spaces
for spherically invariant measures. Zap. Naučn. Sem. Leningrad. Otdel. Mat.
Inst. Steklov. (LOMI) 41 (1974), 14–24. English translation in J. Soviet Math.
9 (1978), 9–18.
767. Sznitman, A.-S. Equations de type de Boltzmann, spatialement homogènes.
Z. Wahrsch. Verw. Gebiete 66 (1984), 559–562.
768. Sznitman, A.-S. Topics in propagation of chaos. In École d’Été de Probabilités
de Saint-Flour XIX—1989. Springer, Berlin, 1991, pp. 165–251.
769. Szulga, A. On minimal metrics in the space of random variables. Teor. Veroy-
atnost. i Primenen. 27, 2 (1982), 401–405.
770. Talagrand, M. A new isoperimetric inequality for product measure and the
tails of sums of independent random variables. Geom. Funct. Anal. 1, 2 (1991),
211–223.
References 953
794. Trudinger, N. S., and Wang, X.-J. On the second boundary value problem
for Monge–Ampère type equations and optimal transportation. Preprint, 2006.
Archived online at arxiv.org/abs/math.AP/0601086.
795. Tuero-Dı́az, A. On the stochastic convergence of representations based on
Wasserstein metrics. Ann. Probab. 21, 1 (1993), 72–85.
796. Uckelmann, L. Optimal couplings between one-dimensional distributions.
Distributions with given marginals and moment problems (Prague, 1996),
Kluwer Acad. Publ., Dordrecht, 1997, pp. 275–281.
797. Unterreiter, A., Arnold, A., Markowich, P., and Toscani, G. On gen-
eralized Csiszár–Kullback inequalities. Monatsh. Math. 131, 3 (2000), 235–253.
798. Urbas, J. On the second boundary value problem for equations of Monge–
Ampère type. J. Reine Angew. Math. 487 (1997), 115–124.
799. Urbas, J. Mass transfer problems. Lecture notes from a course given in Univ.
Bonn, 1997–1998.
800. Urbas, J. The second boundary value problem for a class of Hessian equations.
Comm. Partial Differential Equations 26, 5–6 (2001), 859–882.
801. Valdimarsson, S. I. On the Hessian of the optimal transport potential.
Preprint, 2006.
802. Varadhan, S. R. S. Mathematical statistics. Courant Institute of Mathemat-
ical Sciences New York University, New York, 1974. Lectures given during the
academic year 1973–1974, Notes by Michael Levandowsky and Norman Rubin.
803. Vasershtein, L. N. Markov processes over denumerable products of spaces
describing large system of automata. Problemy Peredači Informacii 5, 3 (1969),
64–72.
804. Vázquez, J. L. An introduction to the mathematical theory of the porous
medium equation. In Shape optimization and free boundaries (Montreal, PQ,
1990). Kluwer Acad. Publ., Dordrecht, 1992, pp. 347–389.
805. Vázquez, J. L. The porous medium equation. Mathematical theory. Oxford
Mathematical Monographs. The Clarendon Press Oxford University Press,
New York, 2006.
806. Vázquez, J. L. Smoothing and decay estimates for nonlinear parabolic equa-
tions of porous medium type, vol. 33 of Oxford Lecture Series in Mathematics
and its Applications. Oxford University Press, 2006.
807. Vershik, A. M. Some remarks on the infinite-dimensional problems of linear
programming. Russian Math. Surveys 25, 5 (1970), 117–124.
808. Vershik, A. M. L. V. Kantorovich and linear programming. Historical note
(2001, updated in 2007). Archived at www.arxiv.org/abs/0707.0491.
809. Vershik, A. M. The Kantorovich metric: the initial history and little-known
applications. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov.
(POMI) 312, Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 11 (2004),
69–85, 311.
810. Vershik, A. M., Ed. J. Math. Sci. 133, 4 (2006). Special issue dedicated to
L. V. Kantorovich. Springer, New York, 2006. Translated from the Russian:
Zapsiki Nauchn. seminarov POMI, vol. 312: “Theory of representation of Dy-
namical Systems. Special Issue”. Saint-Petersburg, 2004.
811. Villani, C. Remarks about negative pressure problems. Unpublished notes,
2002.
812. Villani, C. A review of mathematical topics in collisional kinetic theory. In
Handbook of mathematical fluid dynamics, Vol. I. North-Holland, Amsterdam,
2002, pp. 71–305.
References 955
835. Wang, X.-J. Schauder estimates for elliptic and parabolic equations. Chin.
Ann. Math. 27B, 6 (2006), 637–642.
836. Werner, R. F. The uncertainty relation for joint measurement of position
and momentum. Quantum Information and Computing (QIC) 4, 6–7 (2004),
546–562. Archived online at arxiv.org/abs/quant-ph/0405184.
837. Wolansky, G. On time reversible description of the process of coagulation
and fragmentation. To appear in Arch. Ration. Mech. Anal.
838. Wolansky, G. Extended least action principle for steady flows under a pre-
scribed flux. To appear in Calc. Var. Partial Differential Equations.
839. Wolansky, G. Minimizers of Dirichlet functionals on the n-torus and the weak
KAM theory. Preprint, 2007. Available online at www.math.technion.ac.il/
~gershonw.
840. Wolfson, J. G. Minimal Lagrangian diffeomorphisms and the Monge–Ampère
equation. J. Differential Geom. 46, 2 (1997), 335–373.
841. Wu, L. Poincaré and transportation inequalities for Gibbs measures under the
Dobrushin uniqueness condition. Ann. Probab. 34, 5 (2006), 1960–1989.
842. Wu, L. A simple inequality for probability measures and applications.
Preprint, 2006.
843. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality w.r.t. a
uniform metric for diffusions. Acta Math. Appl. Sin. Engl. Ser. 20, 3 (2004),
357–364.
844. Wu, L., and Zhang, Z. Talagrand’s T2 -transportation inequality and log-
Sobolev inequality for dissipative SPDEs and applications to reaction-diffusion
equations. Chinese Ann. Math. Ser. B 27, 3 (2006), 243–262.
845. Yukich, J. E. Optimal matching and empirical measures. Proc. Amer. Math.
Soc. 107, 4 (1989), 1051–1059.
846. Zhu, S. The comparison geometry of Ricci curvature. In Comparison geometry
(Berkeley, CA, 1993–94), vol. 30 of Math. Sci. Res. Inst. Publ., Cambridge
University Press, Cambridge, 1997, pp. 221–262.
List of short statements
Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Deterministic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Existence of an optimal coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Lower semicontinuity of the cost functional . . . . . . . . . . . . . . . . . . . . . . 43
Tightness of transference plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Optimality is inherited by restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Convexity of the optimal cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Cyclical monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
c-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Alternative characterization of c-convexity . . . . . . . . . . . . . . . . . . . . . . 57
Kantorovich duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Restriction of c-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Restriction for the Kantorovich duality theorem . . . . . . . . . . . . . . . . . 75
Stability of optimal transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Compactness of the set of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . 77
Measurable selection of optimal plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Stability of the transport map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Dual transport inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Criterion for solvability of the Monge problem . . . . . . . . . . . . . . . . . . . 84
Wasserstein distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Kantorovich–Rubinstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Wasserstein space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Weak convergence in Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Wp metrizes Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Continuity of Wp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Metrizability of the weak topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Cauchy sequences in Wp are tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
957
958 List of short statements
965
966 List of figures
pressure and iterated pressure, 422 Talagrand inequality, 569, 583, 585,
Prokhorov theorem, 43, 759 723, 875
tangent cone, 257
Rademacher’s theorem, 222, 898 total variation, 582
rearrangement, 7 transference plan, 10
rectifiability, 257 dynamical, 126, 779
regular cost function, 292, 311 generalized optimal, 250
regularity theory, 317 optimal, 10
regularizing kernel, 834, 845 transform
restriction property, 46 c-transform, 55, 56
dual side, 75 Legendre, 55, 809
Riemannian manifold, 148 transport inequality, 80, 569, 605
transport map, 6
stability, 79
second fundamental form, 300, 304
twist condition, 216, 267
selection theorem, 92 strong, 299
semi-distance, 747
semi-geostrophic system, 35 Vlasov equation, 27
shortening principle, 163, 166, 199 volume (Riemannian), 152
Sobolev inequality, 378, 545, 548, 553,
722, 872 Wasserstein distance, 93, 106
logarithmic, 546 Wasserstein space, 94, 96, 104
Spectral gap inequality, 378 differential structure, 343, 636, 682
speed, 118, 775 geodesics, 127
stochastic mechanics, 159 gradient flows, 672
subdifferential Gromov–Hausdorff convergence, 777
c-subdifferential, 55, 88 representation of curves, 344
Clarke, 261 weak CD(K, N ) space, 801, 838
synthetic point of view, 736 weak KAM theory, 189, 201
Some notable cost functions
In the following table I have compiled a few remarkable cost functions which
have appeared in various applications. The list is by no means exhaustive
and the suggested references should be considered just as entry points to the
corresponding literature.
971
972