Ricci Curvature of Finite Markov Chains Via Convexity of The Entropy

a
r
X
i
v
:
1
1
1
1
.
2
6
8
7
v
2

[
m
a
t
h
.
M
G
]

2
2

J
u
n

2
0
1
2
RICCI CURVATURE OF FINITE MARKOV CHAINS VIA
CONVEXITY OF THE ENTROPY
MATTHIAS ERBAR AND JAN MAAS
Abstract. We study a new notion of Ricci curvature that applies to
Markov chains on discrete spaces. This notion relies on geodesic con-
vexity of the entropy and is analogous to the one introduced by Lott,
Sturm, and Villani for geodesic measure spaces. In order to apply to
the discrete setting, the role of the Wasserstein metric is taken over by
a dierent metric, having the property that continuous time Markov
chains are gradient ows of the entropy.
Using this notion of Ricci curvature we prove discrete analogues of
fundamental results by Bakry
Emery and OttoVillani. Furthermore

we show that Ricci curvature bounds are preserved under tensorisation.
As a special case we obtain the sharp Ricci curvature lower bound for
the discrete hypercube.
Contents
1. Introduction 1
2. The metric J 7
3. Geodesics 17
4. Ricci curvature 19
5. Examples 25
6. Basic constructions 29
7. Functional inequalities 32
References 37
1. Introduction
In two independent contributions Sturm [41] and Lott and Villani [27]
solved the long-standing open problem of dening a synthetic notion of Ricci
curvature for a large class of metric measure spaces.
The key observation, proved in [39], is that on a Riemannian manifold /,
the Ricci curvature is bounded from below by some constant R, if and
only if the Boltzmann-Shannon entropy 1() =
_
log dvol is -convex
along geodesics in the L
2
-Wasserstein space of probability measures on /.
The latter condition does not appeal to the geometric structure /, but only
requires a metric (to dene the L
2
-Wasserstein metric W
2
) and a reference
measure (to dene the entropy 1). Therefore this condition can be used
in order to dene a notion of Ricci curvature lower boundedness on more
Date: June 25, 2012.
JM is supported by Rubicon subsidy 680-50-0901 of the Netherlands Organisation for
Scientic Research (NWO).
1
2 MATTHIAS ERBAR AND JAN MAAS
general metric measure spaces. This notion turns out to be stable under
Gromov-Hausdor convergence and it implies a large number of functional
inequalities with sharp constants. The theory of metric measure spaces with
Ricci curvature bounds in the sense of Lott, Sturm, and Villani is still under
active development [2, 3].
However, the condition of Lott-Sturm-Villani does not apply if the L
2
-
Wasserstein space over A does not contain geodesics. Unfortunately, this
is the case if the underlying space is discrete (even if the underlying space
consists of only two points). The aim of the present paper is to develop
a variant of the theory of Lott-Sturm-Villani, which does apply to discrete
spaces.
In order to circumvent the nonexistence of Wasserstein geodesics, we re-
place the L
2
-Wasserstein metric by a dierent metric J, which has been in-
troduced in [28]. There it has been shown that the heat ow associated with
a Markov kernel on a nite set is the gradient ow of the entropy with re-
spect to J (see also the independent work [15] containing related results for
Fokker-Planck equations on graphs, as well as [31], where this gradient ow
structure has been discovered in the setting of reaction-diusion systems). In
this sense, J takes over the role of the Wasserstein metric, since it is known
since the seminal work by Jordan, Kinderlehrer, and Otto that the heat ow
on R
n
is the gradient ow of the entropy [23] (see [2, 18, 19, 20, 32, 36] for
variations and generalisations). Convexity along J-geodesics may thus be
regarded as a discrete analogue of McCanns displacement convexity [29],
which corresponds to convexity along W
2
-geodesics in a continuous setting.
Since every pair of probability densities on A can be joined by a J-
geodesic, it is possible to dene a notion of Ricci curvature in the spirit
of Lott-Sturm-Villani by requiring geodesic convexity of the entropy with
respect to the metric J. This possibility has already been indicated in
[28]. We shall show that this notion of Ricci curvature shares a number of
properties which make the LSV denition so powerful: in particular, it is
stable under tensorisation and implies a number of functional inequalities,
including a modied logarithmic Sobolev inequality, and a Talagrand-type
inequality involving the metric J.
Main results. Let us now discuss the contents of this paper in more detail.
We work with an irreducible Markov kernel K : A A R
+
on a nite set
A, i.e., we assume that
yX
K(x, y) = 1
for all x A, and that for every x, y A there exists a sequence x
i
n
i=0

A such that x
0
= x, x
n
= y and K(x
i1
, x
i
) > 0 for all 1 i n.
Basic Markov chain theory guarantees the existence of a unique stationary
probability measure (also called steady state) on A, i.e.,
xX
(x) = 1 and (y) =
xX
(x)K(x, y)
RICCI CURVATURE OF FINITE MARKOV CHAINS 3
for all y A. We assume that is reversible for K, which means that the
detailed balance equations
K(x, y)(x) = K(y, x)(y) (1.1)
hold for x, y A.
Let
P(A) :=
_
: A R
+
[
xX
(x)(x) = 1
_
be the set of probability densities on A. The subset consisting of those
probability densities that are strictly positive is denoted by P
(A). We
consider the metric J dened for
0
,
1
P(A) by
J(
0
,
1
)
2
:= inf
,
_
1
2
_
1
0
x,yX
(
t
(x)
t
(y))
2

t
(x, y)K(x, y)(x) dt
_
,
where the inmum runs over all suciently regular curves : [0, 1] P(A)
and : [0, 1] R
X
satisfying the continuity equation
_
_
d
dt
t
(x) +
yX
(
t
(y)
t
(x))
t
(x, y)K(x, y) = 0 x A ,
(0) =
0
, (1) =
1
.
(1.2)
Here, given P(A), we write (x, y) :=
_
1
0
(x)
1p
(y)
p
dp for the loga-
rithmic mean of (x) and (y). The relevance of the logarithm mean in this
setting is due to the identity
(x) (y) = (x, y)(log (x) log (y)) ,
which somehow compensates for the lack of a discrete chain rule. The
denition of J can be regarded as a discrete analogue of the Benamou-
Brenier formula [7]. Let us remark that if t
t
is dierentiable at some
t and
t
belongs to P
(A), then the continuity equation (1.2) is satised

for some
t
R
X
, which is unique up to an additive constant (see [28,
Proposition 3.26]).
Since the metric J is Riemannian in the interior P
(A), it makes sense

to consider gradient ows in (P
(A), J) and it has been proved in [28]

that the heat ow associated with the continuous time Markov semigroup
P
t
= e
t(KI)
is the gradient ow of the entropy
1() =
xX
(x)(x) log (x) , (1.3)
with respect to the Riemannian structure determined by J.
In this paper we shall show that every pair of densities
0
,
1
P(A) can
be joined by a constant speed geodesic. Therefore the following denition
in the spirit of Lott-Sturm-Villani seems natural.
Denition 1.1. We say that K has non-local Ricci curvature bounded from
below by R if for any constant speed geodesic
t
t[0,1]
in (P(A), J)
we have
1(
t
) (1 t)1(
0
) + t1(
1
)

2
t(1 t)J(
0
,
1
)
2
.
In this case, we shall use the notation
Ric(K) .
Remark 1.2. Instead of requiring convexity along all geodesics it will be
shown to be equivalent to require that every pair of densities
0
,
1
P(A)
can be joined by a constant speed geodesic along which the entropy is -
convex. Another equivalent condition would be to impose a lower bound
on the Hessian of 1 in the interior P
(A) (see Theorem 4.5 below for the

details).
One of the main contributions of this paper is a tensorisation result for
non-local Ricci curvature, that we will now describe. For 1 i n, let K
i
be an irreducible and reversible Markov kernel on a nite set A
i
, and let
i
denote the corresponding invariant probability measure. Let K
(i)
denote the
lift of K
i
to the product space A = A
1
. . .A
n
, dened for x = (x
1
, . . . , x
n
)
and y = (y
1
, . . . , y
n
) by
K
(i)
(x, y) =
_
K
i
(x
i
, y
i
), if x
j
= y
j
for all j ,= i,
0, otherwise.
For a sequence
i
1in
of nonnegative numbers with
n
i=1

i
= 1, we
consider the weighted product chain, determined by the kernel
K
:=
n
i=1
i
K
(i)
.
Its reversible probability measure is the product measure =
1

n
.
Theorem 1.3 (Tensorisation of Ricci bounds). Assume that Ric(K
i
)
i
for i = 1, . . . , n. Then we have
Ric(K
) min
i
i
.
Tensorisation results have also been obtained for other notions of Ricci
curvature, including the ones by Lott-Sturm-Villani [41, Proposition 4.16]
and Ollivier [34, Proposition 27]. In both cases the proof does not extend
to our setting, and completely dierent ideas are needed here.
As a consequence we obtain a lower bound on the non-local Ricci curva-
ture for (the kernel K
n
of the simple random walk on) the discrete hypercube
0, 1
n
, which turns out to be optimal.
Corollary 1.4. For n 1 we have Ric(K
n
)
2
n
.
The hypercube is a fundamental building block for applications in math-
ematical physics and theoretical computer science, and the problem of prov-
ing displacement convexity on this space has been an open problem that
motivated the recent paper by Ollivier and Villani [35], in which a Brunn-
Minkowski inequality has been obtained.
Another aspect that we wish to single out at this stage, is the fact that
Ricci bounds imply a number of functional inequalities, that are natural
discrete counterparts to powerful inequalities in a continuous setting. In
particular, we obtain discrete counterparts to the results by Bakry
Emery
[5] and OttoVillani [37].
To state the results we consider the Dirichlet form
c(, ) =
1
2
x,yX
_
(x) (y)
__
(x) (y)
_
K(x, y)(x)
dened for functions , : A R. Furthermore, we consider the functional
J() = c(, log )
dened for P(A), with the convention that J() = + if does not
belong to P
(A). Its signicance here is due to the fact that it is the time-
derivative of the entropy along the heat ow:
d
dt
1(P
t
) = J(P
t
). In this
sense, J can be regarded as a discrete version of the Fisher information.
Theorem 1.5 (Functional inequalities). Let K be an irreducible and re-
versible Markov kernel on a nite set A.
(1) If Ric(K) for some R, then the HJI-inequality
1() J(, 1)
_
J()

2
J(, 1)
2
(HJI())
holds for all P(A).
(2) If Ric(K) for some > 0, then the modied logarithmic Sobolev
inequality
1()
1
2
J() (MLSI())
holds for all P(A).
(3) If K satises (MLSI()) for some > 0, then the modied Talagrand
inequality
J(, 1)
_
2
1() (T
W
())
holds for all P(A).
(4) If K satises (T
W
()) for some > 0, then the Poincare inequality
||
2
L
2
(X,)

1
c(, ) (P())
holds for all functions : A R.
Here, 1 denotes the density of the stationary measure .
The rst inequality in Theorem 1.5 is a discrete counterpart to the HWI-
inequality from Otto and Villani [37], with the dierence that the metric
W
2
has been replaced by J.
The second result is as a discrete version of the celebrated criterion by
Bakry-
Emery [5], who proved the corresponding result on Riemannian man-

ifolds. Classically, the Bakry-
Emery criterion applies to weighted Riemann-

ian manifolds (/, e
V
vol
M
), and asks for a lower bound on the generalised
Ricci curvature given by Ric
M
+Hess V . As in our setting we allow for
general K and , the potential V is already incorporated in K and , and
our notion of Ricci curvature could be thought of as the analogue of this
generalised Ricci curvature.
The modied logarithmic Sobolev inequality (MLSI) is motivated by the
fact that it yields an explicit rate of exponential decay of the entropy along
the heat ow. It has been extensively studied (see, e.g, [11, 14]), along
with dierent discrete logarithmic Sobolev inequalities in the literature (e.g.,
[4, 10]).
The third part is a discrete counterpart to a famous result by Otto and
Villani [37], who showed that the logarithmic Sobolev inequality implies the
so-called T
2
-inequality; recall that the T
p
-inequality is the analogue of T
W
,
in which J is replaced by W
p
, for 1 p < . These inequalities have been
extensively studied in recent years. We refer to [21] for a survey and to [40]
for a study of the T
1
-inequality in a discrete setting.
The modied Talagrand inequality T
W
that we consider is new. This
inequality combines some of the good properties of T
1
and T
2
, as we shall
now discuss.
Like T
1
, it is weak enough to be applicable in a discrete setting. In fact,
we shall prove that T
W
() holds on the discrete hypercube 0, 1
n
with the
optimal constant =
2
n
. By contrast, the T
2
-inequality does not even hold
on the two-point space, and it has been an open problem to nd an adequate
substitute.
Like T
2
, and unlike T
1
, T
W
is strong enough to capture spectral infor-
mation. In fact, the fourth part in Theorem 1.5 asserts that it implies a
Poincare inequality with constant .
Furthermore, we shall show that T
W
yields good bounds on the sub-
Gaussian constant, in the sense that
E
_
e
t(E[])
exp
_
t
2
4
_
(1.4)
for all t > 0 and all functions : A R that are Lipschitz constant
1 with respect to the graph norm. Here, we use the notation E
[] =
xX
(x)(x). As is well known, this estimate yields the concentration
inequality
_
E
[] h
_
e
h
2
for all h > 0. The proof of (1.4) relies on the fact, proved in Section 2, that
the metric J can be bounded from below by W
1
(with respect to the graph
metric), so that T
W
() implies a T
1
(2)-inequality, which is known to be
equivalent to the sub-Gaussian inequality [8].
The proof of Theorem 1.5 follows the approach by Otto and Villani. On
a technical level, the proofs are simpler in the discrete case, since heuris-
tic arguments from Otto and Villani are essentially rigorous proofs in our
setting, and no additional PDE arguments are required as in [37].
To summarise we have the following sequence of implications, for any
> 0:
Ric(K) MLSI() T
W
()
_
P()
T
1
(2) .
Other notions of Ricci curvature. This is of course not the rst time
that a notion of Ricci curvature has been introduced for discrete spaces, but
the notion considered here appears to be the closest in spirit to the one by
Lott-Sturm-Villani. Furthermore it seems to be the rst that yields natural
analogues of the results by Bakry
Emery and OttoVillani.

A dierent notion of Ricci curvature has been introduced by Ollivier [33,
34]. This notion is also based on ideas from optimal transport, and uses
the L
1
-Wasserstein metric W
1
, which behaves better in a discrete setting
than W
2
. Olliviers criterion has the advantage of being easy to check in
many examples. Furthermore, in some interesting cases it yields functional
inequalities with good yet non-optimal constants. Moreover, Ollivier
does not assume reversibility, whereas this is strongly used in our approach.
It is not completely clear how Olliviers notion relates to the one by Lott-
Sturm-Villani (see [35] for a discussion). Furthermore, it does not seem to
be directly comparable to the concept studied here, as it relies on a metric
on the underlying space, which is not the case in our approach.
In the setting of graphs Olliviers Ricci curvature has been further studied
in the recent preprints [6, 22, 24].
Another approach has been taken by Lin and Yau [26], who dened Ricci
curvature in terms the heat semigroup.
Bonciocat and Sturm [12] followed a dierent approach to modify the
Lott-Sturm-Villani criterion, in which they circumvented the lack of mid-
points in the W
2
-Wasserstein metric by allowing for approximate midpoints.
A Brunn-Minkowski inequality in this spirit has been proved on the discrete
hypercube by Ollivier and Villani [35].
Organisation of the paper. In Section 2 we collect basic properties of the
metric J and formulate an equivalent denition, that is more convenient
to work with in some situations. Geodesics in the J-metric are studied in
Section 3. In particular it is shown that every pair of densities can be joined
by a constant speed geodesic. In Section 4 we present the denition of non-
local Ricci curvature and give a characterisation in terms of the Hessian of
the entropy. Section 5 contains a criterion that allows to give lower bounds
on the Ricci curvature in some basic examples, including the discrete circle
and the discrete hypercube. A tensorisation result is contained in Section
6. Finally, we introduce new versions of well-known functional inequalities
in Section 7 and prove implications between these and known inequalities.
Note added. After essentially nishing this paper, the authors have been
informed about the preprint [30], in which geodesic convexity of the entropy
for Markov chains has been studied as well. The results obtained in both
papers do not overlap signicantly and have been obtained independently.
Acknowledgement. The authors are grateful to Nicola Gigli and Karl-Theodor
Sturm for stimulating discussions on this paper and related topics. They thank the
anonymous referees for detailed comments and valuable suggestions.
2. The metric J
In this section we shall study some basic properties of the metric J.
Throughout we shall work with an irreducible and reversible Markov kernel
K on a nite set A. The unique steady state will be denoted by , and
we shall write P
t
:= e
t(KI)
, t 0, to denote the corresponding Markov
semigroup.
We start by introducing some notation.
2.1. Notation. For R
X
we consider the discrete gradient R
XX
dened by
(x, y) := (y) (x) .
For R
XX
we consider the discrete divergence R
X
dened by
( )(x) :=
1
2
yX
((x, y) (y, x))K(x, y) R .
With this notation we have
:= = K I ,
and the integration by parts formula
,
= ,
holds. Here we write, for , R

X
and , R
XX
,
,
xX
(x)(x)(x) ,
,
=
1
2
x,yX
(x, y)(x, y)K(x, y)(x) .
From now on we shall x a function : R
+
R
+
R
+
satisfying the
following assumptions:
Assumption 2.1. The function has the following properties:
(A1) (Regularity): is continuous on R
+
R
+
and C
on (0, )(0, );
(A2) (Symmetry): (s, t) = (t, s) for s, t 0;
(A3) (Positivity, normalisation): (s, t) > 0 for s, t > 0 and (1, 1) = 1;
(A4) (Zero at the boundary): (0, t) = 0 for all t 0;
(A5) (Monotonicity): (r, t) (s, t) for all 0 r s and t 0;
(A6) (Positive homogeneity): (s, t) = (s, t) for > 0 and s, t 0;
(A7) (Concavity): the function : R
+
R
+
R
+
is concave.
It is easily checked that these assumptions imply that is bounded from
above by the arithmetic mean :
(s, t)
s + t
2
s, t 0 . (2.1)
In the next result we collect some properties of the function , which turn
out be very useful to obtain non-local Ricci curvature bounds.
Lemma 2.2. For all s, t, u, v > 0 we have
s
1
(s, t) + t
2
(s, t) = (s, t) , (2.2)
s
1
(u, v) + t
2
(u, v) (s, t) 0 . (2.3)
Proof. The equality (2.2) follows immediately from the homogeneity (A6)
by noting that the left hand side equals
d
dr
r=1
(rs, rt). Let us prove (2.3).
Note that by the concavity (A7) of the gradient is a monotone operator
from R
2
+
to R
2
. Hence, for all s, t, x, y > 0 we have
(s x)
_
1
(s, t)
1
(x, y)
_
+ (t y)
_
2
(s, t)
2
(x, y)
_
0 .
By the homogeneity (A6) both
1
and
2
are 0-homogeneous. Taking now
in particular x = u, y = v and letting 0 we obtain
s
_
1
(s, t)
1
(u, v)
_
+ t
_
2
(s, t)
2
(u, v)
_
0 .
From this we deduce (2.3) by an application of (2.2).
The most important example for our purposes is the logarithmic mean
dened by
(s, t) :=
_
1
0
s
1p
t
p
dp =
s t
log s log t
,
the latter expression being valid if s, t > 0 and s ,= t. For P(A) and
x, y A we dene
(x, y) = ((x), (y)) .
For a xed P(A) it will be useful to consider the Hilbert space (
consisting of all (equivalence classes of) functions : A A R, endowed

with the inner product
,
:=
1
2
x,yX
(x, y)(x, y) (x, y)K(x, y)(x) . (2.4)
Here we identify functions that coincide on the set (x, y) A A :
(x, y)K(x, y) > 0. The operator can then be considered as a linear
operator : L
2
(A) (
, whose negative adjoint is the -divergence oper-

ator (
) : (
L
2
(A) given by
(
)(x) :=
1
2
yX
((x, y) (y, x)) (x, y)K(x, y) .
2.2. Equivalent denitions of the metric J. We shall now state the
denition of the metric J as dened in [28]. Here and in the rest of the
paper we will use the shorthand notation
/(, ) := ||
2
=
1
2
x,yX
((y) (x))
2
(x, y)K(x, y)(x) .
P(A) and R
X
.
Denition 2.3. For
0
,
1
P(A) we dene
J(
0
,
1
)
2
:= inf
__
1
0
/(
t
,
t
) dt : (, ) (c
1
(
0
,
1
)
_
,
where for T > 0, (c
T
(
0
,
1
) denotes the collection of pairs (, ) satisfying
the following conditions:
_
_
(i) : [0, T] R
X
is C
;
(ii)
0
=
0
,
T
=
1
;
(iii)
t
P(A) for all t [0, T] ;
(iv) : [0, T] R
X
is measurable ;
(v) For all x A and all t (0, T) we have

t
(x) +
yX
_
t
(y)
t
(x)
_

t
(x, y)K(x, y) = 0 .
(2.5)
Using the notation introduced above, the continuity equation in (v) can
be written as

t
+ ( ) = 0 . (2.6)
Denition 2.3 is the same as the one in [28], except that slightly dierent
regularity conditions have been imposed on . We shall shortly see that
both denitions are equivalent.
The following results on the metric J have been proved in [28].
Theorem 2.4. (1) The space (P(A), J) is a complete metric space,
compatible with the Euclidean topology.
(2) The restriction of J to P
(A) is the Riemannian distance induced

by the following Riemannian structure:
the tangent space of P
(A) can be identied with the set

T
:= : R
X
by means of the following identication: given a smooth curve

(, ) t
t
P
(A) with
0
= , there exists a unique
element
0
T
, such that the continuity equation (2.5)(v)

holds at t = 0.
The Riemannian metric on T
is given by the inner product

,
=
1
2
x,yX
((x) (y))((x) (y)) (x, y)K(x, y)(x) .
(3) If is the logarithmic mean, i.e., (s, t) =
_
1
0
s
1p
t
p
dp, then the
heat ow is the gradient ow of the entropy, in the sense that for
any P(A) and t > 0, we have
t
:= P
t
P
(A) and
D
t
t
= grad 1(
t
) . (2.7)
Remark 2.5. If belongs to P
(A), then the gradient ow equation (2.7)

holds also for t = 0.
Remark 2.6. The relevance of the logarithmic mean can be seen as follows.
The heat equation
t
=
t
= (
t
) can be rewritten as a continuity
equation (2.6) provided that
=

.
On the other hand, an easy computation (see [28, Proposition 4.2 and Corol-
lary 4.3]) shows that under the identication above, the gradient of the
entropy is given by
grad
W
1() = log .
Combining these observations, we infer that the heat ow is the gradient
ow of the entropy with respect to J, precisely when

= log ,
i.e., when is the logarithmic mean.
This argument shows that the same heat ow can also be identied as the
gradient ow of the functional T() =
xX
f((x))(x) for any smooth
function f : R R with f
> 0, if one replaces the logarithmic mean by

(r, s) =
rs
f
(r)f
(s)
. We refer to [28] for the details.
Our next aim is to provide an equivalent formulation of the denition of
J, which may seem less intuitive at rst sight, but oers several technical
advantages. First, the continuity equation becomes linear in V and , which
allows to exploit the concavity of . Second, this formulation is more stable
so that we can prove existence of minimizers in the class (c
0
(
0
,
1
). Similar
ideas have already been developed in a continuous setting in [17], where a
general class of transportation metrics is constructed based on the usual
continuity equation in R
n
.
An important role will be played by the function : RR
2
+
R+
dened by
(x, s, t) =
_
_
_
0 , (s, t) = 0 and x = 0 ,
x
2
(s,t)
, (s, t) ,= 0 ,
+, (s, t) = 0 and x ,= 0 .
The following observation will be useful.
Lemma 2.7. The function is lower semicontinuous and convex.
Proof. This is easily checked using (A7) and the convexity of the function
(x, y)
x
2
y
on R (0, ).
Given P(A) and V R
XX
we dene
/
(, V ) :=
1
2
x,yX
(V (x, y), (x), (y))K(x, y)(x) ,
and we set
(c
T
(
0
,
1
) := (, ) : (i
), (ii), (iii), (iv
), (v
) hold ,
where
_
_
(i
) : [0, T] R
X
is continuous ;
(iv
) V : [0, T] R
XX
is locally integrable ;
(v
) For all x A we have in the sense of distributions

t
(x) +
1
2
yX
_
V
t
(x, y) V
t
(y, x)
_
K(x, y) = 0 .
(2.8)
The continuity equation in (v
) can equivalently be written as

t
+ V = 0 .
As an immediate consequence of Lemma 2.7 we obtain the following con-
vexity of /
.
Corollary 2.8. Let
i
P(A) and V
i
R
XX
for i = 0, 1. For [0, 1]
set
:= (1 )
0
+
1
and V

:= (1 )V
0
+ V
1
. Then we have
/
, V

) (1 )/
(
0
, V
0
) + /
(
1
, V
1
) .
Now we have the following reformulation of Denition 2.3.
Lemma 2.9. For
0
,
1
P(A) we have
J(
0
,
1
)
2
= inf
__
1
0
/
(
t
, V
t
) dt : (, V ) (c
1
(
0
,
1
)
_
.
Furthermore, if
0
,
1
P
(A), condition (iv) in (2.5) can be reinforced

into: : [0, T] R
X
is C
.
Proof. The inequality follows easily by noting that the inmum is taken
over a larger set. Indeed, given a pair (, ) (c
1
(
0
,
1
) we obtain a pair
(, V ) (c
1
(
0
,
1
) by setting V
t
(x, y) =
t
(x, y)
t
(x, y) and we have
/
(
t
, V
t
) = /(
t
,
t
).
To show the opposite inequality , we x an arbitrary pair (, V )
(c
1
(
0
,
1
). It is sucient to show that for every > 0 there exists a pair
(
) (c
1
(
0
,
1
) such that
_
1
0
/(
t
,
t
) dt
_
1
0
/
(
t
, V
t
) dt + .
For this purpose we rst regularise (, V ) by a mollication argument. We
thus dene ( ,

V ) : [, 1 + ] P(A) R
XX
by
(
t
,

V
t
) =
_
_
((0), 0) , t [, ) ,
((
t
12
),
1
12
V (
t
12
)) , t [, 1 ) ,
((1), 0) , t [1 , 1 + ] ,
and take a nonnegative smooth function : R R
+
which vanishes outside
of [, ], is strictly positive on (, ) and satises
_
(s) ds = 1. For
t [0, 1] we dene
t
=
_
(s)
t+s
ds , V

t
=
_
(s)
V
t+s
ds .
Now t
t
is C
and using the continuity of it is easy to check that

(
, V

) (c
1
(
0
,
1
). Moreover, using the convexity from Corollary 2.8 we
can estimate
_
1
0
/
t
, V

t
) dt
_
1
0
_
(s)/
(
t+s
,

V
t+s
) ds dt
_
1+
(
t
,

V
t
) dt =
1
1 2
_
1
0
/
(
t
, V
t
) dt .
To proceed further, we may assume without loss of generality that V (x, y) =
0 whenever K(x, y) = 0. The fact that
_
1
0
/
(
t
, V
t
) dt is nite implies that
the set t :
t
(x, y) = 0 and V
t
(x, y) ,= 0 is negligible for all x, y A.
Taking properties (A3) and (A4) of the function into account, this im-
plies that for the convolved quantities the corresponding set t :
t
(x, y) =
0 and V

t
(x, y) ,= 0 is empty for all x, y A. Hence there exists a measur-
able function
: [0, 1] R
XX
satisfying
V

t
(x, y) =
t
(x, y)
t
(x, y) for all x, y A and all t [0, 1] . (2.9)
It remains to nd a function
: [0, 1] R
X
such that
t
=
t
. Let T
denote the orthogonal projection in (
onto the range

of . Then there exists a measurable function
: [0, 1] R
X
such that
T
t
=
t
. The orthogonal decomposition
(
t
= Ran()
Ker(
t
) (2.10)
implies that
t
=
t
, hence (
) (c
1
(
o
,
1
). Using the
decomposition (2.10) once more, we infer that
t
,
t
,
t
.
This implies /(
t
,
t
) /
t
, V

t
) and nishes the proof of the rst asser-
tion.
If
0
and
1
belong to P
(A), one can follow the argument in [28, Lemma

3.30] and construct a curve ( ,

V ) (c
1
(
0
,
1
) such that
t
P
(A) for
t [0, 1] and
_
1
0
/
(
t
,

V
t
) dt
_
1
0
/
(
t
, V
t
) dt + .
Then one can apply the argument above. In this case,
t
(x) > 0 for all
x A and t [0, 1], and therefore the function
: [0, 1] R
XX
is
C
. Furthermore, since the orthogonal projection P
depends smoothly on
P
(A), the function
: [0, 1] R
X
is smooth as well.
Remark 2.10. In [28] the metric J has been dened as in Denition 2.3,
with the dierence that (i) in (2.8) was replaced by : [0, T] P(A) is
piecewise C
1
. Therefore Lemma 2.9 shows in particular that Denition 2.3
coincides with the original denition of J from [28].
2.3. Basic properties of J. As an application of Lemma 2.9 we shall
prove the following convexity result, which is a discrete counterpart of the
well-known fact that the squared L
2
-Wasserstein distance over Euclidean
space is convex with respect to linear interpolation (see, e.g., [17, Theorem
5.11]).
Proposition 2.11 (Convexity of the squared distance). For i, j = 0, 1, let
j
i
P(A), and for [0, 1] set
i
:= (1 )
0
i
+
1
i
. Then
J(
0
,
1
)
2
(1 )J(
0
0
,
0
1
)
2
+ J(
1
0
,
1
1
)
2
.
Proof. Let > 0. For j = 0, 1 we may take a pair (
j
, V
j
) (c
(
j
0
,
j
1
) with
_
1
0
/
(
j
t
, V
j
t
) dt J
2
(
j
0
,
j
1
) +
in view of Lemma 2.9. For [0, 1] we set
t
:= (1 )
0
t
+
1
t
, V

t
:= (1 )V
0
t
+ V
1
t
.
It then follows that (
, V

) (c
1
(
0
,
1
), hence
J(
0
,
1
)
2
_
1
0
/
t
, V

t
) dt
(1 )
_
1
0
/
(
0
t
, V
0
t
) dt +
_
1
0
/
(
1
t
, V
1
t
) dt
= (1 )J(
0
0
,
0
1
)
2
+ J(
1
0
,
1
1
)
2
+ .
Since > 0 is arbitrary, this completes the proof.
In this section we compare J to some commonly used metrics. A rst
result of this type (see [28, Lemma 3.10]) gives a lower bound on J in terms
of the total variation metric
d
TV
(
0
,
1
) =
xX
(x)[
0
(x)
1
(x)[ .
Here, more generally, we shall compare J to various Wasserstein distances.
Given a metric d on A and 1 p < , recall that the L
p
-Wasserstein metric
W
p,d
on P(A) is dened by
W
p,d
(
0
,
1
) := inf
__

x,yX
d(x, y)
p
q(x, y)
_1
p

q (
0
,
1
)
_
, (2.11)
where (
0
,
1
) denotes the set of all couplings between
0
and
1
, i.e.,
(
0
,
1
) :=
_
q : A A R
+
yX
q(x, y) =
0
(x)(x) ,
xX
q(x, y) =
1
(y)(y)
_
.
It is well known (see, e.g., [43, Theorem 4.1]) that the inmum in (2.11) is
attained; as usual we shall denote the collection of minimizers by
o
(
0
,
1
).
In our setting there are various metrics on A that are natural to consider.
In particular,
the graph distance d
g
with respect to the graph stucture on A in-
duced by K (i.e., x, y is an edge i K(x, y) > 0).
the metric d
W
, that is, the restriction of J from P(A) to A under
the identication of points in A with the corresponding Dirac masses:
d
W
(x, y) := J
_
1
{x}
(x)
,
1
{y}
(y)
_
.
The induced L
p
-Wasserstein distances will be denoted by W
p,g
and W
p,W
respectively.
We shall now prove lower and upper bounds for the metric J in terms
of suitable Wasserstein metrics. We start with the lower bounds. Let us
remark that, unlike most other results in this paper, the second inequality
in the following result relies on the normalisation
yX
K(x, y) = 1.
Proposition 2.12 (Lower bounds for J). For all probability densities
0
,
1
P(A) we have
1
2
d
TV
(
0
,
1
)
2W
1,g
(
0
,
1
) J(
0
,
1
) . (2.12)
Proof. Note that d
tr
d
g
, where d
tr
(x, y) = 1
x=y
denotes the trivial dis-
tance. Therefore, the rst bound follows from the fact that d
TV
is the
L
1
-Wasserstein distance induced by d
tr
(see [42, Theorem 1.14]).
In order to prove the second bound, we x > 0, take
0
,
1
P(A) and
(, ) (c
1
(
0
,
1
) with
__
1
0
/(
t
,
t
) dt
_1
2
J(
0
,
1
) + .
Using the continuity equation from (2.5) we obtain for any : A R,
xX
(x)(
0
(x)
1
(x))(x)
_
1
0
xX
(x)
t
(x)(x) dt
_
1
0
x,yX
(x)
_
t
(x)
t
(y)
_

t
(x, y)K(x, y)(x) dt
_
1
0
,
t
t
dt
__
1
0
||
2
t
dt
_
1/2
__
1
0
|
t
|
2
t
dt
_
1/2
=
__
1
0
||
2
t
dt
_
1/2
(J(
0
,
1
) + ) .
Let []
Lip
denote the Lipschitz constant of with respect to the graph
distance d
g
, i.e.,
[]
Lip
:= sup
x=y
[(x) (y)[
d
g
(x, y)
.
Applying the inequality (2.1) and using the fact that d
g
(x, y) = 1 if x ,= y
and K(x, y) > 0, we infer that
||
2
t
=
1
2
x,yX
_
(x) (y)
_
2
K(x, y)
t
(x, y)(x)
1
4
[]
2
Lip
x,yX
K(x, y)
_
t
(x) +
t
(y)
_
(x)
=
1
2
[]
2
Lip
xX
t
(x)(x)
yX
K(x, y)
=
1
2
[]
2
Lip
.
The Kantorovich-Rubinstein Theorem (see, e.g., [42, Theorem 1.14]) yields
W
1,g
(
0
,
1
) = sup
:[]
Lip
1
xX
(x)(
0
(x)
1
(x))(x)

J(
0
,
1
) +
2
,
which completes the proof, since > 0 is arbitrary.
Before stating the upper bounds, we provide a simple relation between d
g
and d
W
.
Lemma 2.13. For x, y A we have
d
W
(x, y)
c
k
d
g
(x, y) ,
where
c =
_
1
1
dr
_
2(1 r, 1 + r)
< and k = min
(x,y) : K(x,y)>0
K(x, y) .
If is the logarithmic mean, then c 1.56.
Proof. Let x
i
n
i=0
be a sequence in A with x
0
= x, x
n
= y and K(x
i
, x
i+1
) >
0 for all i. We shall use the fact, proved in [28, Theorem 2.4], that the
J-distance between two Dirac measures on a two-point space a, b with
transition probabilities K(a, b) = K(b, a) = p is equal to
c
p
. The concavity
of readily implies that c is nite. Furthermore, it follows from [28, Lemma
3.14] and its proof, that for any pair x, y A with K(x, y) > 0, one has
J
_
1
{x}
(x)
,
1
{y}
(y)
_
c
max(x), (y)
K(x, y)(x)

c
k
.
Using the triangle inequality for J we obtain
d
W
(x, y) = J
_
1
{x}
(x)
,
1
{y}
(y)
_
n1
i=0
J
_
1
{x
i
}
(x
i
)
,
1
{x
i+1
}
(x
i+1
)
_
nc
k
,
hence the result follows by taking the inmum over all such sequences
x
i
n
i=0
.
Now we turn to upper bounds for J in terms of L
2
-Wasserstein distances.
Proposition 2.14 (Upper bounds for J). For all probability densities
0
,
1
P(A) we have
J(
0
,
1
) W
2,W
(
0
,
1
)
c
k
W
2,g
(
0
,
1
) , (2.13)
where c and k are as in Lemma 2.13.
Proof. We shall prove the rst bound, the second one being an immediate
consequence of Lemma 2.13. For this purpose, we x
0
,
1
P(A) and take
q
o
(
0
,
1
). For all u, v A, take a curve (
u,v
, V
u,v
) (c
_
1
{u}
(u)
,
1
{v}
(v)
_
with
_
1
0
/
(
u,v
t
, V
u,v
t
) dt d
W
(u, v)
2
+ ,
and consider the convex combination of these curves, weighted according to
the optimal plan q, i.e.,
t
:=
u,vX
q(u, v)
u,v
t
, V
t
:=
u,vX
q(u, v)V
u,v
t
.
It then follows that the resulting curve (, V ) belongs to (c
1
(
0
,
1
). Using
the convexity result from Lemma 2.7 we infer that
J(
0
,
1
)
2
_
1
0
/
(
t
, V
t
) dt
u,vX
q(u, v)
_
1
0
/
(
u,v
t
, V
u,v
t
) dt
u,vX
q(u, v)(d
W
(u, v)
2
+ )
= W
2,W
(
0
,
1
)
2
+ .
which implies the result.
3. Geodesics
In this section we show that the metric space (P(A), J) is a geodesic
space, in the sense that any two densities
0
,
1
P(A) can be connected
by a (constant speed) geodesic, i.e., a curve : [0, 1] P(A) satisfying
J(
s
,
t
) = [s t[J(
0
,
1
)
for all 0 s, t 1.
Let us rst give an equivalent characterisation of the inmum in Lemma
2.9, which is invariant under reparametrisation.
Lemma 3.1. For any T > 0 and
0
,
1
P(A) we have
J(
0
,
1
) = inf
__
T
0
_
/
(
t
, V
t
)dt : (, V ) (c
T
(
0
,
1
)
_
. (3.1)
Proof. Taking Lemma 2.9 into account, this follows from a standard repara-
metrisation argument. See [1, Lemma 1.1.4] or [17, Theorem 5.4] for details
in similar situations.
Theorem 3.2. For all
0
,
1
P(A) the inmum in Lemma 2.9 is attained
by a pair (, V ) (c
1
(
0
,
1
) satisfying /
(
t
, V
t
) = J(
0
,
1
)
2
for a.e.
t [0, 1]. In particular, the curve (
t
)
t[0,1]
is a constant speed geodesic.
Proof. We will show existence of a minimizing curve by a direct argument.
Let (
n
, V
n
) (c
1
(
0
,
1
) be a minimizing sequence. Thus we can assume
that
sup
n
_
1
0
/
(
n
t
, V
n
t
) dt < C
for some nite constant C. Without loss of generality we assume that
V
n
t
(x, y) = 0 when K(x, y) = 0. For x, y A, dene the sequence of
signed Borel measures
n
x,y
on [0, 1] by
n
x,y
(dt) := V
n
t
(x, y) dt. For every
Borel set B [0, 1] we can give the following bound on the total variation
of these measures:
|
n
x,y
| (B)
_
B
[V
n
t
(x, y)[ dt
_
B
_
(V
n
t
(x, y),
n
t
(x),
n
t
(y)) dt ,
where we used the fact that (x) max(z)
1
: z A =: C
< for
P(A). Using Holders inequality we obtain
x,yX
|
n
x,y
| (B)K(x, y)(x)
_
2C
Leb(B)
__
1
0
/
(
n
t
, V
n
t
) dt
_
1
2
_
2CC
Leb(B) . (3.2)
In particular, the total variation of the measures
n
x,y
is bounded uniformly
in n. Hence we can extract a subsequence (still indexed by n) such that
for all x, y A the measures
n
x,y
converge weakly* to some nite signed
Borel measure
x,y
. The estimate (3.2) also shows that
x,y
is absolutely
continuous with respect to the Lebesgue measure. Thus there exists V :
[0, 1] R
XX
such that
x,y
(dt) := V
t
(x, y) dt. We claim that, along the
same subsequence,
n
converges pointwise to a function : [0, 1] P(A).
Indeed, using the continuity of t
n
t
one derives from the continuity
equation (v
) in (2.8) that for s [0, 1] and every x A,
n
s

n
0
=
1
2
s
_
0
yX
(V
n
t
(y, x) V
n
t
(x, y))K(x, y) dt . (3.3)
The weak* convergence of
n
x,y
implies (see [1, Prop. 5.1.10]) the convergence
of the right hand side of (3.3). Since
n
0
=
0
for all n, this yields the
desired convergence of
n
s
for all s, and one easily checks that (, V )
(c
1
(
0
,
1
). The weak* convergence of
n
x,y
further implies that the measures
n
t
(x)dt converge weakly* to
t
(x)dt. Applying a general result on the lower-
semicontinuity of integral functionals (see [13, Thm. 3.4.3]) and taking into
account Lemma 2.7, we obtain
_
1
0
/
(
t
, V
t
) dt liminf
n
_
1
0
/
(
n
t
, V
n
t
) dt = J(
0
,
1
)
2
.
Hence the pair (, V ) is a minimizer of the variational problem in the de-
nition of J. Finally, Lemma 3.1 yields
_
1
0
_
/
(
t
, V
t
) dt J(
0
,
1
) =
__
1
0
/
(
t
, V
t
) dt
_
1
2
,
which implies that /
(
t
, V
t
) = W(
0
,
1
)
2
for a.e. t [0, 1].
The fact that (
t
)
t
is a constant speed geodesic follows now by another
application of Lemma 3.1.
We shall now give a characterisation of absolutely continuous curves in
the metric space (P(A), J) and relate their length to their minimal action.
First we recall some notions from the theory of analysis in metric spaces. A
curve (
t
)
t[0,T]
in P(A) is called absolutely continuous w.r.t. J if there
exists m L
1
(0, T) such that
J(
s
,
t
)
_
t
s
m(r) dr for all 0 s t T .
If (
t
) is absolutely continuous, then its metric derivative
[
t
[ := lim
h0
J(
t+h
,
t
)
[h[
exists for a.e. t [0, T] and satises [
t
[ m(t) a.e. (see [1, Theorem 1.1.2]).
Proposition 3.3 (Metric velocity). A curve (
t
)
t[0,T]
is absolutely con-
tinuous with respect to J if and only if there exists a measurable function
V : [0, T] R
XX
such that (, V ) (c
T
(
0
,
T
) and
_
T
0
_
/
(
t
, V
t
)dt < .
In this case we have [
t
[
2
/
(
t
, V
t
) for a.e. t [0, T] and there exists
an a.e. uniquely dened function

V : [0, 1] R
XX
such that (,

V )
(c
T
(
0
,
T
) and [
t
[
2
= /
(
t
,
V
t
) for a.e. t [0, T].
Proof. The proof follows from the very same arguments as in [17, Thm.
5.17]. To contruct the velocity eld

V , the curve is approximated by
curves (
n
, V
n
) which are piecewise minimizing. The velocity eld

V is
then dened as a subsequential limit of the velocity elds V
n
. In our case,
existence of this limit is guaranteed by a compactness argument similar to
the one in the proof of Theorem 3.2.
For later use we state an explicit formula for the geodesic equations in
P
(A) from [28, Proposition 3.4]. Since the interior P
(A) of P(A) is
Riemannian by Theorem 2.4, local existence and uniqueness of geodesics is
guaranteed by standard Riemannian geometry.
Proposition 3.4. Let P
(A) and

R
X
. On a suciently small
time interval around 0, the unique constant speed geodesic with
0
= and
initial tangent vector
0
=
satises the following equations:

_
t
(x) +
yX
(
t
(y)
t
(x))
t
(x, y)K(x, y) = 0 ,
t
(x) +
1
2
yX
_
t
(x)
t
(y)
_
2
1
(
t
(x),
t
(y))K(x, y) = 0 .
(3.4)
4. Ricci curvature
In this section we initiate the study of a notion of Ricci curvature lower
boundedness in the spirit of Lott, Sturm, and Villani [27, 41]. Furthermore,
we present a characterisation, which we shall use to prove Ricci bounds in
concrete examples.
As before, we x an irreducible and reversible Markov kernel K on a
nite set A with steady state . The associated Markov semigroup shall be
denoted by (P
t
)
t0
.
Assumption 4.1. Throughout the remainder of the paper we assume that
is the logarithmic mean.
We are now ready to state the denition, which has already been given
in [28, Denition 1.3].
Denition 4.2. We say that K has non-local Ricci curvature bounded
from below by R and write Ric(K) , if the following holds: for every
constant speed geodesic (
t
)
t[0,1]
in (P(A), J) we have
1(
t
) (1 t)1(
0
) + t1(
1
)

2
t(1 t)J(
0
,
1
)
2
. (4.1)
An important role in our analysis is played by the quantity B(, ), which
is dened for P
(A) and R
X
by
B(, ) :=
1
2
,
_
,
_
=
1
4
x,y,zX
_
(x) (y)
_
2
_
_
(x), (y)
__
(z) (x)
_
K(x, z)
+
2
_
(x), (y)
__
(z) (y)
_
K(y, z)
_
K(x, y)(x)
1
2
x,y,zX
_
K(x, z)
_
(z) (x)
_
K(y, z)
_
(z) (y)
_
_
_
(x) (y)
_
(x, y)K(x, y)(x) ,
(4.2)
where
(x, y) :=
1
((x), (y))(x) +
2
((x), (y))(y) .
The signicance of B(, ) is mainly due to the following result:
Proposition 4.3. For P
(A) and R
X
we have
Hess 1() ,
_
= B(, ) .
Proof. Take (, ) satisfying the geodesic equations (3.4), so that
Hess 1(
t
)
t
,
t
_
t
=
d
2
dt
2
1(
t
) .
Using the continuity equation we obtain
d
dt
1(
t
) =
1 + log
t
, (
t
t
)
_
log
t
,
t

t
_
t
,
t
_
.
Furthermore,
d
2
dt
2
1(
t
) =
t
,
t
_
t
,
t
t
_
t
,
t
_
t
,
t
t
_
.
Using the continuity equation we obtain
t
,
t
_
(
t
),
t
_
t
,
t
_
t
,
t
_
t
.
Furthermore, applying the geodesic equations (3.4) and the detailed balance
equations (1.1), we infer that
t
,
t
t
_
=
1
2
x,y,zX
_
t
(x)
t
(y)
_
2
t
(x),
t
(y)
_
t
(z)
t
(x)
_
K(x, y)K(x, z)(x)
=
1
4
x,y,zX
_
t
(x)
t
(y)
_
2
_
t
(x),
t
(y)
__
t
(z)
t
(x)
_
K(x, z)
+
2
t
(x),
t
(y)
__
t
(z)
t
(y)
_
K(y, z)
_
K(x, y)(x)
=
1
2
t

t
,
t
_
.
Combining the latter three identities, we arrive at
d
2
dt
2
1(
t
) =
t
,
t
_
t
+
1
2
t

t
,
t
_
,
which is the desired identity.
Our next aim is to show that -convexity of 1along geodesics is equivalent
to a lower bound of the Hessian of 1in P
(A). Since the Riemannian metric

on (P(A), J) degenerates at the boundary, this is not an obvious result.
In particular, in order to prove the implication (4) (3) below we cannot
directly apply the equivalence between the so-called EVI (4.4) and the usual
gradient ow equation, which holds on complete Riemannian manifolds (see,
e.g., [43, Proposition 23.1]). Therefore, we take a dierent approach, based
on an argument by Daneri and Savare [16], which avoids delicate regularity
issues for geodesics. An additional benet of this approach is that we expect
it to apply in a more general setting where the underlying space A is innite,
and nite-dimensional Riemannian techniques do not apply at all.
Remark 4.4. The quantity B(, ) arises naturally in the Eulerian approach
to the Wasserstein metric, as developed in [16, 38]. In fact, in a crucial
argument from [16], the authors consider a certain two-parameter family of
measures (
s
t
) and functions (
s
t
) on a Riemannian manifold /, and show
that
s
1(
s
t
) +
1
2
t
_
[
s
t
[
2
d
s
t
= B(
s
t
,
s
t
) , (4.3)
where
B(, ) :=
_
M
_
1
2
([[
2
) ,
_
d .
Since Bochners formula asserts that
B(, ) :=
_
M
[D
2
[
2
+ Ric(, ) d ,
one obtains a lower bound on B if the Ricci curvature is bounded from
below. The lower bound on B can be used to prove an evolution variational
inequality, which in turn yields convexity of the entropy along W
2
-geodesics.
In our setting, the quantity B(, ) can be regarded as a discrete analogue
of B(, ). Therefore the inequality B(, ) /(, ) could be interpreted
as a one-sided Bochner inequality, which allows to adapt the strategy from
[16] to the discrete setting.
In the following result and the rest of the paper we shall use the notation
d
+
dt
f(t) = limsup
h0
f(t + h) f(t)
h
.
Theorem 4.5. Let R. For an irreducible and reversible Markov kernel
(A, K) the following assertions are equivalent:
(1) Ric(K) ;
(2) For all , P(A), the following evolution variational inequality
holds for all t 0:
1
2
d
+
dt
J
2
(P
t
, ) +

2
J
2
(P
t
, ) 1() 1(P
t
) ; (4.4)
(3) For all , P
(A), (4.4) holds for all t 0;

(4) For all P
(A) and R
X
we have
B(, ) /(, ) .
(5) For all P
(A) we have
Hess 1() ;
(6) For all
0
,
1
P
(A) there exists a constant speed geodesic (

t
)
t[0,1]
satisfying
0
=
0
,
1
=
1
, and (4.1).
Proof. (3) (2): This is a special case of [16, Theorem 3.3].
(2) (1): This follows by applying [16, Theorem 3.2] to the metric
space (P(A), J) and the functional 1.
(1) (6): This is clear in view of Theorem 3.2.
(6) (5): Take P
(A) and R
X
and consider the unique
solution (
t
,
t
)
t(,)
to the geodesic equations with
0
= and
0
=
on a suciently small time interval around 0. Using the local uniqueness of
geodesics and (6), we infer that
Hess 1()() =
d
2
dt
2
t=0
1(
t
) ||
2
(see, e.g., the implication (ii) (i) in [43, Proposition 16.2]).

(5) (4): This follows from Proposition 4.3.
(4) (3): We follow [16]. In view of Lemma 2.9 we can nd a smooth
curve (
) (c
1
(, ) satisfying
_
1
0
/(
s
,
s
) ds < J(, )
2
+ . (4.5)
Note in particular that s
s
and s
s
are suciently regular to apply
Lemma 4.6 below. Using the notation from this lemma, we infer that
1
2
t
/(
s
t
,
s
t
) +
s
1(
s
t
) = sB(
s
t
,
s
t
) .
Using the assumption that B / we infer that
1
2
t
_
e
2st
/(
s
t
,
s
t
)
_
+
s
_
e
2st
1(
s
t
)
_
2te
2st
1(
s
t
) .
Integration with respect to t [0, h] and s [0, 1] yields
1
2
_
1
0
_
e
2sh
/(
s
h
,
s
h
) /(
s
0
,
s
0
)
_
ds
+
_
h
0
_
e
2t
1(
1
t
) 1(
0
t
)
_
dt 2
_
1
0
_
h
0
te
2st
1(
s
t
) dt ds .
Arguing as in [16, Lemma 5.1] we infer that
_
1
0
e
2sh
/(
s
h
,
s
h
) ds m(h)J
2
(P
h
, ) ,
where m() =
e
sinh()
. Using (4.5) together with the fact that the entropy
decreases along the heat ow, we infer that
m(h)
2
J
2
(P
h
, )
1
2
J
2
(, )
+ E
(h)1(P
h
) h1() 2
_
1
0
_
h
0
te
2st
1(
s
t
) dt ds ,
(4.6)
where E
(h) :=
_
h
0
e
2t
dt. Since 1 is bounded, it follows that
lim
h0
1
h
_
1
0
_
h
0
te
2st
1(
s
t
) dt ds = 0 .
Furthermore,
lim
h0
1
h
_
E
(h)1(P
h
) h1()
_
= 1() 1() .
Since > 0 is arbitrary, (4.6) implies that
d
+
dh
h=0
_
m(h)
2
J
2
(P
h
, )
_
+1() 1() 0 .
Taking into account that
d
+
dh
h=0
_
m(h)
2
J
2
(P
h
, )
_
=

2
J
2
(, ) +
1
2
d
+
dh
h=0
J
2
(P
h
, ) ,
we obtain (4.4) for t = 0, which clearly implies (4.4) for all t 0.
The following result, which is used in the proof of Theorem 4.5, is a
discrete analogue of (4.3) and the proof proceeds along the lines of [16,
Lemma 4.3]. Since the details are slightly dierent in the discrete setting,
we present a proof for the convenience of the reader.
Lemma 4.6. Let
s
s[0,1]
be a smooth curve in P(A). For each t 0,
set
s
t
:= e
st
s
, and let
s
t
s[0,1]
be a smooth curve in R
X
satisfying the
continuity equation
s
t
+ (
s
t

s
t
) = 0 , s [0, 1] .
Then the identity
1
2
t
/(
s
t
,
s
t
) +
s
1(
s
t
) = sB(
s
t
,
s
t
)
holds for every s [0, 1] and t 0.
Proof. First of all, we have
s
1(
s
t
) =
1 + log
s
t
,
s
s
t
_
1 + log
s
t
, (
s
t
)
_
log
s
t
,
s
t

s
t
_
s
t
,
s
t
_
s
t
,
s
t
_
.
(4.7)
Furthermore,
1
2
t
/(
s
t
,
s
t
) =

s
t

t
s
t
,
s
t
_
+
1
2
t

s
t

s
t
,
s
t
_
=: I
1
+ I
2
.
In order to simplify I
1
we claim that
_
(
t

s
t
)
s
t
_
_

s
t

t
s
t
_
=
s
t
s
_
(
s
t

s
t
)
_
, (4.8)
t

s
t
= s
s
t
. (4.9)
To show (4.8), note that the left-hand side equals
t
s
t
, while the right-
hand side equals
s
s
t
. The identity (4.9) follows from a straightforward
calculation.
Integrating by parts repeatedly and using (4.7), (4.8) and (4.9), we obtain
I
1
=
s
t
, (
s
t

t
s
t
)
_
s
t
,
s
t
_
s
t
,
_
(
s
t
s
t
)
__
s
t
,
_
(
t

s
t
)
s
t
__
=
s
1(
s
t
) + s

s
t

s
t
,
s
t
_
s
t

s
t
,
s
t
_
.
Taking into account that
I
2
=
s
2
s
t

s
t
,
s
t
_
,
the result follows by summing the expressions for I
1
and I
2
.
The evolution variational inequality (4.4) has been extensively studied
in the theory of gradient ows in metric spaces [1]. It readily implies a
number of interesting properties for the associated gradient ow (see, e.g.,
[16, Section 3]). Among them we single out the following -contractivity
property.
Proposition 4.7 (-Contractivity of the heat ow). Let (A, K) be an irre-
ducible and reversible Markov kernel satisfying Ric(K) for some R.
Then the associated continuous time Markov semigroup (P
t
)
t0
satises
J(P
t
, P
t
) e
t
J(, )
for all , P(A) and t 0.
Proof. This follows by applying [16, Proposition 3.1] to the functional 1 on
the metric space (P(A), J).
5. Examples
In this section we give explicit lower bounds on the non-local Ricci cur-
vature in several examples. Moreover, we present a simple criterion (see
Proposition 5.4) for proving non-local Ricci curvature bounds. Although
the assumptions seem restrictive, the criterion allows to obtain the sharp
Ricci bound for the discrete hypercube. Moreover, it can be combined with
the tensorisation result from Section 6 in order to prove Ricci bounds in
other nontrivial situations. To get started let us consider a particularly
simple example.
Example 5.1 (The complete graph). Let /
n
denote the complete graph on n
vertices and let K
n
be the simple random walk on /
n
given by the transition
kernel K(x, y) =
1
n
for all x, y /
n
. Note that in this case is the uniform
measure. We will show that Ric(K
n
)
1
2
+
1
2n
. In view of Theorem 4.5 we
have to show B(, ) (
1
2
+
1
2n
)/(, ) for all P
(A) and R
X
.
Recall the denition (4.2) of the quantity B. We calculate explicitly :
,
_
=
1
2
1
n
3
x,y,zX
(x, y)(y, x)
_
(x, z) (y, z)
_
=
1
2
1
n
2
x,y
(x, y)
_
(x, y)
_
2
= /(, ) .
With the notation
i
(x, y) =
i
((x), (y)) and using equation (2.2) we
obtain further
,
_
=
1
2
1
n
3
x,y,z
_
(x, y)
_
2
_

1
(x, y)((z) (x))
+
2
(x, y)((z) (y))
_
= /(, ) +
1
2
1
n
3
x,y,z
_
(x, y)
_
2
_

1
(x, y)(z)
+
2
(x, y)(z)
_
.
Keeping only the terms with z = x (resp. z = y) in the last sum and using
(2.2) again we see
,
_

_
1
n
1
_
/(, ) .
Summing up we obtain B (
1
2
(
1
n
1) + 1)/, which yields the claim.
For the rest of this section we let K be an irreducible and reversible
Markov kernel on a nite set A. In order to state the criterion and to
perform calculations, it will be convenient to write a Markov chain in terms
of allowed moves rather than jumps from point to point.
Let G be a set of maps from A to itself (the allowed moves) and consider
a function c : A G R
+
(representing the jump rates).
Denition 5.2. We call the pair (G, c) a mapping representation of K if
the following properties hold:
(1) The generator = K Id can be written in the form
(x) =
(x)c(x, ) , (5.1)
where
(x) = (x) (x) .

(2) For every G there exists a unique
1
G satisfying
1
((x)) =
x for all x with c(x, ) > 0.
(3) For every F : A G R we have
xX,G
F(x, )c(x, )(x) =
xX,G
F(x,
1
)c(x, )(x) . (5.2)
Remark 5.3. This denition is close in spirit to the recent work [14], where
2
-type calculations have been performed in order to prove strict convexity
of the entropy along the heat ow in a discrete setting. Here, we essentially
compute the second derivatives of the entropy along J-geodesics. Since
the geodesic equations are more complicated than the heat equation, the
expressions that we need to work with are somewhat more involved.
Every irreducible, reversible Markov chain has a mapping representation.
In fact, an explicit mapping representation can be obtained as follows. For
x, y A consider the bijection t
{x,y}
: A A that interchanges x and y and
keeps all other points xed. Then let G be the set of all these transposi-
tions and set c(x, t
{x,y}
) = K(x, y) and c(x, t
{y,z}
) = 0 for x / y, z. Then
(G, c) denes a mapping representation. However, in examples it is often
more natural to work with a dierent mapping representation involving a
smaller set G, as we shall see below.
It will be useful to formulate the expressions for /and B in this formalism.
For this purpose, we note that (5.1) implies that
yX
F(x, y)K(x, y) =
G
F(x, x)c(x, )
for any F : A A R vanishing on the diagonal. As a consequence we
obtain
/(, ) =
1
2
xX,G
_
(x)
_
2
(x, x)c(x, )(x) (5.3)
and
,
=
1
2
xX
,G
(x)
_
(x)c(x, )
(x)c(x, )
_
(x, x)c(x, )(x) .
(5.4)
Setting for convenience
i
((x), (y)) =:
i
(x, y) for i = 1, 2 we further get
1
2
=
1
4
x,,
_
(x)
_
2
_

1
(x, x)
(x)c(x, )
+
2
(x, x)
(x)c(x, )
_
c(x, )(x) .
(5.5)
Now the expression for B(, ) is obtained as the dierence of the preceding
two expressions.
We are now ready to state the announced criterion, which shall be used
in Examples 5.6 and 5.7 below. Intuitively, condition ii) expresses a certain
spatial homogeneity, saying that the jump rate in a given direction is the
same before and after another jump.
Proposition 5.4. Let K be an irreducible and reversible Markov kernel
on a nite set A and let (G, c) be a mapping representation. Consider the
following conditions:
i) = , for all , G,
ii) c(x, ) = c(x, ) , for all x A, , G,
iii) = id , for all G.
If i) and ii) are satised, then Ric(K) 0. If moreover iii) is satised, then
Ric(K) 2C, where
C := minc(x, ) : x A, G such that c(x, ) > 0 .
Remark 5.5. Note that requiring i) and iii) simultaneously imposes a very
strong restriction on the graph associated with K. We prefer to state the
result in this form in order to give a unied proof which applies both to
the discrete circle and the discrete hypercube, with optimal constant in the
latter case.
Proof of Proposition 5.4. In view of Theorem 4.5 it suces to show that
B(, ) 0 resp. B(, ) 2C/(, ) for all P
(A) and R
X
.
First recall that
B(, ) = ,
+
1
2
=: T
1
+ T
2
.
Using (5.4) and conditions i) and ii) we can write the rst summand as
T
1
=
1
2
x,,
(x)
_
(x)
(x)
_
(x, x)c(x, )c(x, )(x)
=
1
2
x,,
(x)
_
(x)
(x)
_
(x, x)c(x, )c(x, )(x) .
In a similar way we shall write the second summand. Starting from (5.5)
and invoking ii) and equation (2.2) from Lemma 2.2, we obtain
T
2
=
1
4
x,,
_
(x)
_
2
_

1
(x, x)
(x)
+
2
(x, x)
(x)
_
c(x, )c(x, )(x)
=
1
4
x,,
_
(x)
_
2
_

1
(x, x)(x)
+
2
(x, x)(x) (x, x)
_
c(x, )c(x, )(x) .
Using the reversibility of K in the form of (5.2), and again condition ii) we
can write
T
2
=
1
4
x,,
_
_
(x)
_
2
_

1
(x, x)(x) +
2
(x, x)(x)
_
(x)
_
2
(x, x)
_
c(x, )c(x, )(x) .
Adding a zero we obtain
T
2
=
1
4
x,,
_
_
(x)
_
2
(x)
_
2
_
(x, x)c(x, )c(x, )(x)
+
1
4
x,,
_
(x)
_
2
_

1
(x, x)(x) +
2
(x, x)(x)
(x, x)
_
c(x, )c(x, )(x)
=: T
3
+ T
4
.
Invoking the inequality (2.3) from Lemma 2.2, we immediately see that
T
4
0. Hence we get
B(, ) T
1
+ T
3
=
1
4
x,,
_
(x)
(x)
_
2
(x, x)c(x, )c(x, )(x)
0 .
If moreover condition iii) is satised, the latter estimate can be improved
by keeping only the terms with = in the last sum. We thus obtain
B(, )
C
4
x,
_
2
(x)
_
2
(x, x)c(x, )(x)
= 2C/(, ) .
Let us now consider some examples to which Proposition 5.4 can be ap-
plied.
Example 5.6 (The discrete circle). Consider the simple random walk on
the discrete circle C
n
= Z/nZ of n sites given by the transition kernel
K(m, m 1) = K(m, m + 1) =
1
2
for m C
n
. We have the following
mapping representation for K. Set G = +, where +(m) = m + 1 and
(m) = m 1 and let c(m, +) = c(m, ) =
1
2
for all m. Proposition 5.4
immediately yields that Ric(K) 0.
Example 5.7 (The discrete hypercube). Let Q
n
= 0, 1
n
be the hypercube
endowed with the usual graph structure and let K
n
be the kernel of the
simple random walk on Q
n
. The natural mapping representation is given
by G =
1
, . . . ,
n
, where
i
: Q
n
Q
n
is the map that ips the i-
th coordinate, and c(x,
i
) =
1
n
for all x Q
n
. Here the criterion from
Proposition 5.4 yields Ric(K
n
)
2
n
. We shall see in Section 7 that this
bound is optimal.
Alternatively we can use the fact that Q
n
is a product space and use
the tensorisation property Theorem 6.2 below. This will allow to consider
asymmetric random walks on the hypercube as well.
6. Basic constructions
In this section we show how non-local Ricci curvature bounds transform
under some basic operations on a Markov kernel. The main result is The-
orem 6.2, which yields Ricci bounds for product chains. We start with a
simple result, that shows how Ricci bounds behave under adding laziness.
Let K be an irreducible and reversible Markov kernel on a nite set A. For
(0, 1) we consider the lazy Markov kernel dened by K
:= (1)I+K.
Clearly, K
is irreducible and reversible with the same invariant measure .

With this notation, we have the following result:
Proposition 6.1 (Laziness). Let (0, 1). If Ric(K) for some R,
then the lazy kernel K
satises
Ric(K
) .
Proof. Writing /
and B
to denote the lazy versions of / and B, a direct

calculation shows that
/
(, ) = /(, ) , B
(, ) =
2
B(, )
for all P
(A) and R
X
. As a consequence,
B
(, ) /
(, ) =
2
(B(, ) /(, )) .
The result thus follows from Theorem 4.5.
We now give a tensorisation property of lower Ricci bounds with respect to
products of Markov chains. For i = 1, . . . , n, let (A
i
, K
i
) be an irreducible,
reversible nite Markov chain with steady state
i
, and let
i
be a non-
negative number satisfying
n
i=1
i
= 1. The product chain K
on the
product space A =
i
A
i
is dened for x = (x
1
, . . . , x
n
) and y = (y
1
, . . . , y
n
)
by
K
(x, y) =
_
_
n
i=1
i
K
i
(x
i
, x
i
) , if x
i
= y
i
i ,
i
K
i
(x
i
, y
i
) , if x
i
,= y
i
and x
j
= y
j
j ,= i ,
0 , otherwise .
Note that the steady state of K
is the product =
1

n
of the
steady states of K
i
.
Theorem 6.2 (Tensorisation). Assume that Ric(K
i
)
i
for i = 1, . . . , n.
Then we have
Ric(K
) min
i
i
.
Proof. In view of Theorem 4.5 we have to show that for any P
(A) and
: A R :
B(, ) (min
i
i
)/(, ) .
We will use a mapping representation for the Markov kernel K
as in-
troduced in Section 5. Let (G
i
, c
i
) be mapping representations of K
i
for
i = 1, . . . , n. To each G
i
we associate a map

: A A by letting
act on the i-th coordinate. Let us set G =
: G
i
and dene
c : A G R
+
by
c(x,

) :=
i
c
i
(x
i
, ) , for G
i
.
One easily checks that (G, c) is a mapping representation of K
. Recalling
the expressions (5.4),(5.5) which constitute B in mapping representation we
write
B(, ) =:
xX,,G
F(x, , ) .
Taking into account the product structure of the chain we can write
B(, ) =
n
i,j=1
B
i,j
with B
i,j
=
xX
G
i
,G
j
F(x,

, ) .
The proof will be nished if we prove the following two assertions:
i) B
i,j
0 for all i ,= j ,
ii)
n
i=1
B
i,i
(min
i
i
)/(, ) .
To show i), rst note that for G
i
and G
j
the maps

and act on
dierent coordinates if i ,= j. Thus we have

=

and furthermore
c(
x, ) = c(x, ). Note that these are precisely the properties used in

the proof Proposition 5.4, hence the assertion here follows from the same
arguments.
Let us now show ii). We set

A
i
=
j=i
A
j
. For x
i

A
i
we let
x
i
,
x
i
:
A
i
R denote the functions and where all variables except x
i
are
xed to x
i
. Note that
x
i
does not necessarily belong to P(A
i
), but this
will be irrelevant in the calculation below, and we shall use expressions as
/(
x
i
,
x
i
) by abuse of notation. We also set
i
=
j=i
j
. Using once
more the product structure of the chain c we see :
/(, )
=
1
2
n
i=1
xX,G
i
_
(x)
_
2
(x,

x)c(x,

)(x)
=
1
2
n
i=1
x
i

X
i
x
i
X
i
,G
i
_
x
i
(x
i
)
_
2
x
i
(x
i
, x
i
)
i
c
i
(x
i
, )
i
(x
i
)
i
( x
i
)
=
n
i=1
x
i

X
i
/
i
(
x
i
,
x
i
)
i
( x
i
) ,
where /
i
(resp. B
i
) denotes the function / (resp. B) associated with the
ith chain. Similarly we obtain
B
i,i
=
2
i
x
i

X
i
B
i
(
x
i
,
x
i
)
i
( x
i
)

2
i
x
i

X
i
/
i
(
x
i
,
x
i
)
i
( x
i
) ,
where the last inequality holds by assumption on the curvature bound for
K
i
. Summing over i = 1, . . . , n we obtain ii).
We shall now apply Theorem 6.2 to asymmetric random walks on the
discrete hypercube. Here we consider the case where is the logarithmic
mean. For p, q (0, 1) let K
p,q
be the Markov kernel on the two point space
0, 1 dened by K(0, 1) = p, K(1, 0) = q. The asymmetric random walk is
the n-fold product chain on Q
n
denoted by K
p,q,n
where
i
=
1
n
). Note that
the steady state of K
p,q,n
is the Bernoulli measure
_
(1 )
{0}
+
{1}
_
n
with parameter =
p
p+q
. We then have the following bound on the non-local
Ricci curvature:
Proposition 6.3. For n 1 we have
Ric(K
p,q,n
)
1
n
_
p + q
2
+
pq
_
.
Proof. The two-point space Q
1
= 0, 1 has been analysed in detail in [28].
In particular, [28, Proposition 2.12] asserts that Ric(K
p,q,1
)
p,q,n
, where
p,q,n
=
p + q
2
+ inf
1<<1
_
1
1
2
q(1 + ) p(1 )
log q(1 + ) log p(1 )
_
.
In order to estimate the right-hand side, we use the logarithmic-geometric
mean inequality to obtain for (1, 1),
1
1
2
q(1 + ) p(1 )
log q(1 + ) log p(1 )

_
pq
1
2

pq
We thus infer that Ric(K
p,q,1
)
p+q
2
+
pq. The general bound then follows

immediately from Theorem 6.2.
We shall see in Section 7 that this bound is sharp if p = q. If p ,= q, it
should be possible to improve this bound by obtaining a sharper bound in
the minimisation problem in the proof above.
As another application of the tensorisation result, we prove nonnegativity
of the non-local Ricci curvature for the simple random walk on a discrete
torus of arbitrary size in any dimension d 1.
Let c := c
n
d
n=1
be a sequence of natural numbers and consider the
discrete torus
T
c
:= C
c
1
. . . C
c
d
.
The simple random walk K
c
on T
c
is the d-fold product of simple random
walks on the circles of length c
1
, . . . , c
d
.
Proposition 6.4 (d-dimensional torus). For any d 1 and c := c
n
d
n=1

N
d
we have
Ric(K
c
) 0 .
Proof. This follows from Example 5.6 and Theorem 6.2.
7. Functional inequalities
The aim of this section is to prove discrete counterparts to the celebrated
theorems by Bakry-
Emery and Otto-Villani. Along the way we prove a

discrete version of the HWI-inequality, which relates the L
2
-Wasserstein
distance to the entropy and the Fisher information. As announced in the
introduction, we shall follow the approach from Otto-Villani, which relies
on the fact that the heat ow is the gradient ow of the entropy. Therefore,
the role of the L
2
-Wasserstein distance will be taken over by the distance
J.
We x a nite set A and an irreducible and reversible Markov kernel K
with steady state . Recall that the relative entropy of a density P(A)
is dened by
1() =
xX
(x) log (x)(x) .
As before, we consider a discrete analogue of the Fisher information, given
for P
(A) by
J() =
1
2
x,yX
_
(x) (y)
__
log (x) log (y)
_
K(x, y)(x) .
If (x) = 0 for some x A, we set J() = +. Note that this quantity
can be rewritten in the form J() = |log |
2
using the denition of the

logarithmic mean. The relevance of J in this setting is due to the fact that
it describes the entropy dissipation along the heat ow:
d
dt
1(P
t
) = J(P
t
) . (7.1)
The following proposition gives an upper bound for the speed of the heat
ow measured in the metric J.
Proposition 7.1. Let , P(A). For all t > 0 we have
d
+
dt
J(P
t
, )
_
J(P
t
) . (7.2)
In particular, the metric derivative of the heat ow with respect to J satises
[(P
t
)
[
_
J(P
t
). If belongs to P
(A), then (7.2) holds at t = 0 as well.

Proof. Let us set
t
:= P
t
. Elementary Markov chain theory guarantees
that
t
P
(A) for all t > 0 and that the map t

t
is smooth. To prove
(7.2) we use the triangle inequality and obtain
d
+
dt
J(
t
, ) = limsup
s0
1
s
(J(
t+s
, ) J(
t
, ))
limsup
s0
1
s
J(
t
,
t+s
) .
Note that the couple (
r
, log
r
)
r[0,1]
solves the continuity equation (1.2).
From the denition of J we thus obtain the estimate
limsup
s0
1
s
J(
t
,
t+s
) limsup
s0
1
s
t+s
_
t
| log
r
|
r
dr
= limsup
s0
1
s
t+s
_
t
_
J(
r
) dr
=
_
J(
t
) .
The last equality holds since r
_
J(
r
) is a continuous function.
Let us now recall from Section 1 the functional inequalities that will be
studied. Recall that 1 P(A) denotes the density of the stationary distri-
bution, which is everywhere equal to 1.
Denition 7.2. The Markov kernel K satises
(1) a modied logarithmic Sobolev inequality with constant > 0 if for
all P(A)
1()
1
2
J() . (MLSI())
(2) an HJI inequality with constant R if for all P(A)
1() J(, 1)
_
J()

2
J(, 1)
2
. (HJI())
(3) a modied Talagrand inequality with constant > 0 if for all
P(A)
J(, 1)
_
2
1() . (T
W
())
(4) a Poincare inequality with constant > 0 if for all R
X
with
x
(x)(x) = 0
||
2
||
2
. (P())
The following result is a discrete analogue of a result by Otto and Villani
[37].
Theorem 7.3. Assume that Ric(K) for some R. Then K satises
HJI().
Proof. Fix P(A). Without restriction we can assume that > 0 since
otherwise J() = + and there is nothing to prove. Let
t
= P
t
where
P
t
= e
t(KI)
is the heat semigroup. From Theorem 4.5 and the lower bound
on the Ricci curvature we know that the curve (
t
) satises EVI(), i.e.,
equation (4.4). Choosing in particular = 1 and t = 0 in the EVI we
obtain the inequality
1()
1
2
d
+
dt
t=0
J(
t
, 1)
2

2
J(, 1)
2
.
To nish the proof we show that
1
2
d
+
dt
t=0
J(
t
, 1)
2
J(, 1)
_
J() .
Indeed, using the triangle inequality we estimate
1
2
d
+
dt
t=0
J(
t
, 1)
2
= liminf
s0
1
2s
_
J(, 1)
2
J(
s
, 1)
2
_
limsup
s0
1
2s
_
J(,
s
)
2
+ 2J(,
s
) J(, 1)
_
,
Using the estimate (7.2) from Proposition 7.1 with = and t = 0 we see
that the second term on the right hand side is bounded by J(, 1)
_
J()
while the rst term vanishes.
The following result is now a simple consequence.
Theorem 7.4 (Discrete Bakry-
Emery Theorem). Assume that Ric(K)

for some > 0. Then K satises MLSI().
Proof. By Theorem 7.3 K satises HJI(). From this we derive MLSI()
by an application of Youngs inequality :
xy cx
2
+
1
4c
y
2
x, y R , c > 0 ,
in which we set x = J(, 1), y =
_
J() and c =

2
.
Theorem 7.5 (Discrete Otto-Villani Theorem). Assume that K satises
MLSI() for some > 0. Then K also satises T
W
().
Proof. It is sucient to prove that T
W
() holds for any P
(A). The
inequality for general can then be obtained by an easy approximation argu-
ment taking into account the continuity of J with respect to the Euclidean
metric.
So x P
(A) and set

t
= P
t
. First note that as t , we have
1(
t
) 0 and J(,
t
) J(, 1) . (7.3)
Indeed, by elementary Markov chain theory, we know that as t , one
has
t
1 in, say, the Euclidean distance. The claim follows immediately
from the continuity of 1 and J with respect to the Euclidean distance, the
latter being a consequence of, e.g., Proposition 2.14.
We now dene the function F : R
+
R
+
by
F(t) := J(,
t
) +
_
2
1(
t
) .
Obviously we have F(0) =
_
2
1() and by (7.3) we have that F(t)

J(, 1) as t . Hence it is sucient to show that F is non-increasing.
To this end we show that its upper right derivative is non-positive. If
t
,= 1
we deduce from Proposition 7.1 that
d
+
dt
F(t)
_
J(
t
)
J(
t
)
_
21(
t
)
0 ,
where we used MLSI() in the last inequality. If
t
= 1, then the relation
also holds true, since this implies that
r
= 1 for all r t.
In a classical continuous setting it is well known that a logarithmic Sobolev
inequality implies a Poincare inequality by linearisation. Let us make this ex-
plicit in the present discrete context. Fix R
X
satisfying
x
(x)(x) =
0 and for suciently small > 0 set
= 1 + P
(A). One easily

checks that as 0 we have:
1
2
1(
)
1
2
||
2
,
1
2
J(
) ||
2
.
Thus assuming MLSI() holds and applying it to
we get the Poincare

inequality P(). In [37] it has been shown that the Poincare inequality can
also be obtained from Talagrands inequality by linearisation. The same is
true for the modied Talagrand inequality involving the distance J.
Proposition 7.6. Assume that K satises T
W
() for some > 0. Then
K also satises P(). In particular, Ric(K) implies P().
Proof. Assume that T
W
() holds and let us show P(). The second assertion
of the proposition then follows from Theorem 7.4 and Theorem 7.5. So x
R
X
satisfying
x
(x)(x) = 0 and for suciently small > 0 set
= 1 + P
(A). Let (
, V

) (c
1
(
, 1) be an action minimizing
curve. Now we write using the continuity equation
x
(x)
2
(x) =
1
x
(x)
_
(x) 1
_
(x)
_
=
1
2
_
1
0
x,y
(x, y)V

t
(x, y)K(x, y)(x) dt .
Using Holders inequality we can estimate
x
(x)
2
(x)
1
__
1
0
| |
2
t
dt
_
1
2
__
1
0
/
t
, V

t
) dt
_
1
2
=
_
1
2
x,y
_
(x, y)
_
2
g
(x, y)K(x, y)(x)

_1
2
1
J(
, 1) ,
where g
R
XX
is dened by g
(x, y) =
_
1
0

t
(x, y) dt. Using T
W
() we
arrive at
| |
2
| ()
_
2
1(
) .
The proof will be nished if we show that as goes to 0
1
_
2
1(
)
_
1
| |
, | ()
| |
.
As before the rst statement is easily checked. For the second statement it
is sucient to show that
t
1 uniformly in t as 0, as this implies
that g
1. Since J(
, 1) 0 as 0, this follows immediately from

the estimate
J(
, 1) sup
t
J(
t
, 1) sup
t
x
(x)[
t
(x) 1[ ,
where we used that (
t
)
t[0,1]
is a geodesic and the fact that J is an upper
bound for the total variation distance (see Proposition 2.12).
In the following result we use the probabilistic notation
E
[] =
xX
(x)(x)
for functions : A R.
Proposition 7.7. Assume that K satises T
W
() for some > 0. Then
the T
1
(2) inequality holds with respect to the graph distance:
W
1,g
(, 1)
_
1
1() . (7.4)
Furthermore, the sub-Gaussian inequality
E
_
e
t(E[])
exp
_
t
2
4
_
(7.5)
holds for all t > 0 and every function : A R that is 1-Lipschitz with
respect to the graph distance on A.
Proof. The T
1
-inequality (7.4) follows immediately from Proposition 2.12.
The inequalities (7.4) and (7.5) are equivalent, as has been shown in [8].
Arguing again exactly as in [37], we infer that a modied Talagrand in-
equality implies a modied log-Sobolev inequality (with some loss in the
constant), provided that the non-local Ricci curvature is not too bad.
Proposition 7.8. Suppose that K satises T
W
() for some > 0 and that
Ric(K) for some > . Then K satises MLSI(
), where
= max
_
4
_
1 +

_
2
,
_
.
Proof. This is an immediate consequence of the HJI()-inequality and an
elementary computation (see [37, Corollary 3.1]).
As an application of the results proved in this section, we will show how
non-local Ricci curvature bounds can be used to recover functional inequal-
ities with sharp constants in an important example.
Example 7.9 (Discrete hypercube). In Example 5.7 and Proposition 6.3
we proved that the Markov kernel K
n
associated with the simple random
walk on the discrete hypercube Q
n
= 0, 1
n
has non-local Ricci curvature
bounded from below by
2
n
. Applying Theorem 7.4 and Proposition 7.7 in this
setting we obtain the following result. We shall write y x if K(x, y) > 0.
Corollary 7.10. The simple random walk on Q
n
has the following proper-
ties:
(1) the modied log-Sobolev inequality MLSI(
2
n
) holds, i.e., for all
P
(Q
n
) we have
xQ
n
(x) log (x)
1
8
xQ
n
,yx
((x) (y)) (log (x) log (y)) .
(2) the Poincare inequality P(
2
n
) holds, i.e., for all : Q
n
R we have
xQ
n
(x)
2
1
4
xQ
n
,yx
((x) (y))
2
.
(3) The sub-Gaussian inequality (7.5) holds with =
2
n
.
In all cases the constants are optimal (see [11, Example 3.7] and [9, Propo-
sition 2.3] respectively). Moreover, the optimality in (3) implies that the
constant =
2
n
in the modied Talagrand inequality for the discrete cube
is sharp as well.
We nish the paper by remarking that modied logarithmic Sobolev in-
equalities for appropriately rescaled product chains on the discrete hyper-
cube 1, 1
n
can be used to prove a similar inequality for Poisson measures
by passing to the limit n (see [25, Section 5.4] for an argument along
these lines involving a slightly dierent modied log Sobolev inequality). All
of the functional inequalities in Theorem 1.5 are compatible with this limit.
However, the sub-Gaussian estimate will (of course) not hold for the limiting
Poisson law. This does not contradict the results in this section, since the
sub-Gaussian estimates here are obtained using the lower bound for J in
terms of W
1
, which relies on the normalisation assumption
y
K(x, y) = 1,
which does not hold in the Poissonian limit.
References
[1] L. Ambrosio, N. Gigli, and G. Savare. Gradient ows in metric spaces and in the space
of probability measures. Lectures in Mathematics ETH Z urich. Birkhauser Verlag,
Basel, second edition, 2008.
[2] L. Ambrosio, N. Gigli, and G. Savare. Calculus and heat ow in metric measure spaces
and applications to spaces with ricci bounds from below. Preprint at arXiv:1106.2090,
2011.
[3] L. Ambrosio, N. Gigli, and G. Savare. Metric measure spaces with Riemannian Ricci
curvature bounded from below. Preprint at arXiv:1109.0222, 2011.
[4] C. Ane and M. Ledoux. On logarithmic Sobolev inequalities for continuous time
random walks on graphs. Probab. Theory Related Fields, 116(4):573602, 2000.
[5] D. Bakry and M.

Emery. Diusions hypercontractives. In Seminaire de probabilites,
XIX, 1983/84, volume 1123 of Lecture Notes in Math., pages 177206. Springer,
Berlin, 1985.
[6] F. Bauer, J. Jost, and S. Liu. Ollivier-Ricci curvature and the spectrum of the nor-
malized graph laplace operator. Preprint at arXiv:1105.3803, 2011.
[7] J.-D. Benamou and Y. Brenier. A computational uid mechanics solution to the
Monge-Kantorovich mass transfer problem. Numer. Math., 84(3):375393, 2000.
[8] S. G. Bobkov and F. Gotze. Discrete isoperimetric and Poincare-type inequalities.
Probab. Theory Related Fields, 114(2):245277, 1999.
[9] S. G. Bobkov, C. Houdre, and P. Tetali. The subgaussian constant and concentration
inequalities. Israel J. Math., 156:255283, 2006.
[10] S. G. Bobkov and M. Ledoux. On modied logarithmic Sobolev inequalities for
Bernoulli and Poisson measures. J. Funct. Anal., 156(2):347365, 1998.
[11] S. G. Bobkov and P. Tetali. Modied logarithmic Sobolev inequalities in discrete
settings. J. Theoret. Probab., 19(2):289336, 2006.
[12] A.-I. Bonciocat and K.-Th. Sturm. Mass transportation and rough curvature bounds
for discrete spaces. J. Funct. Anal., 256(9):29442966, 2009.
[13] G. Buttazzo. Semicontinuity, relaxation and integral representation in the calculus of
variations. Pitman Research Notes in Mathematics Series. Longman Scientic and
Technical, Harlow, 1989.
[14] P. Caputo, P. Dai Pra, and G. Posta. Convex entropy decay via the Bochner-Bakry-
Emery approach. Ann. Inst. Henri Poincare Probab. Stat., 45(3):734753, 2009.
[15] S.-N. Chow, W. Huang, Y. Li, and H. Zhou. Fokker-Planck equations for a free energy
functional or Markov process on a graph. Arch. Ration. Mech. Anal., Online First.
[16] S. Daneri and G. Savare. Eulerian calculus for the displacement convexity in the
Wasserstein distance. SIAM J. Math. Anal., 40(3):11041122, 2008.
[17] J. Dolbeault, B. Nazaret, and G. Savare. A new class of transport distances between
measures. Calc. Var. Partial Dierential Equations, 34(2):193231, 2009.
[18] M. Erbar. The heat equation on manifolds as a gradient ow in the Wasserstein space.
Ann. Inst. Henri Poincare Probab. Stat., 46(1):123, 2010.
[19] S. Fang, J. Shao, and K.-Th. Sturm. Wasserstein space over the Wiener space. Probab.
Theory Related Fields, 146(3-4):535565, 2010.
[20] N. Gigli, K. Kuwada, and S.-I. Ohta. Heat ow on Alexandrov spaces. Preprint at
arXiv:1008.1319, 2010.
[21] N. Gozlan and C. Leonard. Transport inequalities. A survey. Markov Process. Related
Fields, 16(4):635736, 2010.
[22] B. Hua, J. Jost, and S. Liu. Geometric analysis aspects of innite semiplanar graphs
with nonnegative curvature. Preprint at arXiv:1107.2826, 2011.
[23] R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of the Fokker-
Planck equation. SIAM J. Math. Anal., 29(1):117, 1998.
[24] J. Jost and S. Liu. Olliviers Ricci curvature, local clustering and curvature dimension
inequalities on graphs. Preprint at arXiv:1103.4037, 2011.
[25] M. Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical
Surveys and Monographs. American Mathematical Society, Providence, RI, 2001.
[26] Y. Lin and S.-T. Yau. Ricci curvature and eigenvalue estimate on locally nite graphs.
Math. Res. Lett., 17(2):343356, 2010.
[27] J. Lott and C. Villani. Ricci curvature for metric-measure spaces via optimal trans-
port. Ann. of Math. (2), 169(3):903991, 2009.
[28] J. Maas. Gradient ows of the entropy for nite Markov chains. J. Funct. Anal.,
261(8):22502292, 2011.
[29] R. J. McCann. A convexity principle for interacting gases. Adv. Math., 128(1):153
179, 1997.
[30] A. Mielke. Geodesic convexity of the relative entropy in reversible Markov chains.
Preprint, 2011.
[31] A. Mielke. A gradient structure for reaction-diusion systems and for energy-drift-
diusion systems. Nonlinearity, 24(4):13291346, 2011.
[32] S.-I. Ohta and K.-Th. Sturm. Heat ow on Finsler manifolds. Comm. Pure Appl.
Math., 62(10):13861433, 2009.
[33] Y. Ollivier. Ricci curvature of metric spaces. C. R. Math. Acad. Sci. Paris,
345(11):643646, 2007.
[34] Y. Ollivier. Ricci curvature of Markov chains on metric spaces. J. Funct. Anal.,
256(3):810864, 2009.
[35] Y. Ollivier and C. Villani. A curved Brunn-Minkowski inequality on the discrete
hypercube. Preprint at arXiv:1011.4779, 2010.
[36] F. Otto. The geometry of dissipative evolution equations: the porous medium equa-
tion. Comm. Partial Dierential Equations, 26(1-2):101174, 2001.
[37] F. Otto and C. Villani. Generalization of an inequality by Talagrand and links with
the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361400, 2000.
[38] F. Otto and M. Westdickenberg. Eulerian calculus for the contraction in the Wasser-
stein distance. SIAM J. Math. Anal., 37(4):12271255 (electronic), 2005.
[39] M.-K. von Renesse and K.-Th. Sturm. Transport inequalities, gradient estimates,
entropy, and Ricci curvature. Comm. Pure Appl. Math., 58(7):923940, 2005.
[40] M. Sammer and P. Tetali. Concentration on the discrete torus using transportation.
Combin. Probab. Comput., 18(5):835860, 2009.
[41] K.-Th. Sturm. On the geometry of metric measure spaces. I and II. Acta Math.,
196(1):65177, 2006.
[42] C. Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Math-
ematics. American Mathematical Society, Providence, RI, 2003.
[43] C. Villani. Optimal transport, Old and new, volume 338 of Grundlehren der Mathe-
matischen Wissenschaften. Springer-Verlag, Berlin, 2009.
University of Bonn, Institute for Applied Mathematics, Endenicher Allee
60, 53115 Bonn, Germany
E-mail address: erbar@iam.uni-bonn.de
E-mail address: maas@iam.uni-bonn.de

Ricci Curvature of Finite Markov Chains Via Convexity of The Entropy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ricci Curvature of Finite Markov Chains Via Convexity of The Entropy

Uploaded by

Copyright:

Available Formats

a

Emery and OttoVillani. Furthermore

(A), then the continuity equation (1.2) is satised

(A), it makes sense

(A), J) and it has been proved in [28]

(A) (see Theorem 4.5 below for the

Emery [5], who proved the corresponding result on Riemannian man-

Emery criterion applies to weighted Riemann-

Emery and OttoVillani.

holds. Here we write, for , R

consisting of all (equivalence classes of) functions : A A R, endowed

, whose negative adjoint is the -divergence oper-

(A) is the Riemannian distance induced

(A) can be identied with the set

by means of the following identication: given a smooth curve

, such that the continuity equation (2.5)(v)

is given by the inner product

(A), then the gradient ow equation (2.7)

> 0, if one replaces the logarithmic mean by

), (ii), (iii), (iv

) For all x A we have in the sense of distributions

) can equivalently be written as

(A), condition (iv) in (2.5) can be reinforced

and using the continuity of it is easy to check that

denote the orthogonal projection in (

onto the range

(A), one can follow the argument in [28, Lemma

. Furthermore, since the orthogonal projection P

(A), the function

) in (2.8) that for s [0, 1] and every x A,

(A) from [28, Proposition 3.4]. Since the interior P

satises the following equations:

(A). Since the Riemannian metric

(A), (4.4) holds for all t 0;

(A) there exists a constant speed geodesic (

(see, e.g., the implication (ii) (i) in [43, Proposition 16.2]).

(x) = (x) (x) .

is irreducible and reversible with the same invariant measure .

to denote the lazy versions of / and B, a direct

x, ) = c(x, ). Note that these are precisely the properties used in

pq. The general bound then follows

Emery and Otto-Villani. Along the way we prove a

using the denition of the

(A), then (7.2) holds at t = 0 as well.

(A) for all t > 0 and that the map t

Emery Theorem). Assume that Ric(K)

(A) and set

1() and by (7.3) we have that F(t)

(A). One easily

we get the Poincare

(x, y)K(x, y)(x)

, 1) 0 as 0, this follows immediately from

You might also like