Professional Documents
Culture Documents
Optimal Transport: Discretization and Algorithms
Optimal Transport: Discretization and Algorithms
ALGORITHMS
Contents
1. Introduction 2
2. Optimal transport theory 6
2.1. The problems of Monge and Kantorovich 6
2.2. Kantorovich duality 8
2.3. Kantorovich’s functional 16
3. Discrete optimal transport 19
3.1. Formulation of discrete optimal transport 19
3.2. Linear assignment via coordinate ascent 20
3.3. Discrete optimal transport via entropic regularization 26
4. Semi-discrete optimal transport 36
4.1. Formulation of semi-discrete optimal transport 36
4.2. Semi-discrete optimal transport via coordinate decrements 43
4.3. Semi-discrete optimal transport via Newton’s method 48
4.4. Semi-discrete entropic transport 57
5. Appendix 63
5.1. Convex analysis 63
5.2. Coarea formula 63
References 64
1
2 QUENTIN MÉRIGOT AND BORIS THIBERT
1. Introduction
The problem of optimal transport, introduced by Gaspard Monge in 1871
[76], was motivated by military applications. The goal was to find the most
economical way to transport a certain amount of sand from a quarry to
a construction site. The source and target distributions of sand are seen
as probability measures, denoted µ and ν, and c(x, y) denotes the cost of
transporting a grain of sand from the position x to the position y, and the
goal is to solve the non-convex optimization problem
Z
(MP) = min c(x, T (x))dµ, (1.1)
T# µ=ν
and ψ c (x) := miny c(x, y) + ψ(y) is the c-transform of ψ, a notion closely re-
lated to the Legendre-Fenchel transform in convex analysis. The function K
is called the Kantorovitch functional. Kantorovich’s duality theorem asserts
that the values of (1.2) and (1.3) (or (1.4)) agree under mild assumptions.
where h(r) = r(log r − 1) and η > 0 is a small parameter, under the con-
straints X X
∀i, γi,j = µi , ∀j, γi,j = νj (1.7)
j i
4 QUENTIN MÉRIGOT AND BORIS THIBERT
This idea has been introduced in the field of optimal transport by Galichon
and Salanié [50] and by Cuturi [35], see [83, Remark 4.5] for a brief historical
account. Adding the entropy of the transport plan makes the problem (1.6)
strongly convex and smooth. The dual problem can be solved efficiently
using Sinkhorn-Knopp’s algorithm, which involves computing repeatedly the
smoothed c-transform
X 1 (−c(x ,y )−ψ(y ))
ψ c,η (xi ) = η log(µi ) − η log eη i j j
. (1.8)
j
C. Distance costs. When the cost c satisfies the triangle inequality, the
dual problem (1.3) can be further simplified:
Z Z
max ψdµ − ψdν, (1.9)
Lipc (ψ)61
where the maximum is taken over functions satisfying |ψ(x) − ψ(y)| 6 c(x, y)
for all x, y. The equality between the values of (1.2) and (1.9) is called
Kantorovich-Rubinstein’s theorem. This leads to very efficient algorithms
when the 1-Lipschitz constraint can be enforced using only local informa-
tion, thus reducing the number of constraints. This is possible when the
space is discrete and the distance is induced by a graph, or when c is the
Euclidean norm or more generally a Riemannian metric. In the latter case,
the maximum in (1.9) can be replaced by a supremum over C 1 functions
ψ satisfying k∇ψk∞ 6 1 [94, 9]. Note that the case of distance costs is
particularly easy because the c-transform of a 1-Lipschitz function is trivial:
ψ c = −ψ.
Monge’s problem exhibits several difficulties, one of which is that both the
transport constraint (T# µ = ν) and the functional are non-convex. Note also
that there might exist no transport map between µ and ν. For instance, if
µ = δx for some x ∈ X, then, T# µ(B) = µ(T −1 (B)) = δT (x) . In particular,
if card(spt(ν)) > 1, there exists no transport map between µ and ν.
Kantorovich’s problem.
Definition 3 (Marginals). The marginals of a measure γ on a product space
X × Y are the measures ΠX# γ and ΠY # γ, where ΠX : X × Y → X and
ΠY : X × Y → Y are their projection maps.
Definition 4 (Transport plan). A transport plan between two probability
measures µ, ν on two metric spaces X and Y is a probability measure γ
on the product space X × Y whose marginals are µ and ν. The space of
transport plans is denoted Γ(µ, ν), i.e.
Γ(µ, ν) = {γ ∈ P(X × Y ) | ΠX# γ = µ, ΠY # γ = ν} .
Note that Γ(µ, ν) is a convex set.
8 QUENTIN MÉRIGOT AND BORIS THIBERT
Example 3 (Product measure). Note that the set of transport plans Γ(µ, ν)
is never empty, as it contains the measure µ ⊗ ν.
Definition 5 (Kantorovich’s problem). Consider two compact metric spaces
X, Y , two probability measures µ ∈ P(X), ν ∈ P(Y ) and a cost function
c ∈ C 0 (X × Y ). Kantorovich’s problem is the following optimization problem
Z
(KP) := inf c(x, y)dγ(x, y) | γ ∈ Γ(µ, ν) (2.16)
X×Y
Remark 1. The infimum in Kantorovich’s problem is less than the infimum
in Monge’s problem. Indeed, to any transport map T between µ and ν one
can associate a transport plan, by letting γT = (id, T )# µ. One can easily
check that ΠX# γT = µ and ΠY # γT = ν so that γT ∈ Γ(µ, ν) is a transport
plan between µ and ν. Moreover, by the definition of push-forward,
Z
hc|γT i = hc|(id, T )# µi = hc ◦ (id, T )|µi = c(x, T (x))dµ
X
thus showing that (KP) 6 (MP).
Proposition 1. Kantorovich’s problem (KP) admits a minimizer.
Proof. The definition of ΠX# γ = µ can be expanded into
∀ϕ ∈ C 0 (X), hϕ ⊗ 1|γi = hϕ|µi,
from which it is easy to see that the set Γ(µ, ν) is weakly closed, and there-
fore weakly compact as a subset of P(X × Y ), which is weakly compact by
Banach-Alaoglu’s theorem. We conclude the existence proof by remarking
that the functional that is minimized in (KP), namely µ 7→ hc|µi, is weakly
continuous by definition.
2.2. Kantorovich duality.
Derivation of the dual problem. The primal Kantorovich problem (KP) can
be reformulated by introducing Lagrange multipliers for the constraints.
Namely, we use that for any γ ∈ M+ (X × Y ),
(
0 if ΠX# γ = µ
sup −hϕ ⊗ 1|γi + hϕ|µi =
ϕ∈C 0 (X) +∞ if not
(
0 if ΠX# γ = µ
sup h1 ⊗ ψ|γi − hψ|µi =
ϕ∈C 0 (X) +∞ if not
to deduce
(
0 if γ ∈ Γ(µ, ν)
sup hϕ|µi − hψ|νi − hϕ ψ|γi =
ϕ∈C 0 (X),ψ∈C 0 (Y ) +∞ if not.
This leads to the following formulation of the Kantorovich problem
(KP) = inf sup hc − (ϕ ψ)|γi + hϕ|µi − hψ|νi
γ∈M+ (X×Y ) (ϕ,ψ)∈C 0 (X)×C 0 (Y )
Note that we will often omit the assumptions that γ ∈ M(X × Y ) and ϕ, ψ
are continuous, when the context is clear. The dual problem can further be
simplified by remarking that
(
0 if ϕ ψ 6 c
inf hc − ϕ ψ|γi =
γ>0 −∞ if not.
Existence of solution for the dual problem. Kantorovich’s dual problem (DP)
consists in maximizing a concave (actually linear) functional under linear
inequality constraints. It can also also easily be turned into an unconstrained
minimization problem. The idea is quite simple: given a certain ψ ∈ C 0 (Y ),
one wishes to select ϕ on X which is as large as possible (to maximize the
term hϕ|µi in (DP)) while satisfying the constraint ϕ ψ 6 c. This constraint
can be rewritten as
∀x ∈ X, ϕ(x) 6 min c(x, y) + ψ(y).
y∈Y
Thanks to this notion of c-transform, one can reformulate the dual problem
(DP) as an unconstrained maximization problem:
Z Z
(DP) = sup ψ c dµ − ψdν. (2.20)
ψ∈C 0 (Y ) X Y
By taking x̃ = x , one gets ψ ccc (x) > ψ c (x), while taking ỹ = y gives us
ψ ccc (x) 6 ψ c (x). The last point is obtained similarly.
Proof of Proposition 3. Let (ϕn , ψn ) be a maximizing sequence for (DP), i.e.
ϕn ψn 6 c and limn→+∞ hϕn |µi − hψn |νi = (DP). Define ϕ̂n = ψnc and
ψ̂n = ϕˆn c . Then ϕ̂n ψ̂n 6 c, ϕn 6 ϕˆn and ψn > ψˆn , which implies
hϕn |µi − hψn |νi 6 −hψn |νi 6 hϕ̂n |µi − hψ̂n |νi,
implying that (ϕ̂n , ψ̂n ) is also a maximizing sequence. Our goal is now to
show that this sequence admits a converging subsequence. We first note that
we can assume that ϕ̂n (x0 ) = 0 for all n, where x0 is a given point in X: if
this is not the case, we replace (ϕ̂n , ψ̂n ) by (ϕ̂n − ϕ̂n (x0 ), ψ̂n + ϕ̂n (x0 ))), which
is also admissible and has the same dual value. In addition, by Lemma 4, the
sequences (ϕ̂n )n and (ψ̂n )n are equicontinuous. By Arzelà-Ascoli’s theorem,
we deduce that they admit subsequences converging respectively to ϕ ∈
C 0 (x) and ψ ∈ C 0 (Y ), which are then maximizers for (DP).
Strong duality and stability of optimal transport plans. We will prove strong
duality first in the case where µ, ν are finitely supported, and will then use a
density argument to deduce the general case. As a byproduct of this theorem,
we get a stability result for optimal transport plans (i.e. a limit of optimal
transport plans is also optimal).
Theorem 5 (Strong duality). Let X, Y be compact metric spaces and c ∈
C 0 (X × Y ). Then the maximum is attained in (DP) and (KP) = (DP).
where the last inequality holds because x belongs to the support of µ. Then,
there exists Nk such that for any N > Nk , hϕk |µN i > 0, implying the
existence of xN ∈ X such that xN ∈ spt(µN ) and d(xN , x) 6 1/k. By a
diagonal argument, this allows to construct a sequence of points (xN )N ∈N
such that xN ∈ spt(µN ) and limN →+∞ xN = x.
Lemma 10. Let X be a compact space and µ ∈ P(X). Then, there exists a
sequence of finitely supported probability measures weakly converging to µ.
Proof. ForSany ε > 0, by compactness there exists N points x1 , . . . , xN such
that X ⊆ i B(xi , ε). We define a partition K1 , . . . , KN of X recursively by
Ki = B(xi , ε) \ (K1 ∪ ... ∪ Ki−1 ) and we introduce
X
µε := µ(Ki )δxi .
16i6N
exists j ∈ {1, . . . , M } such that γij > 0. Using γij πij = 0, we deduce that
so that ϕi − ψj = c(xi , yj ), giving
ϕ̂(xi ) = min c(xi , yk ) + ψk = c(xi , yj ) + ψj = ϕi .
k∈{1,...,N }
Similarly, one can show that ψ̂(yj ) = ψj for all j ∈ {1, . . . , M }. Finally,
P
define γ = ij γij δ(xi ,yj ) ∈ Γ(µ, ν). Then one can check that ϕ̂ ψ̂ 6 c with
equality γ-a.e., so that (KP) = (DP) by Proposition 8.
Proof of Theorem 5. By Lemma 10, there exists a sequence µk ∈ P(X) (resp.
νk ∈ P(Y )) of finitely supported measures which converge weakly to µ (resp.
ν). We denote (KP)k and (DP)k the primal and dual Kantorovich problems
between µk and νk . By Proposition 3, there exists a solution (ϕk , ψk ) of
(DP)k , such that ϕk = ψkc and ψk = ϕck . Moreover, since strong duality
holds for finitely supported measures (Lemma 11), we see (Proposition 8)
that γk is supported on the set
Sk = {(x, y) ∈ X × Y | ϕk (x) − ψk (y) = c(x, y)}.
Adding a constant if necessary, we can also assume that ϕk (x0 ) = 0 for some
point x0 ∈ X. As c-concave functions, ϕk and ψk have the same modulus
of continuity as the cost function c (see Lemma 4), and they are uniformly
bounded (using ϕk (x0 ) = 0). Using Arzelà-Ascoli theorem, we can therefore
assume that up to subsequences, (ϕk ) (resp. (ψk )) converges to some ϕ
(resp ψ) uniformly. Then, one easily sees that ϕ ψ 6 c so that (ϕ, ψ) are
admissible for the dual problem (DP).
By compactness of P(X × Y ), we can assume that the sequence γk ∈
Γ(µk , νk ) converges to some γ ∈ Γ(µ, ν). Moreover, by Lemma 9, every pair
(x, y) ∈ spt(γ) can be approximated by a sequence of pairs (xk , yk ) ∈ spt(γk )
i.e. limk→∞ (xk , yk ) = (x, y). Since γk is supported on Sk one has c(xk , yk ) =
14 QUENTIN MÉRIGOT AND BORIS THIBERT
ϕk (xk ) − ψk (xk ), which gives at the limit c(x, y) = ϕ(x) − ψ(y). We have
just shown that for every point pair (x, y) in spt(γ), c(x, y) = ϕ(x) − ψ(y)
where ϕ, ψ is admissible. By Proposition 8, this shows that γ and (ϕ, ψ) are
optimal for their respective problems and that (KP) = (DP).
Corollary 6 is a direct consequence of Proposition 8 and of the strong
duality (KP) = (DP).
Solution of Monge’s problem for Twisted costs. We now show how to use
Kantorovich duality to prove the existence of optimal transport maps when
the source measure is absolutely continuous on a compact subset of Rd and
when the cost function satisfies the following condition:
Definition 8 (Twisted cost). Let ΩX , ΩY ⊆ Rd be open subsets, and c ∈
C 1 (ΩX × ΩY ). The cost function satisfies the twist condition if
∀x0 ∈ ΩX , the map y ∈ ΩY 7→ v := ∇x c(x0 , y) ∈ Rd is injective, (2.23)
where ∇x c(x0 , y) denotes the gradient of x 7→ c(·, y) at x = x0 . Given
x0 ∈ ΩX and v ∈ Rd , we denote yc (x0 , v) the unique point (if it exists)
such that ∇x c(x0 , yc (x0 , v)) = v. The map v 7→ yc (x0 , v) is often called the
c-exponential map at x0 .
Example 4 (Quadratic cost). Let c(x, y) = kx − yk2 . Then, for any x0 ∈ X,
the map y 7→ ∇x c(x0 , y) = 2(x0 − y) is injective, so that c satisfies the twist
condition. Moreover, given v ∈ Rd , the unique y such that ∇x c(x0 , y) =
2(x0 − y) = v is y = x0 − 21 v, implying that yc (x0 , v) = x0 − 21 v.
The following theorem is due to Brenier [19] in the case of the quadratic
cost (i.e. c(x, y) = kx − yk2 ) and Gangbo-McCann in the general case of
twisted costs [51].
Given X ⊆ ΩX ⊂ Rd , we define P ac (X) as the set of probability measures
on ΩX that are absolutely continuous with respect to the Lebesgue measure,
and with support included in X.
Theorem 12 (Brenier [19], Gangbo-McCann [51]). Let c ∈ C 1 (ΩX ×ΩY ) be a
twisted cost, let X ⊆ ΩX , Y ⊆ ΩY be compact sets, and let (µ, ν) ∈ P ac (X)×
P(Y ). Then, there exists a c-concave function ϕ ∈ Lip(X) such that ν =
T# µ where T (x) = yc (x, ∇ϕ(x)). Moreover, the only optimal transport plan
between µ and ν is γT .
The theorem in [53] holds for the quadratic cost on Rd . It was re-
cently generalized to other cost functions [5].
• Berman [15] proves a global estimate, not assuming the regularity of
Tµ0 but with a worse Hölder exponent, of the form
d−1
kTµ − Tµ0 k2L2 (ρ) 6 C W1 (µ, µ0 )1/2 ,
assuming that ρ is bounded from below on a compact convex do-
main of Rd , when the cost is quadratic. The constant then C only
depends on X, Y and ρ. Recently a similar bound with an exponent
independent on the dimension was obtained by Mérigot, Delalande
and Chazal [71]:
kTµ − Tµ0 k2L2 (ρ) 6 C W1 (µ, µ0 )1/15 .
Proof. As before, without loss of generality, we assume that spt(σ) lies in
the interior of X. Let (ϕk , ψk ) be solutions to (DP)k , which are c-conjugate
to each other, and such that ϕk (x0 ) = 0 for some x0 ∈ X. Then, by stability
of Kantorovich potentials, there exists a subsequence (ϕk , ψk ) (which we
do not relabel) which converges uniformly to (ϕ, ψ). Moreover, (ϕ, ψ) are
Kantorovich potentials for (DP), and are also c-conjugate to each other.
Since ϕ, ϕk ∈ Lip(X) are differentiable almost everywhere, there exists a
subset Z ⊆ spt(σ) with µ(Z) = 1 and such that for all x ∈ Z, ∇ϕk exists
for all k and ∇ϕ exists. Let x ∈ Z. Using
ϕk (x) − ψk (Tk (x)) = c(x, Tk (x)),
we get that for any cluster point y of the sequence (Tk (x))k ,
(
ϕ(x) − ψ(y) = c(x, y),
,
ϕ(x0 ) − ψ(y) 6 c(x0 , y) ∀x0 ∈ X
where the second inequality is obtained using ϕk ψk 6 c. Thus, as in the
proof of Brenier-McCann-Gangbo’s theorem, x is a minimizer of c(·, y) −
ϕ, i.e. ∇x c(x, y) = ∇ϕ(x), implying that y = yc (x, ∇ϕ(x)) = T (x). By
compactness, this shows that the whole sequence (Tk (x))k converges to S(x).
Therefore, Tk converges σ-almost everywhere to T , and L1 (σ) convergence
follows easily.
2.3. Kantorovich’s functional. As already mentioned in Equation (2.20),
the Kantorovich’s dual problem (DP) can be expressed as an unconstrained
maximization problem:
Z Z
c
(DP) = max ψ dµ − ψdν.
ψ∈C 0 (Y ) X Y
This motivates the definition of Kantorovich’s functional as follows
Definition 9. The Kantorovitch functional is defined on C 0 (Y ) by
Z Z
c
K(ψ) = ψ dµ − ψdν. (2.24)
X Y
The Kantorovitch dual problem therefore amounts to maximizing the Kan-
torovitch functional:
(DP) = max K(ψ).
ψ∈C 0 (Y )
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 17
where we used ΠX# γ = µ to get the second equality and ψ c (x) 6 c(x, y) +
ψ(y) to get the inequality. Note also that equality holds if ψ = ψ0 , by
assumption on the support of γ. Hence,
Z Z
K(ψ) 6 K(ψ0 ) + (ψ(y) − ψ0 (y))dγ(x, y) − (ψ − ψ0 )dν
= K(ψ0 ) + hΠY # γ − ν|ψ − ψ0 i.
This implies by definition that ΠY # γ−ν lies in the superdifferential ∂ + K(ψ0 ),
giving us the inclusion
D(ψ0 ) := {ΠY # γ − ν | γ ∈ Γψ0 (µ)} ⊆ ∂ + K(ψ0 ).
Note also that the superdifferential of K is non-empty at any ψ0 ∈ RY , so
that K is concave. As a concave function on the finite-dimensional space
RY , K is differentiable almost everywhere and one has ∂K+ (ψ) = {∇K(ψ)}
at differentiability points.
We now show that ∂K+ (ψ0 ) ⊂ D(ψ0 ), using the characterization of the
subdifferential recalled in the Appendix:
n o
∂K+ (ψ0 ) = conv lim ∇K(ψ n ) | (ψ n )n∈N ∈ S ,
n→∞
where S is the set of sequences (ψ n )n∈N that converge to ψ0 , such that
∇K(ψ n ) exist and admit a limit as n → +∞. Let v = limn→∞ ∇K(ψ n ) ,
where (ψ n )n∈N belongs to the set S. For every n, there exists γ n ∈ Γψn (µ)
such that ∇K(ψ n ) = v n := ΠY # γ n − ν. By compactness of P(X × Y ), one
can assume (taking a subsequence if necessary) that γ n weakly converges to
some γ, and it is not difficult to check that γ ∈ Γψ0 (µ), ensuring that the
sequence v n converges to some v ∈ D(ψ0 ). Thus,
n o
lim ∇K(ψ n ) | (ψ n )n∈N ∈ S ⊆ D(ψ0 ).
n→∞
18 QUENTIN MÉRIGOT AND BORIS THIBERT
Taking the convex hull and using the convexity of D(ψ0 ), we get ∂ + K(ψ0 ) ⊆
D(ψ0 ) as desired.
As a corollary of this proposition, we obtain an explicit expression for the
left and right partial deriatives of K, and a characterization of its differen-
tiability. In this corollary, we use the terminology of semi-discrete optimal
transport (Section 4.1), and we will refer to the c-subdifferential at y ∈ Y as
Laguerre cell associated to y and we will denote it by Lagy (ψ).
Lagy (ψ) := {x ∈ X | ∀z ∈ Y, c(x, y) + ψ(y) 6 c(x, z) + ψ(z)}. (2.27)
We also need to introduce the strict Laguerre cell SLagy (ψ):
SLagy (ψ) := {x ∈ X | ∀z ∈ Y, c(x, y) + ψ(y) < c(x, z) + ψ(z)}. (2.28)
Proof. Using Hahn-Banach’s extension theorem, one can easily see that the
super-differential of κ at t is the projection of the super-differential K at ψ t :
∂ + κ(t) = hπ|1y i | π ∈ ∂ + K(ψ t ) .
We only prove the first equality, the second one being similar. Denote Z =
X \ Lagy (ψ t ), so that for any γ ∈ Γψt (µ),
γ(X × {y}) = γ(Lagy (ψ t ) × {y}) + γ(Z × {y}).
Moreover, by definition of Lagy (ψ t ), Z × {y} ∩ ∂ c ψ t = ∅. Since γ belongs to
Γψt (µ), we have spt(γ) ⊆ ∂ c ψ t so that γ(Z × {y}) = 0. This gives us
γ(X × {y}) = γ(Lagy (ψ t ) × {y}) 6 γ(Lagy (ψ t ) × Y ) = µ(Lagy (ψ t )),
where we used ΠX# γ = µ to get the last equality. This proves that
sup γ(X × {y}) | γ ∈ Γψt (µ) 6 µ(Lagy (ψ t )).
Primal and dual problems. We consider in this section that the two sets
X and P Y are finite, and we P consider two discrete probability measures
µ = x∈X µ x δx and ν = y∈Y νy δy . This setting occurs frequently in
applications. The set of transport plans is then given by
X X X
Γ(µ, ν) = γ = γx,y δ(x,y) | γx,y > 0, γx,y = µx , γx,y = νy ,
x,y y∈Y x∈X
As seen in Section 2.2, the dual (DP) of this linear programming problem
amounts to maximizing the Kantorovitch functional K (2.24), which in this
setting can be expressed as
X X
K(ψ) = min(c(x, y) + ψ(y))µx − ψ(y)νy , (3.30)
y∈Y
x∈X y∈Y
where ψ ∈ RY is a function over the finite set Y . Since strong duality holds
(Theorem 5), one has
(KP) = (DP) = max K(ψ).
ψ∈RY
20 QUENTIN MÉRIGOT AND BORIS THIBERT
Assignment problem. When the two sets X and Y have the same cardinal N
and when µ and ν are uniform probability measures over these sets, namely
1 X 1 X
µ= δx , ν = δy , (3.31)
N N
x∈X y∈Y
This problem and its variants have generated a very important amount
of research, as demonstrated by the bibliography of the book by Burkard,
Dell’Amico and Martello on this topic [21].
Note that the set of bijections from X to Y has cardinal N !, making
it practically impossible to solve (AP) through direct enumeration. Using
Birkhoff’s theorem on bistochastic matrices, we will show that the assign-
ment problem coincides with the Kantorovitch problem.
Definition 10 (Bistochastic matrices). A N -by-N bistochastic matrix is a
square matrix M ∈ MN (R) with non-negative coefficients such that the sum
of any row and any column equals one:
X X
∀i ∈ {1, . . . , N }, Mij = 1, ∀j ∈ {1, . . . , N }, Mij = 1
j i
Proof. Denote ψ t = ψ + t1y0 . For t > 0 one has Lagy0 (ψ t ) ⊆ Lagy0 (ψ).
Remark also that for every x ∈ X, one has
x ∈ Lagy0 (ψ t ) ⇔ ∀z 6= y0 c(x, y0 ) + ψ(y0 ) + t6 c(x, z) + ψ(z)
⇔ t 6 minz∈Y \y0 c(x, z) + ψ(z) − (c(x, y0 ) + ψ(y0 ))
⇔ t 6 bidy0 (ψ, x)
This implies that Lagy0 (ψ t ) 6= ∅ if and only if t 6 bidy0 (ψ). By Corollary 16,
the upper-bound of the superdifferential ∂ + κ(t) of the function κ(t) = K(ψ +
t1y0 ) is µ(Lagy0 (ψ t )) − N1 . It is non-negative for t ∈ [0, bidy0 (ψ)] and strictly
negative for t > bidy0 (ψ). This directly implies (for instance by (5.76)) that
0 ∈ ∂ + κ(bidy0 (ψ)), so that the largest maximizer of κ is bidy0 (ψ).
Remark 6 (Economic interpretation of the bidding increment.). Assume that
Y is a set of houses owned by one seller and X is a set of customers that
want to buy a house. Given a set of prices ψ : Y → R, each customer x ∈ X
will make a compromise between the location of a house y ∈ Y (measured by
c(x, y)) and its price (measured by ψ(y)) by choosing a house among those
minimizing c(x, y) + ψ(y). In other words, x chooses y iff x ∈ Lagy (ψ). Let
y be a given house. The seller of y wants to maximize his profit, hence to
increase ψ(y) as much as possible while keeping (at least) one customer. Let
x ∈ X be a customer interested in the house y (i.e. x ∈ Lagy (ψ)). Then,
bidy,x (ψ) tells us how much it is possible to increase the price of y while
keeping it interesting to x. The best choice for the seller is to increase the
price by the maximum bid, which is the maximum raise so that there remains
at least one customer, giving the definition of bidy (ψ).
Remark 7 (Naive coordinate ascent). A naive algorithm would be to would
choose at each step a coordinate y ∈ Y such that Lagy (ψ) 6= ∅ and to increase
ψ(y) by the bidding increment bidy (ψ). In practice, such an algorithm might
get stuck at a point which is not a global maximizer, a phenomenon which is
referred to as jamming in [17, §2]. In practice, this can happen when some
bidding increments bidy (ψ) vanishes, see Remark 8 below. Note that this is a
particular case of the well known fact that coordinate ascent algorithms may
converge to points that are not maximizers, when the maximized functional
is nonsmooth.
In order to tackle the problem of non-convergence of coordinate ascent,
Bertsekas and Eckstein changed the naive algorithm outlined above to impose
that the bids are at least ε > 0. To analyse their algorithm, we introduce
the notion of ε-complementary slackness, where ε can be seen as a tolerance.
Definition 12 (ε-Complementary slackness.). A partial assignment is a cou-
ple (σ, S) where S ⊆ X and σ : S → Y is an injective map. A partial assign-
ment (σ, S) and a price function ψ ∈ RY satisfy ε-complementary slackness
if for every x in S the following inequality holds:
c(x, σ(x)) + ψ(σ(x)) 6 min [c(x, y) + ψ(y)] + ε. (CSε )
y∈Y
Proof. The first inequality just comes from the fact σ is a particular transport
plan. For the second inequality, by summing the (CSε ) condition, one gets
1 X 1 X
c(x, σ(x)) + ψ(σ(x)) 6 min(c(x, y) + ψ(y)) + ε.
N N y∈Y
x∈X x∈X
This leads to
1 X
c(x, σ(x)) 6 K(ψ) + ε 6 (DP) + ε = (KP) + ε.
N
x∈X
Remark 10. Note that the computational complexity of this algorithm is very
bad. Indeed, if ε = 10−k and if C = 1, the number of steps in the worst-case
complexity is 10k N 2 . It would be highly desirable to replace the factor 1/ε
by log(1/ε). In the next paragraph, we see how this can be achieved using a
scaling technique.
The proof of Theorem 22 relies on the following lemma, whose proof is
straightforward.
Lemma 23. Over the course of the auction algorithm,
(i) the set of selected “houses” σ(S) is increasing w.r.t inclusion;
(ii) (ψ, σ) always satisfy the ε-complementary slackness condition ;
(iii) the price increments are by at least ε.
Proof of theorem 22. Suppose that after i steps the algorithm hasn’t stopped.
Then, there exists a point y0 in Y that does not belong to σ(S), i.e whose
price hasn’t increased since the beginning of the algorithm, i.e. ψ(y0 ) = 0.
Suppose now that there exists y1 whose price has been raised more than
n > C/ + 1. Then, by Lemma 23.(iii), one has for every x ∈ X
ψi (y0 ) + c(x, y0 ) = c(x, y0 ) 6 C < n − 6 ψi (y1 ) − 6 ψi (y1 ) + c(x, y1 ) −
This contradicts the fact that y1 was chosen at a former step. From this, we
deduce that there is no point in Y whose price has been raised n times with
n > C/ + 1. With at most C/ε + 1 price rise for each of the N objects,
and every step costing N (finding the minimum among N ) we deduce the
desired bound.
Auction algorithm with ε-scaling. Following [43], Bertsekas and Eckstein [17]
modified Algorithm 1 using a scaling technique which improves dramatically
both the running time and worst-case complexity of the algorithm. Note that
similar scaling techniques have also been applied to improve other algorithms
for the assignment problem, see e.g. [54, 47].
The modified algorithm can be described as follows: define ψ0 = 0 and re-
cursively, let ψk+1 be the prices returned by Auction(ψk , εk ), where εk = 2Ck .
One stops when εk < ε, so that the number of runs of the unscaled auction
algorithm is bounded by log2 (C/ε). Bounding carefully the complexity of
each auction run, one gets:
Theorem 24. The auction algorithm with scaling constructs an η-optimal
assignement in time O(N 3 log(C/η)).
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 25
∀y ∈ Y, ψ(y) 6 ψ0 (y) + N (λ + ε)
c(xk , yk )+ψ(yk ) 6 min c(xk , y)+ψ(y)+ε 6 ψ(yk+1 )+c(xk , yk+1 )+ε. (3.34)
y∈Y
By assumption, the point yK does not belong to σ(S) and ψ(yK ) = ψ0 (yK ).
This gives us ψ(y0 ) 6 ψ0 (y0 ) + K(λ + ε), and we conclude by remarking that
the path (y0 , x0 , . . . , yK ) is simple, i.e. K 6 N .
Proof of Theorem 24. Lemma 25 implies that during the run k + 1 of the
(unscaled) auction algorithm, the price vector never grows larger than ψ0 +
(ε + λ)N = ψ0 + 3εN , with λ := εk and ε := 12 λ. Since at each step,
the price grows by at least ε, there are at most 3N 2 steps in the run k.
Taking into account the cost of finding miny∈Y c(x, y) + ψ(y) at each step,
the computational complexity of each auction run is therefore O(N 3 ). Since
the the number of runs is O(log(C/η)), we get the claimed estimate.
Exploiting the geometry of the cost. The first idea is to exploit the geom-
etry of the space in order to reduce the cost of finding the minimum of
c(x, y) + ψ(y), y ∈ Y , which accounts for a cost of N in the complexity
analysis of Theorem 24. The computation of this minimimum is similar to
the nearest neighbor problem in computational geometry, and nearest neigh-
bors can sometimes be found in log(N ) time, after some preprocessing. For
instance, in the case of c(x, y) = kx − yk2 on Rd , and for ψ > 0 one can
rewrite
2
c(x, y) + ψ(y) = kx − yk2 + ( ψ(y) − 0)2 =
(x, 0) − (y, ψ(y))
,
p
p
thus showing that finding the smallest value of c(x, py)+ψ(y) over Y amounts
to finding the closest point to (x, 0) in the set {(y, ψ(y)) | y ∈ Y } ⊆ Rd+1 .
This idea and variants thereof leads to practical and theoretical improve-
ments, both for auction’s algorithm and for other algorithms for the assign-
ment problem. We refer to [63, 1] and references therein.
Exploiting the graph structure of solutions. When the cost satisfies the Twist
conditition (2.23) on Rd and the source measure is absolutely continuous,
Theorem 12 guarantees that the solution to the Kantorovich’s problem is
concentrated on a graph, i.e. dim(spt(γ)) = d while a priori, the dimension
of spt(γ) ⊆ R2d could be as high as 2d. It is natural, in view of the stability of
the optimal transport plans (Theorem 7), to hope that this feature remains
true at the discrete level, meaning that one expects that the support of the
discrete solution concentrates on a lower dimensional graph G. One could
then try to use this phenomenom to prune the search space, i.e. taking the
minimum in c(x, y) + ψ(y) not over the whole space but over points y such
that (x, y) lie “close” to G. In practice, G is unknown but can estimated
in a coarse-to-fine way. This idea or variants thereof has been used as a
heuristic in several works [70, 78, 10], and has been analyzed more precisely
by Bernhard Schmitzer [89, 90].
For simplicity, we assume throughout that all the points in X and Y carry
some mass, that is min(minx∈X µx , miny∈Y νy ) > 0. As before, we conflate a
transport plan γ ∈ Γ(µ, ν) with its density (γx,y )(x,y)∈X×Y .
(minus) the entropy of the transport plan and acts as a barrier for the non-
negativity constraint:
X
H(γ) = h(γx,y ),
x∈X,y∈Y
t(log(t) − 1) if t > 0
(3.36)
where h(t) = 0 if t = 0
+∞ if t 6 0
Theorem 26. The problem (KPη ) has a unique solution γ, which belongs
to Γ(µ, ν). Moreover, if minx∈X µx > 0 and miny∈Y µy > 0, then
∀(x, y) ∈ X × Y, γx,y > 0.
Proof. From h00 (t) = 1/t, one sees that the Hessian D2 H(γ) is diagonal with
diagonal coefficients 1/γx,y > 1 since γx,y ∈]0, 1].
Proof. The regularized problem (KPη ) amounts to minimizing a continuous
and coercive function over a closed convex set, thus showing existence. Let
us denote by γ ∗ a solution of (KPη ). Then, γ ∗ has a finite entropy, so
∗ > 0. This implies that γ ∗ is a transport
that it satisfies the constraint γx,y
map between µ and ν. We now prove by contradiction that the set Z :=
∗ = 0} is empty. For this purpose, we define a new transport
{(x, y) | γx,y
map γ ∈ Γ(µ, ν) by γ ε = (1 − ε)γ ∗ + εµ ⊗ ν, and we give an upper bound on
ε
Dual formulation. We start by deriving (formally) the dual problem and first
introduce the Lagragian of (KPη )
X X X
L(γ, ϕ, ψ) := γx,y c(x, y) + ηh(γx,y ) + ϕ(x) µx − γx,y
x,y x∈X y∈Y
X X
+ ψ(y) γx,y − νy ,
y∈Y y∈Y
(3.38)
where ϕ : X → R and ψ : Y → R are the Lagrange multipliers. Then,
(KPη ) = min sup L(γ, ϕ, ψ).
γ ϕ,ψ
As always, the dual problem is obtained by inverting the infimum and the
supremum. We also simplify slightly the expressions:
X
sup min L(γ, ϕ, ψ) = sup min γx,y (c(x, y) + ψ(y) − ϕ(x) + η(log(γx,y ) − 1))
ϕ,ψ γ ϕ,ψ γ
x,y
X X
+ ϕ(x)µx − ψ(y)νy . (3.39)
x∈X y∈Y
Taking the derivative with respect to γx,y , we find that for a given ϕ, ψ, the
optimal γ must satisfy:
c(x, y) + ψ(y) − ϕ(x) + η log(γx,y ) = 0
1
(ϕ(x)−ψ(y)−c(x,y))
(3.40)
i.e. γx,y = e η
Putting these values in the Equation (3.39) gives the following definition:
Definition 13 (Dual regularized problem). The dual of the regularized op-
timal transport problem is defined by
(DPη ) = sup Kη (ϕ, ψ) (3.41)
ϕ,ψ
where
1
(ϕ(x)−ψ(y)−c(x,y))
X X X
Kη (ϕ, ψ) := − ηe η + ϕ(x)µx − ψ(y)νy .
(x,y)∈X×Y x∈X y∈Y
(3.42)
We can now state the strong duality result
Theorem 28 (Strong duality). Strong duality holds and the maximum in
the dual problem is reached, i.e. there exist ϕ ∈ RX an ψ ∈ RY such that
(KPη ) = (DPη ) = Kη (ϕ, ψ).
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 29
Corollary 29. If ϕ, ψ is the solution to the dual problem (DPη ), then the
solution γ of (KPη ) is given by
ϕ(x)−ψ(y)−c(x,y)
γx,y = e η .
Corollary 29 is a direct consequence of the relation (3.40). This holds be-
cause, unlike the original linear programming formulation of optimal trans-
port, the regularized problem (KPη ) is smooth and strictly convex.
Proof of Theorem 28. Weak duality (KPη ) > (DPη ) always hold. To prove
the strong duality, we denote by γ ∗ the solution to (KPη ), and we note
that by Theorem 26, γxy ∗ > 0 for all (x, y) ∈ X × Y . This implies that the
The last equality follows from the fact that γ ∗ satisfies the constraints and
is a solution to (KPη ). Thus (DPη ) = (KPη ).
Proof. To prove (i), consider ϕ ∈ RX the maximizer of Kη (·, ψ). Taking the
derivative of Kη with respect to the variable ϕ(x) gives us
ϕ(x) X
− 1 (ψ(y)+c(x,y))
µx = e η e η ,
y∈Y
Remark 12. Note the similarity between the formula for Kantorovich func-
tional derived from regularized transport (3.45) and the formula for the Kan-
torovich functional without regularization (2.24). Note also that Kη is also
η η
P of a constant, namely K (ψ + λ1Y ) = K (ψ) for any
invariant by addition
λ ∈ R and 1Y = y∈Y 1y the constant function equal to one.
Unlike the standard Laguerre cell Lagy (ψ) defined in (2.27), which is a set,
RLagηy (ψ) is a function. The family (RLagηy (ψ))y∈Y is a partition of unity,
meaning that the sum over y of RLagηy (ψ) equals one. One can loosely
think of the regularized Laguerre cells as smoothed indicator functions of
the (standard) Laguerre cells. In particular,
(
η 0 if x 6∈ Lagy (ψ)
lim RLagy (ψ)(x) =
η→0 1 if x ∈ SLagy (ψ),
where SLagy (ψ) is the strict Laguerre cell introduced in (2.28). We also
introduce the two quantities
Informally, Gηy (ψ) measures the quantity of mass of µ within the regularized
Laguerre cell RLagηy (ψ).
32 QUENTIN MÉRIGOT AND BORIS THIBERT
Theorem 32.
• The regularized Kantorovitch functional Kη is C ∞ , concave, with first and
second-order partial derivatives given by
∂Kη
∀y ∈ Y, (ψ) = Gηy (ψ) − νy ,
∂1y
∂ 2 Kη
∀y 6= z ∈ Y, (ψ) = Gηyz (ψ).
∂1z ∂1y
• The function Kη is strictly concave on the orthogonal of the set of constant
functions. More precisely, for every ψ ∈ RY one has
X
∀v ∈ RY s.t. v(y) = 0, D2 Kη (ψ)(v, v) < 0.
y∈Y
The relation
X ∂Kη
(ψ) = 1
∂1y
y∈Y
gives the desired formula for the second order derivatives when z = y. The
hessian of Kn is therefore symmetric with dominant diagonal, with negative
diagonal coefficients. This implies that the Hessian is negative, hence that
Kη is concave. Let us now show that ker H = R1Y , where H = D2 Kη (ψ).
Consider v ∈ ker H and let y0 ∈ Y be the point where v attains its maximum.
Then using Hv = 0, and in particular (Hv)(y0 ) = 0, one has
X
0= Hy,y0 v(y) + Hy0 ,y0 v(y0 )
y6=y0
X
= Hy,y0 (v(y) − v(y0 )).
y6=y0
P
This follows from Hy0 ,y0 = − y6=y0 Hy,y0 . Since for every y 6= y0 , one has
Hy,y0 > 0 and v(y0 ) − v(y) > 0, this implies that v(y) = v(y0 ). Therefore
ker H ⊆ R1Y . The reverse inclusion is obvious and therefore Kη is strictly
concave on the orthogonal of the set of constant functions.
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 33
To prove the last claim we note that if ψ maximizes Kη (·), then (ψ c,η , ψ)
maximizes Kη (·, ·). By Corollary 29, the optimal transport map γ is
ψ(y)+c(x,y)
−
ϕ(x)−ψ(y)−c(x,y) e η µx
γx,y = e η =P ψ(z)+c(x,z)
= RLagηy (ψ)(x)µx .
−
z∈Y e η
The book of Cuturi and Peyré present these difficulties in more details and
explain how to circumvent them [83]. In addition to the works already
cited, we refer to the PhD work of Feydy [29, 45], and especially to the
implementation of regularized optimal transport in the library GeomLoss2.
In order to prove this theorem, we will make use of the following elemen-
tary lemma, giving an upper bound on the L1 distance between two Gibbs
kernels eui /Zi for i ∈ {0, 1} as a function of ku1 − u0 ko,∞ .
Lemma 36. Let u0 , u1 be two functions on Y and denote gi = eui /Zi where
Zi = y∈Y eui (y) . Then,
P
X
|g1 (y) − g0 (y)| 6 2(1 − e−2ku0 −u1 ko,∞ ).
y∈Y
Proof. Note that by definition the Gibbs kernel gi does not change if a con-
stant is added to ui , so that we can assume that
ε := ku0 − u1 ko,∞ = ku0 − u1 k∞ .
Using the inequality u0 − ε 6 u1 6 u0 + ε, one easily shows that
eu0 u1 eu0
e−2ε 6 6 e2ε ,
Z0 Z1 Z0
thus implying e−2ε g0 6 g1 6 e2ε g0 . This gives
(
(e−2ε − 1)g0 6 g1 − g0
(e−2ε − 1)g1 6 g0 − g1 ,
thus implying
|g1 − g0 | 6 (1 − e−2ε ) max(g0 , g1 ) 6 (1 − e−2ε )(g0 + g1 ).
P
Summing this inequality over Y and using Y gi = 1, we obtain the desired
inequality.
Proof of Theorem 35. Consider ψ0 , ψ1 ∈ RY and ψt = ψ0 + tv with v =
ψ1 − ψ0 . Without loss of generality, we assume that the functions ψ0 , ψ1
are translated by a constant so that kψ0 − ψ1 k∞ = kψ0 − ψ1 ko,∞ . We will
first give an upper bound on kψ1c,η − ψ0c,η ko,∞ , and to do that we will give
an upper bound on
A(x, x0 ) = (ψ1c,η (x) − ψ0c,η (x)) − (ψ1c,η (x0 ) − ψ0c,η (x0 ))
which is independent of x, x0 ∈ X. For this purpose, we introduce
X 1 (−c(x,y)−ψ (y)) X 1 (−c(x0 ,y)−ψ (y))
B(t, x, x0 ) = −η log eη t + η log eη t ,
y∈Y y∈Y
and
1
(−c(x,y)−ψt (y))
eη
gx,t (y) = P 1 .
(−c(x,z)−ψt (z))
z∈Y eη
2https://www.kernel-operations.io/geomloss/
36 QUENTIN MÉRIGOT AND BORIS THIBERT
Then, by the previous lemma (Lemma 36) and setting ux,t (y) = − η1 (c(x, y)+
ψt (y)), so that gx,t = eux,t /Zx,t with Zx,t = Y gx,t , we obtain
P
Z 1
−2kux,t −ux0 ,t ko,∞
A(x, x0 ) 6 2 kvk∞ 1−e dt
0
−2kux,t −ux0 ,t ko,∞
6 2 kψ1 − ψ0 ko,∞ (1 − e )
In addition,
kcko,∞
ux,t − ux0 ,t
6 .
o,∞ η
We therefore obtain
1
kψ1c,η − ψ0c,η ko,∞ 6 sup A(x, x0 )
2 x,x0 ∈X
kck
−2 ηo,∞
6 kψ1 − ψ0 ko,∞ 1 − e .
X
Y
Remark 17. For the quadratic cost c(x, y) = kx − yk2 , one has
c(x, y) + ψ(y) 6 c(x, z) + ψ(z)
1
⇐⇒ hx|z − yi 6 (ψ(z) + kzk2 − (ψ(y) − kyk2 )),
2
which easily implies that the Laguerre cells are convex polyhedra intersected
with the domain ΩX . Introducing ψ̃(z) = 21 (ψ(z) + kzk2 ), one has
Lagy (ψ) = {x ∈ ΩX | ∀z ∈ Y, hx|z − yi 6 ψ̃(z) − ψ̃(y)}.
As a direct consequence, the intersection of two distinct Laguerre cells is con-
tained in an hyperplane and is therefore Lebesgue negligible. If in addition
ψ ≡ 0, then the Laguerre tessellation coincides with the Voronoi tessellation.
The shape of the Voronoi and Laguerre tessellations is depicted in Figure 1.
The following proposition shows that Laguerre tessellations can be used
to build optimal transport maps.
Proposition 37. Under the twist condition (Def. 8), the intersection of two
distinct Laguerre cells Lagy (ψ) ∩ Lagz (ψ) (y 6= z) is Lebesgue-negligible, and
the map
Tψ : x ∈ ΩX 7→ arg min c(x, y) + ψ(y)
y∈Y
We will often use the notation R, which measures the oscillation of the cost
function:
R := max c − min c, (4.56)
X×Y X×Y
0, implying in particular that the Laguerre cell Lagy0 (ψ) is non-empty and
contains a point x ∈ X. Then, by definition of the cell one has for all
y ∈ Y \ {y0 }, c(x, y0 ) + ψ(y0 ) 6 c(x, y) + ψ(y), thus showing that ψ(y0 ) 6
minY ψ + R. Point (v) is a consequence of Point (vi).
It remains to establish that each of the maps Gy is continuous. For this
purpose, we consider a sequence (ψn )n∈N ∈ RY converging to some ψ∞ ∈ RY .
We first note that as in the proof of Proposition 37, the set
S = {x ∈ X | ∃y 6= z ∈ Y s.t. c(x, y) + ψ(y) = c(x, y) + ψ(z)}.
is Lebesgue-negligible and therefore also ρ-negligible. Defining χ = 1Lagy (ψ)
and χn = 1Lagy (ψn ) ,
Z Z
Gy (ψn ) = χn dρ, and G(ψ) = χdρ.
Proof. Fix some y0 ∈ Y such that νy0 6= 0, and consider the set
K = {ψ ∈ RY | ψ(y0 ) = 0 and ∀y ∈ Y \ {y0 }, Gy0 (ψ) 6 νy and ψ(y) 6 R},
where R is defined as in (4.56). Given ψ ∈ K, one has
X
Gy0 (ψ) = 1 − Gy (ψ) > νy0 > 0,
y6=y0
implying by (v) that miny∈Y ψ > ψ(y0 ) − R. The set K is therefore bounded
and closed (by continuity of the functions Gy ) and therefore compact. We
∗
P
consider ψ a minimizer over the set K of the function J(ψ) = y∈Y ψy .
Assume that Gy (ψ ∗ ) < νy for some y ∈ Y \ {y0 }. Then, by continuity of Gy ,
there exists some t > 0 such that Gy (ψ ∗ − t1y ) < νy . Then, by property (ii),
we have
∀z 6= y, Gz (ψ ∗ − t1y ) 6 Gz (ψ ∗ ) 6 νz ,
thus showing that ψ ∗ − t1y ∈ K. Since J(ψ ∗ − t1y ) = J(ψ ∗ ) − t < J(ψ ∗ ), we
get a contradiction. We thus have showed that ∀y ∈ Y \ {y0 }, Gy (ψ ∗ ) = νy ,
and using (iv) and ν ∈ P(Y ) we obtain
X X
Gy0 (ψ ∗ ) = 1 − Gy (ψ ∗ ) = 1 − νy = νy0 ,
y∈Y \{y0 } y∈Y \{y0 }
The price of bread at y is then decreased so that the amount of bread sold
equals the production capacity of y, i.e. one finds ty > 0 such that
Gy (ψ (k) − ty 1y ) = νy
and then updates ψ (k+1) = ψ (k) − ty 1y .
Remark 21 (Origin and extensions). This algorithm was introduced by Oliker
and Prussner, for the purpose of solving Monge-Ampère equations with
Dirichlet boundary conditions in [79]. In the context of optimal transport,
the first use of Algorithm 3 seems to be in an article of Caffarelli, Kochengin
and Oliker [26] (see also [24]), in the setting of the reflector problem, namely
c(x, y) = − log(1 − hx|yi) on X = Y = S d−1 . Since then, the convergence of
this algorithm has been generalized to more other costs and/or more general
assumptions on the probability density ρ, we refer the reader to [64, 42] and
to references therein.
C 1,1 estimates for Kantorovich functional. The proof of convergence of Oliker-
Prussner’s algorithm relies on the Lipschitz regularity of the map G when ρ
is bounded, proven in the next proposition. (Since ∇K = G − ν, this propo-
sition also implies that Kantorovich’s functional K has Lipschitz gradient,
improving from the C 1 estimate of Theorem 40.)
Proposition 41. Assume that c ∈ C 2 (ΩX ×ΩY ) satisfies the twist condition,
and assume also that ρ ∈ P ac (X) ∩ L∞ (X). Then for every y ∈ Y , the map
Gy : RY → R defined in (4.54) is globally Lipschitz.
Remark 22. The proof of this proposition comes with an estimation of the
Lipschitz constant: namely it shows |Gy (ψ) − Gy (ϕ)| 6 LG kϕ − ψk∞ with
1 M
LG = c(d)N kρk∞ 1+ diam(X) diam(X)d−1 ,
κ κ
κ = min min k∇x c(·, y) − ∇x c(·, z)k , (4.60)
y6=z∈Y X
In particular, for t > 0, setting cyz = c(·, y) − c(·, z) and a = ψ(z) − ψ(y),
This establishes (4.61) in the case t > 0, and the case t 6 0 can be treated
similarly.
Si = {x ∈ S | ∇f (x) ∈ Vi }.
We will now estimate the volume of each patch Si using the coarea formula
recalled in Theorem 56 of the appendix. To apply this formula, we consider
Πi : Si ⊆ Rd → {ui }⊥ the orthogonal projection onto the hyperplane Hi =
{ui }⊥ . We need to estimate the Jacobian JΠi (x) (see (5.77)). Since Πi is
linear, we have DΠi = Πi . Moreover, for any tangent vector v at x ∈ Si , one
46 QUENTIN MÉRIGOT AND BORIS THIBERT
We now give an upper bound on Card(Si ∩ (y + Rui )). Let x ∈ Si and use
Taylor’s formula to get
M 2 3 M 2
f (x + tui ) > f (x) + th∇f (x)|ui i − t > κt − t
2 4 2
so that f (x + tui ) > 0 as long as t ∈ (0, t∗ ) with t∗ = 2M
3κ
. One has a similar
bound for negative t. This directly implies that the number of intersection
points between Si and y + Rui is at most 1 + diam(X)/t∗ . Since the number
n of directions ui only depends on the dimension d, we have
X
vold−1 (S) 6 vold−1 (Si )
16i6n
X
d−1 M
6 c(d) vol (Hi ∩ Πi (X)) 1 + diam(X)
κ
16n
M
6 c(d) 1 + diam(X) diam(X)d−1 .
κ
set c−1
yz (s):
M
vol d−1
(c−1
yz (s) ∩ X) 6 c(d) 1 + diam(X) diam(X)d−1 ,
κ
which yields
|Gy (ψ + t1z ) − Gy (ψ)| 6 L̂G |t| ,
(4.62)
c(d) M
with L̂G = 1+ diam(X) diam(X)d−1 kρk∞ .
κ κ
Take ψ, ψ̃ ∈ RY . Order the points in Y , i.e. let Y = {y1 , . . . , yN } and
define recursively
(
ψ0 = ψ
ψ k+1 = ψ k + (ψ̃(yk ) − ψ(yk ))1yk
Then, ψ N = ψ̃ and for k > 1, ψ k+1 and ψ k differ only by the value at yk
Thus, applying (4.62),
X
Gy (ψ̃) − Gy (ψ) = Gy (ψ k+1 ) − Gy (ψ k )
16k6N
X
6 L̂G ψ(yk ) − ψ̃(yk )
16k6N
6 LG
ψ − ψ̃
with LG = N L̂G
∞
Convergence of Oliker-Prussner’s algorithm. Now that we have established
the Lipschitz continuity of Gy , the convergence of Algorithm 3 follows eas-
ily, using arguments similar to those used to establish the convergence of
Auction’s algorithm.
Theorem 44 (Oliker-Prussner). Assume that the cost c ∈ C 2 (ΩX × ΩY ) is
twisted (Def. 8) and that ρ ∈ P ac (X) ∩ L∞ (X). Then,
• Oliker-Prussner’s algorithm converges in a finite number of steps k 6
CN 3 /δ, where C is a constant that depends on X, Y , ρ and c.
• Furthermore, at step k, one has
∀1 6 i 6 N, Gi (ψ (k) ) − νi 6 δ.
where Hyz is defined in Equation (4.51) and Hyzw (ψ∞ ) := Hyz (ψ∞ ) ∩
Hzw (ψ∞ ) which by assumption has zero (d − 1) Hausdorff measure. We
now prove that lim inf n→∞ (x)χLn χX (Fn (x)) > χL∞ ∩X on H \ S. If x 6∈
L∞ ∩ X, χL∞ ∩X (x) = 0 and there is nothing to prove. We therefore consider
x ∈ (L∞ ∩ X) \ S, meaning by definition of S that x belongs to the interior
int(X) and that
∀w ∈ Y \ {z, y}, c(x, y) + ψ∞ (y) < c(x, w) + ψ∞ (w).
Since Fn (x) converges to x, this implies that for n large enough one has
Fn (x) ∈ int(X) and
∀w ∈ Y \ {z, y}, c(Fn (x), y) + ψ∞ (y) < c(Fn (x), w) + ψ∞ (w).
By definition, this means that Fn (x) belongs to Lagyz (ψn ), and therefore
x ∈ Ln by definition of Ln . Thus
lim inf χLn (x)χX (Fn (x)) = 1 > χL∞ ∩X (x)
n→+∞
Proof of Theorem 45. Lemma 42 shows that for any distinct point y 6= z ∈ Y
and any ψ ∈ RY one has
Z t
Gy (ψ + t1z ) = Gy (ψ) + Gyz (ψ + s1z )ds.
0
52 QUENTIN MÉRIGOT AND BORIS THIBERT
Moreover, by Lemma 46, we know that the function Gyz is continuous. The
fundamental theorem of calculus implies that f : t 7→ Gy (ψ + t1z ) is differen-
∂G
tiable, and that f 0 (0) = ∂1zy (ψ) = Gyz (ψ). To compute the partial derivative
of Gy with respect to 1y , we note that by invariance of Gy under addition
of a constant, X
Gy (ψ + t1y ) = Gy (ψ − t1z )
z6=y
The right-hand side of this expression is differentiable with respect to t, so
that the left-hand side is also differentiable, and the chain rule gives
∂Gy X
(ψ) = − Gyz (ψ).
∂1y
z∈Y \{y}
By the twist hypothesis and the inverse function theorem, Hyy0 (ψ) is a (d−1)-
dimensional submanifold. In addition, ρ(x) > 0 because x belongs to Z. This
implies that
ρ(x0 )
Z
Hij = 0 0
dvold−1 (x0 )
Lagyi yj (ψ) k∇x c(x , yi ) − ∇x c(x , yj )k
ρ(x0 )
Z
> 0 0
dvold−1 (x0 ) > 0.
Hyi yj (ψ)∩B(x,r) k∇x c(x , yi ) − ∇x c(x , yj )k
1 = (1, . . . , 1) ∈ RN
(3) (Strict monotonicity) The matrix DG(ψ) is symmetric nonpositive,
and
∀ψ ∈ Sε , ∀v ∈ {1}⊥ \ {0}, hDG(ψ)v|vi < 0.
Then Algorithm 4 terminates in a finite number of steps. More precisely, the
iterates (ψ (k) ) of Algorithm 4 satisfy, for some τ ∗ > 0,
τ?
k+1 k
G(ψ ) − ν
6 1 −
G(ψ ) − ν
.
2
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 55
Proof.
Estimates. Let ν ∈ RN be such that i νi = 1. We assume that ψ (0) ∈
P
kG(ψ) − νk kG(ψ) − νk
6 kv(ψ)k 6 6 M. (4.66)
A a
In particular, the function F : (ψ, τ ) ∈ S × [0, 1] 7→ ψτ is continuous. Since
S × [0, 1] is compact, K := F (S × [0, 1]) is also compact. Then, by uniform
continuity of DG over K, we see that there exists an increasing function
ω such that limt→0 ω(t) = 0 and kDG(ψ) − DG(ψ 0 )k 6 ω(kψ − ψ 0 k) for all
ψ, ψ 0 ∈ K. Since G is of class C 1 , a Taylor expansion in τ gives
To establish the first inequality, we used that ψ and ψt belong to the com-
pact set K and for the second one that ω is increasing and that t ∈ [0, τ ].
Linear convergence. We first show the existence of τ1∗ > 0 such that for
all ψ ∈ S and τ ∈ (0, τ1∗ ), one has ψτ ∈ S. By definition of ε, for every
i ∈ {1, . . . , N } one has νi > 2 and Gi (ψ) > . Using (4.67) and (4.68), one
deduces a lower bound on Gi (ψτ ):
If we choose τ1∗ > 0 small enough so that M ω(τ1∗ M ) 6 ε, this implies that
ψτ ∈ S for all ψ ∈ S and τ ∈ [0, τ1∗ ].
We now prove that there exists τ2∗ > 0 such that for τ ∈ [0, τ2∗ ], one
has kG(ψτ ) − νk 6 (1 − τ /2) kG(ψ) − νk. From Equation (4.67), we have
G(ψτ ) − ν = (1 − τ )(G(ψ) − ν) + R(τ ), and it is therefore sufficient to prove
τ
kR(τ )k 6 kG(ψ) − νk .
2
With the upper bound on R(τ ) given in Equation (4.68) combined with the
two bounds on kv(ψ)k of Equation (4.66), this condition will hold provided
that τ is such that ω(τ M )/a 6 1/2.
These two bounds directly imply that the τ (k) chosen in Algorithm 4
always satisfy τ (k) > τ ∗ with τ ∗ = 12 min(τ1∗ , τ2∗ ), so that
τ∗
(k+1) (k+1)
G(ψ ) − ν 6 1 − G(ψ ) − ν
.
2
X Z
ρy = ρ and ρy = νy . (4.70)
y
Then, the entropy of γ, with respect to vold ⊗vol0 , is the sum of the entropies
of the ρy . This leads to the following definition.
X X Z
(KP)η = min hc|γi + η H(ρy ) | ρy ∈ L1 (X), s.t. ρy = ρ, ρy = νy .
y∈Y X
y∈Y
η
XZ −
c(x,y)+ψ(y)−ϕ(x)
(DP) = sup hϕ|ρi − hψ|νi − η e η dx. (4.71)
(ϕ,ψ)∈L1 (X)×RY y∈Y X
0
(DP)η = sup Kη (ψ) (4.72)
ψ∈RY
Z c(x,y)+ψ(y)
−
X
where Kη (ψ) := −η log e η ρ(x)dx − hψ|νi + ηH(ρ)
X y∈Y
c(·,y)+ψ(y)
−
e η
RLagηy (ψ) =P c(·,z)+ψ(z)
. (4.73)
−
z∈Y e η
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 59
We skip the proof of this theorem which follows closely the one of Theorem 32
in the discrete case.
Strong convergence of Kη to K. The next proposition show that for twisted
costs, Gη converges to G locally uniformly (i.e. Kη converges to K in C 1 ).
Its proof follows closely the proof of the Lipschitz estimate for Gη in Propo-
sition 41.
Proposition 53. Assume that c ∈ C 2 (ΩX × ΩY ) is twisted (Def 8), that
X ⊆ ΩX is compact and Y ⊆ ΩY is finite and that ρ ∈ P ac (X) ∩ L∞ (X).
Then:
(i) Gη converges pointwise to G as η → 0, i.e.
η→0
∀y ∈ Y, RLagηy (ψ) −−−−→ 1Lagy (ψ) .
L1 (X)
∂ 2 Kη 1
(ψ) = hRLagηy (ψ)RLagηz (ψ)|ρi.
∂1z ∂1y η
60 QUENTIN MÉRIGOT AND BORIS THIBERT
For getting such an upper bound, as in Proposition 41, we will apply the
co-area formula using the function
f (x) = c(x, z) + ψ(z) − (c(x, y) + ψ(y))
We note that
c(x,y)+ψ(y)+c(x,z)+ψ(z)
−
e η
RLagηy (ψ)(x)RLagηz (ψ)(x) = 2 .
c(x,w)+ψ(w)
P −
w∈Y e η
When f (x) > 0, we use the equality c(x, z) + ψ(z) = c(x, y) + ψ(y) + f (x)
to obtain the upper bound
2
c(x,y)+ψ(y) f (x)
− −
−
c(x,y)+ψ(y)+c(x,z)+ψ(z) e η e η
e η
−
f (x)
2 6 2 6 e η .
c(x,w)+ψ(w) c(x,w)+ψ(w)
P − P −
w∈Y e η
w∈Y e η
A similar upper bound holds Since the diagonal elements, thus ensuring
that Gη is L-Lipschitz with L independent on η. (iii) follows at once from
pointwise convergence and the uniform Lipschitz estimate.
To finish this section, we show that under the genericity assumption in-
troduced in Section 4.3, the Hessian of Kantorovich’s regularized functional
D2 Kη converges pointwise to D2 K as η converges to 0.
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 61
Proof. We let f (x) = c(x, z) + ψ(z) − (c(x, y) + ψ(y)) and > 0. From the
proof of Proposition 53, one has for every x ∈ X \ f −1 ([−ε, ε]) that
− η
RLagηy (ψ)(x)RLagηz (ψ)(x) 6 e .
This implies that
Z
lim RLagηy (ψ)(x)RLagηz (ψ)(x)ρ(x)dx = 0.
η→0 X\f −1 ([−ε,ε])
where
Z
1
Gη,ε
yz (ψ) := RLagηy (ψ)(x)RLagηz (ψ)(x)ρ(x)dx.
η X∩f −1 ([−ε,ε])
Remark that
c(Φ(t,x),y)+ψ(y)+c(Φ(t,x),z)+ψ(z)
−
e η
RLagηy (ψ)(Φ(t, x))RLagηz (ψ)(Φ(t, x)) = 2
c(Φ(t,x),z)+ψ(z)
P −
z∈Y e η
|f (Φ(t,x))|
−
= χη (t, x)e η
|t|
−
= χη (t, x)e η ,
where we put
c(Φ(t,x),y)+ψ(y) c(Φ(t,x),z)+ψ(z)
−2 min ,
e η η
χη (t, x) := 2 .
c(Φ(t,x),z)+ψ(z)
P −
z∈Y e η
62 QUENTIN MÉRIGOT AND BORIS THIBERT
χη (t, x) = 2
c(Φ(t,x),w)+ψ(w)
P −
w∈Y e η
!−2
c(Φ(t,x),w)+ψ(w)−(c(Φ(t,x),y)+ψ(y))
−
X
= e η
w∈Y
−2
−t
= 1 + e η + rη (t, x)
with
− η1 (c(Φ(t,x),w)+ψ(w)−(c(Φ(t,x),y)+ψ(y))
X
rη (t, x) = e .
w∈Y \{y,z}
Now, by assumption on the point x, for any w 6∈ {y, z}, one has c(x, y) +
ψ(y) < c(x, w) + ψ(w) so that rη (t, x) is negligible A similar computation
can be done for t 6 0, giving us the estimation
|t|
Z ε −
e η η→0
gη (x) ∼ |t|
ρ(Φ(t, x))JΦ (t, x)dt −−−→ ρ(x)JΦ (0, x).
η→0 −ε −
η(1 + e η )2
On the other hand, one can show that for almost every x in M but not in
Lagyz (ψ), |χη (t, x)| tends to zero when η goes to zero, thus implying that
the sequence (gη (x)) also converges to 0. In other words,
η→0 ρ(x)
gη (x) −−−→ ρ(x)JΦ (0, x)1Lagyz (ψ) (x) = 1 (x).
a.e. k∇f (x)k Lagyz (ψ)
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 63
From the relation k∇f (x)k = k∇x c(x, y) − ∇x c(x, z)k we get as desired,
5. Appendix
5.1. Convex analysis. We recall a few relevant definitions and facts from
convex analysis (adapted to concave functions).
∀x ∈ RN , ∂ + F (x) 6= ∅.
In particular, for n = 1, one has JΦ(x) = k∇Φ1 (x)k, and for n = 2 one gets
References
1. Pankaj K Agarwal and R Sharathkumar, Approximation algorithms for bipartite
matching with metric and geometric costs, Proceedings of the forty-sixth annual ACM
symposium on Theory of computing, ACM, 2014, pp. 555–564.
2. Martial Agueh and Guillaume Carlier, Barycenters in the wasserstein space, SIAM
Journal on Mathematical Analysis 43 (2011), no. 2, 904–924.
3. Jason Altschuler, Jonathan Weed, and Philippe Rigollet, Near-linear time approxi-
mation algorithms for optimal transport via sinkhorn iteration, Advances in Neural
Information Processing Systems, 2017, pp. 1964–1974.
4. Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Gradient flows: in metric spaces
and in the space of probability measures, Springer Science & Business Media, 2008.
5. Luigi Ambrosio, Federico Glaudo, and Dario Trevisan, On the optimal map in the
2-dimensional random matching problem, arXiv preprint arXiv:1903.12153, 2019.
6. Franz Aurenhammer, Friedrich Hoffmann, and Boris Aronov, Minkowski-type theorems
and least-squares clustering, Algorithmica 20 (1998), no. 1, 61–76.
7. J-D Benamou, Yann Brenier, and Kevin Guittet, The monge–kantorovitch mass trans-
fer and its computational fluid mechanics formulation, International Journal for Nu-
merical methods in fluids 40 (2002), no. 1-2, 21–30.
8. Jean-David Benamou and Yann Brenier, Mixed L2 -Wasserstein optimal mapping be-
tween prescribed density functions, Journal of Optimization Theory and Applications
111 (2001), no. 2, 255–271.
9. Jean-David Benamou and Guillaume Carlier, Augmented lagrangian methods for
transport optimization, mean field games and degenerate elliptic equations, Journal
of Optimization Theory and Applications 167 (2015), no. 1, 1–26.
10. Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel
Peyré, Iterative bregman projections for regularized transportation problems, SIAM
Journal on Scientific Computing 37 (2015), no. 2, A1111–A1138.
11. Jean-David Benamou, Guillaume Carlier, and Luca Nenna, A numerical method to
solve multi-marginal optimal transport problems with coulomb cost, Splitting Methods
in Communication, Imaging, Science, and Engineering, Springer, 2016, pp. 577–601.
12. Jean-David Benamou and Vincent Duval, Minimal convex extensions and finite dif-
ference discretisation of the quadratic monge–kantorovich problem, European Journal
of Applied Mathematics 30 (2019), no. 6, 1041–1078.
13. Jean-David Benamou, Brittany D Froese, and Adam M Oberman, Numerical solution
of the optimal transportation problem using the monge–ampère equation, Journal of
Computational Physics 260 (2014), 107–126.
14. Robert J. Berman, The Sinkhorn algorithm, parabolic optimal transport and geometric
Monge-Ampère equations, arXiv preprint arXiv:1712.03082, 2017.
15. Robert J Berman, Convergence rates for discretized monge-ampére equations and
quantitative stability of optimal transport, arXiv preprint arXiv:1803.00785, 2018.
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 65
16. D.P. Bertsekas, A new algorithm for the assignment problem, Mathematical Program-
ming 21 (1981), no. 1, 152–171.
17. D.P. Bertsekas and J. Eckstein, Dual coordinate step methods for linear network flow
problems, Mathematical Programming 42 (1988), no. 1, 203–243.
18. Garrett Birkhoff, Tres observaciones sobre el algebra lineal, Univ. Nac. Tucuman, Ser.
A 5 (1946), 147–154.
19. Yann Brenier, Polar factorization and monotone rearrangement of vector-valued func-
tions, Communications on pure and applied mathematics 44 (1991), no. 4, 375–417.
20. , Minimal geodesics on groups of volume-preserving maps and generalized so-
lutions of the euler equations, Communications on Pure and Applied Mathematics: A
Journal Issued by the Courant Institute of Mathematical Sciences 52 (1999), no. 4,
411–452.
21. R.E. Burkard, M. Dell’Amico, and S. Martello, Assignment problems, Society for
Industrial Mathematics, 2009.
22. Giuseppe Buttazzo, Luigi De Pascale, and Paola Gori-Giorgi, Optimal-transport for-
mulation of electronic density-functional theory, Physical Review A 85 (2012), no. 6,
062502.
23. Giuseppe Buttazzo, Chloé Jimenez, and Edouard Oudet, An optimization problem for
mass transportation with congested dynamics, SIAM Journal on Control and Opti-
mization 48 (2009), no. 3, 1961–1976.
24. LA Caffarelli and VI Oliker, Weak solutions of one inverse problem in geometric optics,
Journal of Mathematical Sciences 154 (2008), no. 1, 39–49.
25. Luis Caffarelli and Robert J McCann, Free boundaries in optimal transport and monge-
ampere obstacle problems, Annals of mathematics 171 (2010), no. 2, 673–730.
26. Luis A Caffarelli, Sergey A Kochengin, and Vladimir I Oliker, Problem of reflector
design with given far-field scattering data, Monge Ampère Equation: Applications to
Geometry and Optimization: NSF-CBMS Conference on the Monge Ampère Equa-
tion, Applications to Geometry and Optimization, July 9-13, 1997, Florida Atlantic
University, vol. 226, American Mathematical Soc., 1999, p. 13.
27. Guillaume Carlier, Victor Chernozhukov, Alfred Galichon, et al., Vector quantile re-
gression: an optimal transport approach, The Annals of Statistics 44 (2016), no. 3,
1165–1192.
28. Jose A Carrillo, Katy Craig, Li Wang, and Chaozhen Wei, Primal dual methods for
wasserstein gradient flows, arXiv preprint arXiv:1901.08081, 2019.
29. Benjamin Charlier, Jean Feydy, Joan Alexis Glaunes, and Alain Trouvé, An efficient
kernel product for automatic differentiation libraries, with applications to measure
transport, Working version, 2017.
30. Victor Chernozhukov, Alfred Galichon, Marc Hallin, Marc Henry, et al., Monge–
kantorovich depth, quantiles, ranks and signs, The Annals of Statistics 45 (2017),
no. 1, 223–256.
31. Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard, An
interpolating distance between optimal transport and fisher–rao metrics, Foundations
of Computational Mathematics 18 (2018), no. 1, 1–44.
32. Codina Cotar, Gero Friesecke, and Claudia Klüppelberg, Density functional theory
and optimal transportation with coulomb cost, Communications on Pure and Applied
Mathematics 66 (2013), no. 4, 548–599.
33. Keenan Crane, Clarisse Weischedel, and Max Wardetzky, Geodesics in heat: A new
approach to computing distance based on heat flow, ACM Transactions on Graphics
(TOG) 32 (2013), no. 5, 152.
34. Michael JP Cullen and R James Purser, An extended lagrangian theory of semi-
geostrophic frontogenesis, Journal of the atmospheric sciences 41 (1984), no. 9, 1477–
1497.
35. Marco Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Ad-
vances in neural information processing systems, 2013, pp. 2292–2300.
36. Marco Cuturi and Gabriel Peyré, Semidual regularized optimal transport, SIAM Re-
view 60 (2018), no. 4, 941–965.
66 QUENTIN MÉRIGOT AND BORIS THIBERT
37. Pedro Machado Manhães de Castro, Quentin Mérigot, and Boris Thibert, Far-field
reflector problem and intersection of paraboloids, Numerische Mathematik 134 (2016),
no. 2, 389–411.
38. Fernando De Goes, Katherine Breeden, Victor Ostromoukhov, and Mathieu Des-
brun, Blue noise through optimal transport, ACM Transactions on Graphics (TOG)
31 (2012), no. 6, 171.
39. Fernando de Goes, Corentin Wallez, Jin Huang, Dmitry Pavlov, and Mathieu Desbrun,
Power particles: an incompressible fluid solver based on power diagrams., ACM Trans.
Graph. 34 (2015), no. 4, 50–1.
40. Frédéric De Gournay, Jonas Kahn, and Léo Lebrat, 3/4-discrete optimal transport,
arXiv preprint arXiv:1806.09537, 2018.
41. , Differentiation and regularity of semi-discrete optimal transport with respect
to the parameters of the discrete measure, Numerische Mathematik 141 (2019), no. 2,
429–453.
42. Roberto De Leo, Cristian E Gutiérrez, and Henok Mawi, On the numerical solution
of the far field refractor problem, Nonlinear Analysis 157 (2017), 123–145.
43. J. Edmonds and R.M. Karp, Theoretical improvements in algorithmic efficiency for
network flow problems, Journal of the ACM (JACM) 19 (1972), no. 2, 248–264.
44. Matthias Erbar, Martin Rumpf, Bernhard Schmitzer, and Stefan Simon, Com-
putation of optimal transport on discrete metric measure spaces, arXiv preprint
arXiv:1707.06859, 2017.
45. Jean Feydy, Pierre Roussillon, Alain Trouvé, and Pietro Gori, Fast and scalable opti-
mal transport for brain tractograms, International Conference on Medical Image Com-
puting and Computer-Assisted Intervention, Springer, 2019, pp. 636–644.
46. Brittany D Froese, A numerical method for the elliptic Monge–Ampère equation with
transport boundary conditions, SIAM Journal on Scientific Computing 34 (2012),
no. 3, A1432–A1459.
47. H.N. Gabow and R.E. Tarjan, Faster scaling algorithms for network problems, SIAM
Journal on Computing 18 (1989), 1013.
48. Bruno Galerne, Arthur Leclaire, and Julien Rabin, A texture synthesis model based
on semi-discrete optimal transport in patch space, SIAM Journal on Imaging Sciences
11 (2018), no. 4, 2456–2493.
49. Alfred Galichon, Optimal transport methods in economics, Princeton University Press,
2018.
50. Alfred Galichon and Bernard Salanié, Matching with trade-offs: Revealed preferences
over competing characteristics, Tech. report, CEPR Discussion Papers, 2010.
51. Wilfrid Gangbo and Robert J McCann, The geometry of optimal transportation, Acta
Mathematica 177 (1996), no. 2, 113–161.
52. Aude Genevay, Marco Cuturi, Gabriel Peyré, and Francis Bach, Stochastic optimiza-
tion for large-scale optimal transport, Advances in neural information processing sys-
tems, 2016, pp. 3440–3448.
53. Nicola Gigli, On hölder continuity-in-time of the optimal transport map towards mea-
sures along a curve, Proceedings of the Edinburgh Mathematical Society 54 (2011),
no. 2, 401–409.
54. A.V. Goldberg, Efficient graph algorithms for sequential and parallel computers, Ph.D.
thesis, Massachussetts Institute of Technology, 1987.
55. Xianfeng Gu, Feng Luo, Jian Sun, and Shing-Tung Yau, Variational principles for
minkowski type problems, discrete optimal transport, and discrete monge–ampère equa-
tions, Asian Journal of Mathematics 20 (2016), no. 2, 383–398.
56. Nestor Guillen, A primer on generated jacobian equations: Geometry, optics, econom-
ics, Notices of the American Mathematical Society 66 (2019), no. 9.
57. Cristian E Gutiérrez and Haim Brezis, The monge-ampere equation, vol. 44, Springer,
2001.
58. Valentin Hartmann and Dominic Schuhmacher, Semi-discrete optimal transport – the
case p = 1, arXiv preprint arXiv:1706.07650, 2017.
OPTIMAL TRANSPORT: DISCRETIZATION AND ALGORITHMS 67
59. Morgane Henry, Emmanuel Maitre, and Valérie Perrier, Optimal transport using
helmholtz-hodge decomposition and first-order primal-dual algorithms, 2015 IEEE In-
ternational Conference on Image Processing (ICIP), IEEE, 2015, pp. 4748–4752.
60. Romain Hug, Analyse mathématique et convergence d’un algorithme pour le trans-
port optimal dynamique: cas des plans de transports non réguliers, ou soumis à des
contraintes, Thèse de doctorat de l’Université Grenoble-Alpes, 2016.
61. Jan-Christian Hütter and Philippe Rigollet, Minimax rates of estimation for smooth
optimal transport maps, arXiv preprint arXiv:1905.05828, 2019.
62. Richard Jordan, David Kinderlehrer, and Felix Otto, The variational formulation of
the fokker–planck equation, SIAM journal on mathematical analysis 29 (1998), no. 1,
1–17.
63. Michael Kerber, Dmitriy Morozov, and Arnur Nigmetov, Geometry helps to compare
persistence diagrams, Journal of Experimental Algorithmics (JEA) 22 (2017), 1–4.
64. Jun Kitagawa, An iterative scheme for solving the optimal transportation problem,
Calculus of Variations and Partial Differential Equations 51 (2014), no. 1-2, 243–263.
65. Jun Kitagawa, Quentin Mérigot, and Boris Thibert, Convergence of a newton al-
gorithm for semi-discrete optimal transport, Journal of the European Mathematical
Society (2019), OnlineFirst.
66. Stanislav Kondratyev, Léonard Monsaingeon, Dmitry Vorotnikov, et al., A new opti-
mal transport distance on the space of finite radon measures, Advances in Differential
Equations 21 (2016), no. 11/12, 1117–1164.
67. Hugo Lavenant, Unconditional convergence for discretizations of dynamical optimal
transport, arXiv preprint arXiv:1909.08790, 2019.
68. Bruno Lévy, A numerical algorithm for l2 semi-discrete optimal transport in 3d,
ESAIM: Mathematical Modelling and Numerical Analysis 49 (2015), no. 6, 1693–
1715.
69. Damiano Lombardi and Emmanuel Maitre, Eulerian models and algorithms for un-
balanced optimal transport, ESAIM: Mathematical Modelling and Numerical Analysis
49 (2015), no. 6, 1717–1744.
70. Quentin Mérigot, A multiscale approach to optimal transport, Computer Graphics
Forum 30 (2011), no. 5, 1583–1592.
71. Quentin Mérigot, Alex Delalande, and Frédéric Chazal, Quantitative stability of op-
timal transport maps and linearization of the 2-wasserstein space, arXiv preprint
arXiv:1910.05954 (2019).
72. Quentin Mérigot, Jocelyn Meyron, and Boris Thibert, An algorithm for optimal trans-
port between a simplex soup and a point cloud, SIAM Journal on Imaging Sciences 11
(2018), no. 2, 1363–1389.
73. Quentin Mérigot and Jean-Marie Mirebeau, Minimal geodesics along volume-
preserving maps, through semidiscrete optimal transport, SIAM Journal on Numerical
Analysis 54 (2016), no. 6, 3465–3492.
74. Jocelyn Meyron, Quentin Mérigot, and Boris Thibert, Light in power: a general and
parameter-free algorithm for caustic design, ACM Transactions on Graphics (TOG)
37 (2019), no. 6, 224.
75. Jean-Marie Mirebeau, Discretization of the 3d Monge-Ampère operator, between wide
stencils and power diagrams, ESAIM: Mathematical Modelling and Numerical Anal-
ysis 49 (2015), no. 5, 1511–1523.
76. Gaspard Monge, Mémoire sur la théorie des déblais et des remblais, 1781.
77. Michael Neilan, Abner J Salgado, and Wujun Zhang, The monge-amp\{e} re equation,
arXiv preprint arXiv:1901.05108, 2019.
78. Adam M Oberman and Yuanlong Ruan, An efficient linear programming method for
optimal transportation, arXiv preprint arXiv:1509.03668, 2015.
79. VI Oliker and LD Prussner, On the numerical solution of the equation and its dis-
cretizations, i, Numerische Mathematik 54 (1989), no. 3, 271–293.
80. Vladimir Oliker, Mathematical aspects of design of beam shaping surfaces in geomet-
rical optics, Trends in Nonlinear Analysis, Springer, 2003, pp. 193–224.
81. Nicolas Papadakis, Gabriel Peyré, and Edouard Oudet, Optimal transport with prox-
imal splitting, SIAM Journal on Imaging Sciences 7 (2014), no. 1, 212–238.
68 QUENTIN MÉRIGOT AND BORIS THIBERT
82. Brendan Pass, Multi-marginal optimal transport: theory and applications, ESAIM:
Mathematical Modelling and Numerical Analysis 49 (2015), no. 6, 1771–1790.
83. Gabriel Peyré and Marco Cuturi, Computational optimal transport, Foundations and
Trends
R in Machine Learning 11 (2019), no. 5-6, 355–607.
84. Svetlozar T Rachev and Ludger Rüschendorf, Mass transportation problems: Volume
i: Theory, vol. 1, Springer Science & Business Media, 1998.
85. , Mass transportation problems: Applications, Springer Science & Business
Media, 2006.
86. R Tyrrell Rockafellar, Convex analysis, vol. 28, Princeton university press, 1970.
87. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas, The earth mover’s distance as a
metric for image retrieval, International journal of computer vision 40 (2000), no. 2,
99–121.
88. Filippo Santambrogio, Optimal transport for applied mathematicians, Springer, 2015.
89. Bernhard Schmitzer, A sparse multiscale algorithm for dense optimal transport, Jour-
nal of Mathematical Imaging and Vision 56 (2016), no. 2, 238–259.
90. , Stabilized sparse scaling algorithms for entropy regularized transport problems,
SIAM Journal on Scientific Computing 41 (2019), no. 3, A1443–A1481.
91. Richard Sinkhorn, A relationship between arbitrary positive matrices and doubly sto-
chastic matrices, The annals of mathematical statistics 35 (1964), no. 2, 876–879.
92. Richard Sinkhorn and Paul Knopp, Concerning nonnegative matrices and doubly sto-
chastic matrices, Pacific Journal of Mathematics 21 (1967), no. 2, 343–348.
93. Justin Solomon, Fernando De Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher,
Andy Nguyen, Tao Du, and Leonidas Guibas, Convolutional Wasserstein distances:
Efficient optimal transportation on geometric domains, ACM Transactions on Graph-
ics (TOG) 34 (2015), no. 4, 66.
94. Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher, Earth
mover’s distances on discrete surfaces, ACM Transactions on Graphics (TOG) 33
(2014), no. 4, 67.
95. Neil S Trudinger, On the local theory of prescribed jacobian equations, Discrete &
Continuous Dynamical Systems-A 34 (2014), no. 4, 1663–1681.
96. François-Xavier Vialard, An elementary introduction to entropic regularization and
proximal methods for numerical optimal transport, Lecture, May 2019.
97. Cédric Villani, Topics in optimal transportation, no. 58, American Mathematical Soc.,
2003.
98. , Optimal transport: old and new, vol. 338, Springer Science & Business Media,
2008.
99. Xu-Jia Wang, On the design of a reflector antenna ii, Calculus of Variations and
Partial Differential Equations 20 (2004), no. 3, 329–341.