Professional Documents
Culture Documents
Survey Paper
a r t i c l e i n f o a b s t r a c t
Article history: This article gives an introduction to optimal transport, a mathematical theory that makes it possible to
Received 6 October 2017 measure distances between functions (or distances between more general objects), to interpolate between
Revised 12 January 2018
objects or to enforce mass/volume conservation in certain computational physics simulations. Optimal
Accepted 16 January 2018
transport is a rich scientific domain, with active research communities, both on its theoretical aspects and
Available online xxx
on more applicative considerations, such as geometry processing and machine learning. This article aims
Keywords: at explaining the main principles behind the theory of optimal transport, introduce the different involved
Mathematics notions, and more importantly, how they relate, to let the reader grasp an intuition of the elegant theory
Physics that structures them. Then we will consider a specific setting, called semi-discrete, where a continuous
Optimal transport function is transported to a discrete sum of Dirac masses. Studying this specific setting naturally leads
Numerical optimization to an efficient computational algorithm, that uses classical notions of computational geometry, such as a
generalization of Voronoi diagrams called Laguerre diagrams.
© 2018 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.cag.2018.01.009
0097-8493/© 2018 Elsevier Ltd. All rights reserved.
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 1. Comparing functions: one would like to say that f1 is nearer to f2 than f3 ,
but the classical L2 norm “does not see” that the graph of f2 corresponds to the
graph of f1 slightly shifted along the x axis. Fig. 3. Given two terrains defined by their height functions u and v, symbolized
here as gray levels, Monge’s problem consists in transforming one terrain into the
other one by moving matter through an application T. This application needs to
satisfy a mass conservation constraint.
80 2.1. Intuition
2.2. Monge’s problem with measures 108
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
124 From now on, we will use measures μ and ν to represent the
125 “current landscape” and the “target landscape”. These measures
of solutions for problem (M). The problem is not symmetric be- 171
126 are supported by sets X and Y, that may be different sets (in the
cause T needs to be a map, therefore the source and target land- 172
127 present example, X is a subset of R2 and Y is a discrete set of
scape do not play the same role. Thus, it is possible to merge earth 173
128 points). Using measures instead of function not only makes it pos-
(if T (x1 ) = T (x2 ) for two different points x1 and x2 ), but it is not 174
129 sible to study our “transport to discrete set of factories” problem,
possible to split earth (for that, we would need a “map” T that 175
130 but also it can be used to formalize computer objects (meshes) and
could send the same point x to two different points y1 and y2 ). 176
131 directly leads to a computational algorithm. This algorithm is very
The problem is illustrated in Fig. 5: suppose that you want to com- 177
132 elegant because it is a verbatim computer translation of the math-
pute the optimal transport between a segment L1 (that symbol- 178
133 ematical theory (see Section 7.6). In this particular setting, trans-
izes a “wall of earth”) and two parallel segments L2 and L3 (that 179
134 lating from the mathematical language to the algorithmic setting
symbolize two “trenches” with a depth that correspond to half the 180
135 does not require to make any approximation. This is made possi-
height of the wall of earth). Now we want to transport the wall 181
136 ble by the generality of the notion of measure.
of earth to the trenches, to make the landscape flat. To do so, it 182
137 The reader who wishes to learn more on measure theory may
is possible to decompose L1 into segments of length h, sent alter- 183
138 refer to the textbook [16]. To keep the length of this article rea-
natively towards L2 and L3 (Fig. 5 on the left). For any length h, 184
139 sonable, we will not give here the formal definition of a measure.
it is always possible to find a better map T, that is a lower value 185
140 In our context, one can think of a measure as a “function” that can
of the functional in (M), by subdividing L1 into smaller segments 186
141 be only queried using integrals and that can be “concentrated” on
(Fig. 5 on the right). The best way to proceed consists in sending 187
142 very small sets (points). The following table can be used to intu-
from each point of L1 half the earth to L2 and half the earth to L3 , 188
143 itively translate from the “language of functions” to the “language
which cannot be represented by a map. Thus, the best solution of 189
144 of measures” :
problem (M) is not a map. In a more general setting, this problem 190
Function u Measure μ appears each time the source measure μ has mass concentrated 191
μ(B ) or B dμ
on a manifold of dimension d − 1 [17] (like the segment L1 in the 192
B u (x )dx
present example). 193
B f (x )u (x )dx B f ( x )d μ
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 6. Four examples of transport plans in 1D. (A) a segment is translated. (B) a segment is split into two segments. (C) a Dirac mass is split into two Dirac masses; (D) a
Dirac mass is spread along two segments. The first two examples (A and B) have the form (Id × T)μ where T is a transport map. The third and fourth ones (C and D) have
no corresponding transport map, because each of them splits a Dirac mass.
216 optimal transport plans (μ, ν ). Recalling the definition of the (Id × T)μ because it splits the mass concentrated in L1 into L2 and 250
217 pushforward (previous subsection), these two constraints can also L3 . 251
218 be written as: Now the theoretical questions are: 252
(PX )γ = μ ⇐⇒ ∀B ⊂ X, dμ = dγ • when does an optimal transport plan exist ? 253
B
B×Y • when does it admit an associated optimal transport map ? 254
(PY )γ = ν ⇐⇒ ∀B ⊂ Y, dν = dγ . (1) A standard approach to tackle this type of existence problem is 255
B X×B
to find a certain regularity both in the functional and in the space 256
219 Intuitively, the first constraint (PX )γ = μ means that everything of the admissible transport plans, that is, proving that the func- 257
220 that comes from a subset B of X should correspond to the amount tional is sufficiently “smooth” and finding a compact set of admis- 258
221 of matter (initially) contained by B in the source landscape, and sible transport plans. Since the set of admissible transport plans 259
222 the second one (PY )γ = ν means that everything that goes into contains at least the product measure μν , it is non-empty, and 260
223 a subset B of Y should correspond to the (prescribed) amount of existence can be proven thanks to a topological argument that ex- 261
224 matter contained by B in the target landscape ν . ploits the regularity of the functional and the compactness of the 262
225 We now examine the relation between the relaxed problem set. Once the existence of a transport plan is established, the other 263
226 (K) and the initial problem (M). One can easily check that among question concerns the existence of an associated transport map. 264
227 the optimal transport plans, those with the form (Id × T)μ corre- Unfortunately, problem (K) does not directly show the structure 265
228 spond to a transport map T: required by this reasoning path. However, one can observe that 266
(K) is a linear optimization problem subject to linear constraints. 267
229 Observation 1. If (Id × T)μ is a transport plan, then T pushes μ
This suggests using certain tools, such as the dual formulation, that 268
230 onto ν .
was also developed by Kantorovich. With this dual formulation, it 269
231 Proof. (Id × T)μ is in (μ, ν ), thus (PY )(Id × T )μ = ν, or is possible to exhibit an interesting structure of the problem, that 270
232 ( (PY ) ◦ (Id × T ) )μ = ν, and finally T μ = ν . can be used to answer both questions (existence of a transport 271
plan, and existence of an associated transport map). 272
233 We can now observe that if a transport plan γ has the form
234 γ = (Id × T )μ, then problem (K) becomes: 4. Kantorovich’s dual problem 273
Kantorovich’s duality applies to problem (K), in its most general 274
min c (x, y )d ( (Id × T )μ ) = min c (x, T (x ))dμ
form (with measures). To facilitate understanding, we will consider 275
X×Y X
instead a discrete version of problem (K), where the involved enti- 276
235 (one retrieves the initial Monge problem). ties are vectors and matrices (instead of measures and operators). 277
236 To further help grasping an intuition of this notion of trans- This makes it easy to better discern the structure of (K), that is the 278
237 port plan, we show four 1D examples in Fig. 6 (the transport plan linear nature of both the functional and the constraints. This also 279
238 is then 1D × 1D = 2D). Intuitively, the transport plan γ may be makes it easier for the reader to understand how to construct the 280
239 thought of as a “table” indexed by x and y that indicates the quan- dual by manipulating simpler objects (matrices and vectors), how- 281
240 tity of matter transported from x to y. More exactly3 , the measure ever the structure of the reasoning is the same in the general case. 282
241 γ is non-zero on subsets of X × Y that contain points (x, y) such
242 that some matter is transported from x to y. Whenever γ derives 4.1. The discrete Kantorovich problem 283
243 from a transport map T, that is if γ has the form (Id × T)μ, then
244 we can consider γ as the “graph of T” like in the first two exam- In Fig. 7, we show a discrete version of the 1D transport be- 284
245 ples of Fig. 6 (A) and (B)4 in Fig. 6. tween two segments of Fig. 6. The measures μ and ν are replaced 285
246 The transport plans of the two other examples (C) and (D) have with vectors U = (ui )i=1...m and V = (v j )i=1...n . The transport plan 286
247 no associated transport map, because they split Dirac masses. The γ becomes a set of coefficients γ ij . Each coefficient γ ij indicates 287
248 transport plan associated with Fig. 5 has the same nature (but the quantity of matter that will be transported from ui to v j . The 288
249 this time in 2D × 2D = 4D). It cannot be written with the form discrete Kantorovich problem can be written as follows: 289
Px γ = U
3
We recall that one cannot evaluate a measure γ at a point (x, y), we can just
minC, γ subject to Py γ = V (2)
γ
compute integrals with γ . γi, j ≥ 0 ∀i, j
4
Note that the measure μ is supposed to be absolutely continuous with respect
to the Lebesgue measure. This is required, because for instance in example (B) of
where γ is the vector of Rm×n with all coefficients γ ij (that is, 290
Fig. 6, the transport map T is undefined at the center of the segment. The absolute the matrix γ ij “unrolled” into a vector), and C the vector of Rm×n 291
continuity requirement allows one to remove from X any subset with zero measure. with the coefficients cij indicating the transport cost between point 292
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
293 i and point j (for instance, the Euclidean cost). The objective func- 326
294 tion is simply the dot product, denoted by C, γ , of the cost vector
= sup [ϕ , U + ψ , V ]. (6)
295 C and the vector γ . The objective function is linear in γ . The con-
ϕ, ψ
296 straints on the marginals (1) impose in this discrete version that
Px t ϕ + Py t ψ ≤ C
297 the sums of the γ ij coefficients over the columns correspond to the
298 ui coefficients (Fig. 7B) and the sums over the rows correspond to The first step (4) consists in exchanging the “inf” and “sup”. 327
299 the v j coefficients (Fig. 7C). Intuitively, everything that leaves point This exchange is possible thanks to a generalization of the clas- 328
300 i should correspond to ui , that is the quantity of matter initially sical minimax theorem due to von Neumann (see, for example 329
301 present in i in the source landscape, and everything that arrives [18, Theorem 2.7]). Then we rearrange the terms (5). By reinter- 330
302 at a point j should correspond to v j , that is the quantity of mat- preting this equation as a constrained optimization problem (sim- 331
303 ter desired at j in the target landscape. As one can easily notice, ilarly to what we did in the previous paragraph), we finally ob- 332
304 in this form, both constraints are linear in γ . They can be written tain the constrained optimization problem in (6). In the constraint 333
305 with two matrices Px and Py , of dimensions m × mn and n × mn, Px t ϕ + Py t ψ ≤ C, the inequality is to be considered component- 334
306 respectively. wise. Finally, the problem (6) can be rewritten as: 335
sup [ϕ , U + ψ , V ]
307 4.2. Constructing the Kantorovich dual in the discrete setting ϕ ,ψ
subject to ϕi + ψ j ≤ ci j , ∀i, j. (7)
308 We introduce, arbitrarily for now, the following function L de-
309 fined by: As compared to the primal problem (2) that depends on m × n 336
variables (all the coefficients γ ij of the optimal transport plan for 337
L(ϕ , ψ ) = C, γ − ϕ , Px γ − U − ψ , Py γ − V
all couples of points (i, j)), this dual problem depends on m + n 338
310 that takes as arguments two vectors, ϕ in Rm and ψ in Rn . The variables (the components ϕ i and ψ j attached to the source points 339
311 function L is constructed from the objective function C, γ from and target points). We will see later how to further reduce the 340
312 which we subtracted the dot products of ϕ and ψ with the vectors number of variables, but before then, we go back to the general 341
313 that correspond to the degree of violation of the constraints. One continuous setting (that is, with functions, measures and opera- 342
314 can observe that: tors). 343
Py γ = V The classical image that gives an intuitive meaning to this dual 348
problem is to consider that instead of transporting earth by our- 349
319 There is equality, because to minimize sup[L(ϕ , ψ )], γ has no selves, we are now hiring a company that will do the work on 350
320 other choice than satisfying the constraints (see the previous ob- our behalf. The company has a special way of determining the 351
321 servation). Thus, we obtain a new expression (left-hand side) of price: the function ϕ (x) corresponds to what they charge for load- 352
322 the discrete Kantorovich problem (right-hand side). We now fur- ing earth at x, and ψ (y) corresponds to what they charge for un- 353
323 ther examine it, and replace L by its expression: loading earth at y. The company aims at maximizing its profit (this 354
is why the dual problem is a “sup” rather than an “inf)”, but it can- 355
inf sup
C, γ −ϕ , Px γ − U
(3) not charge more than what it would cost us if we were doing the 356
γ ≥0 ϕ ,ψ −ψ , Py γ − V work by ourselves (hence the constraint). 357
324 The existence of solutions for (DK) remains difficult to study, 358
C, γ −ϕ , Px γ − U because the set of functions ϕ , ψ that satisfy the constraint is not 359
= sup inf (4)
ϕ ,ψ γ ≥0 −ψ , Py γ − V
325 5
The functions ϕ and ψ need to be taken in L1 (μ) and L1 (ν ). The proof of the
equivalence with problem (K) requires more precautions than in the discrete case,
= sup inf
γ , C − Px ϕ − Py ψ+
t t
(5) in particular step (4) (exchanging sup and inf), that uses a result of convex analysis
ϕ ,ψ γ ≥0 ϕ , U + ψ , V (due to Rockafellar), see [12] chapter 5.
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
360 compact. However, it is possible to reveal more structure of the when it exists, the map T from X to Y which associated transport 398
361 problem, by introducing the notion of c-transform, that makes it plan (Id × T)μ minimizes the functional of the Monge problem. A 399
362 possible to exhibit a set of admissible functions with sufficient reg- result characterizes the support of γ , that is the subset ∂ c ϕ ⊂ X × Y 400
363 ularity: of the pairs of points (x, y) connected by the transport plan: 401
364 Definition 1. Theorem 1. Let ϕ a c-concave function. For all (x, y) ∈ ∂ c ϕ , we have 402
365 • For f any function on Y with values in R ∪ {−∞} and not ∇ϕ (x ) − ∇x c(x, y ) = 0,
366 identically −∞, we define its c-transform by where ∂ c ϕ = {(x, y ) | ϕ (z ) ≤ ϕ (x ) + (c (z, y ) − c (x, y )), ∀z ∈ X }7 de- 403
f (x ) = inf [c (x, y ) − f (y )],
c
x ∈ X. notes the so-called c -superdifferential of ϕ . 404
y∈Y
Proof. See [12] chapters 9 and 10. 405
367 • If a function ϕ is such that there exists a function f such that
368 ϕ = f c , then ϕ is said to be c-concave; In order to give an idea of the relation between the c- 406
369 • c (X) denotes the set of c-concave functions on X. superdifferential and the associated transport map T, we present 407
below a heuristic argument: consider a point (x, y) in the c- 408
370 We now show two properties of (DK) that will allow us to re- superdifferential ∂ c ϕ , then for all z ∈ X we have 409
371 strict the problem to the class of c-concave functions to search for
372 ϕ and ψ : c (x, y ) − ϕ (x ) ≤ c (z, y ) − ϕ (z ). (9)
Now, by using (9), we can compute the derivative at x with re- 410
373 Observation 2. If the pair (ϕ , ψ ) is admissible for (DK), then the
spect to an arbitrary direction w 411
374 pair (ψ c , ψ ) is admissible as well.
ϕ (x + tw ) − ϕ (x ) c (x + tw, y ) − c (x, y )
375 Proof. lim ≤ lim+
t→0+ t t→0 t
∀(x, y ) ∈ X × Y, ϕ (x ) + ψ (y ) ≤ c(x, y ) and we obtain ∇ϕ (x ) · w ≤ ∇x c (x, y ) · w. We can do the same 412
ψ c (x ) = inf [c(x, y ) − ψ (y )] derivation along direction −w instead of w, and then we get 413
y∈Y
∇ϕ (x ) · w = ∇x c(x, y ) · w, ∀w ∈ X.
ψ c (x ) + ψ (y ) = inf c (x, y ) − ψ (y ) + ψ (y ) 414
y ∈Y In the particular case of the L2 cost, that is with c (x, y ) = 415
≤ (c(x, y ) − ψ (y )) + ψ (y ) 1/2x − y2 , this relation becomes ∀(x, y ) ∈ ∂ c ϕ , ∇ϕ (x ) + y − x = 416
≤ c (x, y ). 0, thus, when the optimal transport map T exists, it is given by 417
376 T (x ) = x − ∇ϕ (x ) = ∇ (x /2 − ϕ (x )). 2
377 Observation 3. If the pair (ϕ , ψ ) is admissible for (DK), then one Not only this gives an expression of T in function of ϕ , which is of 418
378 obtains a better pair by replacing ϕ with ψ c : high interest to us if we want to compute the transport explicitly. 419
In addition, this makes it possible to characterize T as the gradient 420
379 Proof.
of a convex function (see also Brenier’s polar factorization theorem 421
ψ (x ) =
c
inf [c (x, y ) − ψ (y )] [19]). This convexity property is interesting, because it means that 422
y∈Y ⇒ ψ ( x ) ≤ ϕ ( x ).
c
∀y ∈ Y, ϕ (x ) ≤ c (x, y ) − ψ (y ) two “transported particles” x1 →T(x1 ) et x2 →T(x2 ) will never col- 423
lide. We now see how to prove these two assertions (T gradient of 424
380 a convex function and absence of collision) in the case of the L2 425
381 In terms of the previous intuitive image, this means that by re- transport (with (c (x, y ) = 1/2x − y2 ). 426
382 placing ϕ with ψ c , the company can charge more while the price Observation 4. If c(x, y) = 1/2x − y2 and ϕ ∈ c (X), then ϕ̄ : x → 427
383 remains acceptable for the client (that is, the constraint is satis- ϕ̄ (x ) = x2 /2 − ϕ (x ) is a convex function (it is an equivalence if 428
384 fied). Thus, we have: X = Y = Rd , see [1]). 429
inf(K ) = sup ϕ dμ + ϕ c dν Proof. 430
ϕ ∈ c ( X ) X Y
ϕ (x ) = ψ c (x )
= sup ψ dμ + ψ dν
c
ψ ∈c (Y ) X Y
x − y2
= inf − ψ (y )
385 We do not give here the detailed proof for existence. The reader y 2
386 is referred to [12], Chapter 4. The idea is that we are now in a
much better situation, since the set of admissible functions c (X) x2 y2
387 = inf −x·y+ − ψ (y ) .
388 is compact6 . y 2 2
389 The optimal value gives an interesting information, that is the Then, 431
390 minimum cost of transforming μ into ν . This can be used to define
x2 y2
391 a distance between distributions, and also gives a way to compare −ϕ̄ (x ) = ϕ (x ) − 2
= inf −x · y + 2
− ψ (y ) .
y
392 different distributions, which is of practical interest for some ap-
393 plications. Or equivalently, 432
y
ϕ̄ (x ) = − ψ (y )
2
sup x · y − 2
.
394 5. From the Kantorovich dual to the optimal transport map y
y2
395 5.1. The c-superdifferential The function x → x · y − 2 − ψ (y ) is affine in x, therefore the 433
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
where JT denotes the Jacobian matrix of T and det denotes the de- 465
terminant. 466
If μ and ν have densities u and v respectively, that is ∀B, μ(B ) = 467
B u (x )dx and ν (B ) = B v (x )dx, then one can (formally) consider 468
(10) pointwise in X: 469
(1 − t )x1 + tT (x1 ) = (1 − t )x2 + tT (x2 ). 6.2. The discrete → discrete case 484
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 9. Different types of transport, with continuous and discrete measures μ and ν .
to its value at each point yj . The integral ∫Y ψ (y)dν becomes the 538
dot product
j ψ j ν j . Let us now replace the c-transform ψ c with 539
its expression, which gives (16). The integral in the left term can 540
be reorganized, by grouping the points of X for which the same 541
point yj minimizes c (x, y j ) − ψ j , which gives (17), where the La- 542
guerre cell Lagcψ (y j ) is defined by: 543
Lagcψ (y j ) = x ∈ X | c(x, y j ) − ψ j ≤ c(x, yk ) − ψk , ∀k = j .
(18)
Thus, the functional F that corresponds to the dual Kantorovich 544
Fig. 10. Semi-discrete transport: gray levels symbolize the quantity of a resource
problem becomes a function that depends on n variables (ψ j )nj=1 . 545
that will be transported to 4 factories. Each factory will be allocated a part of the
terrain in function of the quantity of resource that it should collect. In this case, the set c (Y) can be characterized as the subset of Rn 546
given by (thanks to Theorems 3 and 4) 547
516 7.1. Semi-discrete Monge problem (Y ) = (ψ1 , . . . , ψn ) : μ(Lagψ (y j )) > 0, ∀ j .
c c
518 A transport map T associates with each point x of X one of the of the L2 cost c (x, y ) = 1/2x − y2 , it corresponds to the power 553
519 points yj . Thus, it is possible to partition X, by associating to each diagram, that was studied by Aurenhammer at the end of the 80’s 554
[29]. One of its particularities is that the boundaries of the cells are 555
520 yj the region T −1 (y j ) that contains all the points transported to-
rectilinear, making it reasonably easy to design computational al- 556
521 wards yj by T. The constraint imposes that the quantity of collected
gorithms to construct them. For other costs, the boundaries are in 557
522 resource over each region T −1 (y j ) corresponds to the prescribed
general not rectilinear, and computing the diagram is much more 558
523 quantity ν j .
difficult. Several implementations in 2D exist for some costs (L1 ) in 559
524 Let us now examine the form of the dual Kantorovich problem.
[30]. An interesting alternative is to use a piecewise-linear approx- 560
525 In terms of measure, the source measure μ has a density u, and
imation on a background mesh and study convergence in function 561
526 the target measure ν is a sum of Dirac masses ν = nj=1 ν j δy j , sup-
of mesh size. 562
527 ported by the set of points Y = (y j )nj=1 . We recall that in its general
528 form, the dual Kantorovich problem is written as follows: Observation 6. Let ψ = (ψ1 , . . . , ψn ) and let κ = (κ , . . . , κ ) be a 563
constant of Rn . We consider φ = ψ + κ, then for all j ∈ {1, . . . , n} 564
sup ψ c ( x )d μ + ψ ( y )d ν . (13)
ψ ∈c (Y ) X Y Lagcψ (y j ) = Lagcφ (y j ).
529 In our semi-discrete case, the functional becomes a function of 7.2. Concavity of F 565
530 n variables, with the following form:
F ( ψ ) = F ( ψ1 , ψ2 , . . . ψn ) (14) The objective function F is a particular case of the Kantorovich 566
dual, and naturally inherits its properties, such as its concavity 567
531
(that we did not discuss yet). This property is interesting both from 568
n
a theoretical point of view, to study the existence and uniqueness 569
= ψ c (x )u(x )dx + ψ jν j (15) of solutions, and from a practical point of view, to design efficient 570
X j=1 numerical solution mechanisms. In the semi-discrete case, the con- 571
532
cavity of F is easier to prove than in the general case. We summa- 572
n
rize here the proof by Aurenhammer et al. [27], that leads to an 573
= inf c (x, y j ) − ψ j u(x )dx + ψ jν j (16) efficient algorithm [5], [31], [32]. 574
y j ∈Y
X j=1
533 Theorem 2. The objective function F of the semi-discrete Kantorovich 575
n
n dual problem (13) is concave in c (Y). 576
= c (x, y j ) − ψ j u(x )dx + ψ jν j. (17)
Proof. Consider the function G defined by: 577
j=1Lagc (y )
ψ j
j=1
G(A, [ψ1 , . . . ψn ]) = c (x, yA(x ) ) − ψA(x ) u(x )dx, (19)
534 The first step (15) takes into account the nature of the mea-
535 sures μ and ν . In particular, one can notice that the measure ν is X
536 completely defined by the scalars ν j associated with the points yj , and parameterized by a surjective assignment A : X → [1 . . . n], that 578
537 and the function ψ is defined by the scalars ψ j that correspond is, a function that associates with each point x of X the index j of 579
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 11. The objective function of the dual Kantorovich problem is concave, because its graph is the lower envelope of a family of affine functions.
580 one of the points yj . If we denote A−1 ( j ) = {x|A(x ) = j}, then G can 612
581 be also written as:
= yk | c(x, y j ) − ψ j = c(x, yk ) − ψk (24)
G(A, ψ ) = c (x, y j ) − ψ j u(x )dx
j A−1 ( j ) 613
= c (x, y j )u(x )dx − ψ j u(x )dx.
j A−1 ( j ) j A−1 ( j ) = yj . (25)
(20) In the first step (22), we replace ψc
by its definition, then we 614
582 The first term does not depend on the ψ j , and the second one is use the fact that x belongs to the Laguerre cell of yj (23), and fi- 615
583 a linear combination of the ψ j coefficients, thus, for a given fixed nally, the only point of Y that satisfies (24) is yj because we have 616
584 assignment A, ψ →G(A, ψ ) is an affine function of ψ . Fig. 11 de- supposed x inside the Laguerre cell of yj . 617
585 picts the appearance of the graph of G for different assignment. To summarize, the optimal transport map T moves each point x 618
586 The horizontal axis symbolizes the components of the vector ψ to the point yj associated with the Laguerre cell Lagcψ (y j ) that con- 619
587 (of dimension n) and the vertical axis the value of G(A, ψ ). For a tains x. The vector (ψ1 , . . . ,ψn ) is the unique vector, up to constant 620
588 given assignment A, the graph of G is an hyperplane (symbolized (see Observation 6), that maximizes the discrete dual Kantorovich 621
589 here by a straight line). function F such that ψ is c-concave. It is possible to show that ψ 622
590 Among all the possible assignments A, we distinguish Aψ that is c-concave if and only if no Laguerre cell is empty of matter, that 623
591 associates with a point x the index j of the Laguerre cell x belongs is the integral of u is non-zero on each Laguerre cell. Indeed, we 624
592 to10 , that is: have the following results: 625
Aψ (x ) = arg min c (x, y j ) − ψ j .
j Theorem 3. Let Y = {y1 , . . . , yn } be a set of n points. Let ψ be any 626
function defined on Y such that Lagcψ (yi ) are not empty sets. Then ψ 627
593 For a fixed vector ψ = ψ 0,
among all the possible assign-
is a c-concave function. 628
ments A, the assignment Aψ minimizes the value G(A, ψ 0 ), be-
0
594
595 cause it minimizes the integrand pointwise (see Fig. 11 on the Proof. By definition, for i = 1, 2, . . . , k 629
596 left. Thus, since G is affine with respect to ψ , the graph of the
597 function ψ → G(Aψ , ψ ) is the lower envelope of a family of hy- (ψ c )c (yi ) = inf [c(x, yi ) − ψ c (x )]
x∈X
598 perplanes (symbolized as straight lines in Fig. 11 on the right),
ψ j
604 Let us now examine the c-superdifferential ∂ c ψ , that is, the set
605 of points (x ∈ X, y ∈ Y) connected by the optimal transport map. We = ψ ( yi ).
606 recall thatthe c-superdifferential ∂ c ψ can be defined alternatively This allows to conclude that ψ is a c-concave function (see 630
607 as ∂ c ψ = (x, y j ) | ψ c (x ) + ψ j = c (x, y j ) . Consider a point x of X [1], Proposition 1.34). 631
608 that belongs to the Laguerre cell Lagcψ (y j ). The c-superdifferential
609 [∂ c ψ ](x) at x is defined by: Moreover, the converse of the theorem is also true: 632
[∂ c
ψ ](x ) = {yk | ψ (x ) + ψk = c(x, yk )}
c
(21) Theorem 4. Let Y = {y1 , . . . , yn } be a set of n points. Let ψ be a c- 633
610 concave function defined on Y. Then the sets Lagcψ (yi ) are not empty 634
for all i = 1, . . . , n. 635
= yk | inf
y
[c (x, yl ) − ψl ] + ψk = c (x, yk ) (22)
l
10
A is undefined on the set of Laguerre cell boundaries, this does not cause any ∀x ∈ X, ∃ j ∈ {1, . . . , n} with j = i0 , and
j > 0 such that
difficulty because this set has zero measure. c (x, yi0 ) − ψ (yi0 ) ≥ c (x, y j ) − ψ (y j ) +
j .
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
639 We will write x = x j . Thus, the components of the gradient vanish, this implies that the quan- 662
tity of matter obtained at yj , that is the integral of u on the La- 663
guerre cell of yj , corresponds to the prescribed quantity of matter,
c ( x j , yi0 ) − inf c (x j , y j ) − ψ (y j ) 664
y j ∈Y that is ν j (see also [31], Theorem 3). Moreover, since ν j are sup- 665
posed to be strictly positive, from (18), this maximum is reached 666
≥ c ( x j , yi0 ) − c ( x j , yi0 ) − ψ ( yi0 ) −
j
to the interior of c (Y). 667
= ψ ( yi0 ) +
j . (26)
7.5. Second order derivatives of the objective function 668
640 Therefore,
641 This contradicts the fact that ψ is a c-concave function, because and 674
1
∂ G 2 u (x )
= lim inf c (x, y1 ) − ψ1 , . . . , c (x, y j ) − ψ j (ψ ) = dS(x ). (28)
t→0 t X ∂ ψl ∂ ψ j Lagc (yl )∩Lagc (y j ) yl − y j
ψ ψ
−t, . . . , c (x, yn ) − ψn ]
7.6. A computational algorithm for L2 semi-discrete optimal transport 684
− inf [c (x, yi ) − ψi ]u(x )dx .
i
With the definition of F(ψ ), the expression of its first order 685
652 Let x ∈ Lagcψ (ym ), then for t small enough we have
derivatives (gradient
∇ F) and second order derivatives (Hessian 686
matrix ∇ 2 F = ∂ 2 F /∂ ψi ∂ ψ j ), we are now equipped to design 687
inf c (x, y1 ) − ψ1 , . . . , c (x, y j ) − ψ j − t, . . . , c (x, yk ) − ψn ij
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 12. (A) transport between a uniform density and a random pointset; (B) transport between a varying density and the same pointset; (C) intersections between meshes
used to compute the coefficients; (D) transport between a measure supported by a surface and a 3D pointset.
695 will receive the prescribed quantity of matter ν j . Clearly, the pre- guerre cell of yj . The Laguerre diagram is completely determined 700
696 scriptions should be balanced with the available resource, that is by the vector ψ that maximizes F. 701
697 X u (x )dx = j ν j . The algorithm computes for each point of the Step 2 needs a criterion for convergence. The classical con- 702
698 target measure the subset of X that is affected to it through the vergence criterion for a Newton algorithm uses the norm of the 703
699 optimal transport, T −1 (y j ) = Lagcψ (y j ), that corresponds to the La- gradient of F. In our case, the components of the gradient of F 704
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
Fig. 14. Computing the transport between objects of different dimension, from a sphere to a cube. The sphere is sampled with 10 million points. The cross-section reveals
the formation of a singularity that has some similarities with the medial axis of the cube.
Fig. 15. An application of optimal transport to fluid simulation: numerical simulation of the Taylor–Rayleigh instability using the Gallouët–Mérigot scheme.
705 have a geometric meaning, since ∂ F/∂ ψ j corresponds to the dif- such as GEOGRAM11 et CGAL12 . Then one needs to compute the in- 721
706 ference between the prescribed quantity ν j associated with j and tersection between each Laguerre cell and the mesh that supports 722
707 the quantity of matter present in the Laguerre cell of yj given by the density u. This can be done with specialized algorithms [31], 723
708 Lagc (y )u (x )dx. Thus, we can decide to stop the algorithm as soon also available in GEOGRAM. 724
ψ j
709 as the largest absolute value of a component becomes smaller than Step 4 finds the Newton step p by solving a linear system. We 725
710 a certain percentage of the smallest prescription. Thus we con- use the Conjugate Gradient algorithm [37] with the Jacobi precon- 726
711 sider that convergence is reached if max |∇ Fj | < min j ν j , for a user- ditioner. In our empirical experiments below, we stopped the con- 727
712 defined (typically 1% in the examples below). jugate iterations as soon as ∇ 2 F p + ∇ F /∇ F < 10−3 . We vali- 728
713 Step 3 computes the coefficients of the gradient and the Hes- dated this threshold on small to medium-size problems by com- 729
714 sian matrix of F, using (27) and (28). These computations involve paring with the results of a direct solver (SuperLU). 730
715 integrals over the Laguerre cells and over their boundaries. For the Step 5 determines the descent parameter α . A result due to 731
716 L2 cost c (x, y ) = 1/2x − y2 , the boundaries of the Laguerre cells Kitagawa et al. [32] ensures the convergence of the Newton algo- 732
717 are rectilinear, which dramatically simplifies the computations of rithm if the measure of the smallest Laguerre cell remains larger 733
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
734 than a certain threshold (that is, half the smallest prescription ν j ).
735 There is also a condition on the norm of the gradient ∇ F that
736 we do not repeat here (the reader is referred to Kitagawa, Mérigot
737 and Thibert’s original article for more details). In our implementa-
738 tion, starting with α = 1, we iteratively divide α by two until both
739 conditions are satisfied.
740 Let us now make one step backwards and think about the orig-
741 inal definition of Monge’s problem (M). We wish to stress that the
742 initial constraint (local mass conservation) that characterizes trans-
743 port maps was terribly difficult. It is remarkable that after sev-
744 eral rewrites (Kantorovich relaxation, duality, c-concavity), the final
745 problem becomes as simple as optimizing a regular (C2 ) concave
746 function, for which computing the gradient and Hessian is easy in
747 the semi-discrete case and boils down to evaluating volumes and
748 areas in a Laguerre diagram. We wish also to stress that the com-
749 putational algorithm did not require to make any approximation or
750 discretization. The discrete, computer version is a particular setting
751 of the general theory, that fully describes not only transport be-
752 tween smooth objects (functions), but also transport between less
753 regular objects, such as pointsets and triangulated meshes. This is
754 made possible by the rich mathematical vocabulary (measures) on
755 which optimal transport theory acts. Thus, the computational algo-
756 rithm is an elegant, direct verbatim translation of the theory into
757 a computer program.
758 We now show some computational results and list possible ap-
759 plications of this algorithm.
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009
JID: CAG
ARTICLE IN PRESS [m5G;February 12, 2018;20:21]
808 site to the centroid result in gravity being applied to the centroid [10] Gallouët TO, Mérigot Q. A Lagrangian scheme à la Brenier for the 869
809 while the incompressibility constraint is enforced through the con- incompressible euler equations. Found Comput Math 2017. doi:10.1007/ 870
s10208- 017- 9355- y. 871
810 trolled volumes of the Laguerre cells13 . Using some efficient geo- [11] de Goes F, Wallez C, Huang J, Pavlov D, Desbrun M. Power particles: an 872
811 metric algorithms for the Laguerre diagram and its intersections incompressible fluid solver based on power diagrams. ACM Trans Graph 873
812 [31], the Gallouët–Mérigot numerical scheme for incompressible 2015;34(4):50:1–50:11. doi:10.1145/2766901. 874
[12] Villani C. Optimal transport : old and new. In: Grundlehren der mathema- 875
813 Euler scheme can be also applied in 3D to simulate the behavior tischen Wissenschaften. Berlin: Springer; 2009. ISBN978-3-540-71049-3, URL: 876
814 of non-mixing incompressible fluids (Fig. 16). The 3D version of the http://opac.inria.fr/record=b1129524. 877
[13] Santambrogio F. Introduction to optimal transport theory. In: Optimal trans- Q4
815 Taylor–Rayleigh instability is shown in Fig. 17, with 10 million La- 878
port, theory and applications; 2014. ISBN 9781107689497; [math.PR] http: 879
816 guerre cells. More complicated vortices are generated (shown here
//arxiv.org/abs/1009.3856. 880
817 in cross-section). [14] Caffarelli L. The Monge–Ampère equation and optimal transportation, an ele- 881
818 The applications in computer animation and computational mentary review. In: Optimal transportation and applications (Martina Franca, 882
2001). In: Lecture Notes in Mathematics; 2003. p. 1–10. 883
819 physics seem to be promising, because the semi-discrete algorithm
[15] Ambrosio L, Gigli N. A users guide to optimal transport. Modelling and opti- 884
820 behaves very well in practice. It is now reasonable to design simu- misation of flows on networks. Lecture Notes in Mathematics; 2013. p. 1–155. 885
821 lation algorithms that solve a transport problem in each time step [16] Tao T. An introduction to measure theory. American Mathematical Society; 886
822 (as our early computational fluid dynamics of the previous para- 2011. ISBN 978-0-8218-6919-2. 887
[17] McCann RJ. Existence and uniqueness of monotone measure-preserving maps. 888
823 graph do). With our optimized implementation that uses a multi- Duke Math J 1995;80(2):309–23. 889
824 core processor for the geometric part of the algorithm and a GPU [18] Tuy H. Convex analysis and global optimization. Springer optimization and its 890
825 for solving the linear systems, the semi-discrete algorithm takes no applications, vol. 110. Springer, [Cham]; 2016. ISBN 978-3-319-31482-2; 978- 891
3-319-31484-6 Second edition [of MR1615096]; URL: https://doi.org/10.1007/ 892
826 more than a few tens of seconds to solve a problem with 10 mil- 978- 3- 319- 31484- 6. 893
827 lions unknown. This opens the door to numerical solution mecha- [19] Brenier Y. Polar factorization and monotone rearrangement of vector-valued 894
828 nisms for difficult problems. For instance, in astrophysics, the Early functions. Commun Pure Appl Math 1991;44:375–417. 895
[20] Benamou J-D, Brenier Y. A computational fluid mechanics solution to the 896
829 Universe Reconstruction problem [38] consists in going “backward Monge–Kantorovich mass transfer problem. Numer Math 20 0 0;84(3):375–93. 897
830 in time” from observation data on the repartition of galaxy clus- doi:10.10 07/s0 02110 050 0 02. 898
831 ters, to “play the Big-Bang movie” backward. Our numerical exper- [21] Papadakis N, Peyré G, Oudet E. Optimal transport with proximal splitting. SIAM 899
J Imaging Sci 2014;7(1):212–38. doi:10.1137/130920058. 900
832 iments tend to confirm Brenier’s point of view that he expressed in
[22] Burkard R, Dell’Amico M, Martello S. Assignment problems. SIAM; 2009. ISBN 901
833 the 20 0 0’s, that semi-discrete optimal transport can be an efficient 978-1-61197-222-1. 902
834 way of solving this problem. [23] Leonard C. A survey of the Schrödinger problem and some of its connections 903
with optimal transport. arXiv2013;[math.PR] http://arxiv.org/abs/1308.0215; 904
835 To ensure that our results are reproducible, the source-code as-
[24] Cuturi M. Sinkhorn distances: Lightspeed computation of optimal transport. In: 905
836 sociated with the numerical solution mechanism used in all these Proceedings of the Twenty Seventh Advances in Neural Information Processing 906
837 experiments is available in the EXPLORAGRAM component of the Systems 26. Lake Tahoe, Nevada, United States; 2013. p. 2292–300. 907
838 GEOGRAM programming library14 . [25] Gangbo W, McCann RJ. The geometry of optimal transportation. Acta Math 908
1996;177(2):113–61. doi:10.1007/BF02392620. 909
[26] Alexandrov AD. Intrinsic geometry of convex surfaces (translation of the 1948 910
839 Acknowledgments Russian original). CRC Press; 2005. 911
[27] Aurenhammer F, Hoffmann F, Aronov B. Minkowski-type theorems and 912
least-squares partitioning. In: Symposium on Computational Geometry; 1992. 913
840 This research is supported by EXPLORAGRAM (Inria Exploratory p. 350–7. 914
841 Research Grant). The authors wish to thank Quentin Mérigot, Yann [28] Gu X, Luo F, Sun J, Yau S-T. Variational principles for Minkowski type prob- 915
842 Brenier, Jean-David Benamou, Nicolas Bonneel and Lénaïc Chizat lems, discrete optimal transport, and discrete Monge–Ampere equations. Asian 916
J Math 2016;20(2):383–98. URL: https://doi.org/10.4310/AJM.2016.v20.n2.a7. 917
843 for many discussions. [29] Aurenhammer F. Power diagrams: Properties, algorithms and applications. 918
SIAM J Comput 1987;16(1):78–96. 919
844 References [30] CGAL, the computational geometry algorithms library. CGAL User and Refer- 920 Q5
ence Manual. CGAL Editorial Board; 2011. 921
[31] Lévy B. A numerical algorithm for L2 semi-discrete optimal transport in 3d. 922 Q6
845 [1] Santambrogio F. Optimal transport for applied mathematicians. Progress
846 in nonlinear differential equations and their applications, vol. 87. ESAIM M2AN (Math Model Anal) 2015. 923
847 Birkhäuser/Springer, Cham; 2015. Calculus of variations, PDEs, and mod- [32] Kitagawa J, Mérigot Q, Thibert B. A newton algorithm for semi-discrete optimal 924
848 eling; ISBN 978-3-319-20827-5; 978-3-319-20828-2, doi: https://doi.org/ transport. CoRR 2016;abs/1603.05579. http://arxiv.org/abs/1603.05579. 925
849 10.1007/978- 3- 319- 20828- 2 [33] Serrin J. Mathematical principles of classical fluid mechanics. In: Handbuch 926
Q2850 [2] Peyré G., Cuturi M.. Computational optimal transport. Preprint 2018; der Physik (herausgegeben von S. Flügge), Bd. 8/1, Strömungsmechanik I 927
851 URL:https://optimaltransport.github.io/. (Mitherausgeber C. Truesdell). Berlin-Göttingen-Heidelberg: Springer-Verlag; 928
Q3852 [3] Monge G. Mémoire sur la théorie des déblais et des remblais. Histoire de 1959. p. 125–263. 929
853 l’AcadȨmie Royale des Sciences (1781) 1784:666–704. [34] Nocedal J, Wright SJ. Numerical Optimization. 2nd. New York: Springer; 2006. 930
854 [4] Mémoli F. Gromov–Wasserstein distances and the metric approach to object [35] Bowyer A. Computing Dirichlet tessellations. Comput J 1981;24(2):162–6. 931
855 matching. Found Comput Math 2011;11(4):417–87. doi:10.1093/comjnl/24.2.162. 932
856 [5] Mérigot Q. A multiscale approach to optimal transport. Comput Graph Forum [36] Watson D. Computing the n-dimensional Delaunay tessellation with applica- 933
857 2011;30(5):1583–92. tion to Voronoi polytopes. Comput J 1981;24(2):167–72. 934
858 [6] Bonneel N, van de Panne M, Paris S, Heidrich W. Displacement interpolation [37] Hestenes MR, Stiefel E. Methods of conjugate gradients for solving linear sys- 935
859 using Lagrangian mass transport. ACM Trans Graph 2011;30(6):158. tems. J Res Natl Bur Stand 1952;49(6):409–36. 936
860 [7] Mérigot Q, Meyron J, Thibert B. Light in power: a general and parameter-free [38] Brenier Y, Frisch U, Henon M, Loeper G, Matarrese S, Mohayaee R, et al. Recon- 937
861 algorithm for caustic design. CoRR 2017;abs/1708.04820. URL: http://arxiv.org/ struction of the early universe as a convex optimization problem. arXiv2003; 938
862 abs/1708.04820. URL:https://arxiv.org/abs/astro-ph/0304214. 939
863 [8] Schwartzburg Y, Testuz R, Tagliasacchi A, Pauly M. High-contrast computa-
864 tional caustic design. ACM Trans Graph 2014;33(4):74:1–74:11. doi:10.1145/
865 2601097.2601200.
866 [9] Benamou J.-D, Carlier G, Mérigot Q, Oudet E. Discretization of functionals in-
867 volving the Monge–ampÈre operator. arXiv2014; [math.NA] http://arxiv.org/
868 abs/1408.4336.
13
In fact, the “elastic” works the other way round: instead of pulling the site to-
wards the centroid, it indirectly “pulls down” the centroid by transmitting the in-
fluence of gravity from the site.
14
http://alice.loria.fr/software/geogram/doc/html/index.html
Please cite this article as: B. Lévy, E.L. Schwindt, Notions of optimal transport theory and how to implement them on a computer,
Computers & Graphics (2018), https://doi.org/10.1016/j.cag.2018.01.009