You are on page 1of 1

Some applications may require the comparison of distributions with different total masses.

One approach is to allow for a partial match, where dirt from the most massive distribution is
rearranged to make the least massive, and any leftover "dirt" is discarded at no cost. Under
this approach, the EMD is no longer a true distance between distributions.
Another approach is to allow for mass to be created or destroyed, on a global and/or local
level, as an alternative to transportation, but with a cost penalty. In that case one must
specify a real parameter σ, the ratio between the cost of creating or destroying one unit of
"dirt", and the cost of transporting it by a unit distance. This is equivalent to minimizing the
sum of the earth moving cost plus σ times the L1 distance between the rearranged pile and
the second distribution.

Notationally, if is a partial function which is a bijection on subsets and , then


one is interested in the distance function

where denotes set minus. Here, would be the portion of the earth that was moved;

thus would be the portion not moved, and the size of the pile not moved. By

symmetry, one contemplates as the pile at the destination that 'got there' from P, as
compared to the total Q that we want to have there. Formally, this distance indicates how
much an injective correspondence differs from an isomorphism.
The EMD can be extended naturally to the case where more than two distributions are
compared. In this case, the "distance" between the many distributions is defined as the
optimal value of a linear program. This generalized EMD may be computed exactly using a
greedy algorithm, and the resulting functional has been shown to be Minkowski additive and
convex monotone.[4]

You might also like