You are on page 1of 1

Taxonomy of principal distances and divergences

Euclidean geometry Hyperbolic/spherical geometry

Euclidean
pP distance
Hamming distance
Statistical geometry Bolyai Lobachevsky
d2 (p, q) = (p i − qi )2 (Pythagoras’ Physics Rentropy JK −1 (1802-1860) (1792-1856)
i
theorem circa 500 BC)
(|{i : pi ̸= qi }|)
−k p log pdµ Additive entropy
cross-entropy
Manhattan Pdistance (Boltzmann-Gibbs 1878) conditional entropy
d1 (p, q) = i |pi − qi | mutual information
(chain rules)
(city block-taxi cab) Mahalanobis metric (1936)
Minkowski distance
p Information
R entropy
Euclid Pythagoras
pP (Lk -norm) dΣ = (p − q)T Σ−1 (p − q)
H(p) = − p log pdµ
dk (p, q) = k i
|pi − qi |k
(H. Minkowski 1864-1909) (C. Shannon 1948)
Haussdorf set distance
dH (X, Y ) = max{supx ρ(x, Y ), supy ρ(X, y)}

H(p) = KL(p||u)
Lévy-Prokhorov distance
LPρ (p, q) = inf ϵ>0 {p(A) ≤ q(Aϵ ) + ϵ∀A ∈ B(X )} Kullback-Leibler
R divergence
I-projection
Aϵ = {y ∈ X , ∃x ∈ A : ρ(x, y) < ϵ} KL(p||q) = p log pq dµ = Ep [log Q P
]
Quadratic
p distance (relative entropy, 1951)
dQ = (p − q)T Q(p − q) Cone

E. Vinberg
Non-Euclidean geometries geometry Jeffrey divergence
(Jensen-Shannon)

J.M. Souriau
Riemannian geometry

J.L. Koszul
Fisher information (local entropy)
 2

I(θ) = E[ ∂θ ln p(X|θ) ] Bhattacharya distance (1967)

Bhat.
√ √
qR
(R. A. Fisher 1890-1962) d(p, q) = − log p qdµ
Finsler metric2
tensor Kolmogorov
Riemannian metric tensor 1 2 F (x,y) Symplectic
R
q gij = 2
∂ i j K(p||q) = |q − p|dµ
B. Riemann dx ∂y ∂y
geometry
gij dx
R i j
ds ds
ds (Kolmogorov-Smirnoff max |p − q|)
Aitchison distance Hilbert α = −1
(B. Riemann 1826-1866,)
Probability simplex log-ratio metric Matsushitaq distance (1956)
Fisher-Rao distance: α
R 1 1
ds2 = gij dθi dθj = dθ ⊤
I(θ)dθ Mα (p, q) = |q α − p α |dµ
R1p Chernoff divergence R α (1952)
C. R. Rao ρF R (p, q) = minγ 0 γ̇(t)I(θ)γ̇(t)dt Cα (p||q) = − ln p q 1−α dµ Hellinger
C(p, q) = maxα∈(0,1) Cα (p||q) √ √
qR
Conformal geometry H(p||q) = ( p − q)2
Conformal divergence conformal Riemannian metric p √
×α(1 − α) = 2(1 − f g
Dρ (p : q) = ρ(p)D(p : q) gp hi = eϕ g
χ2 test
K. Nomizu

Affine differential geometry Rényi divergenceR(1961) 2 Pearson


χ (p||q) = (q−p)
2
R
Constant Hα = α(1−α)1
log f α dµ α = 0
p

Logarithmic divergence R (K. Pearson, 1857-1936 )
sectional Rα (p|q) = α(α−1) 1
ln pα q 1−α dµ
LG,α (θ1 : θ2 ) = curvature
1 ⊤ (additive entropy)
α log 1 + α∇G(θ2 ) (θ1 − θ2 ) +G(θ2 )−G(θ1 )
Lev M. Bregman

α → 0, F = −G T
Csiszár’ f -divergence
R
Bregman divergences (1967): Kullback-Leibler Df (p||q) = pf ( pq )dµ Vajda Neyman L. LeCam
BF (θ1 ||θ2 ) = F (θ1 ) − F (θ2 ) − (θ1 − θ2 )⊤ ∇F (θ2 ) (Ali& Silvey 1966, Csiszár 1967)
Dual div.∗-conjugate (f ∗ (y) = yf (1/y)) Information geometries
Dual div. (Legendre) DF ∗ (∇F (θ1 )||∇F (θ2 )) = DF (θ2 ||θ1 )
Itakura-Saito divergence
Df ∗ (p||q) = Df (q||p) Hessian manifolds
IS(p|q) = i ( pqii − log pqii − 1) Amari α-divergence (1985)
P
x log x α = 1

(Burg entropy) − log x α = −1 ∇
F. Itakura fα (x) = 1+α H. Shima
Bregman-Csiszár
 divergence (1991) Generalized 4
1−α2
(1 − x 2 ) −1 < α < 1
x − log x − 1 α = 0
Fα (x) = x log x − x + 1 α = 1 f -means Dually flat space ∇∗
1
α(1−α)
(−xα + αx − α + 1) 0 < α < 1 duality... Quantum & matrix geometry
Generalized Pythagoras’ theorem Fröbenius & Hilbert-Schmidt norm
(Generalized projection) β=1 M. Nagumo Quantum entropy
B. De Finetti S(ρ) = −kTr(ρ log ρ)
β→α
Sharma-Mittal  entropies (Von Neumann 1927)
Burbea-Rao or Jensen
1−β

1
R α  1−α
hα,β (p) = 1−β p dµ −1 (incl. Jensen-Shannon)
f (p)+f (q) p+q

JF (p; q) =
2
−f
2 Quantum f -divergences
Non-additive entropy (Dénes Petz)
Log Det divergence J. Jensen
Tsallis entropy (1998)
D(P||Q) =< P, Q−1 > − log det PQ−1 − dimP Von Neumann divergence
(Non-additive R entropy)
1
Tα (p) = 1−α ( pα dµ − 1) L. Kantorovich D(P||Q) = Tr(P(log P − log Q) − P + Q)
1
R pα Integral probability metrics
Tα (p||q) = 1−α (1 − qα−1 dµ) G. Monge
IPMs Stein discrepancies
Earth mover distance
ρ = L1 (EMD 1998) MMD
Maximum Mean Gromov-Haussdorf distance
Wasserstein distances Discrepancy (between compact metric spaces)
M. Fréchet 1
Wα,ρ (p, q) = (inf γ∈Γ(p,q) ρ(p, q)α dγ(x, y)) α dGH (X, Y ) = inf ϕX :X→Z,ϕY :Y →Z {ρZ
H (ϕX (X), ϕY (Y ))}
ϕX , ϕY : isometric embeddings
Optimal transport geometry Sinkhorn divergence (h-regularized OT) ©2023 Frank Nielsen

You might also like