Professional Documents
Culture Documents
Aplicaciones de La Geometría Informacional
Aplicaciones de La Geometría Informacional
Geometry and Its Applications
Shun‐ichi Amari RIKEN Brain Science Institute
1.Divergence Function and Dually Flat Riemannian Structure
2.Invariant Geometry on Manifold of Probability Distributions
3.Geometry and Statistical Inference
semi‐parametrics
4. Applications to Machine Learning and Signal Processing
Information Geometry
-- Manifolds of
Probability Distributions
M { p (x)}
Information Geometry
Combinatorics Physics
Information Sciences
Math. AI
Vision
Manifold of Probability Distributions
x 1, 2, 3 S n ={ p ( x )}
p p1 , p 2 , p3 p1 p 2 p3 1
p3 M p x;
p1 p2
Manifold and Coordinate System
coordinate transformation
Examples of Coordinate systems
Euclidean space
Gaussian distributions
2
1 x
S p x; , p x; , exp
2 2 2
Discrete Distributions
Positive measures
Divergence: D z : y M
D z : y 0 Y
D z : y 0, iff z y Z
Not necessarily symmetric
D[z : y] = D[y : z] D z : z dz gij dzi dz j
positive‐definite
Taylor expansion
Various Divergences
Euclidean
f‐divergence
KL‐divergence
(α‐β)‐divergence
Kullback‐Leibler Divergence
quasi‐distance
p ( x)
D[ p ( x) : q ( x)] p ( x) log
x q( x)
D[ p ( x) : q ( x)] 0 =0 iff p ( x) q( x)
D[ p : q ] D[q : p ]
( , ) divergence
D , [ p : q] { pi
qi p i qi }
: divergence
1: -divergence
Manifold with Convex Function
S : coordinates 1 , 2 ,, n
: convex function
1
i 2
1
D , d
2
gij d i d j
gij i j , i i
i i 0
i i ,
i ( ) max { i ( )}
i
i
D ,
Proof
D ,
: geodesic (e-geodesic)
“dually orthogonal”
i , j i j
i , i
i i
X Y , Z X Y , Z Y , * X Z
Bi‐orthogonality
Dually flat manifold
-coordinates -coordinates
potential functions ,
2 2
gij g
ij
i j i j
ii 0
Gaussian:
Negative entropy
natural parameter
expectation parameter
x : discrete X = {0, 1, …, n}
S n { p ( x) | x X }: exponential family
n n
p ( x) pi i ( x) exp[ i xi ( )] exp x
i 0 i 1
log( pi / p0 );
i
xi i ( x); ( ) log p0
i E[ xi ] pi (η)= pi log pi
Two geodesics
Tangent directions
Function space of probability distributions: topology
{p(x)}
Exponential Family
Pythagorean Theorem
(dually flat manifold)
D P : Q D Q : R D P : R
D[ P : Q] P Q P Q
D P : Q D Q : R D P : R ( P Q ) ( Q R )
( P Q ) ( Q R ) 0
Projection Theorem
s
q
M
q arg min sM D[ s : p ]
e-geodesic
Projection Theorem
min D P : Q
QM
Q = m-geodesic projection of P to M
unique when M is e-flat
min D Q : P
QM
Q’ = e-geodesic projection of P to M
unique when M is m-flat
Convex function – Bregman divergence
– Dually flat Riemannian divergence
Invariant under different representation
y y x, p y, 2
p x, p x, dx
1 2
| p ( y, ) p ( y, ) | dy
1 2
2
Invariant divergence
(manifold of probability Chentsov
Amari ‐Nagaoka
distributions; )
S { p( x, )}
y k x : sufficient statistics
D p X x : qX x D pY y : qY y
Invariance
‐‐‐ characterization of f‐divergence
Csiszar
1 n
pi :
p :
1 2 m
p A ( p ) p pi
iA
D p : q D p A : q A
D p : q D p A : q A
pi c qi ; i A
p:
q:
Invariance ⇒ f‐divergence
Csiszar f‐divergence Ali‐Silvey
Morimoto
q
D f p : q pi f i ,
pi
f u : convex, f 1 0,
Dcf p : q cD f p : q f (u )
f u f u c u 1
u
f 1 f ' 1 0 ; f '' 1 1 1
Theorem
An invariant separable divergence belongs to the class of f‐divergence.
Separable divergence: D[p : q ] k ( pi , qi )
qi
k ( pi , qi ) pi f ( )
pi
divergence (n > 1)
S {p} : space of probability distributions
invariance dually flat space
invariant divergence Flat divergence
convex functions
F‐divergence Bregman
Fisher inf metric
KL‐divergence
Alpha connection p(x)
D[p : q] = p(x) log{ }dx
q(x)
‐Divergence: why?
flat & invariant in Sn 1
1
4 2
f (u ) {1 u }
2
(1 u ), 1
1 2
1
KL-divergence
f (u ) u log u (u 1)
p i
D[ p : q ] { p i log p i qi }
qi
Space of positive measures :
vectors, matrices, arrays
S p , p i 0 : ( p i 1 nn holds)
f‐divergence
Bregman divergence
α‐divergence
f divergence of S
qi
D f p : q p i f 0
p i
D f p : q 0 p q
KL‐divergence
p i
D[ p : q ] { p i log p i qi }
qi
S : dually flat
S : not dually flat (except 1)
p i 1
2
r i
1
1
Metric and Connections Induced by Divergence
(Eguchi)
Riemannian metric
1
gij z i j D z : y y z :D z : y = gij z (zi - yi )(z j - y j )
2
ijk z i j 'k D z : y y z
i , i'
zi yi
ijk z i' 'j k D z : y y z
Invariant geometrical structure S p x,
alpha‐geometry
(derived from invariant divergence)
Tijk E i l j l k l
l log p x, ; i
i
α ‐connection
ijk i, j; k Tijk Levi‐civita:
: dually coupled
X Y , Z X Y , Z Y , X Z
Duality: X Y , Z X Y , Z Y , * X Z
k g ij kij kji
ijk ijk Tijk
M , g , T
Riemannian Structure
ds 2 gij ( )d i d j
d T G ( )d
G ( ) ( gij )
Euclidean G E
Fisher information
Affine Connection
covariant derivative
XY , c X Y
geodesic X X 0, X=X(t)
s ij
g ( ) d i
d j
X , Y X , Y X , Y gij X iY j
X Y , Z X Y , Z Y , * X Z *
Y
X
Y
X
Riemannian geometry:
Dual Affine Connections ,
e‐geodesic ( , ) *
log r x, t t log p x 1 t l o g q x c t
m‐geodesic
r x, t tp x 1 t q x
q x
p x
Mathematical structure of S p x,
gij E i l j l
Tijk E i l j l k l
{M, g, T}
l log p x, ; i
i
-connection
: dually coupled
X Y , Z X Y , Z Y , X Z
α‐geometry
Dual Foliations
k‐cut
Two neurons: { p00 , p01 , p10 , p11}
x1
0011000101101
x2 0100100110100
x3 0101101001010
firing rates: r1 , r2 ; r12
correlation—covariance?
Correlations of Neural Firing
x1 x2
p x , x
1 2
2
p00 , p10 , p01 , p11 1
r1 p1 p10 p11 firing rates
r2 p1 p01 p11 correlations
log
p11 p00 { ( r1 , r2 ), }
p10 p01
orthogonal coordinates
Independent Distributions
x1 , x2 0,1
S { p ( x1 , x2 )}
M {q( x1 )q( x2 )}
two neuron case
r1 , r2 , r 12 ; 1 , 2 , 12
p00 p11 r12 1 r12 r1 r2
12 log log
p01 p10 r1 r12 r2 r12
r12 f r1 , r2 ,
r12 t f r1 t , r2 t ,
Decomposition of KL-divergence
D[p:r] = D[p:q]+D[q:r]
correlations p
q
p,q: same marginals 1 , 2 r
independent
r,q: same correlations
p( x)
D[ p : r ] p ( x) log
x q ( x)
pairwise correlations
independent distributions
rij ri rj , rijk ri rj rk ,
higher-order correlations
Orthogonal
higher‐order correlations
xi 1 u i
ui Gaussian E[ui u j ]
Synfiring
p ( x ) p ( x1 ,..., xn )
1
r xi q r
n
q(r )
r
Input‐output Analysis
Gross product consumption
Relations among industires
(K. Tsuda and R. Morioka)
Mathematical Problems
M submanifold of S ?
Hong van Le
{M, g} {M, g, T} dually flat J. Armstrong
Affine differential geometry
Hessian manifold
Almost complex structure