You are on page 1of 66

Information 

Geometry and Its Applications
Shun‐ichi Amari    RIKEN Brain Science Institute

1.Divergence Function and Dually Flat Riemannian Structure
2.Invariant Geometry on Manifold of Probability Distributions
3.Geometry and Statistical Inference
semi‐parametrics
4.   Applications to Machine Learning and Signal Processing
Information Geometry

-- Manifolds of
Probability Distributions

M  { p (x)}
Information Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics Physics
Information Sciences
Math. AI
Vision

Riemannian Manifold Optimization


Dual Affine Connections

Manifold of Probability Distributions


Information Geometry ?
   
2
1  x  
S   p  x;  ,   p  x;  ,    exp  
2  2 2

Gaussian distributions
 p  x 
 θ  ( , )
S   p  x; θ 


Manifold of Probability Distributions
x  1, 2, 3 S n ={ p ( x )}
p   p1 , p 2 , p3  p1  p 2  p3  1
p3 M   p  x; 

p1 p2
Manifold and Coordinate System

coordinate transformation
Examples of Coordinate systems

Euclidean space
Gaussian distributions 

 
  
2
1 x  
S   p  x;  ,   p  x;  ,    exp  
2  2 2

Discrete Distributions

Positive measures
Divergence: D  z : y M
D  z : y  0 Y

D  z : y  0, iff z  y Z

Not necessarily symmetric
D[z : y] = D[y : z] D  z : z  dz    gij dzi dz j

positive‐definite
Taylor expansion
Various Divergences

Euclidean

f‐divergence

KL‐divergence

(α‐β)‐divergence
Kullback‐Leibler Divergence
quasi‐distance

p ( x)
D[ p ( x) : q ( x)]   p ( x) log
x q( x)

D[ p ( x) : q ( x)]  0 =0 iff p ( x)  q( x)
D[ p : q ]  D[q : p ]
( ,  )  divergence

 
D , [ p : q]  { pi  
 qi    p i qi  }
 

   :   divergence
  1:  -divergence
Manifold with Convex Function
S : coordinates    1 , 2 ,, n 
   : convex function

1
      
i 2

negative entropy   p    p  x  log p  x  dx


energy

mathematical programming, control systems


physics, engineering, vision, economics
Riemannian metric and flatness (affine structure)
{S ,  ( ),  }
Bregman divergence
D  ',                 grad   

1
D  ,   d  
2
 gij   d i d j

gij   i  j   ,  i  i


Flatness (affine)  : geodesic (not Levi-Civita)


Legendre Transformation

i   i   , i  i

  one-to-one

       i i  0

 i   i   ,  
i   ( )  max { i  ( )}
i

i

D  ,              
Proof
D  ,               

D  ,         '     '  grad   '


  '   '   'i '  0
i

  '    '   'i ' i


Two affine coordinate systems  , 

 : geodesic (e-geodesic)

 : dual geodesic (m-geodesic)

“dually orthogonal”
i ,  j  i j
 
i  ,  i

 i i

X  Y , Z   X Y , Z    Y , * X Z 
Bi‐orthogonality
Dually flat manifold
 -coordinates   -coordinates
potential functions    ,   
2 2
gij       g 
ij
  
i  j i  j
         ii  0

exponential family: p  x,    exp  i xi   


 : cumulant generating function
 : negative entropy
canonical divergence D(P: P')=      '    ii '
Exponential Family 

p ( x,  )  exp{  x  ( )}  ( ) : convex function, free-energy

Gaussian:

Negative entropy

natural parameter
expectation parameter
x : discrete X = {0, 1, …, n}

S n  { p ( x) | x  X }: exponential family
n n
p ( x)   pi i ( x)  exp[  i xi  ( )]  exp   x   
i 0 i 1

  log( pi / p0 );
i
xi   i ( x);  ( )   log p0

i  E[ xi ]  pi  (η)=  pi log pi
Two geodesics

Tangent directions
Function space of probability distributions:  topology
{p(x)}
Exponential Family
Pythagorean Theorem
(dually flat manifold)
D  P : Q   D Q : R   D  P : R 

Euclidean space: self-dual  


1
     i
 
2

2
Proof

D[ P : Q]    P    Q    P Q

D  P : Q   D Q : R   D  P : R   ( P  Q )  ( Q   R )
( P  Q )  ( Q   R )  0
Projection Theorem

q  arg min sM D[ p : s ] S


p
m-geodesic

s
q
M
q  arg min sM D[ s : p ]
e-geodesic
Projection Theorem

min D  P : Q 
QM

Q = m-geodesic projection of P to M
unique when M is e-flat
min D Q : P 
QM

Q’ = e-geodesic projection of P to M
unique when M is m-flat
Convex function – Bregman divergence
– Dually flat Riemannian divergence

Dually flat R‐manifold – convex function – canonical divergence


KL‐divergence
Exponential family – Bregman divergence
Banerjee et al
Invariance S   p  x,  

Invariant under different representation

y  y  x, p  y,   2
 p  x,   p  x,  dx
1 2

  | p ( y,  )  p ( y,  ) | dy
1 2
2
Invariant divergence 
(manifold of probability  Chentsov
Amari ‐Nagaoka
distributions;                 ) 
S  { p( x,  )}

y  k  x : sufficient statistics

D  p X  x  : qX  x    D  pY  y  : qY  y  
Invariance 
‐‐‐ characterization of f‐divergence
Csiszar
1 n
pi :

p :
  1 2 m

p A  ( p ) p   pi
iA
D  p : q   D  p A : q A 
D  p : q   D  p A : q A 

 pi  c qi ; i  A

p:

q:

Invariance ⇒ f‐divergence
Csiszar f‐divergence Ali‐Silvey
Morimoto

q 
D f  p : q    pi f  i ,
 pi 

f  u  : convex, f 1  0,

Dcf  p : q   cD f  p : q  f (u )

f  u   f  u   c  u  1

u
f 1  f ' 1  0 ; f '' 1  1 1
Theorem
An invariant separable divergence belongs to the class of f‐divergence.

Separable divergence: D[p : q ]   k ( pi , qi )
qi
k ( pi , qi )  pi f ( )
pi
divergence (n > 1)

S  {p} : space of probability distributions

invariance dually flat space

invariant divergence Flat divergence
convex functions
F‐divergence Bregman
Fisher inf metric
KL‐divergence
Alpha connection p(x)
D[p : q] =  p(x) log{ }dx
q(x)
 ‐Divergence:  why? 
flat & invariant in Sn 1
1
4 2
f (u )  {1  u } 
2
(1  u ),   1
1 2
1
KL-divergence
f (u )  u log u  (u  1)
p i
D[ p : q ]   { p i log  p i  qi }
qi
Space of positive measures :  
vectors, matrices, arrays
S   p  , p i  0 : (  p i  1 nn holds)

f‐divergence
Bregman divergence

α‐divergence

f divergence of S
 qi 
D f  p : q    p i f  0
 p i 

D f  p : q   0  p  q

not invariant under f  u   f  u   c  u  1


 divergence
1 1
1 1 
D [ p : q ]   { p i  qi  p i 2 qi 2 }
2 2

KL‐divergence
p i
D[ p : q ]   { p i log  p i  qi }
qi


S : dually flat
S : not dually flat (except   1)

p i 1
2

r i
1
1
Metric and Connections Induced by Divergence
(Eguchi)
Riemannian metric
1
gij  z    i  j D  z : y  y  z :D  z : y  = gij  z  (zi - yi )(z j - y j )
2

affine connections {, *}

ijk  z    i  j  'k D  z : y  y  z  
i  ,  i' 
zi yi
ijk  z    i'  'j  k D  z : y  y  z
Invariant geometrical structure S   p  x,  
alpha‐geometry
(derived from invariant divergence)

gij    E  i l  j l  Fisher information

Tijk    E  i l  j l  k l  
l  log p  x,   ; i 
 i
α ‐connection
ijk  i, j; k    Tijk Levi‐civita: 

     : dually coupled

X Y , Z   X Y , Z  Y ,   X Z
Duality: X  Y , Z   X Y , Z    Y , * X Z 


 k g ij   kij   kji


 ijk   ijk  Tijk

M , g , T 
Riemannian Structure

ds 2   gij ( )d i d j
 d T G ( )d

G ( )  ( gij )
Euclidean G  E

Fisher information
Affine Connection
covariant derivative
XY , c X  Y
geodesic  X X  0, X=X(t)

s  ij
g ( ) d i
d  j

minimal distance  non-metric


straight line
Duality

X , Y  X ,  Y  X , Y   gij X iY j

X  Y , Z   X Y , Z    Y , * X Z   *
Y

X
Y
X


Riemannian geometry:   
Dual Affine Connections  ,  

e‐geodesic ( ,  ) *

log r  x, t   t log p  x   1  t  l o g q  x   c  t 

m‐geodesic
r  x, t   tp  x   1  t  q  x 

q  x

p  x
Mathematical structure of S   p  x,  

gij    E  i l  j l 
Tijk    E  i l  j l  k l 
{M, g, T}

l  log p  x,   ; i 
 i
 -connection

ijk  i, j; k    Tijk

     : dually coupled

X Y , Z   X Y , Z  Y , X Z
α‐geometry
Dual Foliations

k‐cut
Two neurons: { p00 , p01 , p10 , p11}
x1
0011000101101
x2 0100100110100

x3 0101101001010

firing rates: r1 , r2 ; r12
correlation—covariance?
Correlations of Neural Firing

x1 x2 
 p  x , x 
1 2
2
 p00 , p10 , p01 , p11 1
r1  p1  p10  p11 firing rates
r2  p1  p01  p11 correlations

  log
p11 p00 { ( r1 , r2 ),  }
p10 p01
orthogonal coordinates
Independent Distributions
x1 , x2  0,1
S  { p ( x1 , x2 )}
M  {q( x1 )q( x2 )}
two neuron case

r1 , r2 , r 12 ; 1 ,  2 , 12
p00 p11 r12 1  r12  r1  r2 
12  log  log
p01 p10  r1  r12  r2  r12 
r12  f  r1 , r2 ,  
r12  t   f  r1  t  , r2  t  ,  
Decomposition of KL-divergence

D[p:r] = D[p:q]+D[q:r]
correlations p

q
p,q: same marginals 1 , 2 r
independent
r,q: same correlations 

p( x)
D[ p : r ]   p ( x) log
x q ( x)
pairwise correlations

covariance: cij  rij  ri rj not orthogonal

independent distributions
rij  ri rj , rijk  ri rj rk , 

How to generate correlated spikes?


(Niebur, Neural Computation [2007])

higher-order correlations
Orthogonal 
higher‐order correlations

  i ,ij ; ,1n 


r   ri , rij ; , r1n 
Population and Synfire
x1 x2 xn
Neurons

xi  1  u i 

ui  Gaussian E[ui u j ]  
Synfiring
p ( x )  p ( x1 ,..., xn )
1
r   xi q r 
n
q(r )

r
Input‐output Analysis
Gross product consumption
Relations among industires
(K. Tsuda and R. Morioka)
Mathematical Problems
M         submanifold of S ?
Hong van Le

{M, g}        {M, g, T}  dually flat   J. Armstrong

Affine differential geometry
Hessian manifold
Almost complex structure

You might also like