Aplicaciones de La Geometría Informacional

Information
Geometry and Its Applications
Shun‐ichi Amari RIKEN Brain Science Institute
1.Divergence Function and Dually Flat Riemannian Structure
2.Invariant Geometry on Manifold of Probability Distributions
3.Geometry and Statistical Inference
semi‐parametrics
4. Applications to Machine Learning and Signal Processing
Information Geometry
-- Manifolds of
Probability Distributions
M  { p (x)}
Information Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics Physics
Information Sciences
Math. AI
Vision
Riemannian Manifold Optimization

Dual Affine Connections
Manifold of Probability Distributions

Information Geometry ?
   
2
1  x  
S   p  x;  ,   p  x;  ,    exp  
2  2 2

Gaussian distributions
 p  x 
 θ  ( , )
S   p  x; θ 

Manifold of Probability Distributions
x  1, 2, 3 S n ={ p ( x )}
p   p1 , p 2 , p3  p1  p 2  p3  1
p3 M   p  x; 
p1 p2
Manifold and Coordinate System
coordinate transformation
Examples of Coordinate systems
Euclidean space
Gaussian distributions 
 
  
2
1 x  
S   p  x;  ,   p  x;  ,    exp  
2  2 2

Discrete Distributions
Positive measures
Divergence: D  z : y M
D  z : y  0 Y
D  z : y  0, iff z  y Z
Not necessarily symmetric
D[z : y] = D[y : z] D  z : z  dz    gij dzi dz j
positive‐definite
Taylor expansion
Various Divergences
Euclidean
f‐divergence
KL‐divergence
(α‐β)‐divergence
Kullback‐Leibler Divergence
quasi‐distance
p ( x)
D[ p ( x) : q ( x)]   p ( x) log
x q( x)
D[ p ( x) : q ( x)]  0 =0 iff p ( x)  q( x)
D[ p : q ]  D[q : p ]
( ,  )  divergence
 
D , [ p : q]  { pi  
 qi    p i qi  }
 
   :   divergence
  1:  -divergence
Manifold with Convex Function
S : coordinates    1 , 2 ,, n 
   : convex function
1
      
i 2
negative entropy   p    p  x  log p  x  dx

energy
mathematical programming, control systems

physics, engineering, vision, economics
Riemannian metric and flatness (affine structure)
{S ,  ( ),  }
Bregman divergence
D  ',                 grad   
1
D  ,   d  
2
 gij   d i d j

gij   i  j   ,  i  i

Flatness (affine)  : geodesic (not Levi-Civita)

Legendre Transformation

i   i   , i  i

  one-to-one
       i i  0
 i   i   ,  
i   ( )  max { i  ( )}
i
i
D  ,              
Proof
D  ,               
D  ,         '     '  grad   '

  '   '   'i '  0
i
  '    '   'i ' i

Two affine coordinate systems  , 
 : geodesic (e-geodesic)
 : dual geodesic (m-geodesic)
“dually orthogonal”
i ,  j  i j
 
i  ,  i

 i i
X  Y , Z   X Y , Z    Y , * X Z 
Bi‐orthogonality
Dually flat manifold
 -coordinates   -coordinates
potential functions    ,   
2 2
gij       g 
ij
  
i  j i  j
         ii  0
exponential family: p  x,    exp  i xi   

 : cumulant generating function
 : negative entropy
canonical divergence D(P: P')=      '    ii '
Exponential Family
p ( x,  )  exp{  x  ( )}  ( ) : convex function, free-energy
Gaussian:
Negative entropy
natural parameter
expectation parameter
x : discrete X = {0, 1, …, n}
S n  { p ( x) | x  X }: exponential family
n n
p ( x)   pi i ( x)  exp[  i xi  ( )]  exp   x   
i 0 i 1
  log( pi / p0 );
i
xi   i ( x);  ( )   log p0
i  E[ xi ]  pi  (η)=  pi log pi
Two geodesics
Tangent directions
Function space of probability distributions: topology
{p(x)}
Exponential Family
Pythagorean Theorem
(dually flat manifold)
D  P : Q   D Q : R   D  P : R 
Euclidean space: self-dual  

1
     i
 
2

2
Proof
D[ P : Q]    P    Q    P Q
D  P : Q   D Q : R   D  P : R   ( P  Q )  ( Q   R )
( P  Q )  ( Q   R )  0
Projection Theorem
q  arg min sM D[ p : s ] S

p
m-geodesic
s
q
M
q  arg min sM D[ s : p ]
e-geodesic
Projection Theorem
min D  P : Q 
QM
Q = m-geodesic projection of P to M
unique when M is e-flat
min D Q : P 
QM
Q’ = e-geodesic projection of P to M
unique when M is m-flat
Convex function – Bregman divergence
– Dually flat Riemannian divergence
Dually flat R‐manifold – convex function – canonical divergence

KL‐divergence
Exponential family – Bregman divergence
Banerjee et al
Invariance S   p  x,  
Invariant under different representation
y  y  x, p  y,   2
 p  x,   p  x,  dx
1 2
  | p ( y,  )  p ( y,  ) | dy
1 2
2
Invariant divergence
(manifold of probability Chentsov
Amari ‐Nagaoka
distributions; )
S  { p( x,  )}
y  k  x : sufficient statistics
D  p X  x  : qX  x    D  pY  y  : qY  y  
Invariance
‐‐‐ characterization of f‐divergence
Csiszar
1 n
pi :
p :
  1 2 m
p A  ( p ) p   pi
iA
D  p : q   D  p A : q A 
D  p : q   D  p A : q A 
 pi  c qi ; i  A
p:
q:
Invariance ⇒ f‐divergence
Csiszar f‐divergence Ali‐Silvey
Morimoto
q 
D f  p : q    pi f  i ,
 pi 
f  u  : convex, f 1  0,
Dcf  p : q   cD f  p : q  f (u )
f  u   f  u   c  u  1
u
f 1  f ' 1  0 ; f '' 1  1 1
Theorem
An invariant separable divergence belongs to the class of f‐divergence.
Separable divergence: D[p : q ]   k ( pi , qi )
qi
k ( pi , qi )  pi f ( )
pi
divergence (n > 1)
S  {p} : space of probability distributions
invariance dually flat space
invariant divergence Flat divergence
convex functions
F‐divergence Bregman
Fisher inf metric
KL‐divergence
Alpha connection p(x)
D[p : q] =  p(x) log{ }dx
q(x)
 ‐Divergence: why?
flat & invariant in Sn 1
1
4 2
f (u )  {1  u } 
2
(1  u ),   1
1 2
1
KL-divergence
f (u )  u log u  (u  1)
p i
D[ p : q ]   { p i log  p i  qi }
qi
Space of positive measures :
vectors, matrices, arrays
S   p  , p i  0 : (  p i  1 nn holds)
f‐divergence
Bregman divergence
α‐divergence

f divergence of S
 qi 
D f  p : q    p i f  0
 p i 
D f  p : q   0  p  q
not invariant under f  u   f  u   c  u  1

 divergence
1 1
1 1 
D [ p : q ]   { p i  qi  p i 2 qi 2 }
2 2
KL‐divergence
p i
D[ p : q ]   { p i log  p i  qi }
qi

S : dually flat
S : not dually flat (except   1)
p i 1
2
r i
1
1
Metric and Connections Induced by Divergence
(Eguchi)
Riemannian metric
1
gij  z    i  j D  z : y  y  z :D  z : y  = gij  z  (zi - yi )(z j - y j )
2
affine connections {, *}
ijk  z    i  j  'k D  z : y  y  z  
i  ,  i' 
zi yi
ijk  z    i'  'j  k D  z : y  y  z
Invariant geometrical structure S   p  x,  
alpha‐geometry
(derived from invariant divergence)
gij    E  i l  j l  Fisher information
Tijk    E  i l  j l  k l  
l  log p  x,   ; i 
 i
α ‐connection
ijk  i, j; k    Tijk Levi‐civita:
     : dually coupled
X Y , Z   X Y , Z  Y ,   X Z
Duality: X  Y , Z   X Y , Z    Y , * X Z 

 k g ij   kij   kji

 ijk   ijk  Tijk
M , g , T 
Riemannian Structure
ds 2   gij ( )d i d j
 d T G ( )d
G ( )  ( gij )
Euclidean G  E
Fisher information
Affine Connection
covariant derivative
XY , c X  Y
geodesic  X X  0, X=X(t)
s  ij
g ( ) d i
d  j
minimal distance  non-metric

straight line
Duality
X , Y  X ,  Y  X , Y   gij X iY j
X  Y , Z   X Y , Z    Y , * X Z   *
Y
X
Y
X

Riemannian geometry:   
Dual Affine Connections  ,  

e‐geodesic ( ,  ) *
log r  x, t   t log p  x   1  t  l o g q  x   c  t 
m‐geodesic
r  x, t   tp  x   1  t  q  x 
q  x
p  x
Mathematical structure of S   p  x,  
gij    E  i l  j l 
Tijk    E  i l  j l  k l 
{M, g, T}

l  log p  x,   ; i 
 i
 -connection
ijk  i, j; k    Tijk
     : dually coupled
X Y , Z   X Y , Z  Y , X Z
α‐geometry
Dual Foliations
k‐cut
Two neurons: { p00 , p01 , p10 , p11}
x1
００１１０００１０１１０１
x2 ０１００１００１１０１００
x3 ０１０１１０１００１０１０
firing rates: r1 , r2 ; r12
correlation—covariance?
Correlations of Neural Firing
x1 x2 
 p  x , x 
1 2
2
 p00 , p10 , p01 , p11 1
r1  p1  p10  p11 firing rates
r2  p1  p01  p11 correlations
  log
p11 p00 { ( r1 , r2 ),  }
p10 p01
orthogonal coordinates
Independent Distributions
x1 , x2  0,1
S  { p ( x1 , x2 )}
M  {q( x1 )q( x2 )}
two neuron case
r1 , r2 , r 12 ; 1 ,  2 , 12
p00 p11 r12 1  r12  r1  r2 
12  log  log
p01 p10  r1  r12  r2  r12 
r12  f  r1 , r2 ,  
r12  t   f  r1  t  , r2  t  ,  
Decomposition of KL-divergence
D[p:r] = D[p:q]+D[q:r]
correlations p
q
p,q: same marginals 1 , 2 r
independent
r,q: same correlations 
p( x)
D[ p : r ]   p ( x) log
x q ( x)
pairwise correlations
covariance: cij  rij  ri rj not orthogonal
independent distributions
rij  ri rj , rijk  ri rj rk , 
How to generate correlated spikes?

(Niebur, Neural Computation [2007])
higher-order correlations
Orthogonal
higher‐order correlations
  i ,ij ; ,1n 

r   ri , rij ; , r1n 
Population and Synfire
x1 x2 xn
Neurons
xi  1  u i 
ui  Gaussian E[ui u j ]  
Synfiring
p ( x )  p ( x1 ,..., xn )
1
r   xi q r 
n
q(r )
r
Input‐output Analysis
Gross product consumption
Relations among industires
(K. Tsuda and R. Morioka)
Mathematical Problems
M submanifold of S ?
Hong van Le
{M, g} {M, g, T} dually flat J. Armstrong
Affine differential geometry
Hessian manifold
Almost complex structure

Aplicaciones de La Geometría Informacional

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aplicaciones de La Geometría Informacional

Uploaded by

Copyright:

Available Formats

Information

Systems Theory Information Theory

Statistics Neural Networks

Riemannian Manifold Optimization

Manifold of Probability Distributions

negative entropy   p    p  x  log p  x  dx

mathematical programming, control systems

Flatness (affine)  : geodesic (not Levi-Civita)

D  ,         '     '  grad   '

  '    '   'i ' i

 : dual geodesic (m-geodesic)

exponential family: p  x,    exp  i xi   

p ( x,  )  exp{  x  ( )}  ( ) : convex function, free-energy

Euclidean space: self-dual  

q  arg min sM D[ p : s ] S

Dually flat R‐manifold – convex function – canonical divergence

not invariant under f  u   f  u   c  u  1

affine connections {, *}

gij    E  i l  j l  Fisher information

minimal distance  non-metric

ijk  i, j; k    Tijk

covariance: cij  rij  ri rj not orthogonal

How to generate correlated spikes?

  i ,ij ; ,1n 

You might also like