Professional Documents
Culture Documents
PRINCIPE
Renyi’s entropy
U. OF FLORIDA
EEL 6935
History: Alfred Renyi was looking for the most general definition of
352-392-2662
information measures that would preserve the additivity for indepen-
principe@cnel.ufl.edu
dent events and was compatible with the axioms of probability.
copyright 2009
k =1
page 1 of 35
JOSE C. PRINCIPE
Applying this definition to the I(P) we get
U. OF FLORIDA N
EEL 6935
I ( P) = g (∑ pk g (I k ))
−1
k =1
352-392-2662
When the postulate of additivity for independent events is applied we
principe@cnel.ufl.edu
get just two possible g(x):
copyright 2009
g(x)=cx
g(x)=c-2(1-α)x
The first form gives Shannon information and the second gives
N
1
Iα (P) = log(∑ pkα )
1− α k =1
for non negative α different from 1. This gives a parametric family of
information measures that are called today Renyi’s entropies.
N N
N
lim ∑ log pk ⋅ pk α
∑ pkα −1
1 α →1
lim Hα ( X ) = lim log ∑ pαk = k =1 k =1
= HS ( X )
α →1 α →1 1 − α lim− 1
k =1
α →1
page 2 of 35
JOSE C. PRINCIPE
Meaning of α
U. OF FLORIDA
EEL 6935
If we compare the two definitions we see that instead of weighting the
352-392-2662
log pk by the probabilities, here the log is external to the sum and its
argument is the α power of the PMF, i.e.
principe@cnel.ufl.edu
( )
copyright 2009
Hα ( X ) =
1
log(Vα (X )) = −log(α −1Vα ( X ) = −log α−1 E(Vα−1(X )) Vα ( X ) = ∑ p αk
1 −α k
A geometric view will help here. All the PMFs of N r.v. exist on what is
called the simplex, in an N dimensional space.
p2 n
α α
∑
k=1
pk = p α
(entropy α-norm) p2
p = (p 1, p2)
0 p3
p1 1
p1 1
0 1
page 3 of 35
JOSE C. PRINCIPE
The norm measures exactly the distance of the PMF to the origin, and
U. OF FLORIDA
α is the designated l-norm (Euclidean norm is l=2). Changing α
EEL 6935
changes the metric distance in the simplex.
352-392-2662
[ 0 01] [0 01] [ 001 ]
principe@cnel.ufl.edu
copyright 2009
page 4 of 35
JOSE C. PRINCIPE
Properties of Renyi’s entropies
U. OF FLORIDA
EEL 6935
352-392-2662
principe@cnel.ufl.edu
copyright 2009
page 5 of 35
JOSE C. PRINCIPE
Renyi’s quadratic entropy
U. OF FLORIDA
We will be using heavily Renyi’s entropy with α=2, called the quadratic
EEL 6935
entropy
352-392-2662
H2 ( X ) = − log(∑ pk2 )
principe@cnel.ufl.edu
k
copyright 2009
It has been used in physics, in signal processing and in economics.
We want also to stress that the argument of the log, which is the 2-
norm of the PMF V2(X) has meaning in itself. In fact it is the E[p(X)] or
if we make the change in variables ξk=p(xk), then it is the mean of the
transformed variable when the PMF is the transformation.
p(x)
Mean p(X)
x
mean of X
page 6 of 35
JOSE C. PRINCIPE
Extensions to continuous variables.
U. OF FLORIDA
EEL 6935
It is possible to show that Renyi’s entropy measure extends to continu-
352-392-2662
ous r.v. and reads
principe@cnel.ufl.edu
1
copyright 2009 Hα ( X ) = lim(Iα (Pn ) − log n) = log ∫ pα (x)dx
n→∞ 1−α
H2 ( X ) = − log ∫ p2 ( x)dx
page 7 of 35
JOSE C. PRINCIPE
Estimation of Renyi’s quadratic entropy
U. OF FLORIDA
EEL 6935
We will use here ideas from density estimation called kernel density
352-392-2662
estimation. This is an old method (Rosenblatt) that is now called
principe@cnel.ufl.edu
Parzen density estimation because Parzen proved many important
copyright 2009
statistical properties of the estimator.
The idea is very simple: Place a kernel over the samples and sum with
proper normalization, i.e.
1 N
x − xi
pˆ X ( x ) =
Nσ
∑κ ( σ
)
i =1
2. ∫R k <∞
3. lim xk ( x ) = 0
x→∞
4. k ( x ) ≥ 0, ∫R k ( x )dx = 1
Parzen proved that the estimator is asymptotically unbiased and con-
sistent with a good efficiency.
The free parameter is called the kernel size. Normally a Gaussian is
used and σ becomes the standard deviation.
page 8 of 35
JOSE C. PRINCIPE
But notice that here we are not interested in estimating the PDF that is
U. OF FLORIDA
a function, we are interested in a single number the 2-norm of the PDF.
EEL 6935
For a Gaussian kernel, substituting the estimator yields immediately
352-392-2662
∞ 2
⎛1 N ⎞
principe@cnel.ufl.edu H 2 ( X ) = − log ∫ ⎜⎜ ∑ Gσ ( x − xi ) ⎟⎟ dx
ˆ
−∞ ⎝ ⎠
N i =1
copyright 2009
1
∞ ⎛N N ⎞
⎜ ∑∑ Gσ ( x − x j ) ⋅ Gσ ( x − xi ) ⎟dx
= − log 2
N ∫ ⎜ i =1 j =1 ⎟
−∞ ⎝ ⎠
∞
1 N N
= − log 2 ∑∑ ∫ Gσ ( x − x j ) ⋅ Gσ ( x − xi ) dx
N i =1 j =1 −∞
1 N N
= − log( 2 ∑∑ Gσ 2
( x j − xi ))
N i =1 j =1
We call this estimator for V2(X), the Information Potential.
Let us look at the derivation: We never had to compute the integral
explicitly since the integral of the product of Gaussians is the value of a
Gaussian at the difference of arguments (with a larger kernel size).
Let us also look at the expression: The algorithm is O(N2), and there is
a free parameter σ that the user has to select from the data.
Cross validation or the Silverman’s rule suffice for most cases
( )
1
−1 −1 d + 4
σ opt = σ X 4 N (2d + 1)
page 9 of 35
JOSE C. PRINCIPE
Extended Estimator for Renyi’s Entropy
U. OF FLORIDA
EEL 6935
We can come up with still another estimator for Reny’s entropy of any
352-392-2662
order
principe@cnel.ufl.edu
[ ]
Δ ∞
1 1
copyright 2009 Hα ( X ) = log ∫ pαX ( x) dx = log EX pαX −1( X )
1−α −∞
1− α
by approximating the expected value by the empirical mean
N
1 1
Hα ( X ) ≈
1 −α
log
N
∑ pαX −1 (x j )
j =1
to yield
α −1
1 1 N ⎛1 N ⎞
Hˆ α ( X ) =
1 −α
log
N
∑ ⎜ N ∑ κ σ ( x j − xi ) ⎟⎟
⎜
j =1 ⎝ i =1 ⎠
α −1
1 1 N ⎛N ⎞
=
1−α
log
Nα
∑⎜ ∑ σ j i ⎟⎟
⎜ κ ( x − x )
j =1 ⎝ i=1 ⎠
We will now present a set of properties of these estimators.
page 10 of 35
JOSE C. PRINCIPE
Properties of the Estimators
U. OF FLORIDA
Property 2.1: The estimator of Eq. (2.18) for Renyi’s quadratic entropy using Gaussian
EEL 6935
352-392-2662 kernels only differs from the IP of Eq. (2.14) by a factor of 2 in the kernel size.
principe@cnel.ufl.edu
copyright 2009 Property 2.2: For any Parzen kernel that obeys the relation
∞
κ new
(x j − xi ) = ∫κ old ( x − xi ) ⋅κ old ( x − x j )dx (2.19)
−∞
the estimator of Renyi’s quadratic entropy of Eq. (2.18) matched the estimator of Eq.
(2.14) using the IP.
Property 2.3. The kernel size must be a parameter that satisfies the scaling property
κ cσ ( x) = κ σ ( x / c) / c for any positive factor c
Property 2.4. The entropy estimator in Eq. (2.18) is invariant to the mean of the
underlying density of the samples as is the actual entropy
using Parzen windowing with the expectation approximated by the sample mean.
page 11 of 35
JOSE C. PRINCIPE
Properties of the Estimators
U. OF FLORIDA
Property 2.6. In order to maintain consistency with the scaling property of the actual
EEL 6935 entropy, if the entropy estimate of samples {x1 ,…,x N} of a random variable X is estimated
352-392-2662 using a kernel size of σ, the entropy estimate of the samples {cx 1,…,cxN} of a random
variable cX must be estimated using a kernel size of |c| σ.
principe@cnel.ufl.edu
copyright 2009
Property 2.7. When estimating the joint entropy of an n-dimensional random vector X
from its samples {x 1,…,x N}, use a multi-dimensional kernel that is the product of single-
dimensional kernels. This way, the estimate of the joint entropy and estimate of the
Theorem 2.1. The entropy estimator in Eq. (2.18) is consistent if the Parzen
windowing and the sample mean are consistent for the actual PDF of the iid samples.
Theorem 2.2. If the maximum value of the kernel κσ(ξ) is achieved when ξ = 0, then the
minimum value of the entropy estimator in Eq. (2.18) is achieved when all samples are
equal to each other, i.e., x1=…= xN = c
page 12 of 35
JOSE C. PRINCIPE
Properties of the Estimators
U. OF FLORIDA Theorem 2.3. If the kernel function κ σ(.) is continuous, differentiable, symmetric and
EEL 6935
unimodal, then the global minimum described in Theorem 2.2 of the entropy estimator in
352-392-2662
principe@cnel.ufl.edu Eq. (2.18) is smooth, i.e., it has a zero gradient and a positive semi-definite Hessian
copyright 2009
matrix.
Property 2.8. If the kernel function satisfies the conditions in Theorem 2.3, then in
the limit, as the kernel size tends to infinity, the quadratic entropy estim ator approaches
Property 2.9. In the case of joint entropy estim ation, if the m ulti-dimensional
entropy estim ator in Eq. (2.18) is invariant under rotations as is the actual entropy of a
random vector X. Notice that the condition on the joint kernel function requires hyper-
page 13 of 35
JOSE C. PRINCIPE
Properties of the Estimators
U. OF FLORIDA
Theorem 2.4. lim Hˆ α ( X ) = H α ( Xˆ ) ≥ H α ( X ) , where X̂ is a random variable with the
EEL 6935 N →∞
352-392-2662
PDF f X (.) * κ σ (.) . The equality (in the inequality portion) occurs if and only if (iff) the
principe@cnel.ufl.edu
kernel size is zero. This result is also valid on the average for the finite-sample case.
copyright 2009
Bias of the Information Potential
[ ] [ ][
a = E Kσ ( xi − xj )Kσ ( x j − xl ) − E Kσ ( xi − x j ) E Kσ (x j − xl ) ]
b = Var[Kσ (xi − x j )]
EEL 6935
There is a useful physical interpretation of the IP. Think of the gravita-
352-392-2662
tional or the electrostatic field.
principe@cnel.ufl.edu
Assume the samples are particles (information particles) of unit mass
copyright 2009
that interact with each other with the rules specified by the Parzen ker-
nel used. This analogy is exact!
The information potential field is given by the kernel density estimation
Δ 1 N
Vˆ2 ( x j ) = ∑ Gσ 2
( x j − xi )
N i =1
the Information Potential is the total potential
N
Vˆ2 ( X ) = 1 / N ∑ Vˆ2 ( x j )
j =1
Now if there is potential there are forces in the space of the samples
that can be easily computed as
∂ ˆ 1 N
1 N
∂xj
V2 (x j ) =
N
∑Gσ′ 2 (xj − xi ) = 2Nσ 2 ∑Gσ 2 (xj − xi )(xi − xj )
i =1 i=1
Δ 1
F2 ( x j ; xi ) = Gσ′ 2
( x j − xi )
N
Δ
∂ N
F2 ( x j ) = V2 ( x j ) = ∑ F2 ( x j ; x i )
∂x j i =1
page 15 of 35
JOSE C. PRINCIPE
Interaction between one information particle in the center of 2-D space
U. OF FLORIDA
EEL 6935
352-392-2662
principe@cnel.ufl.edu
copyright 2009
⎪⎩ Nσ 2
j =1
1 N N ˆ 1 N
V ( X ) = 2 ∑∑ Vij =
ˆ ∑ Vˆ (i )
N i =1 j =1 N i =1
page 16 of 35
JOSE C. PRINCIPE
Example of forces pulling apart samples when the entropy is maxi-
U. OF FLORIDA
mized. The lines show the direction of the force and their size is the
EEL 6935
intensity of the pull.
352-392-2662
principe@cnel.ufl.edu 1
copyright 2009
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
page 17 of 35
JOSE C. PRINCIPE
Extension to any α and kernel
U. OF FLORIDA
EEL 6935
The framework of information potential and forces extends to any
352-392-2662
Parzen kernel where the shape of the kernel creates the interaction
principe@cnel.ufl.edu
field. Changing α also changes the field.
copyright 2009
The α information potential field is
α −1
Δ1 ⎛N ⎞
Vα (x j )= α −1 ⎜⎜∑κσ (x j − xi )⎟⎟
ˆ
N ⎝ i=1 ⎠
N
1
Vˆα ( X) = ∑Vˆα (xj )
N j =1
EEL 6935
Renyi proposed a divergence measure from his entropy definition
352-392-2662
which is different from Kullback Liebler divergence discussed in ch 1.
principe@cnel.ufl.edu
copyright 2009
Renyi’s divergence
∞ α−1
Δ 1 ⎛ f (x) ⎞
Dα ( f || g)= log ∫ f (x)⎜⎜ ⎟⎟ dx
α −1 −∞ ⎝ g(x) ⎠
with properties
i. Dα ( f || g) ≥ 0, ∀f , g,α > 0
α −1
1 ⎡⎛ f (x) ⎞α −1⎤ 1 1 N ⎛⎜ fˆ (x j ) ⎞⎟
f
Dα ( f || g) = log Ep ⎢⎜⎜ ⎟⎟ ⎥ ≈ log ∑
α −1 ⎢⎣⎝ g ( x) ⎠ ⎥⎦ α − 1 N j=1 ⎜⎝ gˆ (xgj ) ⎟⎠
α −1
⎛N f f f ⎞
N ∑
⎜ κ ( x j − xi )⎟
log ∑⎜⎜ i=m1 ⎟
1 1
⎟ =D
ˆ ( f || g)
α
α − 1 N j=1
⎜⎜ ∑κ ( xj − xi ) ⎟⎟
g g g
⎝ i=1 ⎠
page 19 of 35
JOSE C. PRINCIPE
Reny’s Mutual Information
U. OF FLORIDA
EEL 6935
Renyi’s mutual information is also the divergence between the joint
352-392-2662
and the product of the marginals
principe@cnel.ufl.edu
∞ ∞
Δ 1 pαX ( x1 ,..., xn ) 1
copyright 2009 I( X ) = log ∫ L ∫ n dx ...dxn
1 −α −∞ −∞
∏ p1X−oα ( xo )
o =1
⎝ o =1 ⎝ i = 1 ⎠⎠
EEL 6935
Measuring dissimilarity in probability space is a complex issue and
352-392-2662
there are many other divergence measures that can be used. If we
principe@cnel.ufl.edu
transform the simplex to an hypersphere preserving the FIsher infor-
copyright 2009
mation, the coordinates of each PMF become pk . The geodesic dis-
tance between PMF f and g becomes cos D G = ∑ fk gk . This is related to
the Bhattacharyya distance (which is Renyi’s divergence with α=1/2)
(
DB( f , g) = −ln ∫ f (x)g(x)dx )
Now if we measure the distance between f and g in a linear projection
0.5
space (the cordal distance) we get DH = ∑ ( fk – gk ) This resembles
the Hellinger distance which is also related to Reny’s divergence
( ) [( )]
1/ 2
DH ( f , g) = ⎡∫ f (x) − g(x) dx⎤
2 1/ 2
= 21− ∫ f (x)g(x)dx
⎢⎣ ⎥⎦
Since we have an estimator for quadratic Renyi’s entropy (the IP) we
will substitute α=1/2 for 2 and define two quadratic divergences called
Euclidean distance and Cauchy Schwarz divergence between f and g.
page 21 of 35
JOSE C. PRINCIPE
Euclidean distance between PDFs
U. OF FLORIDA
EEL 6935
We will drop the square root and define the distance as
352-392-2662
principe@cnel.ufl.edu
DED ( f , g) = ∫ ( f (x) − g( x))2 dx = ∫ f (x)2 dx − 2∫ f ( x)g (x)dx + ∫ g( x)2 dx
copyright 2009 Notice that each of the three terms can be estimated with the IP. The
middle term involves samples from the two PDFs and so will be called
the Cross Information Potential (CIP) that measures the potential cre-
ated by one PDF in the locations specified by the samples of the other
PDF.
Notice that the IED is zero if the two r.v. are independent.
The extension to multidimensional random variables is also possible
k
I ED ( X 1,..., X k ) = DED ( f X ( x1,... xk ), ∏ f X i ( xi ))
i =1
page 22 of 35
JOSE C. PRINCIPE
Cauchy Schwarz divergence
U. OF FLORIDA
EEL 6935
The other divergence is related to the Bhattacharyya distance but can
352-392-2662
be formally derived from the Cauchy Schwarz inequality
principe@cnel.ufl.edu
EEL 6935
There is a very interesting interpretation of the DCS in terms of Renyi’s
352-392-2662
entropy. Lutwak defines Renyi’s relative entropy as
principe@cnel.ufl.edu
(∫ ) (∫ g )
1
α −1 α 1/ α
copyright 2009 g (x) f (x) 1− α (x)
DRα ( f , g ) = log R R
(∫ ( x) )
1
fα α (1− α )
R
page 24 of 35
JOSE C. PRINCIPE
Geometric Interpretation of Quadratic Mutual Information
U. OF FLORIDA
EEL 6935
Let us define
352-392-2662
⎧ VJ = ∫∫ f X X (x 1, x2 )2dx 1dx 2
principe@cnel.ufl.edu ⎪ 1 2
⎪
⎨ VM = ∫∫ (f X1( x1 )fX2( x2) ) dx1 dx2
2
copyright 2009
⎪
⎪ V = f
⎩ c ∫ ∫ X1X2 (x1, x2 )fX1( x1 )fX2(x2 )dx 1dx 2
fX1 X2 ( x 1, x2 )
IED ( Euclidean Distance)
I s (K-L Divergence )
VJ
f X ( x1 )fX ( x 2 )
θ 1 2
VM
0
2
I CS = –log ( ( cos θ ) ) V c = cos θ VJ VM
page 25 of 35
JOSE C. PRINCIPE
Example
U. OF FLORIDA 21
X2 PX
Is
EEL 6935
352-392-2662 2
P X2 2 PX
12
PX
22
Is
principe@cnel.ufl.edu 1 11
PX2 1 PX P 21
X
copyright 2009
X1
1 2 21 11
PX PX 11
1 PX
P X1 P 2X1
21
IE D PX
Vary
P(1,1) -> [0, 0.6] 21 11 11
P(2,1) -> [0, 0.4] PX PX PX
I CS 21
PX
IC S
21 11 11
⎧ n m PX PX PX
I ED ( X 1, X 2 ) = ∑∑ ( PX (i , j ) − PX 1 ( i ) PX 2 ( j )) 2
i =1 j =1
⎛ n m ⎞⎛ n m
⎜ ∑∑ ∑∑ (PX 1 (i) PX 2 ( j ))2
⎨ ⎜ ( P (i , j )) 2 ⎟⎜
X ⎟⎜
I CS ( X1, X 2 ) = log ⎝ ⎠⎝ i =1 j =1
i =1 j =1
n m
∑∑ (PX (i, j) PX 1(i)PX 2 ( j))2
⎩ i =1 j =1
page 26 of 35
JOSE C. PRINCIPE
Information Potential and Forces in the Joint Spaces
U. OF FLORIDA
EEL 6935
The important lesson to be learned from the IP framework and the dis-
352-392-2662
tances measures defined, is that each PDF creates its own potential
principe@cnel.ufl.edu
that interacts with the others. This is an additive process, so the forces
copyright 2009
are also additive, and it becomes relatively simple.
ˆ ( f , g) = Vˆ = Vˆ + Vˆ − 2Vˆ
D ∂VˆED ∂ Vˆf ∂Vˆg ∂Vˆc
ED ED f g c = + −2
∂xi ∂xi ∂xi ∂xi
Vˆ Vˆ
ˆ ( f , g ) = Vˆ = log f g
D ∂VˆCS 1 ∂Vˆ f 1 ∂Vˆg 2 ∂Vˆc
CS CS
Vˆ 2 = + −
c
∂xi V f ∂xi Vg ∂ xi Vˆc ∂xi
ˆ ˆ
page 27 of 35
JOSE C. PRINCIPE
Quadratic Mutual Information
U. OF FLORIDA
EEL 6935
The QMIs are a bit more detailed because now we have two deal with
352-392-2662
the joints, the marginals and their product. But everything is still addi-
principe@cnel.ufl.edu
tive. The three PDFs are
copyright 2009
⎧ˆ 1 N
⎪ f X 1 X 2 ( x1 , x2 ) = ∑ Gσ ( x − x(i ))
⎪ N i =1
⎪ ˆ 1 N
⎨ f X 1 ( x1 ) = ∑ Gσ ( x1 − x1 (i ))
⎪ N i =1
⎪ ˆ 1 N
⎪ f X 2 ( x2 ) = ∑ Gσ ( x2 − x2 (i))
⎩ N i =1
page 28 of 35
JOSE C. PRINCIPE
let us exemplify for VC
U. OF FLORIDA
⎡1 N ⎤⎡ 1 N ⎤⎡ 1 N ⎤
352-392-2662
principe@cnel.ufl.edu = ∫∫ ⎢ ∑Gσ (x1 − x1(k ))Gσ ( x2 − x2 (k ))⎥⎢ ∑Gσ (x1 − x1(i ))⎥ ⎢ ∑Gσ ( x2 − x2 ( j ))⎥ dx1dx2
copyright 2009
⎣ N k =1 ⎦⎣ N i =1 ⎦ ⎢⎣ N j =1 ⎥⎦
1 N 1 N 1 N
= ∑ ∑ ∑∫ Gσ (x1 − x1(i))Gσ (x1 − x1(k ))dx1 ∫Gσ ( x2 − x2 (k ))Gσ ( x2 − x2 ( j ))dx2
N i =1 N j =1 N k =1
1 N ⎡1 N ⎤⎡ 1 N ⎤
= ∑⎢ ∑G ( x (k ) − x1(i))⎥⎢ ∑G
2σ 1
( x (k ) − x2 ( j))⎥
2σ 2
(2.102)
N k =1 ⎣ N i =1 ⎦⎣⎢ N j =1 ⎦⎥
N ⎛ 1 N ⎞⎛ 1 N ⎞
∑ ⎜⎜ N ∑G ∑G
1
− ( x1 ( i ) − x1 ( j )) ⎟⎜ ( x 2 ( i ) − x2 ( j )) ⎟
N i =1 ⎝
2σ ⎟⎜ N 2σ ⎟
j =1 ⎠⎝ j =1 ⎠
⎛ 1 ⎞
( )
N N
⎜
⎜ N2 ∑ ∑G 2σ
( x1 ( i ) − x1 ( j ))G 2σ ( x 2 ( i ) − x 2 ( j )) ⎟ Vˆ1Vˆ2
⎟
IˆCS ( X 1 , X 2 ) = lo g ⎝ i = 1 j =1 ⎠
2
⎛ 1 N ⎛ 1 N ⎞⎛ 1 N ⎞⎞
⎜
⎜N ∑ ⎜⎜ N ∑ G 2σ ( x1 (i ) − x1 ( j )) ⎟⎟ ⎜⎜ N ∑ G 2σ ( x 2 ( i ) − x 2 ( j )) ⎟⎟ ⎟⎟
⎝ i =1 ⎝ j =1 ⎠⎝ j =1 ⎠⎠
page 29 of 35
JOSE C. PRINCIPE
The interactions are always based on the marginals, but there are
U. OF FLORIDA
three levels: the level of the individual marginal samples (joint space),
EEL 6935
the level of the marginal sample and marginal fields (the CIP) and the
352-392-2662
product of the two marginal fields. This field can be generalized to any
principe@cnel.ufl.edu
number of variables
copyright 2009
N N K
ˆI ( X ,..., X ) = 1 2 N K ˆ K
2 ∑∑∏ k
ED 1 K V (ij ) − ∑∏ Vk (i) + ∏ Vˆk
ˆ
N i =1 j =1 k =1 N i =1 k =1 k =1
⎛ 1 N N K ⎞K
⎜ N 2 ∑∑∏
⎜ Vˆk (ij ) ⎟∏ Vˆk
⎟ k =1
ˆI ( X ,..., X ) = log ⎝ i =1 j =1 k =1 ⎠
CS 1 K 2
⎛1 N K ˆ ⎞
⎜⎜ ∑∏ Vk (i) ⎟⎟
⎝ N i =1 k =1 ⎠
The information forces for each field are also easily derived
∂IˆCS ∂Vˆj 2∂VˆC ∂Vˆk
FED (i) =
ˆ = − +
∂xk (i) ∂xk (i) ∂xk (i) ∂xk (i)
∂Iˆ 1 ∂Vˆ 2 ∂VˆC 1 ∂Vˆk
FCS (i) = = − +
ˆ CS j
∂xk (i) Vj ∂xk (i) VC ∂xk (i) Vk ∂xk (i)
page 30 of 35
JOSE C. PRINCIPE
Fast Information and Cross Information Potential Calculations
U. OF FLORIDA
EEL 6935
One of the problems with this IP estimator methodology is the compu-
tational complexity which is O(N2) or O(N3). There are two ways to get
352-392-2662
principe@cnel.ufl.edu
approximations to the IP and CIP with arbitrary accuracy: The Fast
copyright 2009
Gauss Transform and the Incomplete Cholesky decomposition. They
both transform the complexity to O(NM) where M <<N.
The FGT takes advantage of the fast decay of the Gaussian function
and can efficiently compute weighted sums of scalar Gaussians.
S( yi ) = ∑j=1wie
−( y j − yi ) / 4σ
2 2
N
i = 1,...M
The savings come from the shifting property of the Gaussian function
which reads
2 2 2
⎛ y j − yi ⎞ ⎛ y j − yC − ( yi − yC ) ⎞ ⎛ y j − yc ⎞
−⎜⎜ ⎟ − ⎜⎜ ⎟⎟ −⎜⎜ ⎟
1 ⎛ yi − yc ⎞
n
⎛ y j − yc ⎞
n
exp(−x2)
hn( y) = (−1)
σ ⎟⎠ σ ⎟⎠ nd
∑n=0 n!⎝ σ ⎠ n ⎜⎜⎝ σ ⎟⎟⎠
σ ∞
e ⎝ =e ⎝ ⎠ = e ⎝
⎜ ⎟ H
dxn
page 32 of 35
JOSE C. PRINCIPE
U. OF FLORIDA
⎛ y −y 2⎞ ⎛ y −c 2⎞
⎟ ⎛⎜ y i − c ⎞⎟ ⎛ (y j − c) ⋅ (yi − c) ⎞
2
⎜ j i ⎟ ⎜ j
EEL 6935
exp⎜ − ⎟ = exp⎜− ⎟ exp⎜ − ⎟ exp⎜⎜ 2 ⎟
⎟
⎜ 4σ 2
⎟ ⎜ 4σ 2 ⎟ ⎝ 4σ 2
⎝ 4σ 2
⎠
352-392-2662
⎝ ⎠ ⎝ ⎠ ⎠
principe@cnel.ufl.edu
and the cross term is approximated by a Taylor expansion as
copyright 2009
α α α α! = α 1!α 2 !L α d !
⎛ (y j − c) ⋅ (yi − c) ⎞ ⎛ y j − c ⎞ ⎛ yi − c ⎞
∑
2
exp⎜⎜2 ⎟=
⎟
⎜ ⎟
⎜ 2σ ⎟ ⎜⎝ 2σ ⎟⎠ + ε (α ) α =α1 +α2 +L+αd
4σ ⎠ α ≥0 α!
2
⎝ ⎝ ⎠
page 33 of 35
JOSE C. PRINCIPE
Incomplete Cholesky Decomposition
U. OF FLORIDA
EEL 6935
It turns out that the eigenvalues of the matrix created by the pairwise
352-392-2662
evaluation of the Gaussian is full rank, but the eigenspectrum decays
principe@cnel.ufl.edu
very fast. Hence we can take advantage of this fact.
For a NxN symmetric matrix K=GTG, where G is a lower triangular
copyright 2009
ˆIED = 1 1TD (G
~T ~ ~T ~ 1 T ~ 2 T ~ 2 2 T ~ ~T ~ ~T
XX GYY o G XXGYY )1D + 4 1NGXX 1N GYY − 3 (1N GXX )(G XXGYY )(GYY 1N )
N2 X
Ny 2 2 N
~ ~ ~T ~ T ~ 2 T ~ 2
1TDX (GT XX GYY o G XX GYY )1Dy 1 N GXX 1N GYY
IˆCS = log 2 2
( ~ ~ ~ ~
(1TN GXX )(GTXX GYY )(GYY
T
)
1N )
2
page 34 of 35
JOSE C. PRINCIPE
However the CIP is not a positive definite matrix, so we need to extend
U. OF FLORIDA
K XX K XY
K ZZ =
EEL 6935 K K
KXX to create a positive definite matrix where KXY denotes XY YY
352-392-2662
the Gram matrix for the CIP. The calculation of the CIP becomes
principe@cnel.ufl.edu
( )( )
1 N N
1 1 ~ ~ e1 = {1{ ,... 0} T
,... 1, 0{
copyright 2009
Vˆ ( X ; Y ) = 2 ∑∑ k (xi − y j ) = 2 e1T Kzze2 = 2 e1T GZZ GZZ e2 N N
N i =1 j =1 N N e 2 = { 0{ ,. .. 1}T
,. .. 0 , 1{
N N
page 35 of 35