Professional Documents
Culture Documents
Entropy
1.1
The entropy measures the expected uncertainty in X. It has the following properties:
H(X) 0, entropy is always non-negative. H(X) = 0 iff X is deterministic.
Since Hb (X) = logb (a)Ha (X), we dont need to specify the base of the logarithm.
1.2
=
=
1.3
Chain rule
Harvard SEAS
2.1
2.2
I(X; Y )
I(X; Y )
=
=
H(X) H(X|Y )
H(Y ) H(Y |X)
I(X; Y )
I(X; Y )
=
=
I(X; X)
H(X)
(self-information)
X
p(x)
p(x)
log
p(x) log
=
q(x)
q(x)
xX
D(p(x, y) k p(x)p(y))
p(X, Y )
= Ep(x,y) log
p(X)p(Y )
XX
p(x, y)
p(x, y) log
=
p(x)p(y)
xX yY
3
3.1
Chain Rules
Chain Rule for Entropy
The entropy of a collection of random variables is the sum of the conditional entropies:
Theorem (Chain rule for entropy) (X1 , X2 , ..., Xn ) p(x1 , x2 , ..., xn )
H(X1 , X2 , ..., Xn ) =
n
X
i=1
Harvard SEAS
3.2
I(X; Y |Z)
H(X|Z) H(X|Y, Z)
p(X, Y |Z)
= Ep(x,y,z) log
p(X|Z)p(Y |Z)
n
X
i=1
3.3
p(Y |X)
, Ep(x,y) log
q(Y |X)
X
X
p(y|x)
=
p(x)
p(y|x) log
q(y|x)
x
y
Jensens Inequality
Recall that a convex function on an interval is one for which every chord lies (on or) above the function on
that interval.
A function f is concave if f is convex.
4.1
Consequences
Harvard SEAS
4.2
Some Inequalities
Theorem
H(X) log |X |
with equality iff X has a uniform distribution over X .
Theorem (Conditioning reduces entropy)
H(X|Y ) H(X)
with equality iff X and Y are independent.
Theorem (Independence bound on entropy)
H(X1 , X2 , ..., Xn )
n
X
H(Xi )
i=1
ai log Pi=1
ai log
n
bi
i=1 bi
i=1
i=1
with equality iff ai /bi = const.
Theorem (Convexity of relative entropy) D(p k q) is convex in the pair (p, q), so that for pmfs (p 1 , q1 ) and (p2 , q2 ),
we have for all 0 1:
D(p1 + (1 )p2 k q1 + (1 )q2 )
D(p1 k q1 ) + (1 )D(p2 k q2 )
Theorem (Concavity of entropy) For X p(x), we have that
H(p) := Hp (X) is a concave function of p(x).
Theorem Let (X, Y ) p(x, y) = p(x)p(y|x).
Then, I(X; Y ) is a concave function of p(x) for fixed p(y|x), and a convex function of p(y|x) for fixed p(x).
6
6.1
Data-Processing Inequality
Markov Chain
If Z = f (Y ), then X Y Z.
Harvard SEAS
6.2
Data-Processing Inequality
Sufficient Statistics
7.1
7.2
Definition (Sufficient Statistic) A function T (X) is said to be a sufficient statistic relative to the family {f (x)} if the
conditional distribution of X, given T (X) = t, is independent of for any distribution on (Fisher-Neyman):
f (x) = f (x | t)f (t)
T (X) X
8
8.1
Fanos Inequality
Fanos Inequality and Estimation Error
H(X|Y ) 1
.
log |X |
Harvard SEAS
8.2
8.3
Suppose no observation Y so that X must simply be guessed, and order X {1, 2, . . . , m} such that p 1 p2
= 1 is the optimal estimate of X, with Pe = 1 p1 , and Fanos inequality becomes
pm . Then X
H(Pe ) + Pe log(m 1) H(X).
The pmf
(p1 , p2 , . . . , pm ) =
Pe
Pe
,...,
1 Pe ,
m1
m1
8.4