Lecture2 Entropy

Information Theory
Engineering Part IIA: 3F1 - Signals and Systems
3F1 - Signals and Systems Michaelmas 2005
Information Theory Handout 2 Andrea Lecchini Visintini

30 November 2005
Information Theory
The denition of Entropy

Let X be a random variable which takes on values {x1, x2, . . . , xM } with probabilities {p1, p2, . . . , pM }. The Entropy of X , denoted H (X ), is a measure of how uncertain we are of the outcome of X . The Entropy is in the form of an average. It is the average uncertainty of the event {X = xi} and is a function of the probabilities pi H (X ) = H ({p1, p2, . . . , pM }) . The Entropy can also be seen as a measure of information because the loss of uncertainty when the realization of X is revealed equals an increase of information from the point of view of the observer. Entropy = measure of the average loss of uncertainty (or of the average surprise) when the realization of X is revealed The formula of the Entropy can be derived from rst principles.
The axioms for H

1 (1) If the distribution is uniform, i.e. pi = M , then H ({p1, p2, . . . , pM }) should increase with M
e.g. winning the toss of a coin is less of a surprise than winning the National Lottery (even if the prize is the same). (2) If the observation of the outcome of X is broken down into successive observations then H (X ) should be a weighted sum of the uncertainties on the individual observations. (This says that the Entropy is an average.)
H ({p1, p2, p3, p4, p5}) = H ({p1 + p2, p3 + p4 + p5}) p1 p2 , }) + (p1 + p2)H ({ p1 + p 2 p1 + p 2 p3 p4 + (p3 + p4 + p5)H ({ 5 , 5 , 3 pi 3 pi (3) H ({p1, p2, . . . , pM }) should be a continuous function in the pi
p5
5 }) 3 pi
Information Theory
Theorem (Shannon 1948) The only H ({p1, p2, . . . , pM }) satisfying the three axioms is of the form
M
Theorem H ({p1, p2, . . . , pM }) is maximized when the pi are equal. Proof Step 1: show that H ({p1, p2, . . . , pM }) log2(M ) (examples paper) Step 2: notice that the value log2(M ) is attained when the pi are equal 1 1 H ({ , . . . , }) = log2(M ) M M
M
H ({p1, p2, . . . , pM }) = K
i=1
pi log2(pi)
where K is a positive constant. The choice of K is arbitrary and allows to change the base of the logarithm since loga(x) = loga(b) logb(x) . The choice of K actually amounts to a choice of a unit of measure. The convention is to choose K = 1 and base 2. The unit of measure is 1 bit.
pi = log2(M )
i=1
Example: the toss of a biased coin H ({p, 1 p})

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Information Theory
Joint and conditional Entropy

Let X and Y be random variables which take values on {x1, . . . , xM } and {y1, . . . , yN } and have joint probability distribution {p(xi, yj )} i = 1, . . . , M ; j = 1, . . . , N . The joint Entropy is the Entropy of the joint distribution
M N
Notice that if p(xi, yj ) = p(xi)p(yj ) then H (X ) + H (Y ) = H (X, Y ) . At this point we have proved the if implication of the equality. As for the inequality, we have
M N
H (X, Y ) [H (X )+ H (Y )] =
i=1 j =1 M N
p(xi, yj ) log2 p(xi)p(yj ) p(xi, yj ) p(xi)p(yj ) 1 p(xi, yj )

M N
p(xi)p(yj ) p(xi, yj )
H (X, Y ) =
i=1 j =1
p(xi, yj ) log2 p(xi, yj ) .
=
i=1 j =1
p(xi, yj ) log2(e) ln
M N
Theorem The following inequality holds H (X, Y ) H (X ) + H (Y ) with equality if and only if X and Y are independent. Proof Firstly we calculate an expression for the right hand side
M N
log2(e)
i=1 j =1
p(xi, yj )
M N
since ln(x) x 1
= log2(e) =0
p(xi)p(yj )
i=1 j =1 i=1 j =1
p(xi, yj )
H (X ) + H (Y ) =
i=1 M N
p(xi) log2 p(xi)

j =1 N M
p(yj ) log2 p(yj ) p(xi, yj ) log2 p(yj )
Now we have proved the inequality. The only if implication of the equality is proved by noticing that the equality in ln(x) x 1 holds only if x = 1.
3
x1 ln(x)
=
i=1 j =1 M N
p(xi, yj ) log2 p(xi)

j =1 i=1
=
i=1 j =1 M N
p(xi, yj )[log2 p(xi) + log2 p(yj )] p(xi, yj ) log2(p(xi)p(yj ))

i=1 j =1
-1
-2
-3
0.5
1.5
2.5
3.5
Information Theory
10
The conditional Entropy is the average of the Entropies of the conditional distributions.
M N
Theorem The following identity holds H (X, Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X |Y ) Proof

M N
H (Y |X ) =
i=1
p(xi)
j =1
p(yj |xi) log2 p(yj |xi) .
Example: p(xi, yj ) x1 x2 x3 p(yi) y1 0.1 0.2 0.05 0.35 y2 0.15 0.1 0.05 0.3 y3 0.05 0.15 0.15 0.35 p(xi) 0.3 0.45 0.25
H (X, Y ) =
i=1 j =1 M N
p(xi, yj ) log2 p(yj |xi)p(xi)

M N
=
i=1 j =1
p(xi, yj ) log2 p(xi)

i=1 j =1 M N
p(xi, yj ) log2 p(yj |xi )
= H (X ) H (Y ) = 0.35 log2(0.35) 0.3 log2(0.3) 0.35 log2(0.35) = 1.5813 0.1 0.1 log2 0.3 0.3 = 1.4591 0.15 log2 0.3 0.15 0.3 0.05 log2 0.3 0.05 0.3
i=1
p(xi)
j =1
p(yj |xi) log2 p(yj |xi) .
H (Y |x1) =
Theorem The following inequality holds H (Y |X ) H (Y ) . with equality if and only if X and Y are independent. Proof The claim follows from the previous Theorems. The Theorems say that the conditional Entropy is the uncertainty which remains after the revelation of the outcome of one random variable. Moreover the revelation of the outcome of one random variable cannot increase the uncertainty on the other one.
H (Y |x2) = 1.5305 H (Y |x3) = 1.371 H (Y |X ) = 0.3H (Y |x1) + 0.45H (Y |x2) + 0.25H (Y |x3) = 1.3752
Information Theory
11
12
Mutual Information
The decrease in uncertainty due to the revelation of the outcome one of two joined random variables is called mutual Information and is given by I (X |Y ) = H (X ) H (X |Y ) . The quantity I (X |Y ) is the average information conveyed about X by Y . Quite surprisingly I (X |Y ) is symmetric (proof in the example paper). Example Two coins are available, one unbiased and the other twoheaded. One coin is selected at random and tossed. How much information is conveyed about the identity of the coin by the outcome of the toss? X : selection Y : toss Head Tail p(xi) p(xi, yj ) 1 1 1 Unbiased 4 4 2 1 1 Two-Headed 0 2 2 3 1 p(yi) 4 4 I (X |Y ) = H (X ) H (X |Y ) 1 1 1 1 H (X ) = log2 log2 = 1 2 2 2 2 1 3 H (X |Y ) = H (X |Head) + H (X |Tail) 4 4 1 1 3 1 2 2 log2 log2 + 0 = 0.6887 = 4 3 3 3 3 4 I (Y |X ) = H (Y ) H (Y |X ) 3 1 1 3 H (Y ) = log2 log2 = 0.8113 4 4 4 4 1 1 H (Y |X ) = H (Y |Unbiased) + H (Y |Two-headed) 2 2 1 1 1 1 1 1 log2 log2 + 0 = 0.5 = 2 2 2 2 2 2 I (X |Y ) = 1 0.6887 = 0.8113 0.5 = 0.3113

Lecture2 Entropy

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture2 Entropy

Uploaded by

Copyright:

Available Formats

Information Theory

Engineering Part IIA: 3F1 - Signals and Systems

3F1 - Signals and Systems Michaelmas 2005

Information Theory Handout 2 Andrea Lecchini Visintini

Engineering Part IIA: 3F1 - Signals and Systems

The denition of Entropy

The axioms for H

Engineering Part IIA: 3F1 - Signals and Systems

Example: the toss of a biased coin H ({p, 1 p})

Engineering Part IIA: 3F1 - Signals and Systems

Joint and conditional Entropy

p(xi, yj ) log2 p(xi)p(yj ) p(xi, yj ) p(xi)p(yj ) 1 p(xi, yj )

p(xi, yj ) log2 p(xi, yj ) .

p(xi) log2 p(xi)

p(yj ) log2 p(yj ) p(xi, yj ) log2 p(yj )

p(xi, yj ) log2 p(xi)

p(xi, yj )[log2 p(xi) + log2 p(yj )] p(xi, yj ) log2(p(xi)p(yj ))

Engineering Part IIA: 3F1 - Signals and Systems

Theorem The following identity holds H (X, Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X |Y ) Proof

p(yj |xi) log2 p(yj |xi) .

p(xi, yj ) log2 p(yj |xi)p(xi)

p(xi, yj ) log2 p(xi)

p(xi, yj ) log2 p(yj |xi )

p(yj |xi) log2 p(yj |xi) .

Engineering Part IIA: 3F1 - Signals and Systems

You might also like