Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
Entropy_Relative_Entropy_and_Mutual_Information

# Entropy_Relative_Entropy_and_Mutual_Information

Ratings: (0)|Views: 77|Likes:

Published by: Nitin Sharma on Feb 17, 2011

### Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

02/17/2011

pdf

text

original

Chapter 2
Entropy, Relative Entropyand Mutual Information
This chapter introduces most of the basic definitions required for thesubsequent development of the theory. It is irresistible to play withtheir relationships and interpretations, taking faith in their later utility.After defining entropy and mutual information, we establish chainrules, the non-negativity of mutual information, the data processinginequality, and finally investigate the extent to which the second law ofthermodynamics holds for Markov processes.The concept of information is too broad to be captured completely by asingle definition. However, for any probability distribution, we define aquantity called the
entropy,
which has many properties that agree withthe intuitive notion of what a measure of information should be. Thisnotion is extended to define
mutual information,
which is a measure ofthe amount of information one random variable contains about another.Entropy then becomes the self-information of a random variable. Mutualinformation is a special case of a more general quantity called
relativeentropy,
which is a measure of the distance between two probabilitydistributions. All these quantities are closely related and share anumber of simple properties. We derive some of these properties in thischapter.In later chapters, we show how these quantities arise as naturalanswers to a number of questions in communication, statistics, complex-ity and gambling. That will be the ultimate test of the value of thesedefinitions.
2.1 ENTROPY
We will first introduce the concept of entropy, which is a measure ofuncertainty of a random variable. Let X be a discrete random variable
12
Elements of Information Theory
Thomas M. Cover, Joy A. ThomasCopyright
1991 John Wiley & Sons, Inc.Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1

2.1 ENTROPY
13
with alphabet Z!?and probability mass function p(x) = Pr{X = x}, x E %.We denote the probability mass function by p(x) rather than p,(x) forconvenience. Thus, p(x) and p(y) refer to two different random variables,and are in fact different probability mass functions, p*(x) and pY(y)respectively.Definition; The entropy H(X) of a discrete random variable X is definedbY
H(X)= -c p(d ogpm -
(2.1)
We also write H(p) for the above quantity. The log is to the base 2and entropy is expressed in bits. For example, the entropy of a fair cointoss is
1
bit. We will use the convention that 0 log 0 = 0, which is easilyjustified by continuity since x log x + 0 as x + 0. Thus adding terms ofzero probability does not change the entropy.If the base of the logarithm is b, we will denote the entropy as H,(X).If the base of the logarithm is e, then the entropy is measured in nuts.Unless otherwise specified, we will take all logarithms to base 2, andhence all the entropies will be measured in bits.Note that entropy is a functional of the distribution of X. It does notdepend on the actual values taken by the random variable X, but onlyon the probabilities.We shall denote expectation by E. Thus if X - p(x), then the expectedvalue of the random variable g(X) is writtenE,g(X) = c g(dP(d >XEZ
w2)
or more simply as Eg(X) when the probability mass function is under-stood from the context.We shall take a peculiar interest in the eerily self-referential expecta-tion of g(X) under p(x) when g(X) = log &J .
Remark:
The entropy of X can also be interpreted as the expectedvalue of log &J, where X is drawn according to probability- massfunction p(x). Thus1H(X) = EP log -p(X) *This definition of entropy is related to the definition of entropy inthermodynamics; some of the connections will be explored later. It ispossible to derive the definition of entropy axiomatically by definingcertain properties that the entropy of a random variable must satisfy.This approach is illustrated in a problem at the end of the chapter. We

14
ENTROPY, RELATIVE ENTROPY AND MUTUAL INFORMATION
will not use the axiomatic approach to justify the definition of entropy;instead, we will show that it arises as the answer to a number of naturalquestions such as “What is the average length of the shortest descrip-tion of the random variable. 3” First, we derive some immediate con-sequences of the definition.Lemma 2.1.1: H(X) 2 0.Proof: 0 <P(x) I 1 implies lOg(llP(x)) 2 0.ClLemma 2.1.2: H,(X) = (log, a)&(X).Proof: log, P = log, a log, P.0The second property of entropy enables us to change the base of thelogarithm in the definition. Entropy can be changed from one base toanother by multiplying by the appropriate factor.
Example 2.1.1;
Letwith probability p ,with probability 1 - p .
(2.4)
Then
(2.5)
In particular, H(X) = 1 bit whenp = 1 / 2. The graph of the function H( P)is shown in Figure 2.1. The figure illustrates some of the basic prop-erties of entropy-it is a concave function of the distribution and equals0 when p = 0 or 1. This makes sense, because when p = 0 or 1, thevariable is not random and there is no uncertainty. Similarly, theuncertainty is maximum when p =g, which also corresponds to themaximum value of the entropy.
Example 2.1.2:
Letwith probability l/2 ,with probability l/4 ,with probability l/8 ,with probability l/8 .
(2.6)
The entropy of X is1 1 7HGy)=-clogs-alog~-~log~-81og8=4bits.
(2.7)

## Activity (1)

You've already reviewed this. Edit your review.