You are on page 1of 16

1

EE231A:InformationTheory
RickWesel
wesel@ee.ucla.edu
2
InformationTheory
Lecture1
A. Introduction
B. Entropy
C. RelativeEntropy
D. MutualInformation
3
PartA:Introduction
4
Twomajorthemes
1)HowmuchcanwecompressData(witha
knownprobabilisticdistribution)?
Forlosslesscompression,theanswerisentropy.
Forlossycompression,theansweristherate
distortionfunctionR(D).
2)Howmuchinformation (i.e.fully
compresseddata)canwesendreliablyovera
channel(withaknownprobabilisticstructure)?
Answeriscapacity.
5
PracticalSignificance
Thewaythingsstandtoday,theresultson
channelcapacityandlosslesscompressionhave
adirectquantitativeimpactonrealsystems
whiletheresultsonlossycompressiononly
suggestqualitativelyhowtodocompressionin
manycases.
Itturnschannelsandtexthaveawelldefined
probabilisticstructurewhilemanydatasources
suchasimages,music,andvideosdonot.
6
AnapplicationofErgodicTheory
Fromamathematicalorprobabilisticpointof
view,informationtheorymaybethoughtof
asanapplicationofergodictheory(i.e.the
lawoflargenumbers).
7
PartB:Entropy
8
Entropy
EntropyH(X)
numberofbitsrequiredtodescribeX onthe
average.

Entropyhasunitsofbits
( ) ( )log ( )
x
H X p x p x

X
( ) H p = IfforthesameXmultiple
distributionsarepossible
2
log log
( ) [ log ( )]
p
H X E p X =
9
EntropyusingNaturalLogarithm
Sometimeswemightcomputeentropyusing
baseelogarithminsteadofbase2.
Inthiscase,theentropyisinunitsofnats.
( ) ( )ln ( )
e
x
H X p x p x

X
10
Propertiesofentropy
Lemma2.1.1
Lemma2.1.2(changingbases)
Define
since
( ) 0 H X Why?
( )( log ( )) 0
x
p x p x

( ) ( )log ( )
b b
x
H X p x p x =

( ) ( ) log ( )
b b a
H X a H X =
( ) log ( ) log log ( )
b b a
p x a p x =
11
Exampleofentropy
1
log log p
p
=
x 1 2 3 4
P(x) 1/2 1/4 1/8 1/8
C(x) 1 01 000 111
1 1 1
( ) log2 log4 2 log8
2 4 8
H X = + +
3
1 bits
4
=
12
Thenames dontmatter
1
log log p
p
=
x a b c d
P(x) 1/2 1/4 1/8 1/8
C(x) 1 01 000 111
1 1 1
( ) log2 log4 2 log8
2 4 8
H X = + +
3
1 bits
4
=
13
Jointentropy
Jointentropyisthenumberofbitstodescribe
bothX andY ontheaverage.
( , ) ( , )log ( , )
x y
H X Y p x y p x y

=

X Y
( , )
[ log ( , )]
P x y
E p X Y =
14
Dimensionalitydoesntchange
entropyfordiscretedistributions
Notethatfordiscrete
alphabets,whether
wedescribethe
probabilitieswithone
ortwodimensions
doesntreallymatter.
Itsstillthenegative
ofthesumofplogp.
X=1 X=2
Y=1 1/2 1/4
Y=2 1/8 1/8
15
Conditionalentropy
Conditionalentropyisthenumberofbitsto
describeY giventhatX isalreadyknown
exactly,averagedoverpossibleX values.
( | ) ( ) ( | )
x
H Y X p x H Y X x = =

( ) ( | )log ( | )
x y
p x p y x p y x =

( , )log ( | )
x y
p x y p y x =

[ ]
( , )
log ( | )
p x y
E p Y X =
16
Chainruleforentropy
( , ) ( ) ( | ) H X Y H X H Y X = +
Information
requiredto
describeX
ontheaverage
Information
requiredto
describeY
ontheaverage
GivenXisknown
(averagedoverXs)
17
Proofofchainrule
( ) ( | ) ( )log ( ) ( , )log ( | )
x x y
H X H Y X p x p x p x y p y x + =

( , )log ( ) ( , )log ( | )
x y
p x y p x p x y p y x = +

( , )[log ( ) log ( | )]
x y
p x y p x p y x = +

( , )log ( , )
x y
p x y p x y =

( , ) H X Y =
( ) ( | )log ( ) ( , )log ( | )
x y x y
p x p y x p x p x y p y x =

( ) ( | ) H Y H X Y = +
18
Generalchainrule
1 2 1 1
1
( , ,..., ) ( | ,..., )
n
n i i
i
H X X X H X X X

=
=

( ) ( ) ( ) ( )
1 2 3 1 2 1 3 2 1
, , | | , H X X X H X H X X H X X X = + +
Example:
19
ProofofGeneralchainrule
Proofbyinduction
Basecase:
Suppose
1 2 1 2 1
( , ) ( ) ( | ) H X X H X H X X = +
1
1 2 1 1 1
1
( , ,..., ) ( | ,..., )
n
n i i
i
H X X X H X X X


=
=

( )
1 2 1 1 1 1 2
( , ,. ( , ,..., ) ( | ,. .. . ) , ) .,
a
n n n n
H X X X H X X X H X X X

= +
1 1
1
1
1
1
( | ,. ( | ,... , ) , . ) .
n
n i i
i
n
H X X X X X H X

=
= +

1 1
1
( | ,..., )
n
i i
i
H X X X

=
=

For(a)theproofgoesexactly
likeH(X,Y)=H(X)+H(Y|X) with
XreplacedbyX
1
,,X
n1
and
YreplacedbyX
n
.
20
PartC:RelativeEntropy
21
Relativeentropy
Awaytocomparehowclosetwodistributions
are.
Thepenalty forcompressingusingthe
wrongdistribution.
Specifically,ifbutwerepresentitina
waythatwouldbeefficientif,our
representationwillrequire
( ) X p x
( ) X q x

entropy
relative entropy
between p and q
( ) ( || ) bits H p D p q +

22
Relativeentropy(cont.)
( || ) ( )log ( ) ( )log ( )
x x
D p q p x q x p x p x

=



bitsrequiredby
assumingq(x)
whenX~p(x)

bitsrequiredby
correctassumption
X~p(x)
Fortwodistributionsp(x) andq(x)
[ ] [ ]
( )
( || ) log log ( ) log (
(
)
)
X X X
p X
D p q E
q X
E p X E q X

= =


23
Conditionalrelativeentropy
|
( | )
( ( | ) || ( | )) log
( | )
X Y X x
p y x
D p y x q y x E E
q y x
=

=


( | )
( ) ( | )log
( | )
x y
p y x
p x p y x
q y x
=

,
( | )
( , )log
( | )
x y
p y x
p x y
q y x
=

( , )
[ log ( | ) log ( | )]
p x y
E q Y X p Y X =
24
Chainruleforrelativeentropy
( ( , ) || ( , )) D p x y q x y
( , ) ( , )
[ log ( ) ( | )] [ log ( ) ( | )]
p x y p x y
E q X q Y X E p X p Y X =
[ log ( | )] [ l [ log ( )] [ log ( ) og ( ] ] | ) E q Y X E p Y X E q X E p X = +
( ( | )|| ( ( )|| ( )) ( | )) D D p x q x p y x q y x = +
( , ) ( , )
[ log ( , )] [ log ( , )]
p x y p x y
E q X Y E p X Y =
( ( , )|| ( , )) ( ( )|| ( )) ( ( | )|| ( | )) D p x y q x y D p x q x D p y x q y x = +
25
PartD:MutualInformation
26
Mutualinformation
I(X;Y) describeshowmuchinformationX
containsaboutY andviceversa.
FormalDefinitionis(2.28)onpage20,but
thatisnotthefirstexpressiontolearn.
27
OperationalDefinitionof
Mutualinformation
( ; ) ( ; )
( ) ( | )
I X Y I Y X
H X H X Y
=
=
( ) ( | ) H Y H Y X =
( ) ( ) ( , ) H X H Y H X Y = +
( , )
( ) ( ) ( ) ( | )
H X Y
H X H Y H Y H X Y

= +

Why?
( ) ( ) ( ) ( | ) H X H Y H Y H X Y = +
28
Venndiagram
( | ) H Y X ( | ) H X Y ( ; ) I X Y
( ) H X
( ) H Y
( , ) H X Y
TheVenndiagram
onlyworksfortwoR.V.s
29
AspecialcaseofD(p||q)
Whatif
( , ) ( ) ( ) q x y p x p y =
( , )
( ( , ) || ( ) ( )) [ log ( ) ( )] [ log ( , )]
p x y
D p x y p x p y E p x p y E p x y =
( ) ( ) ( , ) H X H Y H X Y = +
( ; ) I X Y =
30
Mutualinformationandrelativeentropy
SoI(X;Y) measuresthepenaltyassociated
withcompressingtwodependentR.V.sasif
theywereindependent.
i.e.I(X;Y) measureshowmuchinformationX
hasaboutY andviceversa.
( , )
( ; ) ( , )log
( ) ( )
x y
p x y
I X Y p x y
p x p y

( ( , ) || ( ) ( )) D p x y p x p y =
31
Mutualinformationandentropy
Letsusetheformaldefinitiontoshow
Proof:
( ; ) ( ) ( | ) I X Y H X H X Y =
( , )
( ; ) ( , )log
( ) ( )
x y
p x y
I X Y p x y
p x p y

( | )
( , )log
( )
x y
p x y
p x y
p x
=

( | ) ( )
( , )log ( | ) ( , )log ( )
x y x y
H X Y p x
p x y p x y p x y p x

=


( ) ( | ) H X H X Y =
32
Conditionalmutualinformation

Chainruleformutualinformation
Proof:
( ; | ) ( | ) ( | , ) I X Y Z H X Z H X Y Z =
1 2 1 2 1
1
( , ,..., ; ) ( ; | , ,..., )
n
n i i i
i
I X X X Y I X Y X X X

=
=

1 2
( , ,..., ; )
n
n
X
I X X X Y

( ) ( | )
n n
H X H X Y =
1 2 1 1 2 1
1 1
( | , ,..., ) ( | , ,..., , )
n n
i i i i i i
i i
H X X X X H X X X X Y

= =
=

( ) ( | )
n
H Y H Y X =
1 2 1
1
( ; | , ,..., )
n
i i i
i
I X Y X X X

=
=

Whichdecomposition
Ishelpful?

You might also like