You are on page 1of 16

Information & Entropy

Concepts & Signals_Spectra 29


Messages vs. Information

• Form vs. Content


• Sentence vs. Meaning
• e.g. …

Concepts & Signals_Spectra 30


What historical context motivated the
invention of the "information" concept (and
an associated mathematical theory)?
Claude Shannon and his colleagues were quite
interested in how best to encode a digital message.
Why?
To save resources to make the encodings as
concise as possible
Making the encoding very redundant evenif it
would be long and expensive to send. Real
communication isn't perfect, and if even a single
bit of a such an encoding somehow got flipped, it
might become impossible to reconstruct the
original message.
What Shannon invented?

• What is the capacity of a system to carry


messages, deal with noise, and so on?
• He imagined an arbitrary source S capable of
transmitting one of a fixed set of messages,
where message i has a probability Pi of being
transmitted.
• He developed a theory to quantify the amount
of "information" being transmitted by S.
Why is the information entropy equation a
good representation of this concept?
• So far, "information" is a bit of a vague idea, but we have
some ideas for how we want to use it. So, one way to
proceed is to think about all the properties we'd like an
information equation to have, and then just guess a function
that satisfies them.
• If some message has probability 1 of being transmitted,
then we learn absolutely nothing when we observe that
message - we already knew what it was going to be! So it
seems reasonable that the information associated with this S
should be 0. Sure enough, if one probability is 1 and the rest
are 0, plugging into Shannon's H(S) gives you 0.
Information vs. Uncertainty
• Uncertainty or Surprise or Information

• Information }
• Uncertainty}
• Probability↓

Concepts & Signals_Spectra 31


Information vs. Uncertainty
• Information definition

⎛ 1 ⎞
I ( sk ) = log 2 ⎜⎜ ⎟⎟
⎝ pk ⎠
= − log 2 pk (Bits)

• where pk is the probability of sk


• Measured in bits

Concepts & Signals_Spectra 32


Information Examples
• E.g.1 The probability of English letter E
is 0.105, determine its information.

• E.g.2 The probability of English letter x


is 0.002, determine its information.

Concepts & Signals_Spectra 33


Information Properties
⎛ 1 ⎞
I ( sk ) = log 2 ⎜⎜ ⎟⎟ bits
⎝ pk ⎠
• 4 properties:
I ( sk ) = 0 for pk = 1

I ( sk ) > 0 for 0 < pk < 1

I ( sk ) > I ( si ) for pk < pi

• if statistically independent

I ( sk si ) = I ( sk ) + I ( si )

Concepts & Signals_Spectra 34


Entropy
• Average Information of a source
K −1
⎛ 1 ⎞
H = ∑ pk log 2 ⎜⎜ ⎟⎟ bits per symbol
k =0 ⎝ pk ⎠

• Measured in bits per symbol


symbol: s0 , s1 , …, sK-1 K is number of
symbols of the
probability: p0 , p1 , …, pK-1 source
Concepts & Signals_Spectra 35
ENTROPY
If S is encoding its messages optimally, H(S)H(S) is
precisely the expected number of bits it will need to encode
a given message, or, equivalently, the average number of
bits it will use per message over many messages.
Any sensible encoding scheme will use fewer bits for
messages with high probability and more bits for messages
with low probability, We can see that H(S) gets smaller as
the probability gets bigger, and vice versa.
In fact, this turns out to be the optimal number of bits to use
for a message with transmission probability pipi, and you
can also think of it as the amount of information inherent to
that message.
Entropy Example
• Suppose a source emits 4 symbols: A,B,C and D.
The probability are 3/8,1/4,1/4,1/8, respectively.
Furthermore, they occur independently. Determine
the entropy of the source.

Concepts & Signals_Spectra 36


Entropy Properties
K −1
⎛ 1 ⎞
0 ≤ H = ∑ pk log2 ⎜⎜ ⎟⎟ ≤ log2 K
k =0 ⎝ pk ⎠
Where K is number of symbols of the source

Discussion
• If entropy=0, pk =1 , for some k.

• If entropy=log2K, pk=1/K , for all k. meaning…

Concepts & Signals_Spectra 37


Maximum Entropy Examples
2 symbols: probabilities are ½, ½ respectively

if all of the messages are equally probable, we expect


H(S)H(S) to be maximized.

2 symbols: probabilities are ¼ , ¾ respectively

Concepts & Signals_Spectra 38


Illustration of Entropy
Consider the binary source with 0 and 1,
Probabilities are p0 and p1 , respectively.
The more lopsided the probability distribution is, the less we
"learn" by observing a message, and vice versa.

H(p0) = - p0 log2 p0 - p1log2 p1


= - p0 log2 p0 - (1- p0)log2 (1- p0)

This is, of course, the opposite of the


case above. It's a bit less obvious,

Concepts & Signals_Spectra 39


Summary

• Information ⎛ 1 ⎞
I ( sk ) = log2 ⎜⎜ ⎟⎟ (bits)
⎝ pk ⎠

• Entropy (average information)


K −1
⎛ 1 ⎞
H = ∑ pk log 2 ⎜⎜ ⎟⎟ (bits / symbol)
k =0 ⎝ pk ⎠

Concepts & Signals_Spectra 40

You might also like