003 DC Information and Entropy

Information & Entropy
Concepts & Signals_Spectra 29

Messages vs. Information
• Form vs. Content

• Sentence vs. Meaning
• e.g. …

What historical context motivated the
invention of the "information" concept (and
an associated mathematical theory)?
Claude Shannon and his colleagues were quite
interested in how best to encode a digital message.
Why?
To save resources to make the encodings as
concise as possible
Making the encoding very redundant evenif it
would be long and expensive to send. Real
communication isn't perfect, and if even a single
bit of a such an encoding somehow got flipped, it
might become impossible to reconstruct the
original message.
What Shannon invented?
• What is the capacity of a system to carry

messages, deal with noise, and so on?
• He imagined an arbitrary source S capable of
transmitting one of a fixed set of messages,
where message i has a probability Pi of being
transmitted.
• He developed a theory to quantify the amount
of "information" being transmitted by S.
Why is the information entropy equation a
good representation of this concept?
• So far, "information" is a bit of a vague idea, but we have
some ideas for how we want to use it. So, one way to
proceed is to think about all the properties we'd like an
information equation to have, and then just guess a function
that satisfies them.
• If some message has probability 1 of being transmitted,
then we learn absolutely nothing when we observe that
message - we already knew what it was going to be! So it
seems reasonable that the information associated with this S
should be 0. Sure enough, if one probability is 1 and the rest
are 0, plugging into Shannon's H(S) gives you 0.
Information vs. Uncertainty
• Uncertainty or Surprise or Information
• Information }
• Uncertainty}
• Probability↓

Information vs. Uncertainty
• Information definition
⎛ 1 ⎞
I ( sk ) = log 2 ⎜⎜ ⎟⎟
⎝ pk ⎠
= − log 2 pk (Bits)
• where pk is the probability of sk

• Measured in bits

Information Examples
• E.g.1 The probability of English letter E
is 0.105, determine its information.
• E.g.2 The probability of English letter x

is 0.002, determine its information.

Information Properties
⎛ 1 ⎞
I ( sk ) = log 2 ⎜⎜ ⎟⎟ bits
⎝ pk ⎠
• 4 properties:
I ( sk ) = 0 for pk = 1
I ( sk ) > 0 for 0 < pk < 1
I ( sk ) > I ( si ) for pk < pi
• if statistically independent
I ( sk si ) = I ( sk ) + I ( si )

Entropy
• Average Information of a source
K −1
⎛ 1 ⎞
H = ∑ pk log 2 ⎜⎜ ⎟⎟ bits per symbol
k =0 ⎝ pk ⎠
• Measured in bits per symbol

symbol: s0 , s1 , …, sK-1 K is number of
symbols of the
probability: p0 , p1 , …, pK-1 source
ENTROPY
If S is encoding its messages optimally, H(S)H(S) is
precisely the expected number of bits it will need to encode
a given message, or, equivalently, the average number of
bits it will use per message over many messages.
Any sensible encoding scheme will use fewer bits for
messages with high probability and more bits for messages
with low probability, We can see that H(S) gets smaller as
the probability gets bigger, and vice versa.
In fact, this turns out to be the optimal number of bits to use
for a message with transmission probability pipi, and you
can also think of it as the amount of information inherent to
that message.
Entropy Example
• Suppose a source emits 4 symbols: A,B,C and D.
The probability are 3/8,1/4,1/4,1/8, respectively.
Furthermore, they occur independently. Determine
the entropy of the source.

Entropy Properties
K −1
⎛ 1 ⎞
0 ≤ H = ∑ pk log2 ⎜⎜ ⎟⎟ ≤ log2 K
k =0 ⎝ pk ⎠
Where K is number of symbols of the source
Discussion
• If entropy=0, pk =1 , for some k.
• If entropy=log2K, pk=1/K , for all k. meaning…

Maximum Entropy Examples
2 symbols: probabilities are ½, ½ respectively
if all of the messages are equally probable, we expect

H(S)H(S) to be maximized.
2 symbols: probabilities are ¼ , ¾ respectively

Illustration of Entropy
Consider the binary source with 0 and 1,
Probabilities are p0 and p1 , respectively.
The more lopsided the probability distribution is, the less we
"learn" by observing a message, and vice versa.
H(p0) = - p0 log2 p0 - p1log2 p1

= - p0 log2 p0 - (1- p0)log2 (1- p0)
This is, of course, the opposite of the

case above. It's a bit less obvious,

Summary
• Information ⎛ 1 ⎞
I ( sk ) = log2 ⎜⎜ ⎟⎟ (bits)
⎝ pk ⎠
• Entropy (average information)

K −1
⎛ 1 ⎞
H = ∑ pk log 2 ⎜⎜ ⎟⎟ (bits / symbol)
k =0 ⎝ pk ⎠

003 DC Information and Entropy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

003 DC Information and Entropy

Uploaded by

Copyright:

Available Formats

Information & Entropy

Concepts & Signals_Spectra 29

• Form vs. Content

Concepts & Signals_Spectra 30

• What is the capacity of a system to carry

Concepts & Signals_Spectra 31

• where pk is the probability of sk

Concepts & Signals_Spectra 32

• E.g.2 The probability of English letter x

Concepts & Signals_Spectra 33

I ( sk ) > 0 for 0 < pk < 1

I ( sk ) > I ( si ) for pk < pi

Concepts & Signals_Spectra 34

• Measured in bits per symbol

Concepts & Signals_Spectra 36

• If entropy=log2K, pk=1/K , for all k. meaning…

Concepts & Signals_Spectra 37

if all of the messages are equally probable, we expect

2 symbols: probabilities are ¼ , ¾ respectively

Concepts & Signals_Spectra 38

H(p0) = - p0 log2 p0 - p1log2 p1

This is, of course, the opposite of the

Concepts & Signals_Spectra 39

• Entropy (average information)

Concepts & Signals_Spectra 40

You might also like