2 BasicInformationTheory

TSBK01 Image Coding and Data Compression
Lecture 2:
Basic Information Theory
Jörgen Ahlberg
Div. of Sensor Technology
Swedish Defence Research Agency (FOI)
Today
1. What is information theory about?
2. Stochastic (information) sources.
3. Information and entropy.
4. Entropy for stochastic sources.
5. The source coding theorem.
Part 1: Information Theory
The
Claude Shannon:
Shannon A Mathematical Theory of Communication
Bell System Technical Journal, 1948
Sometimes referred to as ”Shannon-Weaver”, since

the standalone publication has a foreword by Weaver.
Be careful!
Quotes about Shannon
 ”What is information? Sidestepping questions about
meaning, Shannon showed that it is a measurable
commodity”.
 ”Today, Shannon’s insight help shape virtually all systems

that store, process, or transmit information in digital form,
from compact discs to computers, from facsimile machines
to deep space probes”.
 ”Information theory has also infilitrated fields outside

communications, including linguistics, psychology,
economics, biology, even the arts”.
Change to an efficient
Change
representation,
to an efficient representation for,
Any source of information
i.e., data compression.
transmission, i.e., error control coding.
Source Channel Channel

Source
coder coder
Channel Channel Source Sink,

decoder decoder receiver
The channel is anything transmitting or storing information –

Recover
a radio link, from channel
a cable, distortion.Uncompress
a disk, a CD, a piece of paper, …
Fundamental Entities
H Source R Channel Channel
Source
coder coder
C

C decoder decoder receiver
H: The information content of the source.

R: Rate from the source coder.
C: Channel capacity.
Fundamental Theorems
H Source R Channel Channel
Source
coder coder
C

C decoder decoder receiver
Shannon 1:
1 Error-free transmission possible if R¸H and C¸R.
Shannon 2:2 Source coding
Source coding
and
Channel
theorem
channel
coding
(simplified)
coding can be
theorem (simplified)
optimized independently, and binary symbols can be used as
intermediate format. Assumption: Arbitrarily long delays.
Part 2: Stochastic sources
 A source outputs symbols X1, X2, ...
 Each symbol take its value from an alphabet
Example 1: A text is a sequence of symbols
Example
Aeach
= (ataking2: A (digitized) grayscale image is a
1, a2, …).
its value from the alphabet
sequence of symbols each taking its value
 Model: P(X
A = (a, …,
from the
z,1,…, XN)Z,Aassumed
A, …,
alphabet
1, 2, …9, !,to
= (0,1) or A =be
?, …).known for
(0, …, 255).
all combinations.
Source X1, X2, …

Two Special Cases
1. The Memoryless Source
 Each symbol independent of the previous
ones.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2) ¢ … ¢ P(Xn)
2. The Markov Source
 Each symbol depends on the previous one.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2|X1) ¢ P(X3|X2) ¢ …
¢ P(Xn|Xn-1)
The Markov Source
 A symbol depends only on the previous
symbol, so the source can be modelled by a
state diagram.
0.7 b
a 0.5 A ternary source with

1.0
alphabet A = (a, b, c).
0.3
0.2 c
0.3
The Markov Source
 Assume we are in state a, i.e., Xk = a.
 The probabilities for the next symbol are:
0.7 b
P(Xk+1 = a | Xk = a) = 0.3
a 0.5
1.0 P(Xk+1 = b | Xk = a) = 0.7
0.3
P(Xk+1 = c | Xk = a) = 0
0.2 c
0.3
The Markov Source
 So, if Xk+1 = b, we know that Xk+2 will
equal c.
0.7 b
P(Xk+2 = a | Xk+1 = b) = 0
a 0.5
1.0 P(Xk+2 = b | Xk+1 = b) = 0
0.3
P(Xk+2 = c | Xk+1 = b) = 1
0.2 c
0.3
The Markov Source
 If all the states can be reached, the
stationary probabilities for the states can be
calculated from the given transition
probabilities.
Stationary probabilities? That’s the
 Markov models can be usedto
probabilities represent
i = P(Xk = ai) for any
sources with dependencies
k when Xk-1, Xmore than one
k-2, … are not given.
step back.
– Use a state diagram with several symbols in
each state.
Analysis and Synthesis
 Stochastic models can be used for analysing
a source.
– Find a model that well represents the real-world
source, and then analyse the model instead of
the real world.
 Stochastic models can be used for
synthesizing a source.
– Use a random number generator in each step of
a Markov model to generate a sequence
simulating the source.
Show plastic slides!
Part 3: Information and Entropy
 Assume a binary memoryless source, e.g., a flip of
a coin. How much information do we receive when
we are told that the outcome is heads?
– If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say
that the amount of information is 1 bit.
– If we already know that it will be (or was) heads, i.e.,
P(heads) = 1, the amount of information is zero!
– If the coin is not fair, e.g., P(heads) = 0.9, the amount of
information is more than zero but less than one bit!
– Intuitively, the amount of information received is the
same if P(heads) = 0.9 or P (heads) = 0.1.
Self Information
 So, let’s look at it the way Shannon did.
 Assume a memoryless source with
– alphabet A = (a1, …, an)
– symbol probabilities (p1, …, pn).
 How much information do we get when
finding out that the next symbol is ai?
 According to Shannon the self information of
ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
For both the events to happen, the probability is

pA ¢ pB. However, the amount of information
should be added, not multiplied.
Logarithms satisfy this!

No, we want the information to increase with
decreasing probabilities, so let’s use the negative
logarithm.
Self Information
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information
On average over all the symbols, we get:
H(X) is called the first order entropy of the source.
This can be regarded as the degree of uncertainty

about the following symbol.
Entropy
Example: Binary Memoryless Source
BMS 01101000…
Let
Ofte
n den
Then oted
1
The uncertainty (information) is greatest when
0 0.5 1
Entropy: Three properties
1. It can be shown that 0 · H · log N.
2. Maximum entropy (H = log N) is reached
when all symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N – H is called the
redundancy of the source.
Part 4: Entropy for Memory Sources
 Assume a block of source symbols (X1, …,

Xn) and define the block entropy:
That is, the summation is done over all possible combinations of n symbols.
 The entropy for a memory source is defined

as:
That is, let the block length go towards infintity.
Divide by n to get the number of bits / symbol.
Entropy for a Markov Source
The entropy for a state Sk can be expressed as
Pkl is the transition probability from state k to state l.

Averaging over all states, we get the
entropy for the Markov source as
The Run-length Source
 Certain sources generate long runs or bursts of
equal symbols.

 Example:
 A B 

 Probability for a burst of length r: P(r) = (1-)r-1¢

 Entropy: HR = - r=11 P(r) log P(r)
 If the average run length is , then HR/ = HM.
Part 5: The Source Coding Theorem
The entropy is the smallest number of bits

allowing error-free representation of the source.
Why is this? Let’s take a look on typical sequences!

Typical Sequences
 Assume a long sequence from a binary
memoryless source with P(1) = p.
 Among n bits, there will be approximately
w = n ¢ p ones.
 Thus, there is M = (n over w) such typical
sequences!
 Only these sequences are interesting. All
other sequences will appear with smaller
probability the larger is n.
How many are the typical
sequences?
bits/symbol
Enumeration needs log M bits, i.e,

bits per symbol!
How many bits do we need?
Thus, we need H(X) bits per symbol

to code any typical sequence!
The Source Coding Theorem
 Does tell us
– that we can represent the output from a source
X using H(X) bits/symbol.
– that we cannot do better.
 Does not tell us
– how to do it.
Summary
 The mathematical model of communication.
– Source, source coder, channel coder, channel,…
– Rate, entropy, channel capacity.
 Information theoretical entities
– Information, self-information, uncertainty, entropy.
 Sources
– BMS, Markov, RL
 The Source Coding Theorem

2 BasicInformationTheory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 BasicInformationTheory

Uploaded by

Copyright:

Available Formats

TSBK01 Image Coding and Data Compression

Sometimes referred to as ”Shannon-Weaver”, since

 ”Today, Shannon’s insight help shape virtually all systems

 ”Information theory has also infilitrated fields outside

Source Channel Channel

Channel Channel Source Sink,

The channel is anything transmitting or storing information –

Channel Channel Source Sink,

H: The information content of the source.

Channel Channel Source Sink,

Source X1, X2, …

a 0.5 A ternary source with

For both the events to happen, the probability is

Logarithms satisfy this!

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty

 Assume a block of source symbols (X1, …,

 The entropy for a memory source is defined

Pkl is the transition probability from state k to state l.

 Probability for a burst of length r: P(r) = (1-)r-1¢

The entropy is the smallest number of bits

Why is this? Let’s take a look on typical sequences!

Enumeration needs log M bits, i.e,

Thus, we need H(X) bits per symbol

You might also like