Professional Documents
Culture Documents
2 BasicInformationTheory
2 BasicInformationTheory
Lecture 2:
Basic Information Theory
Jörgen Ahlberg
Div. of Sensor Technology
Swedish Defence Research Agency (FOI)
Today
1. What is information theory about?
2. Stochastic (information) sources.
3. Information and entropy.
4. Entropy for stochastic sources.
5. The source coding theorem.
Part 1: Information Theory
The
Claude Shannon:
Shannon A Mathematical Theory of Communication
Bell System Technical Journal, 1948
C: Channel capacity.
Fundamental Theorems
H Source R Channel Channel
Source
coder coder
C
Shannon 1:
1 Error-free transmission possible if R¸H and C¸R.
Shannon 2:2 Source coding
Source coding
and
Channel
theorem
channel
coding
(simplified)
coding can be
theorem (simplified)
optimized independently, and binary symbols can be used as
intermediate format. Assumption: Arbitrarily long delays.
Part 2: Stochastic sources
A source outputs symbols X1, X2, ...
Each symbol take its value from an alphabet
Example 1: A text is a sequence of symbols
Example
Aeach
= (ataking2: A (digitized) grayscale image is a
1, a2, …).
its value from the alphabet
sequence of symbols each taking its value
Model: P(X
A = (a, …,
from the
z,1,…, XN)Z,Aassumed
A, …,
alphabet
1, 2, …9, !,to
= (0,1) or A =be
?, …).known for
(0, …, 255).
all combinations.
0.7 b
P(Xk+1 = a | Xk = a) = 0.3
a 0.5
1.0 P(Xk+1 = b | Xk = a) = 0.7
0.3
P(Xk+1 = c | Xk = a) = 0
0.2 c
0.3
The Markov Source
So, if Xk+1 = b, we know that Xk+2 will
equal c.
0.7 b
P(Xk+2 = a | Xk+1 = b) = 0
a 0.5
1.0 P(Xk+2 = b | Xk+1 = b) = 0
0.3
P(Xk+2 = c | Xk+1 = b) = 1
0.2 c
0.3
The Markov Source
If all the states can be reached, the
stationary probabilities for the states can be
calculated from the given transition
probabilities.
Stationary probabilities? That’s the
Markov models can be usedto
probabilities represent
i = P(Xk = ai) for any
sources with dependencies
k when Xk-1, Xmore than one
k-2, … are not given.
step back.
– Use a state diagram with several symbols in
each state.
Analysis and Synthesis
Stochastic models can be used for analysing
a source.
– Find a model that well represents the real-world
source, and then analyse the model instead of
the real world.
Stochastic models can be used for
synthesizing a source.
– Use a random number generator in each step of
a Markov model to generate a sequence
simulating the source.
Show plastic slides!
Part 3: Information and Entropy
Assume a binary memoryless source, e.g., a flip of
a coin. How much information do we receive when
we are told that the outcome is heads?
– If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say
that the amount of information is 1 bit.
– If we already know that it will be (or was) heads, i.e.,
P(heads) = 1, the amount of information is zero!
– If the coin is not fair, e.g., P(heads) = 0.9, the amount of
information is more than zero but less than one bit!
– Intuitively, the amount of information received is the
same if P(heads) = 0.9 or P (heads) = 0.1.
Self Information
So, let’s look at it the way Shannon did.
Assume a memoryless source with
– alphabet A = (a1, …, an)
– symbol probabilities (p1, …, pn).
How much information do we get when
finding out that the next symbol is ai?
According to Shannon the self information of
ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information
Let
Ofte
n den
Then oted
1
The uncertainty (information) is greatest when
0 0.5 1
Entropy: Three properties
1. It can be shown that 0 · H · log N.
2. Maximum entropy (H = log N) is reached
when all symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N – H is called the
redundancy of the source.
Part 4: Entropy for Memory Sources
That is, the summation is done over all possible combinations of n symbols.
bits/symbol