2 BasicInformationTheory

TSBK01 Image Coding and Data Compression
Lecture 2: Basic Information Theory
Jrgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Today
1. 2. 3. 4. 5. What is information theory about? Stochastic (information) sources. Information and entropy. Entropy for stochastic sources. The source coding theorem.
Part 1: Information Theory

The Claude Shannon: A Mathematical Theory of Communication
Bell System Technical Journal, 1948
Sometimes referred to as Shannon-Weaver, since the standalone publication has a foreword by Weaver. Be careful!
Quotes about Shannon

What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable commodity. Today, Shannons insight help shape virtually all systems that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines to deep space probes. Information theory has also infilitrated fields outside communications, including linguistics, psychology, economics, biology, even the arts.
Change to an efficient Change representation, to an efficient representation for, Any source of information i.e., data compression. transmission, i.e., error control coding.
Source
Source coder
Channel coder
Channel
Channel
Channel
Source
Sink,
decoder
decoder
receiver
The channel is anything transmitting or storing information Recover from channel distortion. Uncompress a radio link, a cable, a disk, a CD, a piece of paper,
Fundamental Entities
Source
Source coder
Channel coder
Channel
Channel
Channel
Source
Sink,
decoder
decoder
receiver
H: The information content of the source. R: Rate from the source coder.
C: Channel capacity.
Fundamental Theorems
Source
Source coder
Channel coder
Channel
Channel
Channel
Source
Sink,
decoder
decoder
receiver
Shannon 1: Error-free transmission possible if RH and CR.

Shannon 2: Source coding and channel coding can be Source coding Channel theorem coding (simplified) theorem (simplified) optimized independently, and binary symbols can be used as intermediate format. Assumption: Arbitrarily long delays.
Part 2: Stochastic sources

A source outputs symbols X1, X2, ... Each symbol take its value from an alphabet Example 1: A text is a sequence of symbols A (digitized) grayscale image is a A Example = (a1, a2,2: ). each taking its value from the alphabet sequence of symbols each taking its value Model: P ( X ,, X ) assumed be known for A = (a, , z, A, , Z, 1, 2, 9, !,to ?, ). 1 N from the alphabet A = (0,1) or A = (0, , 255). all combinations.
Source X1, X2,
Two Special Cases

1. The Memoryless Source
Each symbol independent of the previous ones. P(X1, X2, , Xn) = P(X1) P(X2) P(Xn) Each symbol depends on the previous one. P(X1, X2, , Xn) = P(X1) P(X2|X1) P(X3|X2) P(Xn|Xn-1)
2. The Markov Source
The Markov Source

A symbol depends only on the previous symbol, so the source can be modelled by a state diagram.
0.7
b
0.5 1.0
a
0.3 0.2
A ternary source with alphabet A = (a, b, c).
c
0.3
The Markov Source

Assume we are in state a, i.e., Xk = a. The probabilities for the next symbol are:
0.7
b
0.5 1.0
P(Xk+1 = a | Xk = a) = 0.3 P(Xk+1 = b | Xk = a) = 0.7 P(Xk+1 = c | Xk = a) = 0

0.3
a
0.3 0.2
The Markov Source

So, if Xk+1 = b, we know that Xk+2 will equal c.
0.7
b
0.5 1.0
P(Xk+2 = a | Xk+1 = b) = 0 P(Xk+2 = b | Xk+1 = b) = 0 P(Xk+2 = c | Xk+1 = b) = 1

0.3
a
0.3 0.2
The Markov Source

If all the states can be reached, the stationary probabilities for the states can be calculated from the given transition probabilities. Stationary probabilities? Thats the Markov models can be usedto represent probabilities i = P(Xk = ai) for any k when Xk-1, X not one given. sources with dependencies more than k-2, are step back.
Use a state diagram with several symbols in each state.
Analysis and Synthesis

Stochastic models can be used for analysing a source.
Find a model that well represents the real-world source, and then analyse the model instead of the real world.
Stochastic models can be used for synthesizing a source.

Use a random number generator in each step of a Markov model to generate a sequence simulating the source.
Show plastic slides!
Part 3: Information and Entropy

Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads?
If its a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the amount of information is 1 bit. If we already know that it will be (or was) heads, i.e., P(heads) = 1, the amount of information is zero! If the coin is not fair, e.g., P(heads) = 0.9, the amount of information is more than zero but less than one bit! Intuitively, the amount of information received is the same if P(heads) = 0.9 or P (heads) = 0.1.
Self Information
So, lets look at it the way Shannon did. Assume a memoryless source with
alphabet A = (a1, , an) symbol probabilities (p1, , pn).
How much information do we get when finding out that the next symbol is ai? According to Shannon the self information of ai is
Why?
Assume two independent events A and B, with probabilities P(A) = pA and P(B) = pB. For both the events to happen, the probability is pA pB. However, the amount of information should be added, not multiplied.
Logarithms satisfy this! No, we want the information to increase with decreasing probabilities, so lets use the negative logarithm.
Self Information
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log, youll measure in nats, if you pick the 10-log, youll get Hartleys, if you pick the 2-log (like everyone else), youll get bits.
Self Information
On average over all the symbols, we get:
H(X) is called the first order entropy of the source.
This can be regarded as the degree of uncertainty about the following symbol.
Entropy
Example: Binary Memoryless Source
BMS
01101000
Let
Then
1 The uncertainty (information) is greatest when
0.5
Entropy: Three properties

1. It can be shown that 0 H log N. 2. Maximum entropy (H = log N) is reached when all symbols are equiprobable, i.e., pi = 1/N. 3. The difference log N H is called the redundancy of the source.
Part 4: Entropy for Memory Sources

Assume a block of source symbols (X1, , Xn) and define the block entropy:
That is, the summation is done over all possible combinations of n symbols.
The entropy for a memory source is defined as:

That is, let the block length go towards infintity. Divide by n to get the number of bits / symbol.
Entropy for a Markov Source

The entropy for a state Sk can be expressed as
Pkl is the transition probability from state k to state l.
Averaging over all states, we get the entropy for the Markov source as
The Run-length Source

Certain sources generate long runs or bursts of equal symbols. Example:
1-
1-
Probability for a burst of length r: P(r) = (1-)r-1 Entropy: HR = - r=11 P(r) log P(r) If the average run length is , then HR/ = HM.
Part 5: The Source Coding Theorem
The entropy is the smallest number of bits allowing error-free representation of the source.
Why is this? Lets take a look on typical sequences!
Typical Sequences
Assume a long sequence from a binary memoryless source with P(1) = p. Among n bits, there will be approximately w = n p ones. Thus, there is M = (n over w) such typical sequences! Only these sequences are interesting. All other sequences will appear with smaller probability the larger is n.
How many are the typical sequences?
bits/symbol
Enumeration needs log M bits, i.e, bits per symbol!
How many bits do we need?
Thus, we need H(X) bits per symbol to code any typical sequence!
The Source Coding Theorem

Does tell us
that we can represent the output from a source X using H(X) bits/symbol. that we cannot do better.
Does not tell us

how to do it.
Summary
The mathematical model of communication.
Source, source coder, channel coder, channel, Rate, entropy, channel capacity.
Information theoretical entities

Information, self-information, uncertainty, entropy.
Sources
BMS, Markov, RL
The Source Coding Theorem

2 BasicInformationTheory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 BasicInformationTheory

Uploaded by

Copyright:

Available Formats

TSBK01 Image Coding and Data Compression

Lecture 2: Basic Information Theory

Part 1: Information Theory

Quotes about Shannon

Shannon 1: Error-free transmission possible if RH and CR.

Part 2: Stochastic sources

Two Special Cases

2. The Markov Source

The Markov Source

A ternary source with alphabet A = (a, b, c).

The Markov Source

P(Xk+1 = a | Xk = a) = 0.3 P(Xk+1 = b | Xk = a) = 0.7 P(Xk+1 = c | Xk = a) = 0

The Markov Source

P(Xk+2 = a | Xk+1 = b) = 0 P(Xk+2 = b | Xk+1 = b) = 0 P(Xk+2 = c | Xk+1 = b) = 1

The Markov Source

Analysis and Synthesis

Stochastic models can be used for synthesizing a source.

Show plastic slides!

Part 3: Information and Entropy

H(X) is called the first order entropy of the source.

Entropy: Three properties

Part 4: Entropy for Memory Sources

The entropy for a memory source is defined as:

Entropy for a Markov Source

Pkl is the transition probability from state k to state l.

The Run-length Source

Part 5: The Source Coding Theorem

Why is this? Lets take a look on typical sequences!

How many are the typical sequences?

Enumeration needs log M bits, i.e, bits per symbol!

How many bits do we need?

The Source Coding Theorem

Does not tell us

Information theoretical entities

The Source Coding Theorem

You might also like