Professional Documents
Culture Documents
INFORMATION THEORY
Historical Notes
• Claude E. Shannon (1916-
2001) himself in 1948, has
established almost
everything regarding Source
Coding and Channel Coding.
• He was dealing with
communication aspects.
• He first used the term “bit.”
Motivation: What is information?
• By definition, information is certain knowledge
about certain things, which may or may not be
conceived by an observer.
• Example: music, story, news, etc.
• Information means.
• Information propagates.
• Information corrupts…
Information Theory tells us…
• Information knowledge
• Information: reduction in uncertainty
• Example:
1) Flip a Coin
2) Roll a Die
qi
• H(Pn) = n H(P)
• TAMIL:
• 248 characters (247+1)
• Assuming independence between successive characters:
– Uniform character distribution: log2 (248)= 7.952bits/character
• Entropy of English is much lower!
Source Coding
Source coding means an effective representation of
data generated by a discrete source
Represented by source encoder
Statistics of the source must be known (e.g. if coding
priorities exist)
Morse Code : E . , Q - - . -
Functional Requirements:
Codewords are in binary form
Code is uniquely decodable
Source Coding
K-1
Average codeword length L = pk lk ?
k=0
A EAB AAD
A BAAB AAAB
Kraft Inequality
Example
Huffman Coding
Example
Huffman coding is not unique
Example 2
Alternate
Table
100% Efficient
Example
Arithmetic Coding
Limitations of Huffman Code
• Huffman – optimal if probabilities 2^(-n).
K-1
p (yk/xj) = 1; for all j
k=0
Discrete Memoryless Channels
p (xj) apriori probability of input symbols
J-1
J-1
Transition probabilities
Discrete Memoryless Channels
• Example : Binary symmetric channel
1-p p
p 1-p
Conditional Entropy
J-1 K-1
K-1 J-1
= p(xj , yk ) log2 { 1 / p(xj) }
K=0 j=0
Conditional Entropy
• Source alphabet X , Source Entropy H(X ),
• Uncertainity remaining about X after the observation of Y
J-1
K-1
K-1 J-1
K-1 J-1
= p(xj , yk ) log2 { 1 / p(xj / yk) }
K=0 j=0
Mutual Information
I(X ; Y ) = H(X ) - H(X / Y )
K-1 J-1
K-1 J-1
- p(xj , yk ) log2 { 1 / p(xj / yk) }
K=0 j=0
K-1 J-1
I(X ; Y )= p(xj , yk ) log2 {p(xj / yk) / p(xj) }
K=0 j=0
H(X)
H(Y)
p(xj ) = 1
j=0
Discrete Memoryless Channels
Example : Binary symmetric channel
= 1 + plog2p
+(1-p)log2(1-p)
= 1 – H(p)
H(p) is maximum for p = 0.5
C is maximum when p=0 noise free
Binary Erasure Channel (BEC)
Symmetric
Example
Properties
Channel Coding
ECC
Rate(Optimum)
Noisy Channel Coding Theorem or simply the
Channel Coding Theorem
Example
Average probability of error for repetition
codes
Information Capacity Theorem
Capacity
Example
Shannon’s
Information Capacity Theorem
The information capacity of a continuous channel of
bandwidth B Hertz, perturbed by additive white Gaussian
noise of power spectral density No/2 and limited in
bandwidth to B, is given by,
C = B log2( 1 + P/ NoB)
Rb = C = B log2( 1 + Eb C / NoB)
C/B = log2 [ 1 + (C/B) (Eb/ No )]
Shannon limit
Eb / No = [ 2C/B – 1 ] / (C/B)
As B , Eb / No ln 2 = 0.693 -1.6 dB
Unattainable
region
Rb /B [bits/s/Hz]
Practical region
Eb /No [dB]
Shannon limit …
Unattainable
region
-1.6 [dB]
Bandwidth efficiency plane
R>C
Unattainable region M=256
M=64
R=C
M=16
M=8
Rb/ B [bits/s/Hz]
M=4
Bandwidth limited
M=2
• Encoding algorithm
– Order the symbols by decreasing probabilities Node
– Starting from the bottom, assign 0 to the least probable Root
symbol and 1 to the next least probable
– Combine the two least probable symbols into one composite
symbol 1 0
– Reorder the list with the composite symbol
1 0
– Repeat Step 2 until only two symbols remain in the list
• Huffman tree
– Nodes: symbols or composite symbols
– Branches: from each node, 0 defines one branch Leaves
while 1 defines the other
• Decoding algorithm
– Start at the root, follow the branches based on the bits received
– When a leaf is reached, a symbol has just been decoded
Huffman Coding Example
Symbols Prob. Symbols Prob. Symbols Prob.
A 0.35 A 0.35 A 0.35
B 0.17 DE 0.31 BC 0.34 1
C 0.17 B 0.17 1 DE 0.31 0
D 0.16 1 C 0.17 0
E 0.15 0
Huffman Tree
Huffman Codes 1 0 Symbols Prob.
BCDE A
A 0 BCDE 0.65 1
BC 1 0 DE
B 111 A 0.35 0
C 110 B1 0 1 0 E
D 101 C D
E 100
Average code-word length =
0.35 x 1 + 0.65 x 3 = 2.30 bits per symbol
Huffman coding
• In Huffman coding to each symbol of a given alphabet is assigned a
sequence of bits according to the symbol probability
L = 2.2
H(P) = 2.12193
Huffman coding
• Huffman coding not unique
L remains unchanged
Variance differs
1 K-1
2 = pk (lk – L )2
0 k=0
original image
decompose
Sequence of 8 by 8 blocks - different planes
treated separately (RGB, YUV etc.)
transform
Transformed blocks reduce redundancy and
concentrate signal energy into a few coefficients
discrete cosine transformation (DCT)
quantise
Blocks with discarded information - goal is to smooth
picture and discard information that will not be
missed, e.g. high frequencies
entropy code
Block Transform Encoding
DCT
Zig-zag quantise
run
length
code 010111000111…..
entropy
code
Block Encoding
Original image
139 144 149 153 1260 -1 -12 -5
DCT -23 -17 -6 -3
144 151 153 156
150 155 160 163 -11 -9 -2 2
quantise
159 161 162 160 -7 -2 0 1
79 0 -1 0
Zig-zag
79 0 -2 -1 -1 -1 0 0 -1 0 0 0 0 0 0 0 -2 -1 0 0
-1 -1 0 0
0 79 0 0 0 0
1 -2
run 0 -1
length 0 -1
code 0 -1 10011011100011….
Huffman
2 -1
code
0 0
Block Transform Decoding
DCT
Zig-zag quantise
run
length
code 010111000111…..
entropy
code
Result of Coding and Decoding
-5 -2 0 1
-4 1 1 2
-5 -1 3 5
-1 0 1 -2
errors
Linear Prediction (Introduction):
• The object of linear prediction is to estimate
the output sequence from a linear
combination of input samples, past output
samples or both :
94