Professional Documents
Culture Documents
(EEEg4172)
Chapter II
Information Theory and Coding
Instructor:
Fisiha Abayneh
Acknowledgement:
Materials prepared by Welelaw Y. & Habib M. in topics
related to this chapter are used.
1
Chapter II
Information Theory and Coding
2
2.1 Information Theory Basics
Outline:
Introduction
Measure of Information
Discrete Memoryless Channels
Mutual Information
Channel Capacity
3
Introduction
4
Measure of Information
Information Source:
An information source is an object that produces an event, the outcome of
which is selected at random according to a specific probability distribution.
A practical information source in a communication system is a device that
produces messages and it can be either analog or discrete.
A discrete information source is a source that has only a finite set of
symbols as possible outputs.
The set of source symbols is called the source alphabet and the elements of
the set are called symbols or letters.
5
Measure of Information Cont’d……
Information Source:
Information sources can be classified as:
Sources with memory, or
Memoryless sources.
A source with memory is one for which a current symbol depends on the
previous symbols.
A memoryless source is one for which each symbol produced is
independent of the previous symbols.
A discrete memoryless source (DMS) can be characterized by the set of it’s
symbols, the probability assignment to the symbols, and the rate by which
the source generates symbols.
6
Measure of Information Cont’d……
7
Measure of Information Cont’d……
8
Measure of Information Cont’d……
i. I ( xi ) 0 for P ( xi ) 1
ii. I ( xi ) 0
iii. I ( xi ) I ( x j ) if P( xi ) P( x j )
iv. I ( xi , x j ) I ( xi ) I ( x j )
if xi and x j are independent
9
Measure of Information Cont’d……
Average Information or Entropy:
In a practical communication system, we usually transmit long sequences
of symbols from an information source.
Thus, we are more interested in the average information that a source
produces than the information content of a single symbol.
The mean value of I(xi) over the alphabet of source X with m different
symbols is given by:
m
H ( X ) E[ I ( xi )] P( xi ) I ( xi )
i 1
m
P( xi ) log2
P ( xi )
bits/symbol
i 1
10
Measure of Information Cont’d……
Average Information or Entropy:
The quantity H(X) is called the entropy of the source X.
It a measure of the average information content per source symbol.
The source entropy H(X) can be considered the average amount of
uncertainty within source X that is resolved by use of the alphabet.
The source entropy H(X) satisfies the following relations:
0 H ( X ) log2
m
where m is the size of the
alphabet of source X .
The lower bound corresponds to no uncertainty, which occurs when one symbol has
probability p(xi)=1 while p(xj)= 0 for j≠i, so X emits xi at all times. The upper bound
corresponds to the maximum uncertainty which occurs when p(xj)= 1/m for all i (i.e: when
all symbols have equal probability to be generated by X.
11
Measure of Information Cont’d……
Average Information or Entropy:
12
Discrete Memoryless Channels
Channel Representation:
A communication channel is the path or medium through which the
symbols flow from a source to a receiver.
A discrete memoryless channel (DMC) is a statistical model with an input
X and output Y as shown in the figure below.
13
Discrete Memoryless Channels Cont’d…..
Channel Representation:
The channel is discrete when the alphabets of X and Y are both finite.
It is memoryless when the current output depends on only the current input
and not on any of the previous inputs.
In the DMC shown above, the input X consists of input symbols x1, x2, …,
xm and the output Y consists of output symbols y1, y2, …, yn.
Each possible input-to-output path is indicated along with a conditional
probability P(yj/xi) where P(yj/xi) is called channel transition probability.
14
Discrete Memoryless Channels Cont’d…..
Channel Matrix:
A channel is completely specified by the complete set of transition
probabilities.
The channel matrix , denoted by [P(Y/X)], is given by:
Since each input to the channel results in some output, each row of the
channel matrix must sum to unity, i.e:
15
Discrete Memoryless Channels Cont’d…..
Special Channels:
1. Lossless Channel
A channel described by a channel matrix with only one non-zero element in
each column is called a lossless channel.
An example of a lossless channel is shown in the figure below.
16
Discrete Memoryless Channels Cont’d…..
Special Channels:
2. Deterministic Channel
A channel described by a channel matrix with only one non-zero unity
element in each row is called a deterministic channel.
An example of a deterministic channel is shown in the figure below.
17
Discrete Memoryless Channels Cont’d…..
Special Channels:
3. Noiseless Channel
A channel is called noiseless if it is both lossless and deterministic.
A noiseless channel is shown in the figure below.
The channel matrix has only one element in each in each row and each
column and this element is unity.
18
Discrete Memoryless Channels Cont’d…..
Special Channels:
4. Binary Symmetric Channel
The binary symmetric channel (BSC) is defined by the channel matrix and
channel diagram given below.
19
Mutual Information
Conditional and Joint Entropies:
Using the input probabilities P(xi), output probabilities P(yj), transition
probabilities P(yj/xi), and joint probabilities P(xi, yj), we can define the
following various entropy functions for a channel with m inputs and n
outputs.
20
Mutual Information
Conditional and Joint Entropies:
The above entropies can be interpreted as the average uncertainties of the
inputs and outputs.
Two useful relationships among the above various entropies are:
Mutual Information:
The mutual information I(X;Y) of a channel is defined by:
21
Mutual Information Cont’d…..
Mutual Information:
22
Channel Capacity
23
Channel Capacity Cont’d….
24
Channel Capacity Cont’d….
25
Additive White Gaussian Noise(AWGN) Channel
Differential Entropy:
The average amount of information per sample value of x(t) is measured by
26
Additive White Gaussian Noise(AWGN) Channel
27
Additive White Gaussian Noise(AWGN) Channel
In an additive white Gaussian noise (AWGN ) channel, the channel output Y is given
by:
Y= X + n
where X is the channel input and n is an additive band-limited white Gaussian noise
with zero mean and variance σ2.
The capacity Cs of an AWGN channel is given by:
28
Additive White Gaussian Noise(AWGN) Channel
If the channel bandwidth B Hz is fixed, then the output y(t) is also a band-limited
signal completely characterized by its periodic sample values taken at the Nyquist
rate 2B samples/s.
Then the capacity C (b/s) of the AWGN channel is given by:
29
Examples on Information Theory and Coding
Example-1:
a. Calculate H ( X )
x1 x2 x3 x4 and x4 x3 x3 x2
30
Examples on Information Theory and Coding Cont’d…
Solution:
4
a. H ( X ) P( xi ) log2
[ P ( xi )]
i 1
1.85 bits/symbol
8.7 bits/symbol
Similarly,
P ( x1 x2 x3 x4 ) (0.1)(0.2) 2 (0.3) 0.0012
I ( x1 x2 x3 x4 ) log2 9.7 bits/symbol
0.0012
31
Examples on Information Theory and Coding Cont’d….
Example-2:
A high-resolution black-and-white TV picture consists of about 2*106
picture elements and 16 different brightness levels. Pictures are repeated at
the rate of 32 per second. All picture elements are assumed to be
independent and all levels have equal likelihood of occurrence. Calculate
the average rate of information conveyed by this TV picture source.
Solution:
16
1 1
H ( X ) log2 16 4 bits/element
i 1 16
b. P (Y ) P ( X ) P (Y / X )
0 .9 0 . 1
0.5 0.5
0 . 2 0 . 8
0.55 0.45 P ( y1 ) P ( y 2 )
P ( y1 ) 0.55 and P ( y 2 ) 0.45
34
Examples on Information Theory and Coding Cont’d….
Solution:
c. P ( X , Y ) P ( X ) d P (Y / X )
0.5 0 0.9 0.1
0.2 0.8
0 0 . 5
0.45 0.05 P ( x1 , y1 ) P ( x1 , y2 )
0. 1 0 .4 P ( x 2 , y1 ) P ( x 2 , y 2
)
P ( x1 , y2 ) 0.05 and P ( x2 , y1 ) 0.1
35
Examples on Information Theory and Coding Cont’d….
Example-4:
An information source can be modeled as a bandlimited process with a
bandwidth of 6kHz. This process is sampled at a rate higher than the
Nyquist rate to provide a guard band of 2kHz. It is observed that the
resulting samples take values in the set {-4, -3, -1, 2, 4, 7} with
probabilities 0.2, 0.1, 0.15, 0.05, 0.3 and 0.2 respectively. What is the
entropy of the discrete-time source in bits/sample? What is the entropy in
bits/sec?
36
Examples on Information Theory and Coding Cont’d….
Solution:
The entropy of the source is
6
H ( X ) P ( xi ) log2 2.4087 bits/sample
P ( xi )
i 1
37
Chapter II
Information Theory and Coding
38
Source Coding
Outline:-
Code length and code efficiency
Source coding theorem
Classification of codes
Kraft inequality
Entropy coding
39
Source Coding
40
Code Length and Code Efficiency
Let X be a DMS with finite entropy H(X) and an alphabet {x1, x2, …..xm}
with corresponding probabilities of occurrence P(xi) (i=1, 2, …., m).
An encoder assigns a binary code (sequence of bits) of length ni bits for
each symbol xi in the alphabet of the DMS.
A binary code that represent a symbol xi is known as a code word.
The length of a code word (code length) is the number of binary digits in
the code word (or simply ni).
The average code word length L, per source symbol is given by:
m
L P ( xi )ni
i 1
41
Code Length and Code Efficiency
The parameter L represents the average number of bits per source symbol
used in the source coding process.
The code efficiency η is defined as:
Lmin
, where Lmin is the minimum possible value of L
L
When η approaches unity, the code is said to be efficient.
The code redundancy γ is defined as:
1
42
Source Coding Theorem
The source coding theorem states that for a DMS X with entropy H(X), the
average code word length L per symbol is bounded as:
L H (X )
H (X )
L
43
Classifications of Codes
There are several types of codes.
Let’s consider the table given below to describe different types of codes.
1. Fixed-Length Codes:
A fixed-length code is one whose code word length is fixed.
Code 1 and Code 2 of the above table are fixed-length codes with
length 2.
44
Classifications of Codes
2. Variable-Length Codes:
A variable-length code is one whose code word length is not fixed.
All codes of the above table except codes 1 and 2 are variable-length
codes.
3. Distinct Codes:
A code is distinct if each code word is distinguishable from the other
code words.
All codes of the above table except code 1 are distinct codes.
45
Classifications of Codes
4. Prefix-Free Codes:
A code is said to be a prefix-free code when the code words are distinct
and no code word is a prefix for another code word.
Codes 2, 4 and 6 in the above table are prefix free codes.
5. Uniquely Decodable Codes:
A distinct code is uniquely decodable if the original source sequence
can be reconstructed perfectly from the encoded binary sequences.
Note that code 3 of Table above is not a uniquely decodable code.
For example, the binary sequence 1001 may correspond to the source
sequences X2X3X2 or X2X1XIX2
46
Classifications of Codes
5. Uniquely Decodable Codes continued…
A sufficient condition to ensure that a code is uniquely decodable is
that no code word is a prefix of another.
Thus, the prefix-free codes 2, 4, and 6 are uniquely decodable codes.
Note that the prefix-free condition is not a necessary condition for
codes to be uniquely decodable.
For example, code 5 of Table above does not satisfy the prefix-free
condition, and yet it is uniquely decodable since the bit 0 indicates the
beginning of each code word of the code.
i.e: when the decoder finds 0, it knows a new word is starting. But it
has to wait until the next 0 comes to identify which word is received.
47
Classifications of Codes
6. Instantaneous Codes:
A uniquely decodable code is called an instantaneous code if the end
of any code word is recognizable without examining subsequent code
symbols.
The instantaneous codes have the property previously mentioned that no code
word is a prefix of another code word
For this reason, prefix-free codes are sometimes called instantaneous codes.
7. Optimal Codes:
A code is said to be optimal if it is instantaneous and has minimum
average length L for a given source with a given probability
assignment for the source symbols
48
Kraft Inequality
Let X be a DMS with alphabet {Xi} (i = 1,2, ... ,m). Assume that the length
of the assigned binary code word corresponding to Xi is ni.
A necessary and sufficient condition for the existence of an instantaneous
binary code is:
49
Entropy Coding
The design of a variable-length code such that its average code word length
approaches the entropy of the DMS is often referred to as entropy coding.
In this section we present two examples of entropy coding.
There are two commonly known types of entropy coding; which are
presented in this section. These are:
i. Shannon-Fano coding
ii. Huffman coding
50
Entropy Coding
1. Shannon-Fano Coding:
An efficient code can be obtained by the following simple procedure,
known as Shannon-Fano algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equi-probable as possible,
and assign 0 to the symbols in the upper set and 1 to the symbols in the lower
set.
3. Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible, and assigning 0s to the symbols of upper and 1s to
the lower sets, until further partitioning is not possible.
4. Collect 0s and 1s in the order from 1st step to the last step, to form a code
word for each symbol.
Note that in Shannon-Fano encoding the ambiguity may arise in the
choice of approximately equi-probable sets.
51
Entropy Coding
52
Entropy Coding
2. Huffman Coding:
Huffman encoding employs an algorithm that results in an optimum code.
Thus, it is the code that has the highest efficiency.
The Huffman encoding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols that have lowest probabilities,
and re order the resultant probabilities. This action is know as reduction.
3. Repeat the above step until only two ordered probabilities are left.
4. Start encoding with the last reduction, which contains only two ordered
probabilities. Assign 0 for the 1st and 1 for the 2nd probability.
5. Now, go back one step and assign 0 and 1 (as a 2nd digit) for the two
probabilities that were combined in the last reduction process, and keep the
digit in the previous step for the probability that is not combined.
6. Repeat this process until the first column. The resulting digits will be code
words for the symbols.
53
Entropy Coding
54
Chapter II
Information Theory and Coding
55
Channel Coding
Outline:
Introduction
Channel coding theorem
Channel coding types
Linear block codes
Convolutional codes
56
Channel Coding
57
Channel Coding
Figure below shows a basic block diagram representation of channel
coding.
58
Channel Coding Theorem
The channel coding theorem for a DMC is stated in two parts as
follows:
Let a DMS X has an entropy H(X) bits/symbol and produce symbols ever Ts
seconds, and Let a DMC has a capacity Cs bits/ symbol and be used every Tc
seconds:
H(X) C
1. If ≤ s ,
Ts Tc
there exists a coding scheme for which the source output can be
transmitted over the channel with arbitrarily small probability of error .
H(X) C
2. Conversely, if > s,
Ts Tc
it is not possible to transmit information over the channel with an
arbitrarily small probability of error.
Note that the channel coding theorem only asserts the existence
of codes but it does not tell us how to construct these codes
59
Channel Coding Types
60
Block Codes
A block code is a code in which all of its words have the same length.
In block codes, the binary message or data sequence of the source is
divided into sequential blocks of each k bits long.
The channel encoder maps the k-bit blocks to other n-bit blocks.
The channel encoder performs a mapping
T: U→V
Where U is a set of binary data words of length k and V is a set of binary
code words of length n with n>k.
61
Block Codes
The n-bit blocks are produced by the channel encoder (so, they are aka
channel symbols), and n>k.
The resultant block code is called an (n, k) block code.
The k-bit blocks form 2k distinct message sequences referred to as k-tuples.
The n-bit blocks can form as many as 2n distinct sequences referred to as n-
tuples. The set of all n-tuples defined over K ={0,1} is denoted by Kn.
Each of the 2k data words is mapped to a unique code word.
The ratio k/n is called the code rate.
62
Block Codes
Binary Field:
The set K= {0, 1} is known as a binary field. The binary field has two
operations:- addition and multiplication, such that the results of all
operations are in K.
The rules of addition and multiplication are as follows:
Addition: 0+0=0, 1+1=0, 0+1=1+0=1
Multiplication: 0*0=0, 1*1=1, 0*1=1*0=0
63
Linear Block Codes
Linearity:
Let a = (a1,a2, ... ,an), and b = (b1,b2, ... ,bn) be two code words in a
code C.
The sum of a and b, denoted by a+b, is defined by:-
(a1+b1, a2 + b2, •• ,an + bn).
A code C is called linear if the sum of two code words is also a code word
in C.
A linear code C must contain the zero code word 0 = (0,0, ... ,0), since
a+a =0.
64
Linear Block Codes
65
Linear Block Codes
66
Convolutional Codes
In this type of encoding, every code word of the channel symbols is made
to be the weighted sum of various input message symbols.
The operation is similar to a convolution used in LTI systems to find the
output when the input and impulse response are known.
In general, the resulting code word will be a convolution of input bit
against previously known values of convolutional encoder.
Convolutional encoders offer greater simplicity of implementation, but not
more protection, than an equivalent block code.
They are used in GSM mobile, voiceband modems, satellite and military
communications.
67
References:
Additional Reading:
https://en.wikipedia.org/wiki/Information_theory
https://en.wikipedia.org/wiki/Coding_theory
https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem
68