Chapte-2 Information Theory and Coding

Communication Systems
(EEEg4172)
Chapter II
Information Theory and Coding
Instructor:
Fisiha Abayneh
Acknowledgement:
Materials prepared by Welelaw Y. & Habib M. in topics
related to this chapter are used.
1
Chapter II
2.1 Information Theory basics

2.2 Source Coding
2.3 Channel Coding
2
2.1 Information Theory Basics
Outline:
 Introduction
 Measure of Information
 Discrete Memoryless Channels
 Mutual Information
 Channel Capacity
3
Introduction
 Information theory is a field of study which deals with

quantification, storage, and communication of information.
 It is originally proposed by Claude E. Shannon in his
famous paper entitled “A Mathematical Theory of
Communication”.
 In general, information theory provides:
 a quantitative measure of information contained in message
signals.
 a way to determine the capacity of a communication system to
transfer information from source to destination.
4
Measure of Information
Information Source:
 An information source is an object that produces an event, the outcome of
which is selected at random according to a specific probability distribution.
 A practical information source in a communication system is a device that
produces messages and it can be either analog or discrete.
 A discrete information source is a source that has only a finite set of
symbols as possible outputs.
 The set of source symbols is called the source alphabet and the elements of
the set are called symbols or letters.
5
Measure of Information Cont’d……
Information Source:
 Information sources can be classified as:
 Sources with memory, or
 Memoryless sources.
 A source with memory is one for which a current symbol depends on the
previous symbols.
 A memoryless source is one for which each symbol produced is
independent of the previous symbols.
 A discrete memoryless source (DMS) can be characterized by the set of it’s
symbols, the probability assignment to the symbols, and the rate by which
the source generates symbols.
6
Information Content of a DMS:

 The amount of information contained in an event is closely related to its
uncertainty.
 Messages containing knowledge of high probability of occurrence convey
relatively little information.
 If an event is certain, it conveys zero information.
 Thus, a mathematical measure of information should be a function of the
probability of the outcome and should satisfy the following axioms:
1. Information should be proportional to the uncertainty of an outcome.
2. Information contained in independent outcomes should add.
7
Information Content of a Symbol:

 Consider a DMS, denoted by X, with alphabets {x1, x2, …, xm}.
 The information content of a symbol xi, denoted by I(xi), is defined by:
1
I ( xi )  log   logb
P ( xi )
b P ( xi )
where P( xi ) is the probability of occurenceof symbol x i

 Units of I(xi) depends on the base b, can be defined as follows:
 If b = 2, the unit is bits/symbol (aka, Shannon)
 If b = 10, the unit is in decimal digit (or hartleys)
 If b = e, the unit is in “nats/symbol)
8
Information Content of a Symbol:

 Note that I(xi) satisfies the following properties.
i. I ( xi )  0 for P ( xi )  1
ii. I ( xi )  0
iii. I ( xi )  I ( x j ) if P( xi )  P( x j )
iv. I ( xi , x j )  I ( xi )  I ( x j )
if xi and x j are independent
9
Average Information or Entropy:
 In a practical communication system, we usually transmit long sequences
of symbols from an information source.
 Thus, we are more interested in the average information that a source
produces than the information content of a single symbol.
 The mean value of I(xi) over the alphabet of source X with m different
symbols is given by:
m
H ( X )  E[ I ( xi )]   P( xi ) I ( xi )
i 1
m
  P( xi ) log2
P ( xi )
bits/symbol
i 1
10
 The quantity H(X) is called the entropy of the source X.
 It a measure of the average information content per source symbol.
 The source entropy H(X) can be considered the average amount of
uncertainty within source X that is resolved by use of the alphabet.
 The source entropy H(X) satisfies the following relations:
0  H ( X )  log2
m
where m is the size of the
alphabet of source X .
 The lower bound corresponds to no uncertainty, which occurs when one symbol has
probability p(xi)=1 while p(xj)= 0 for j≠i, so X emits xi at all times. The upper bound
corresponds to the maximum uncertainty which occurs when p(xj)= 1/m for all i (i.e: when
all symbols have equal probability to be generated by X.
11
12
Discrete Memoryless Channels
Channel Representation:
 A communication channel is the path or medium through which the
symbols flow from a source to a receiver.
 A discrete memoryless channel (DMC) is a statistical model with an input
X and output Y as shown in the figure below.
13
Discrete Memoryless Channels Cont’d…..
Channel Representation:
 The channel is discrete when the alphabets of X and Y are both finite.
 It is memoryless when the current output depends on only the current input
and not on any of the previous inputs.
 In the DMC shown above, the input X consists of input symbols x1, x2, …,
xm and the output Y consists of output symbols y1, y2, …, yn.
 Each possible input-to-output path is indicated along with a conditional
probability P(yj/xi) where P(yj/xi) is called channel transition probability.
14
Channel Matrix:
 A channel is completely specified by the complete set of transition
probabilities.
 The channel matrix , denoted by [P(Y/X)], is given by:
 Since each input to the channel results in some output, each row of the
channel matrix must sum to unity, i.e:
15
Special Channels:
1. Lossless Channel
 A channel described by a channel matrix with only one non-zero element in
each column is called a lossless channel.
 An example of a lossless channel is shown in the figure below.
16
Special Channels:
2. Deterministic Channel
 A channel described by a channel matrix with only one non-zero unity
element in each row is called a deterministic channel.
 An example of a deterministic channel is shown in the figure below.
17
Special Channels:
3. Noiseless Channel
 A channel is called noiseless if it is both lossless and deterministic.
 A noiseless channel is shown in the figure below.
 The channel matrix has only one element in each in each row and each
column and this element is unity.
18
Special Channels:
4. Binary Symmetric Channel
 The binary symmetric channel (BSC) is defined by the channel matrix and
channel diagram given below.
19
Mutual Information
Conditional and Joint Entropies:
 Using the input probabilities P(xi), output probabilities P(yj), transition
probabilities P(yj/xi), and joint probabilities P(xi, yj), we can define the
following various entropy functions for a channel with m inputs and n
outputs.
20
Mutual Information
Conditional and Joint Entropies:
 The above entropies can be interpreted as the average uncertainties of the
inputs and outputs.
 Two useful relationships among the above various entropies are:
Mutual Information:
 The mutual information I(X;Y) of a channel is defined by:
21
Mutual Information Cont’d…..
Mutual Information:
22
Channel Capacity
23
Channel Capacity Cont’d….
24
Channel Capacity Cont’d….
25
Additive White Gaussian Noise(AWGN) Channel
 In a continuous channel an information source produces a continuous signal

x(t).
 The set of possible signals is considered as an ensemble of waveforms
generated by some ergodic random process.
Differential Entropy:
 The average amount of information per sample value of x(t) is measured by
 The entropy H(X) defined by equation above is known as the differential

entropy of X.
26
 The average mutual information in a continuous channel is defined (by analogy

with the discrete case) as:
27
 In an additive white Gaussian noise (AWGN ) channel, the channel output Y is given
by:
Y= X + n
 where X is the channel input and n is an additive band-limited white Gaussian noise
with zero mean and variance σ2.
 The capacity Cs of an AWGN channel is given by:
where S/N is the signal-to-noise ratio at the channel output.
28
 If the channel bandwidth B Hz is fixed, then the output y(t) is also a band-limited
signal completely characterized by its periodic sample values taken at the Nyquist
rate 2B samples/s.
 Then the capacity C (b/s) of the AWGN channel is given by:
 This equation is known as the Shannon-Hartley law.
29
Examples on Information Theory and Coding
Example-1:
A DMS X has four symbols x1 , x2 , x3 , x4 with probabilities
P ( x1 )  0.4, P ( x2 )  0.3, P ( x3 )  0.2 and P ( x4 )  0.1
a. Calculate H ( X )
b. Find the amount of information contained in the messages
x1 x2 x3 x4 and x4 x3 x3 x2
30
Examples on Information Theory and Coding Cont’d…
Solution:
4
a. H ( X )   P( xi ) log2
[ P ( xi )]
i 1
 0.4 log2  0.3 log2  0.2 log2  0.1log2

0.4 0.3 0.2 0.1
 1.85 bits/symbol
b. P ( x1 x2 x3 x4 )  (0.4)(0.3)(0.2)(0.1)  0.0096 0.0024
I ( x1 x2 x3 x4 )   log2  6.7 bits/symbol

0.0096
8.7 bits/symbol
Similarly,
P ( x1 x2 x3 x4 )  (0.1)(0.2) 2 (0.3)  0.0012
I ( x1 x2 x3 x4 )   log2  9.7 bits/symbol
0.0012
31
Examples on Information Theory and Coding Cont’d….
Example-2:
A high-resolution black-and-white TV picture consists of about 2*106
picture elements and 16 different brightness levels. Pictures are repeated at
the rate of 32 per second. All picture elements are assumed to be
independent and all levels have equal likelihood of occurrence. Calculate
the average rate of information conveyed by this TV picture source.
Solution:
16
1  1 
H ( X )   log2  16   4 bits/element
i 1 16
r  2 *106 * 32  64 *106 elements/s
 R  rH ( X )  64 *106 * 4  256*106 bits/s  256Mbps

32
Example-3:
Consider a binary symmetric channel shown below.
a. Find the channel matrix of the channel

b. Find P( y1 ) and P( y2 ) when P( x1 )  P( x2 )  0.5
c. Find the joint probabilities P( x1 , y2 ) and P( x2 , y1 )
when P( x1 )  P( x2 )  0.5
33
Solution:
a. The channel matrix is given by :
 P ( y1 / x1 ) P ( y2 / x1 )  0.9 0.1
P(Y / X )     
 P ( y1 / x 2 ) P ( y 2 / x 2 
)  0 . 2 0 . 8 
b. P (Y )   P ( X ) P (Y / X ) 
 0 .9 0 . 1 
 0.5 0.5 
 0 . 2 0 . 8 
 0.55 0.45  P ( y1 ) P ( y 2 ) 
 P ( y1 )  0.55 and P ( y 2 )  0.45
34
Solution:
c. P ( X , Y )  P ( X ) d P (Y / X )
0.5 0 0.9 0.1
  0.2 0.8
 0 0 . 5  
0.45 0.05  P ( x1 , y1 ) P ( x1 , y2 ) 
   
 0. 1 0 .4   P ( x 2 , y1 ) P ( x 2 , y 2 
)
 P ( x1 , y2 )  0.05 and P ( x2 , y1 )  0.1
35
Example-4:
An information source can be modeled as a bandlimited process with a
bandwidth of 6kHz. This process is sampled at a rate higher than the
Nyquist rate to provide a guard band of 2kHz. It is observed that the
resulting samples take values in the set {-4, -3, -1, 2, 4, 7} with
probabilities 0.2, 0.1, 0.15, 0.05, 0.3 and 0.2 respectively. What is the
entropy of the discrete-time source in bits/sample? What is the entropy in
bits/sec?
36
Solution:
The entropy of the source is
6
H ( X )   P ( xi ) log2  2.4087 bits/sample
P ( xi )
i 1
The sampling rate is

f s  2kHz  2 * (6kHz)  14kHz
This means that 14000samples are taken per each second.
 The entropy of the source in bits per second is given by :
H ( X )  2.4087 (bis/sample) *14000 (samples/sec)
 33721.8bits/sec  33.7218Kbps
37
Chapter II

2.2 Source Coding
2.3 Channel Coding
38
Source Coding
Outline:-
 Code length and code efficiency
 Source coding theorem
 Classification of codes
 Kraft inequality
 Entropy coding
39
Source Coding
 A conversion of the output of a DMS into a sequence of binary symbols

(binary code word) is called source coding.
 The device that performs this conversion is called the source decoder.
 An objective of source coding is to minimize the average bit rate required

for representation of the source by reducing the redundancy of the
information source.
40
Code Length and Code Efficiency
 Let X be a DMS with finite entropy H(X) and an alphabet {x1, x2, …..xm}
with corresponding probabilities of occurrence P(xi) (i=1, 2, …., m).
 An encoder assigns a binary code (sequence of bits) of length ni bits for
each symbol xi in the alphabet of the DMS.
 A binary code that represent a symbol xi is known as a code word.
 The length of a code word (code length) is the number of binary digits in
the code word (or simply ni).
 The average code word length L, per source symbol is given by:
m
L   P ( xi )ni
i 1
41
Code Length and Code Efficiency
 The parameter L represents the average number of bits per source symbol
used in the source coding process.
 The code efficiency η is defined as:
Lmin
 , where Lmin is the minimum possible value of L
L
 When η approaches unity, the code is said to be efficient.
 The code redundancy γ is defined as:
  1 
42
Source Coding Theorem
 The source coding theorem states that for a DMS X with entropy H(X), the
average code word length L per symbol is bounded as:
L  H (X )
 And further, L can be made as close to H(X) as desired, by employing

efficient coding schemes.
 Thus, with Lmin=H(X), the code efficiency can be written as:
H (X )

L
43
Classifications of Codes
 There are several types of codes.
 Let’s consider the table given below to describe different types of codes.
1. Fixed-Length Codes:
 A fixed-length code is one whose code word length is fixed.
 Code 1 and Code 2 of the above table are fixed-length codes with
length 2.
44
2. Variable-Length Codes:
 A variable-length code is one whose code word length is not fixed.
 All codes of the above table except codes 1 and 2 are variable-length
codes.
3. Distinct Codes:
 A code is distinct if each code word is distinguishable from the other
code words.
 All codes of the above table except code 1 are distinct codes.
45
4. Prefix-Free Codes:
 A code is said to be a prefix-free code when the code words are distinct
and no code word is a prefix for another code word.
 Codes 2, 4 and 6 in the above table are prefix free codes.
5. Uniquely Decodable Codes:
 A distinct code is uniquely decodable if the original source sequence
can be reconstructed perfectly from the encoded binary sequences.
 Note that code 3 of Table above is not a uniquely decodable code.
 For example, the binary sequence 1001 may correspond to the source
sequences X2X3X2 or X2X1XIX2
46
5. Uniquely Decodable Codes continued…
 A sufficient condition to ensure that a code is uniquely decodable is
that no code word is a prefix of another.
 Thus, the prefix-free codes 2, 4, and 6 are uniquely decodable codes.
 Note that the prefix-free condition is not a necessary condition for
codes to be uniquely decodable.
 For example, code 5 of Table above does not satisfy the prefix-free
condition, and yet it is uniquely decodable since the bit 0 indicates the
beginning of each code word of the code.
 i.e: when the decoder finds 0, it knows a new word is starting. But it
has to wait until the next 0 comes to identify which word is received.
47
6. Instantaneous Codes:
 A uniquely decodable code is called an instantaneous code if the end
of any code word is recognizable without examining subsequent code
symbols.
 The instantaneous codes have the property previously mentioned that no code
word is a prefix of another code word
 For this reason, prefix-free codes are sometimes called instantaneous codes.
7. Optimal Codes:
 A code is said to be optimal if it is instantaneous and has minimum
average length L for a given source with a given probability
assignment for the source symbols
48
Kraft Inequality
 Let X be a DMS with alphabet {Xi} (i = 1,2, ... ,m). Assume that the length
of the assigned binary code word corresponding to Xi is ni.
 A necessary and sufficient condition for the existence of an instantaneous
binary code is:
which is known as the Kraft inequality.

 It is also referred as Kraft-McMillan inequality.
 Note that the Kraft inequality assures us of the existence of an
instantaneously decodable code with code word lengths that satisfy the
inequality.
49
Entropy Coding
 The design of a variable-length code such that its average code word length
approaches the entropy of the DMS is often referred to as entropy coding.
 In this section we present two examples of entropy coding.
 There are two commonly known types of entropy coding; which are
presented in this section. These are:
i. Shannon-Fano coding
ii. Huffman coding
50
Entropy Coding
1. Shannon-Fano Coding:
 An efficient code can be obtained by the following simple procedure,
known as Shannon-Fano algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equi-probable as possible,
and assign 0 to the symbols in the upper set and 1 to the symbols in the lower
set.
3. Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible, and assigning 0s to the symbols of upper and 1s to
the lower sets, until further partitioning is not possible.
4. Collect 0s and 1s in the order from 1st step to the last step, to form a code
word for each symbol.
 Note that in Shannon-Fano encoding the ambiguity may arise in the
choice of approximately equi-probable sets.
51
Entropy Coding
1. Shannon-Fano Coding continued…

 An example of Shannon-Fano encoding is shown in Table below.
52
Entropy Coding
2. Huffman Coding:
 Huffman encoding employs an algorithm that results in an optimum code.
Thus, it is the code that has the highest efficiency.
 The Huffman encoding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols that have lowest probabilities,
and re order the resultant probabilities. This action is know as reduction.
3. Repeat the above step until only two ordered probabilities are left.
4. Start encoding with the last reduction, which contains only two ordered
probabilities. Assign 0 for the 1st and 1 for the 2nd probability.
5. Now, go back one step and assign 0 and 1 (as a 2nd digit) for the two
probabilities that were combined in the last reduction process, and keep the
digit in the previous step for the probability that is not combined.
6. Repeat this process until the first column. The resulting digits will be code
words for the symbols.
53
Entropy Coding
2. Huffman Coding continued…

 An example of Huffman encoding is shown in Table below.
54
Chapter II

2.2 Source Coding
2.3 Channel Coding
55
Channel Coding
Outline:
 Introduction
 Channel coding theorem
 Channel coding types
 Linear block codes
 Convolutional codes
56
Channel Coding
 Channel encoder introduces systematic redundancy into data stream by

adding bits to the message signal to facilitate the detection and correction
of bit errors in the original binary message at the receiver.
 The channel decoder at the receiver exploits the redundancy to decide
which message bits are actually transmitted.
 The combined objective of the channel encoder and decoder is to minimize
the effect of channel noise.
57
Channel Coding
 Figure below shows a basic block diagram representation of channel
coding.
Channel DMC Channel

encoder Coded
decoder Decoded
Binary
message sequence binary
sequence sequence
Noise
58
Channel Coding Theorem
 The channel coding theorem for a DMC is stated in two parts as
follows:
 Let a DMS X has an entropy H(X) bits/symbol and produce symbols ever Ts
seconds, and Let a DMC has a capacity Cs bits/ symbol and be used every Tc
seconds:
H(X) C
1. If ≤ s ,
Ts Tc
there exists a coding scheme for which the source output can be
transmitted over the channel with arbitrarily small probability of error .
H(X) C
2. Conversely, if > s,
Ts Tc
it is not possible to transmit information over the channel with an
arbitrarily small probability of error.
 Note that the channel coding theorem only asserts the existence
of codes but it does not tell us how to construct these codes
59
Channel Coding Types
 Channel Coding schemes can be simply classified in to two:-

1. Block code
2. Convolutional code
 Commonly, block codes are used in channel encoding process.
 A brief introduction of these codes will be provided.
60
Block Codes
 A block code is a code in which all of its words have the same length.
 In block codes, the binary message or data sequence of the source is
divided into sequential blocks of each k bits long.
 The channel encoder maps the k-bit blocks to other n-bit blocks.
 The channel encoder performs a mapping
 T: U→V
Where U is a set of binary data words of length k and V is a set of binary
code words of length n with n>k.
61
Block Codes
 The n-bit blocks are produced by the channel encoder (so, they are aka
channel symbols), and n>k.
 The resultant block code is called an (n, k) block code.
 The k-bit blocks form 2k distinct message sequences referred to as k-tuples.
 The n-bit blocks can form as many as 2n distinct sequences referred to as n-
tuples. The set of all n-tuples defined over K ={0,1} is denoted by Kn.
 Each of the 2k data words is mapped to a unique code word.
 The ratio k/n is called the code rate.
62
Block Codes
Binary Field:
 The set K= {0, 1} is known as a binary field. The binary field has two
operations:- addition and multiplication, such that the results of all
operations are in K.
 The rules of addition and multiplication are as follows:
Addition: 0+0=0, 1+1=0, 0+1=1+0=1
Multiplication: 0*0=0, 1*1=1, 0*1=1*0=0
63
Linear Block Codes
Linearity:
 Let a = (a1,a2, ... ,an), and b = (b1,b2, ... ,bn) be two code words in a
code C.
 The sum of a and b, denoted by a+b, is defined by:-
(a1+b1, a2 + b2, •• ,an + bn).
 A code C is called linear if the sum of two code words is also a code word
in C.
 A linear code C must contain the zero code word 0 = (0,0, ... ,0), since
a+a =0.
64
Linear Block Codes
Hamming Weight and Distance:

 Let c be a code word of length n. The Hamming weight of c, denoted by
w(c), is the number of 1's in c.
 Let a and b be code words of length n. The Hamming distance between a
and b, denoted by d(a,b), is the number of positions in which a and b differ.
 Thus, the Hamming weight of a code word c is the Hamming distance
between c and 0, that is:-
W(c)= d(c,0)
 Similarly, the Hamming distance can be written in terms of Hamming
weight as:-
d(a, b)=w(a + b)
65
Linear Block Codes
 There are many types of Linear block codes, such as:-

 Hamming codes
 Repetition codes
 Reed-Solomon codes
 Reed-Muller codes
 Polynomial codes… etc.
 Reading Assignment:
Minimum distance
Error detection and correction capabilities
Generator matrix
Parity check matrix
Syndrome decoding
Cyclic codes
66
Convolutional Codes
 In this type of encoding, every code word of the channel symbols is made
to be the weighted sum of various input message symbols.
 The operation is similar to a convolution used in LTI systems to find the
output when the input and impulse response are known.
 In general, the resulting code word will be a convolution of input bit
against previously known values of convolutional encoder.
 Convolutional encoders offer greater simplicity of implementation, but not
more protection, than an equivalent block code.
 They are used in GSM mobile, voiceband modems, satellite and military
communications.
67
References:
 Simon Haykin, Communication Systems, 4th edition. (chapter-9).

 B.P. Lathi, Modern Digital and Analog Communication Systems, 3rd
edition. (Chapter-15)
 Henk C.A van Tilborg, Coding Theory, April 03, 1993.
Additional Reading:
 https://en.wikipedia.org/wiki/Information_theory
 https://en.wikipedia.org/wiki/Coding_theory
 https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem
68

Chapte-2 Information Theory and Coding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapte-2 Information Theory and Coding

Uploaded by

Copyright:

Available Formats

Communication Systems

2.1 Information Theory basics

 Information theory is a field of study which deals with

Information Content of a DMS:

Information Content of a Symbol:

where P( xi ) is the probability of occurenceof symbol x i

Information Content of a Symbol:

 In a continuous channel an information source produces a continuous signal

 The entropy H(X) defined by equation above is known as the differential

 The average mutual information in a continuous channel is defined (by analogy

where S/N is the signal-to-noise ratio at the channel output.

 This equation is known as the Shannon-Hartley law.

A DMS X has four symbols x1 , x2 , x3 , x4 with probabilities

P ( x1 )  0.4, P ( x2 )  0.3, P ( x3 )  0.2 and P ( x4 )  0.1

b. Find the amount of information contained in the messages

 0.4 log2  0.3 log2  0.2 log2  0.1log2

b. P ( x1 x2 x3 x4 )  (0.4)(0.3)(0.2)(0.1)  0.0096 0.0024

I ( x1 x2 x3 x4 )   log2  6.7 bits/symbol

r  2 *106 * 32  64 *106 elements/s

 R  rH ( X )  64 *106 * 4  256*106 bits/s  256Mbps

a. Find the channel matrix of the channel

The sampling rate is

2.1 Information Theory basics

 A conversion of the output of a DMS into a sequence of binary symbols

 An objective of source coding is to minimize the average bit rate required

 And further, L can be made as close to H(X) as desired, by employing

which is known as the Kraft inequality.

1. Shannon-Fano Coding continued…

2. Huffman Coding continued…

2.1 Information Theory basics

 Channel encoder introduces systematic redundancy into data stream by

Channel DMC Channel

 Channel Coding schemes can be simply classified in to two:-

Hamming Weight and Distance:

 There are many types of Linear block codes, such as:-

 Simon Haykin, Communication Systems, 4th edition. (chapter-9).

You might also like

r  2 106 32  64 *106 elements/s

 R  rH ( X )  64 106 4  256*106 bits/s  256Mbps