You are on page 1of 68

Communication Systems

(EEEg4172)

Chapter II
Information Theory and Coding

Instructor:
Fisiha Abayneh

Acknowledgement:
Materials prepared by Welelaw Y. & Habib M. in topics
related to this chapter are used.
1
Chapter II
Information Theory and Coding

2.1 Information Theory basics


2.2 Source Coding
2.3 Channel Coding

2
2.1 Information Theory Basics

Outline:
 Introduction
 Measure of Information
 Discrete Memoryless Channels
 Mutual Information
 Channel Capacity

3
Introduction

 Information theory is a field of study which deals with


quantification, storage, and communication of information.
 It is originally proposed by Claude E. Shannon in his
famous paper entitled “A Mathematical Theory of
Communication”.
 In general, information theory provides:
 a quantitative measure of information contained in message
signals.
 a way to determine the capacity of a communication system to
transfer information from source to destination.

4
Measure of Information

Information Source:
 An information source is an object that produces an event, the outcome of
which is selected at random according to a specific probability distribution.
 A practical information source in a communication system is a device that
produces messages and it can be either analog or discrete.
 A discrete information source is a source that has only a finite set of
symbols as possible outputs.
 The set of source symbols is called the source alphabet and the elements of
the set are called symbols or letters.

5
Measure of Information Cont’d……
Information Source:
 Information sources can be classified as:
 Sources with memory, or
 Memoryless sources.
 A source with memory is one for which a current symbol depends on the
previous symbols.
 A memoryless source is one for which each symbol produced is
independent of the previous symbols.
 A discrete memoryless source (DMS) can be characterized by the set of it’s
symbols, the probability assignment to the symbols, and the rate by which
the source generates symbols.
6
Measure of Information Cont’d……

Information Content of a DMS:


 The amount of information contained in an event is closely related to its
uncertainty.
 Messages containing knowledge of high probability of occurrence convey
relatively little information.
 If an event is certain, it conveys zero information.
 Thus, a mathematical measure of information should be a function of the
probability of the outcome and should satisfy the following axioms:
1. Information should be proportional to the uncertainty of an outcome.
2. Information contained in independent outcomes should add.

7
Measure of Information Cont’d……

Information Content of a Symbol:


 Consider a DMS, denoted by X, with alphabets {x1, x2, …, xm}.
 The information content of a symbol xi, denoted by I(xi), is defined by:
1
I ( xi )  log   logb
P ( xi )
b P ( xi )

where P( xi ) is the probability of occurenceof symbol x i


 Units of I(xi) depends on the base b, can be defined as follows:
 If b = 2, the unit is bits/symbol (aka, Shannon)
 If b = 10, the unit is in decimal digit (or hartleys)
 If b = e, the unit is in “nats/symbol)

8
Measure of Information Cont’d……

Information Content of a Symbol:


 Note that I(xi) satisfies the following properties.

i. I ( xi )  0 for P ( xi )  1
ii. I ( xi )  0

iii. I ( xi )  I ( x j ) if P( xi )  P( x j )
iv. I ( xi , x j )  I ( xi )  I ( x j )
if xi and x j are independent

9
Measure of Information Cont’d……
Average Information or Entropy:
 In a practical communication system, we usually transmit long sequences
of symbols from an information source.
 Thus, we are more interested in the average information that a source
produces than the information content of a single symbol.
 The mean value of I(xi) over the alphabet of source X with m different
symbols is given by:
m
H ( X )  E[ I ( xi )]   P( xi ) I ( xi )
i 1
m
  P( xi ) log2
P ( xi )
bits/symbol
i 1

10
Measure of Information Cont’d……
Average Information or Entropy:
 The quantity H(X) is called the entropy of the source X.
 It a measure of the average information content per source symbol.
 The source entropy H(X) can be considered the average amount of
uncertainty within source X that is resolved by use of the alphabet.
 The source entropy H(X) satisfies the following relations:
0  H ( X )  log2
m
where m is the size of the
alphabet of source X .
 The lower bound corresponds to no uncertainty, which occurs when one symbol has
probability p(xi)=1 while p(xj)= 0 for j≠i, so X emits xi at all times. The upper bound
corresponds to the maximum uncertainty which occurs when p(xj)= 1/m for all i (i.e: when
all symbols have equal probability to be generated by X.
11
Measure of Information Cont’d……
Average Information or Entropy:

12
Discrete Memoryless Channels
Channel Representation:
 A communication channel is the path or medium through which the
symbols flow from a source to a receiver.
 A discrete memoryless channel (DMC) is a statistical model with an input
X and output Y as shown in the figure below.

13
Discrete Memoryless Channels Cont’d…..

Channel Representation:
 The channel is discrete when the alphabets of X and Y are both finite.
 It is memoryless when the current output depends on only the current input
and not on any of the previous inputs.
 In the DMC shown above, the input X consists of input symbols x1, x2, …,
xm and the output Y consists of output symbols y1, y2, …, yn.
 Each possible input-to-output path is indicated along with a conditional
probability P(yj/xi) where P(yj/xi) is called channel transition probability.

14
Discrete Memoryless Channels Cont’d…..
Channel Matrix:
 A channel is completely specified by the complete set of transition
probabilities.
 The channel matrix , denoted by [P(Y/X)], is given by:

 Since each input to the channel results in some output, each row of the
channel matrix must sum to unity, i.e:

15
Discrete Memoryless Channels Cont’d…..
Special Channels:
1. Lossless Channel
 A channel described by a channel matrix with only one non-zero element in
each column is called a lossless channel.
 An example of a lossless channel is shown in the figure below.

16
Discrete Memoryless Channels Cont’d…..
Special Channels:
2. Deterministic Channel
 A channel described by a channel matrix with only one non-zero unity
element in each row is called a deterministic channel.
 An example of a deterministic channel is shown in the figure below.

17
Discrete Memoryless Channels Cont’d…..
Special Channels:
3. Noiseless Channel
 A channel is called noiseless if it is both lossless and deterministic.
 A noiseless channel is shown in the figure below.
 The channel matrix has only one element in each in each row and each
column and this element is unity.

18
Discrete Memoryless Channels Cont’d…..
Special Channels:
4. Binary Symmetric Channel
 The binary symmetric channel (BSC) is defined by the channel matrix and
channel diagram given below.

19
Mutual Information
Conditional and Joint Entropies:
 Using the input probabilities P(xi), output probabilities P(yj), transition
probabilities P(yj/xi), and joint probabilities P(xi, yj), we can define the
following various entropy functions for a channel with m inputs and n
outputs.

20
Mutual Information
Conditional and Joint Entropies:
 The above entropies can be interpreted as the average uncertainties of the
inputs and outputs.
 Two useful relationships among the above various entropies are:

Mutual Information:
 The mutual information I(X;Y) of a channel is defined by:

21
Mutual Information Cont’d…..
Mutual Information:

22
Channel Capacity

23
Channel Capacity Cont’d….

24
Channel Capacity Cont’d….

25
Additive White Gaussian Noise(AWGN) Channel

 In a continuous channel an information source produces a continuous signal


x(t).
 The set of possible signals is considered as an ensemble of waveforms
generated by some ergodic random process.

Differential Entropy:
 The average amount of information per sample value of x(t) is measured by

 The entropy H(X) defined by equation above is known as the differential


entropy of X.

26
Additive White Gaussian Noise(AWGN) Channel

 The average mutual information in a continuous channel is defined (by analogy


with the discrete case) as:

27
Additive White Gaussian Noise(AWGN) Channel

 In an additive white Gaussian noise (AWGN ) channel, the channel output Y is given
by:
Y= X + n
 where X is the channel input and n is an additive band-limited white Gaussian noise
with zero mean and variance σ2.
 The capacity Cs of an AWGN channel is given by:

where S/N is the signal-to-noise ratio at the channel output.

28
Additive White Gaussian Noise(AWGN) Channel

 If the channel bandwidth B Hz is fixed, then the output y(t) is also a band-limited
signal completely characterized by its periodic sample values taken at the Nyquist
rate 2B samples/s.
 Then the capacity C (b/s) of the AWGN channel is given by:

 This equation is known as the Shannon-Hartley law.

29
Examples on Information Theory and Coding
Example-1:

A DMS X has four symbols x1 , x2 , x3 , x4 with probabilities

P ( x1 )  0.4, P ( x2 )  0.3, P ( x3 )  0.2 and P ( x4 )  0.1

a. Calculate H ( X )

b. Find the amount of information contained in the messages

x1 x2 x3 x4 and x4 x3 x3 x2

30
Examples on Information Theory and Coding Cont’d…
Solution:
4
a. H ( X )   P( xi ) log2
[ P ( xi )]

i 1

 0.4 log2  0.3 log2  0.2 log2  0.1log2


0.4 0.3 0.2 0.1

 1.85 bits/symbol

b. P ( x1 x2 x3 x4 )  (0.4)(0.3)(0.2)(0.1)  0.0096 0.0024

I ( x1 x2 x3 x4 )   log2  6.7 bits/symbol


0.0096

8.7 bits/symbol
Similarly,
P ( x1 x2 x3 x4 )  (0.1)(0.2) 2 (0.3)  0.0012
I ( x1 x2 x3 x4 )   log2  9.7 bits/symbol
0.0012

31
Examples on Information Theory and Coding Cont’d….
Example-2:
A high-resolution black-and-white TV picture consists of about 2*106
picture elements and 16 different brightness levels. Pictures are repeated at
the rate of 32 per second. All picture elements are assumed to be
independent and all levels have equal likelihood of occurrence. Calculate
the average rate of information conveyed by this TV picture source.
Solution:
16
1  1 
H ( X )   log2  16   4 bits/element
i 1 16

r  2 *106 * 32  64 *106 elements/s

 R  rH ( X )  64 *106 * 4  256*106 bits/s  256Mbps


32
Examples on Information Theory and Coding Cont’d….
Example-3:
Consider a binary symmetric channel shown below.

a. Find the channel matrix of the channel


b. Find P( y1 ) and P( y2 ) when P( x1 )  P( x2 )  0.5
c. Find the joint probabilities P( x1 , y2 ) and P( x2 , y1 )
when P( x1 )  P( x2 )  0.5
33
Examples on Information Theory and Coding Cont’d….
Solution:
a. The channel matrix is given by :
 P ( y1 / x1 ) P ( y2 / x1 )  0.9 0.1
P(Y / X )     
 P ( y1 / x 2 ) P ( y 2 / x 2 
)  0 . 2 0 . 8 

b. P (Y )   P ( X ) P (Y / X ) 
 0 .9 0 . 1 
 0.5 0.5 
 0 . 2 0 . 8 
 0.55 0.45  P ( y1 ) P ( y 2 ) 
 P ( y1 )  0.55 and P ( y 2 )  0.45

34
Examples on Information Theory and Coding Cont’d….
Solution:

c. P ( X , Y )  P ( X ) d P (Y / X )
0.5 0 0.9 0.1
  0.2 0.8
 0 0 . 5  
0.45 0.05  P ( x1 , y1 ) P ( x1 , y2 ) 
   
 0. 1 0 .4   P ( x 2 , y1 ) P ( x 2 , y 2 
)
 P ( x1 , y2 )  0.05 and P ( x2 , y1 )  0.1

35
Examples on Information Theory and Coding Cont’d….
Example-4:
An information source can be modeled as a bandlimited process with a
bandwidth of 6kHz. This process is sampled at a rate higher than the
Nyquist rate to provide a guard band of 2kHz. It is observed that the
resulting samples take values in the set {-4, -3, -1, 2, 4, 7} with
probabilities 0.2, 0.1, 0.15, 0.05, 0.3 and 0.2 respectively. What is the
entropy of the discrete-time source in bits/sample? What is the entropy in
bits/sec?

36
Examples on Information Theory and Coding Cont’d….
Solution:
The entropy of the source is
6
H ( X )   P ( xi ) log2  2.4087 bits/sample
P ( xi )

i 1

The sampling rate is


f s  2kHz  2 * (6kHz)  14kHz
This means that 14000samples are taken per each second.
 The entropy of the source in bits per second is given by :
H ( X )  2.4087 (bis/sample) *14000 (samples/sec)
 33721.8bits/sec  33.7218Kbps

37
Chapter II
Information Theory and Coding

2.1 Information Theory basics


2.2 Source Coding
2.3 Channel Coding

38
Source Coding

Outline:-
 Code length and code efficiency
 Source coding theorem
 Classification of codes
 Kraft inequality
 Entropy coding

39
Source Coding

 A conversion of the output of a DMS into a sequence of binary symbols


(binary code word) is called source coding.
 The device that performs this conversion is called the source decoder.

 An objective of source coding is to minimize the average bit rate required


for representation of the source by reducing the redundancy of the
information source.

40
Code Length and Code Efficiency

 Let X be a DMS with finite entropy H(X) and an alphabet {x1, x2, …..xm}
with corresponding probabilities of occurrence P(xi) (i=1, 2, …., m).
 An encoder assigns a binary code (sequence of bits) of length ni bits for
each symbol xi in the alphabet of the DMS.
 A binary code that represent a symbol xi is known as a code word.
 The length of a code word (code length) is the number of binary digits in
the code word (or simply ni).
 The average code word length L, per source symbol is given by:
m
L   P ( xi )ni
i 1

41
Code Length and Code Efficiency

 The parameter L represents the average number of bits per source symbol
used in the source coding process.
 The code efficiency η is defined as:

Lmin
 , where Lmin is the minimum possible value of L
L
 When η approaches unity, the code is said to be efficient.
 The code redundancy γ is defined as:

  1 

42
Source Coding Theorem

 The source coding theorem states that for a DMS X with entropy H(X), the
average code word length L per symbol is bounded as:

L  H (X )

 And further, L can be made as close to H(X) as desired, by employing


efficient coding schemes.
 Thus, with Lmin=H(X), the code efficiency can be written as:

H (X )

L

43
Classifications of Codes
 There are several types of codes.
 Let’s consider the table given below to describe different types of codes.

1. Fixed-Length Codes:
 A fixed-length code is one whose code word length is fixed.
 Code 1 and Code 2 of the above table are fixed-length codes with
length 2.
44
Classifications of Codes
2. Variable-Length Codes:
 A variable-length code is one whose code word length is not fixed.
 All codes of the above table except codes 1 and 2 are variable-length
codes.
3. Distinct Codes:
 A code is distinct if each code word is distinguishable from the other
code words.
 All codes of the above table except code 1 are distinct codes.

45
Classifications of Codes
4. Prefix-Free Codes:
 A code is said to be a prefix-free code when the code words are distinct
and no code word is a prefix for another code word.
 Codes 2, 4 and 6 in the above table are prefix free codes.
5. Uniquely Decodable Codes:
 A distinct code is uniquely decodable if the original source sequence
can be reconstructed perfectly from the encoded binary sequences.
 Note that code 3 of Table above is not a uniquely decodable code.
 For example, the binary sequence 1001 may correspond to the source
sequences X2X3X2 or X2X1XIX2

46
Classifications of Codes
5. Uniquely Decodable Codes continued…
 A sufficient condition to ensure that a code is uniquely decodable is
that no code word is a prefix of another.
 Thus, the prefix-free codes 2, 4, and 6 are uniquely decodable codes.
 Note that the prefix-free condition is not a necessary condition for
codes to be uniquely decodable.
 For example, code 5 of Table above does not satisfy the prefix-free
condition, and yet it is uniquely decodable since the bit 0 indicates the
beginning of each code word of the code.
 i.e: when the decoder finds 0, it knows a new word is starting. But it
has to wait until the next 0 comes to identify which word is received.
47
Classifications of Codes
6. Instantaneous Codes:
 A uniquely decodable code is called an instantaneous code if the end
of any code word is recognizable without examining subsequent code
symbols.
 The instantaneous codes have the property previously mentioned that no code
word is a prefix of another code word
 For this reason, prefix-free codes are sometimes called instantaneous codes.
7. Optimal Codes:
 A code is said to be optimal if it is instantaneous and has minimum
average length L for a given source with a given probability
assignment for the source symbols

48
Kraft Inequality

 Let X be a DMS with alphabet {Xi} (i = 1,2, ... ,m). Assume that the length
of the assigned binary code word corresponding to Xi is ni.
 A necessary and sufficient condition for the existence of an instantaneous
binary code is:

which is known as the Kraft inequality.


 It is also referred as Kraft-McMillan inequality.
 Note that the Kraft inequality assures us of the existence of an
instantaneously decodable code with code word lengths that satisfy the
inequality.

49
Entropy Coding

 The design of a variable-length code such that its average code word length
approaches the entropy of the DMS is often referred to as entropy coding.
 In this section we present two examples of entropy coding.
 There are two commonly known types of entropy coding; which are
presented in this section. These are:
i. Shannon-Fano coding
ii. Huffman coding

50
Entropy Coding

1. Shannon-Fano Coding:
 An efficient code can be obtained by the following simple procedure,
known as Shannon-Fano algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equi-probable as possible,
and assign 0 to the symbols in the upper set and 1 to the symbols in the lower
set.
3. Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible, and assigning 0s to the symbols of upper and 1s to
the lower sets, until further partitioning is not possible.
4. Collect 0s and 1s in the order from 1st step to the last step, to form a code
word for each symbol.
 Note that in Shannon-Fano encoding the ambiguity may arise in the
choice of approximately equi-probable sets.

51
Entropy Coding

1. Shannon-Fano Coding continued…


 An example of Shannon-Fano encoding is shown in Table below.

52
Entropy Coding

2. Huffman Coding:
 Huffman encoding employs an algorithm that results in an optimum code.
Thus, it is the code that has the highest efficiency.
 The Huffman encoding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols that have lowest probabilities,
and re order the resultant probabilities. This action is know as reduction.
3. Repeat the above step until only two ordered probabilities are left.
4. Start encoding with the last reduction, which contains only two ordered
probabilities. Assign 0 for the 1st and 1 for the 2nd probability.
5. Now, go back one step and assign 0 and 1 (as a 2nd digit) for the two
probabilities that were combined in the last reduction process, and keep the
digit in the previous step for the probability that is not combined.
6. Repeat this process until the first column. The resulting digits will be code
words for the symbols.
53
Entropy Coding

2. Huffman Coding continued…


 An example of Huffman encoding is shown in Table below.

54
Chapter II
Information Theory and Coding

2.1 Information Theory basics


2.2 Source Coding
2.3 Channel Coding

55
Channel Coding

Outline:
 Introduction
 Channel coding theorem
 Channel coding types
 Linear block codes
 Convolutional codes

56
Channel Coding

 Channel encoder introduces systematic redundancy into data stream by


adding bits to the message signal to facilitate the detection and correction
of bit errors in the original binary message at the receiver.
 The channel decoder at the receiver exploits the redundancy to decide
which message bits are actually transmitted.
 The combined objective of the channel encoder and decoder is to minimize
the effect of channel noise.

57
Channel Coding
 Figure below shows a basic block diagram representation of channel
coding.

Channel DMC Channel


encoder Coded
decoder Decoded
Binary
message sequence binary
sequence sequence
Noise

58
Channel Coding Theorem
 The channel coding theorem for a DMC is stated in two parts as
follows:
 Let a DMS X has an entropy H(X) bits/symbol and produce symbols ever Ts
seconds, and Let a DMC has a capacity Cs bits/ symbol and be used every Tc
seconds:
H(X) C
1. If ≤ s ,
Ts Tc
there exists a coding scheme for which the source output can be
transmitted over the channel with arbitrarily small probability of error .
H(X) C
2. Conversely, if > s,
Ts Tc
it is not possible to transmit information over the channel with an
arbitrarily small probability of error.
 Note that the channel coding theorem only asserts the existence
of codes but it does not tell us how to construct these codes

59
Channel Coding Types

 Channel Coding schemes can be simply classified in to two:-


1. Block code
2. Convolutional code
 Commonly, block codes are used in channel encoding process.
 A brief introduction of these codes will be provided.

60
Block Codes

 A block code is a code in which all of its words have the same length.
 In block codes, the binary message or data sequence of the source is
divided into sequential blocks of each k bits long.
 The channel encoder maps the k-bit blocks to other n-bit blocks.
 The channel encoder performs a mapping
 T: U→V
Where U is a set of binary data words of length k and V is a set of binary
code words of length n with n>k.

61
Block Codes

 The n-bit blocks are produced by the channel encoder (so, they are aka
channel symbols), and n>k.
 The resultant block code is called an (n, k) block code.
 The k-bit blocks form 2k distinct message sequences referred to as k-tuples.
 The n-bit blocks can form as many as 2n distinct sequences referred to as n-
tuples. The set of all n-tuples defined over K ={0,1} is denoted by Kn.
 Each of the 2k data words is mapped to a unique code word.
 The ratio k/n is called the code rate.

62
Block Codes

Binary Field:
 The set K= {0, 1} is known as a binary field. The binary field has two
operations:- addition and multiplication, such that the results of all
operations are in K.
 The rules of addition and multiplication are as follows:
Addition: 0+0=0, 1+1=0, 0+1=1+0=1
Multiplication: 0*0=0, 1*1=1, 0*1=1*0=0

63
Linear Block Codes

Linearity:
 Let a = (a1,a2, ... ,an), and b = (b1,b2, ... ,bn) be two code words in a
code C.
 The sum of a and b, denoted by a+b, is defined by:-
(a1+b1, a2 + b2, •• ,an + bn).
 A code C is called linear if the sum of two code words is also a code word
in C.
 A linear code C must contain the zero code word 0 = (0,0, ... ,0), since
a+a =0.

64
Linear Block Codes

Hamming Weight and Distance:


 Let c be a code word of length n. The Hamming weight of c, denoted by
w(c), is the number of 1's in c.
 Let a and b be code words of length n. The Hamming distance between a
and b, denoted by d(a,b), is the number of positions in which a and b differ.
 Thus, the Hamming weight of a code word c is the Hamming distance
between c and 0, that is:-
W(c)= d(c,0)
 Similarly, the Hamming distance can be written in terms of Hamming
weight as:-
d(a, b)=w(a + b)

65
Linear Block Codes

 There are many types of Linear block codes, such as:-


 Hamming codes
 Repetition codes
 Reed-Solomon codes
 Reed-Muller codes
 Polynomial codes… etc.
 Reading Assignment:
Minimum distance
Error detection and correction capabilities
Generator matrix
Parity check matrix
Syndrome decoding
Cyclic codes

66
Convolutional Codes

 In this type of encoding, every code word of the channel symbols is made
to be the weighted sum of various input message symbols.
 The operation is similar to a convolution used in LTI systems to find the
output when the input and impulse response are known.
 In general, the resulting code word will be a convolution of input bit
against previously known values of convolutional encoder.
 Convolutional encoders offer greater simplicity of implementation, but not
more protection, than an equivalent block code.
 They are used in GSM mobile, voiceband modems, satellite and military
communications.

67
References:

 Simon Haykin, Communication Systems, 4th edition. (chapter-9).


 B.P. Lathi, Modern Digital and Analog Communication Systems, 3rd
edition. (Chapter-15)
 Henk C.A van Tilborg, Coding Theory, April 03, 1993.

Additional Reading:
 https://en.wikipedia.org/wiki/Information_theory
 https://en.wikipedia.org/wiki/Coding_theory
 https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

68

You might also like