Information Theory and Coding - Chapter 5

Mustaqbal University
College of Engineering &Computer Sciences

Electronics and Communication Engineering Department
Course: EE301: Probability Theory and Applications
Part C
Prerequisite: Stat 219
Text Book: B.P. Lathi, “Modern Digital and Analog Communication Systems”, 3th edition, Oxford University Press, Inc., 1998
Reference: A. Papoulis, Probability, Random Variables, and Stochastic Processes, Mc-Graw Hill, 2005
Dr. Aref Hassan Kurdali

Communications Channel Entropies
(calculated in base 2)
Tx Rx Notes
The entropy H(A( represents our
H(A( H(B) uncertainty about the channel input
before observing the channel output.
H(B/ai) H(A/bj(
The conditional entropy H(A/B(
represents our uncertainty about the
H(B/A) H(A/B(
channel input after observing the
channel output.
H(A,B) = H(A) + H(B/A)
= H(B) + H(A/B( H(A,B) is the joint channel entropy.
Could A & B be SI ?
Could A & B be SI? !!!!!!!!!!!!!!!!
Conditional Channel Entropies
The Joint Channel Entropy
Mutual Information
p(ai) is the priori probability of the input symbol ai
p(ai /bj) is the posterior probability of the input symbol ai upon
reception of bj
The change in the probability (due to channel noise) measures
how much the receiver learned from the reception of bj. The
receiver will never absolutely sure what exactly was sent. The
difference between the information uncertainty of ai before &
after reception of bj is called the mutual information that is the
actual amount of information transmitted via the channel
I(ai;bj) = I(ai) – I(ai/bj)
For ideal error free channel, if i=j p(ai/bj) =1 & if i≠j, p(ai/bj) = 0
Therefore, I(ai;bj) = I(ai) [I(ai;bj) = 0 if ai & bj were SI !!!!!!]
I(ai;bj) = logr(1/p(ai)) –logr(1/p(ai/bj)) = logr(p(ai/bj)/p(ai))
= logr(p(ai,bj)/(p(ai) p(bj))).
Channel Mutual Information
Therefore, I(A;B) = H(A) – H(A/B)

Note also that
Similarly,
I(A;B) = H(B) – H(B/A)

Channel Mutual Information
Since the entropy H(A) represents our uncertainty about the channel
input before observing the channel output, and the conditional entropy
H(A/B) represents the average amount of uncertainty remaining about
the channel input after the channel output has been observed, it follows
that the difference H(A) - H(A/B) must represent our uncertainty about
the channel input that is resolved by observing the channel output. This
important quantity is called the channel mutual information
I(A;B) = H(A) - H(A/B)
The channel mutual information represents the actual average amount of
information transmitted through the channel.
Similarly, I(A;B) = H(B) - H(B/A)
Or I(A;B) = H(A) + H(B) - H(A,B)
Note that I(A;B) = I(B;A) ≥ 0
The interpretation given in Figure 9.9 above shows The entropy of channel input H(A) is
represented by the circle on the left while the entropy of channel output H(B) is
represented by the circle on the right. The mutual information of the channel is
represented by the intersection between these two circles.
I(A;B) = H(B) - H(B/A) = H(A) - H(A/B) = H(A) + H(B) - H(A,B)

Problem 1
Let ,
Find ,
Problem 1 - Solution
So
1.584 bit/symbol
0.666 bit/symbol
Problem 1 - Tips
Note that is the forward transition matrix (FM), and
is the backward transition matrix (BM).
Form the Joint Matrix:
We can deduce the FM as follows:
Now, we can find as follows:
So:
[
Problem 2
Let ,
Find
So
1.684 bit/symbol
Problem 2 - Tips
We can deduce the FM as follows:
Now, we can find as follows:
So:
[
[ 0.6
Or:
Channel Capacity
The channel capacity of a discrete memoryless channel is defined as the
maximum mutual information I(A; B) in any single use of the channel (i.e., signaling
interval), where the maximization is over all possible input probability distributions
{p(ai)} on A. The channel capacity is commonly denoted by C and is written as
C = {p(a
Max )}
I(A;B)
i
The channel capacity C is measured in bits per channel use, or bits per transmission.
Note that the channel capacity C is a function only of the transition probabilities
p(bj/ai), which define the channel. The calculation of C involves maximization of the
mutual information I(A; B) over r variables [i.e., the input probabilities p(a 1), . . . ,p(ar)]
subject to two constraints:
p(ai) ≥ 0 for all i
sum of all p(ai) for all i =1
In general, the variation problem of finding the channel capacity C is a challenging

task.
Uniform Channel
The channel is called uniform when each row in its forward matrix is a permutation of the first
row. The mutual information then can be written as I(A;B) = H(B) – H(B/A) = H(B) – W
Where W is a constant = entropy of any row of the uniform matrix.
Example 1: binary symmetric channel (BSC) is a uniform channel
1-p p
p 1-p
Where p is the probability of error.

Here W = h(p) = plog(1/p)+(1-p)log(1/(1-p))
H(p) is called the entropy function.
Therefore, the channel capacity is the maximization of [H(B) – h(p)]
which gives H(B) = 1 at uniform input probability distribution, i. e.
C= 1 - h(p) bits/ transmission at p(a1) = p(a2) = ½
Let p = 0.01, h(0.01) = 0.0808, then C = 1 – 0.0808 = 0.919 bits/ transmission
the channel capacity curve as a function in the bit error probability p is
shown above.
Problem 3
Figure 1 shows the forward channel diagram of a binary symmetric channel

(BSC. If you given that I(a1) = 2 bits, find:
1- Channel Capacity (C).
2- Channel efficiency (𝜂).
3- Channel Redundancy.
1) From the forward channel diagram we deduce that the forward transition matrix is:
(where s: number of FM’s columns, or it is the number of the receiver nodes)

bit/symbol
2) , so we should to find
Given that So
The Joint Matrix is: JM=
From JM:
JM=
=0.811+0.97-1.692=0.089 bit/symbol
[or 0.97-0.881=0.089]
3) Redundancy: 𝜸=1- = 25%

Problem 4
Given the forward transition matrix (FM).
If you given that , find:
1- Channel Capacity (C).

2- Channel efficiency (𝜂).
3- Channel Redundancy (𝜸).
4- Draw the channel.
1)
2)
+
+
+
=94%
3) Redundancy: 𝜸=1- = 6%
4)
0.7
0.2
0.1
Example 2: Cascaded BSCs
Two cascaded BSC whose probability of error p each, has an equivalent
forward diagram as
With an equivalent forward matrix such as a BSC with probability of error = 2p(1-p)
(1-p)2+p2 2p(1-p)
2p(1-p) (1-p)2+p2
Therefore, C = 1- h(2p(1-p)) bits/ transmission at p(a1) = p(a2) = ½
If p = 0.01, h (2x0.01x0.99) = h (0.0198) = 0.1126, then
C = 1 – 0.1126 = 0.8874 bits/trans which is less than of that of Example 1
This means that Tr and Rx should always be connected by one channel.
M-ary baseband Transmission
Binary data is usually transmitted using binary code extension. Instead of sending one
binary bit (0 or 1) per transmission, M-ary transmission uses M different symbols, with
duration of [log2 (M)Tb] each, to transmit [log2 (M)] binary bits per symbol.
For example, 4-ary transmission uses the second extension of the binary code. Four
distinct symbols, with duration of 2Tb each, such as four Pulses with different magnitude
of +3A, A, -A & -3A may be used to represent 11, 10, 00 & 01 (Grey code) respectively.
Example:
4-ary transmission with symbol duration = 2Tb
11 3
10 1
00 -1
01 -3
Binary bit stream: 1 1 0 1 0 0 0 1 1 0 1 1 1 0
4-ary symbol stream: 3, -3, -1, -3, 1, 3, 1
Similarly, 8-ary transmission uses the third extension of the binary code.
eight distinct symbols, with duration of 3Tb each.
Example: 8-ary transmission with symbol duration = 3Tb

Code Pulse Amplitude Threshold
111 7
6
110 5
4
100 3
2
101 1
0
001 -1
-2
000 -3
-4
010 -5
-6
011 -7
Binary bit stream: 1 1 0 1 0 0 0 1 1 0 1 1 1 0 1
8-ary symbol stream: 5, 3, -7,-7, 1
r-ary Symmetric Channel with Pe = p
1-p p/(r-1) p/(r-1) ........ p/(r-1)
p/(r-1) 1-p p/(r-1) ........ p/(r-1)
..
p/(r-1) p/(r-1) p/(r-1) ........ 1-p
Example: 4-ary Symmetric Channel with Pe = .1
.9 .1/3 .1/3 .1/3
.1/3 .9 .1//3 .1/3
.1/3 .1/3 .9 .1/3
.1/3 .1/3 .1//3 .9
The forward symmetric matrix is uniform of size (r , r).

Therefore, W = (1-p) log (1/(1-p)) + (r-1)[(p/(r-1)) log ((r-1)/p)]
= h(p) + p log(r-1).
C = Max [ H(B) - h(p) - p log(r-1)]
{p(ai)}
= log r - h(p) - p log(r-1) bits/ transmission
at uniform input probability distribution
Statistics of Continuous Random Variable
Joint and conditional probability density functions can be also defined for continuous
random variables X & Y.
f X,Y(x, y) is the joint probability density function of X and Y, and
fx(x/y) is the conditional probability density function of X, given Y.
Differential Entropy and Mutual
Information for Continuous Ensembles
some of the previous discussed information theory concepts are to be
extended to continuous random variables and random vectors. The
motivation is to pave the way for the description of another fundamental
limit in information theory.
Consider a continuous random variable X with the probability density
function fx(x). By analogy with the entropy of a discrete random variable,
the following definition is introduced :
Example 9.8 Gaussian Distribution
Mutual Information
the mutual information between two random variables X and Y is
defined as follows:
where f X,Y(x, y) is the joint probability density function of X and Y, and

fx(x/y) is the conditional probability density function of X, given that Y = y.
h(X/Y) is the
conditional differential
entropy of X, given Y
Information Capacity Theorem
The information capacity theorem for band-limited, power-limited
Gaussian channels is to be considered. Consider a zero-mean stationary
process X(t) that is band-limited to B hertz. Let Xk, k = 1 , 2 , . . . , K,
denote the continuous random variables obtained by uniform sampling
of the process X(t) at the Nyquist rate of 2B samples per second. These
samples are transmitted in T seconds over a noisy channel, also band-
limited to B hertz. Hence, the number of samples, K = 2BT.
Xk is a sample of the transmitted signal. The channel output is disturbed
by additive white Gaussian noise (AWGN) of zero mean and power
spectral density No/2. The noise is band-limited to B hertz. Let the
continuous random variables Yk, k = 1,2, . . . , K denote samples of the
received signal, as shown by
The information capacity of a power-limited Gaussian channel is defined
as the maximum of the mutual information between the channel input Xk
and the channel output Yk over all distributions on the input Xk that
satisfy the power constraint of Equation (9.86).
Where P is the average transmitted power.
Since Xk and Nk are independent random variables, and their sum equals Yk, as in
equation (9.84), the conditional differential entropy of Yk, given Xk, is equal to the
differential entropy of Nk
Since h(Nk) is independent of the distribution of Xk ,
maximizing I(Xk; Yk) in accordance with Equation (9.87) requires
maximizing h(Yk), the differential entropy of sample Yk of the received
signal. For h(Yk) to be maximum, Yk has to be a Gaussian random
variable (see Example 9.8). That is, the samples of the received signal
represent a noise like process. Next, note that since Nk is Gaussian by
assumption, the sample Xk of the transmitted signal must be Gaussian
too. Therefore, the maximization specified in Equation (9.87) is achieved
by choosing the samples of the transmitted signal from a noise like
process of average power P.
Correspondingly, Equation(9.87) may be reformulated as in the

following slide.
With 2B transmission per second, the channel information capacity will be
C = C*2B bits/sec
Information Capacity Theorem
Shannon's third Theorem
The information capacity of a continuous channel of bandwidth B hertz,
disturbed by additive white Gaussian noise of power spectral density No/2
and limited in bandwidth to B, is given by
Where P is the average transmitted power and the noise variance σ2 = NoB
It highlights most intensely the interplay among three key system
parameters: channel bandwidth B, average received signal power P, and
noise power spectral density at the channel output No /2. The dependence
of information capacity C on channel bandwidth B is linear, whereas its
dependence on signal-to-noise ratio P/(NoB) is logarithmic. Accordingly,
it is easier to increase the information capacity of a communication
channel by expanding its bandwidth than increasing the transmitted power
for a prescribed noise variance.
Channel Coding Theorem
The theorem implies that, for given average transmitted power P and
channel bandwidth B.
• Information can be transmitted at the rate of C bits per second, as
defined in the previous Equation, with arbitrarily small probability of
error by employing sufficiently complex encoding systems.
• It is not possible to transmit at a rate higher than C bits per second by
any encoding system without a definite probability of error.
Hence, the channel capacity theorem defines the fundamental limit on

the rate of error-free transmission for a power limited, band-limited
Gaussian channel.
To approach this limit, however, the transmitted signal must have

statistical properties approximating those of white Gaussian noise.

Information Theory and Coding - Chapter 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory and Coding - Chapter 5

Uploaded by

Copyright:

Available Formats

Mustaqbal University

College of Engineering &Computer Sciences

Course: EE301: Probability Theory and Applications

Dr. Aref Hassan Kurdali

Therefore, I(A;B) = H(A) – H(A/B)

I(A;B) = H(B) – H(B/A)

I(A;B) = H(B) - H(B/A) = H(A) - H(A/B) = H(A) + H(B) - H(A,B)

Now, we can find as follows:

Now, we can find as follows:

In general, the variation problem of finding the channel capacity C is a challenging

Where p is the probability of error.

Figure 1 shows the forward channel diagram of a binary symmetric channel

(where s: number of FM’s columns, or it is the number of the receiver nodes)

3) Redundancy: 𝜸=1- = 25%

Given the forward transition matrix (FM).

If you given that , find:

1- Channel Capacity (C).

Example: 8-ary transmission with symbol duration = 3Tb

The forward symmetric matrix is uniform of size (r , r).

where f X,Y(x, y) is the joint probability density function of X and Y, and

Where P is the average transmitted power.

Correspondingly, Equation(9.87) may be reformulated as in the

Hence, the channel capacity theorem defines the fundamental limit on

To approach this limit, however, the transmitted signal must have

You might also like