Professional Documents
Culture Documents
Part C
Prerequisite: Stat 219
Text Book: B.P. Lathi, “Modern Digital and Analog Communication Systems”, 3th edition, Oxford University Press, Inc., 1998
Reference: A. Papoulis, Probability, Random Variables, and Stochastic Processes, Mc-Graw Hill, 2005
Could A & B be SI ?
Could A & B be SI? !!!!!!!!!!!!!!!!
Conditional Channel Entropies
The Joint Channel Entropy
Mutual Information
p(ai) is the priori probability of the input symbol ai
p(ai /bj) is the posterior probability of the input symbol ai upon
reception of bj
The change in the probability (due to channel noise) measures
how much the receiver learned from the reception of bj. The
receiver will never absolutely sure what exactly was sent. The
difference between the information uncertainty of ai before &
after reception of bj is called the mutual information that is the
actual amount of information transmitted via the channel
I(ai;bj) = I(ai) – I(ai/bj)
For ideal error free channel, if i=j p(ai/bj) =1 & if i≠j, p(ai/bj) = 0
Therefore, I(ai;bj) = I(ai) [I(ai;bj) = 0 if ai & bj were SI !!!!!!]
I(ai;bj) = logr(1/p(ai)) –logr(1/p(ai/bj)) = logr(p(ai/bj)/p(ai))
= logr(p(ai,bj)/(p(ai) p(bj))).
Channel Mutual Information
Let ,
Find ,
Problem 1 - Solution
So
Problem 1 - Solution
1.584 bit/symbol
Problem 1 - Solution
0.666 bit/symbol
Problem 1 - Tips
Note that is the forward transition matrix (FM), and
is the backward transition matrix (BM).
Form the Joint Matrix:
We can deduce the FM as follows:
So:
[
Problem 2
Let ,
Find
Problem 2 - Solution
So
1.684 bit/symbol
Problem 2 - Tips
We can deduce the FM as follows:
So:
[
[ 0.6
Or:
Channel Capacity
The channel capacity of a discrete memoryless channel is defined as the
maximum mutual information I(A; B) in any single use of the channel (i.e., signaling
interval), where the maximization is over all possible input probability distributions
{p(ai)} on A. The channel capacity is commonly denoted by C and is written as
C = {p(a
Max )}
I(A;B)
i
The channel capacity C is measured in bits per channel use, or bits per transmission.
Note that the channel capacity C is a function only of the transition probabilities
p(bj/ai), which define the channel. The calculation of C involves maximization of the
mutual information I(A; B) over r variables [i.e., the input probabilities p(a 1), . . . ,p(ar)]
subject to two constraints:
p(ai) ≥ 0 for all i
sum of all p(ai) for all i =1
2) , so we should to find
Given that So
The Joint Matrix is: JM=
From JM:
Problem 3 - Solution
JM=
=0.811+0.97-1.692=0.089 bit/symbol
[or 0.97-0.881=0.089]
2)
+
+
+
=94%
Problem 4 - Solution
3) Redundancy: 𝜸=1- = 6%
4)
0.7
0.2
0.1
Example 2: Cascaded BSCs
Two cascaded BSC whose probability of error p each, has an equivalent
forward diagram as
With an equivalent forward matrix such as a BSC with probability of error = 2p(1-p)
(1-p)2+p2 2p(1-p)
2p(1-p) (1-p)2+p2
Therefore, C = 1- h(2p(1-p)) bits/ transmission at p(a1) = p(a2) = ½
If p = 0.01, h (2x0.01x0.99) = h (0.0198) = 0.1126, then
C = 1 – 0.1126 = 0.8874 bits/trans which is less than of that of Example 1
This means that Tr and Rx should always be connected by one channel.
M-ary baseband Transmission
Binary data is usually transmitted using binary code extension. Instead of sending one
binary bit (0 or 1) per transmission, M-ary transmission uses M different symbols, with
duration of [log2 (M)Tb] each, to transmit [log2 (M)] binary bits per symbol.
For example, 4-ary transmission uses the second extension of the binary code. Four
distinct symbols, with duration of 2Tb each, such as four Pulses with different magnitude
of +3A, A, -A & -3A may be used to represent 11, 10, 00 & 01 (Grey code) respectively.
Example:
4-ary transmission with symbol duration = 2Tb
11 3
10 1
00 -1
01 -3
Binary bit stream: 1 1 0 1 0 0 0 1 1 0 1 1 1 0
4-ary symbol stream: 3, -3, -1, -3, 1, 3, 1
Similarly, 8-ary transmission uses the third extension of the binary code.
eight distinct symbols, with duration of 3Tb each.
Joint and conditional probability density functions can be also defined for continuous
random variables X & Y.
f X,Y(x, y) is the joint probability density function of X and Y, and
fx(x/y) is the conditional probability density function of X, given Y.
Differential Entropy and Mutual
Information for Continuous Ensembles
some of the previous discussed information theory concepts are to be
extended to continuous random variables and random vectors. The
motivation is to pave the way for the description of another fundamental
limit in information theory.
Consider a continuous random variable X with the probability density
function fx(x). By analogy with the entropy of a discrete random variable,
the following definition is introduced :
Example 9.8 Gaussian Distribution
Mutual Information
the mutual information between two random variables X and Y is
defined as follows:
h(X/Y) is the
conditional differential
entropy of X, given Y
Information Capacity Theorem
The information capacity theorem for band-limited, power-limited
Gaussian channels is to be considered. Consider a zero-mean stationary
process X(t) that is band-limited to B hertz. Let Xk, k = 1 , 2 , . . . , K,
denote the continuous random variables obtained by uniform sampling
of the process X(t) at the Nyquist rate of 2B samples per second. These
samples are transmitted in T seconds over a noisy channel, also band-
limited to B hertz. Hence, the number of samples, K = 2BT.
Xk is a sample of the transmitted signal. The channel output is disturbed
by additive white Gaussian noise (AWGN) of zero mean and power
spectral density No/2. The noise is band-limited to B hertz. Let the
continuous random variables Yk, k = 1,2, . . . , K denote samples of the
received signal, as shown by
The information capacity of a power-limited Gaussian channel is defined
as the maximum of the mutual information between the channel input Xk
and the channel output Yk over all distributions on the input Xk that
satisfy the power constraint of Equation (9.86).
Since Xk and Nk are independent random variables, and their sum equals Yk, as in
equation (9.84), the conditional differential entropy of Yk, given Xk, is equal to the
differential entropy of Nk
Since h(Nk) is independent of the distribution of Xk ,
maximizing I(Xk; Yk) in accordance with Equation (9.87) requires
maximizing h(Yk), the differential entropy of sample Yk of the received
signal. For h(Yk) to be maximum, Yk has to be a Gaussian random
variable (see Example 9.8). That is, the samples of the received signal
represent a noise like process. Next, note that since Nk is Gaussian by
assumption, the sample Xk of the transmitted signal must be Gaussian
too. Therefore, the maximization specified in Equation (9.87) is achieved
by choosing the samples of the transmitted signal from a noise like
process of average power P.
Where P is the average transmitted power and the noise variance σ2 = NoB
It highlights most intensely the interplay among three key system
parameters: channel bandwidth B, average received signal power P, and
noise power spectral density at the channel output No /2. The dependence
of information capacity C on channel bandwidth B is linear, whereas its
dependence on signal-to-noise ratio P/(NoB) is logarithmic. Accordingly,
it is easier to increase the information capacity of a communication
channel by expanding its bandwidth than increasing the transmitted power
for a prescribed noise variance.
Channel Coding Theorem
The theorem implies that, for given average transmitted power P and
channel bandwidth B.
• Information can be transmitted at the rate of C bits per second, as
defined in the previous Equation, with arbitrarily small probability of
error by employing sufficiently complex encoding systems.
• It is not possible to transmit at a rate higher than C bits per second by
any encoding system without a definite probability of error.