You are on page 1of 5

Harvard SEAS ES250 Information Theory

Channel Capacity

1 Preliminaries and Definitions

1.1 Preliminaries and Examples

Communication between A (the sender) and B (the receiver) is succesful when both A and B agree
on the content of the message.

A communication channel is modeled as a probabilistic function.

The maximum number of distinguishable signals for n uses of a communication channel grows expo-
nentially with n at a rate termed the channel capacity.

Our goal is to infer the transmitted message based on the received data with a vanishingly small
probability of error.

Definition (Discrete memoryless channel) A discrete channel comprises of an input alphabet X , output
alphabet Y, and a likelihood function (probability transition matrix) p(y|x).
The channel is said to be memoryless if the probability distribution of the output depends only on
the input at that time and is conditionally independent of previous channel inputs or outputs.

Definition Information channel capacity of a discrete memoryless channel:

C = max I(X; Y ),
p(x)

where the maximum is taken over all possible input distributions p(x).

Examples of Channel Capacity :

1. Noiseless binary channel : C = 1 bit

2. Noisy channel with non-overlapping outputs : also C = 1 bit

3. Noisy typewriter : C = log(number of keys) 1 bits

4. Binary symmetric channel : C = 1 H(p) bits

5. Binary erasure channel : C = 1 , the fraction of bits erased

1.2 Definitions and Properties

Definition (Symmetric and weakly symmetric channels) A channel is said to be symmetric if all rows of
the channel transition matrix p(y|x) are permutations of each other, and all columns are permutations
of each other. A channel is said to be weakly symmetric if every
P row of the transition matrix p(|x)
is a permutation of every other row and all the column sums x p(y|x) are equal.

Based on Cover & Thomas, Chapter 7

1
Harvard SEAS ES250 Information Theory

Theorem For a weakly symmetric channel,

C = log |Y| H(row of transition matrix),

achieved by a uniform distribution on the input alphabet.

In general, there is no closed-form solution for the capacity. For simple mathematical models of channels,
however, it is often possible to calculate the capacity analytically. (For more realistic scenarios, numerical
methods may be needed.)

Properties :

1. C 0 since I(X; Y ) 0.

2. C log |X | since C = max I(X; Y ) max H(X) = log |X |.

3. C log |Y| for the same reason.

4. I(X; Y ) is a continuous function of p(x).

5. I(X; Y ) is a concave function of p(x). Since I(X; Y ) is a concave function over a closed convex set,
a local maximum is a global maximum that is finite and attainable.

2 The Channel Coding Theorem

2.1 Definitions

Definition A Communication system is defined as follows:

A message W {1, 2, . . . , M } results in signal X n (W ), which passes through the channel and
is received as a random sequence Y n p(y n | xn ).
The receiver then applies decoding rule W c = g(Y n ), and declare an error if W
c 6= W .

Definition A discrete channel, denoted by (X , p(y|x), Y), consists of two finite sets X and Y and a
collection
P of pmfs p(y|x), one for each x X , such that for every x and y, p(y|x) 0, and for every
x, y p(y|x) = 1, with the interpretation that X is the input and Y is the output of the channel.

Definition The nth extension of the discrete memoryless channel (DMC) is the channel (X n , p(y n |xn ), Y n ),
where
p(yk |xk , y k1 ) = p(yk |xk ), k = 1, 2, . . . , n.

Remark If the channel is used without feedback, meaning that inputs do not depend on past outputs and
hence p(xk |xk1 , y k1 ) = p(xk |xk1 ), then the likelihood for the nth extension of the DMC is just a
product:
n
Y
p(y n |xn ) = p(yi |xi ).
i=1

Definition A code is a function of the source and channel alphabet sizes.

Definition An (M, n) code for the channel (X , p(y|x), Y) consists of the following:

2
Harvard SEAS ES250 Information Theory

1. An index set {1, 2, . . . , M } over messages W .


2. An encoding function X n : {1, 2, . . . , M } X n , yielding codewords xn (1), xn (2), . . . , xn (M ).
(This set is called the codebook C.)
3. A (deterministic) decoding function

g : Y n {1, 2, ., , , .M },

c = g(Y n ) of W {1, 2, . . . , M }.
which is an estimator W

Definition (Conditional probability of error)


X
Let i = Pr {g(Y n ) 6= i|X n = xn (i)} = p(y n |xn (i)) I{g(yn )6=i} (y n )
yn

be the conditional probability of error given that index i was sent.

Definition (Maximal probability of error) The maximal probability of error (n) for an (M, n) code is
defined as
(n) = max i .
i{1,2,...,M }

(n)
Definition (Average probability of error) The (arithmetic) average probability of error P e for an (M, n)
code is:
M
1 X
Pe(n) = i .
M
i=1

(n)
Note that Pe = Pr{W 6= g(Y n )} if W is chosen uniformly.
(n)
Also, Pe (n) ; i.e., the average probability of error is less than the maximal probability of error.

log M
Definition (Rate of an (M, n) code) The rate R of an (M, n) code is R = bits per transmission.
n

Definition (Achievability) A rate R is called achievable if there exists a sequence of ( 2nR , n) codes such
that (n) (i.e., maximal Pr{Error}) tends to 0 as n .

Definition (Channel Capacity) The capacity of a channel is the supremum over all achievable rates.
Hence, rates less than capacity yield arbitrarily small probability of error for sufficiently large block
lengths.

2.2 Jointly Typical Sequences


(n)
Definition The set A of jointly typical sequences {(xn , y n )} with respect to the distribution p(x, y) is
the set of n-sequences with empirical entropies -close to the true entropies:

n n 1
(n)
A = (x , y ) X Y : log p(x ) H(X) < ,
n n n
n

1
log p(y n ) H(Y ) < ,
n

1
log p(xn , y n ) H(X, Y ) < ,
n

3
Harvard SEAS ES250 Information Theory

where
n
Y
n n
p(x , y ) = p(xi , yi ).
i=1

iid Qn
Theorem (Joint AEP) Let (X n , Y n ) be n-sequences p(xn , y n ) = i=1 p(xi , yi ). Then :
n o
(n)
1. Pr (X n , Y n ) A 1 as n .

(n)
2. A 2n(H(X,Y )+) .

3. If (Xe n , Ye n ) p(xn )p(y n ) so that X e n and Ye n are independent with the same marginals as
n n
p(x , y ), then we have that:
n o
Pr (X e n , Ye n ) A(n)
2n(I(X;Y )3)
n o
Pr (X e n , Ye n ) A(n)
(1 )2n(I(X;Y )+3) for sufficiently large n.

2.3 Channel Coding Theorem

Theorem (Channel coding theorem) For a DMC, all rates below capacity C are achievable.

Specifically, for every rate R < C, there exists a sequence of ( 2nR , n) codes with maximum
probability of error (n) 0.

Conversely, any sequence of ( 2nR , n) codes with (n) 0 must have R C.

Lemma (Fanos inequality) For a discrete memoryless channel with a codebook C and the input message
W uniformly distributed over 2nR , we have
c ) 1 + Pe(n) nR.
H(W | W

Lemma (Reuse of DMC) Let Y n be the result of passing X n through a discrete memoryless channel of
capacity C. Then
I(X n ; Y n ) nC for all p(xn ).

Theorem A capacity-achieving zero-error code has distinct codewords and the distribution of the Y i s
must be i.i.d. with X
p (y) = p (x)p(y|x),
x

where p (x) is the distribution on X that achieves capacity.

2.4 Practical Coding Schemes

The channel coding theorem promises the existence of block codes achieving capacity. But it says nothing
about complexity.

From a constructive point of view, the object of coding is to introduce redundancy in such a way that
errors can be detected and corrected.

Repetition code: rate of the code goes to zero with block length.

4
Harvard SEAS ES250 Information Theory

Error-detecting code: adding one parity bit enables the detection of an odd number of errors.

Linear error-correcting code: vector space structure allows for a parity-check matrix to detect, locate,
and correct multiple errors.

Hamming codes, Reed-Solomon codes, convolutional codes, etc. All are used nowadays in various commu-
nications and data storage systems.

2.5 Feedback Capacity

Definition (Capacity with feedback) The capacity with feedback, CF B , of a discrete memoryless channel
is the supremum of all rates achievable by feedback codes xi (W, Y i1 ).

Theorem (Feedback capacity)


CF B = C = max I(X; Y ).
p(x)

We have CF B C, but we cannot appeal to the earlier lemma regarding reuse of a discrete memoryless
channel to show CF B C.

2.6 Source-Channel Separation Theorem

So far we have R > H (compression) and R < C (transmission). It is natural to ask whether the condition
H < C is necessary and sufficient for transmitting a source over a channel:

Theorem (Source-channel coding theorem) If V1 , V2 , . . . , Vn is a finite-alphabet stochastic process satisfy-


ing the AEP and the condition H(V) < C, then there exists a source-channel code with probability
of error Pr(Vb n 6= V n ) 0.

Conversely, if for any stationary stochastic process we have that H(V) > C, then the probability of
error is bounded away from zero.

You might also like