You are on page 1of 23

CHANNEL CODING

RELIABLE COMMUNICATION
THROUGH NOISY CHANNELS
In real-life, channels are always noisy.
Removing redundancy makes the data more
susceptible to corruption by noise.
So we introduce redundancy to detect and
correct errors.
Then why did we remove redundancy in the
first place?
We introduce redundancy in a controlled
manner, so that we know exactly how much
redundancy is needed for a channel of
particular S/N ratio.
Simplest model of a channel:
Binary Symmetric channel [BSC]:

1-p
0 0
p

p
1 1
1-p
Repetition code: A simple way to increase
reliability
We repeat each bit n times.
Consider n = 3.
Probability of error = Perr
= p3 +3p2q
If p = 0.001, then Perr = 0.0013+3 x 0.0012x 0.999
= 2.998 x 10-6
But if we require Perr to be less than 10-6, then what
do we do?
Have more repetitions per bit!
For repetition code K5,
p 5  5 p 4 q  10 p 3q 2
Perr =
However, having more repetitions reduces the
information rate of the code.
Number of information bits k
Information rate  
Total number of bits in code word n

Let n = 2m+1, then Perr is given by:


n i
n
Perr     p (1  p ) n i

i  m 1  i 
Information rate and protection from noise are
usually contradictory requirements.
If Information rate is high,
Perr is also high.
Consider even parity code: n-1 information
bits and 1 check bit.
R(K) = (n-1)/n = 1-1/n ~ 1 (if n is high).
It detects single errors, but then error
detection capability reduces with increasing
n.
Consider the following code K4*:
Information bits Code Word
00 0000
01 0111
10 1000
11 1111
What’s the rule here?
First bit is retained and second bit is repeated
thrice.
Decoding is correct, if first bit is not
corrupted and at most one out of the
remaining three bits is corrupted.
Therefore: Perr = 1-(q4 + 3pq3) = 0.001
Can we improve on this, without
compromising on the information rate (0.5)?
Yes. Consider K6*
Information bits codeword
000 000000
100 100011
010 010101
001 001110
011 011011
101 101101
110 110110
111 111000
What is the Perr?
Code corrects all single errors since any two
distinct code words differ in 3 bits.
Perr = 1-(q6 + 6pq5) = 0.00015
How was the improvement achieved?
By increasing the “distance” between the
code words!
Can we design better and better codes?
Can we make simultaneously make R(K) large
and Perr vanishingly small?
Hamming Distance:
Given two words a  a1a2 .....an andb  b1b2 .....bn
, the Hamming distance is defined as the
number of positions in which the two code
words differ. Denote it by d  a , b 
The Hamming distance between any two code
words satisfies the triangle inequality.
d  a , b   d b , c   d  a , c 
Importance of Hamming distance:
On receiving a word, we assign to the code
word which has the smallest Hamming
distance from the received word (maximum
likelihood decoding)
A code detects t errors iff its minimum
distance is greater than t.
A code corrects t errors iff its minimum
distance is greater than 2t.
What limits the information rate of a channel?
(while we keep Perr arbitrarily low)
The channel capacity.
Conditional probability is important for
understanding the concept of channel
capacity.
Entropy (uncertainty) of the input alphabet
under the condition of receiving yi:
n
H  X yi    P( x j | yi ) log 2 P ( x j | yi )
j 1
Taking the mean value over all output
symbols yi , we obtain the uncertainty about
the input after receiving the output of the
channel: n
H ( X | Y )   H ( X | yi ) P( yi )
i 1

n n
H ( X | Y )    P( yi ) P ( x j | yi ) log2 P ( x j | yi )
i 1 j 1

n n
H ( X | Y )    P ( x j , yi ) log 2 P( x j | yi )
i 1 j 1
H(X|Y) is called the conditional entropy. This
is the uncertainty about the input after
receiving the output, so it is a measure of the
information loss due to noise.
I ( X ;Y )  H ( X )  H ( X | Y )

I(X,Y) is called the mutual information


n n n
I ( X ; Y )   P( x j ) log 2 P( x j )   P( x j , yi ) log 2 P( x j | yi )
j 1 i 1 j 1

n n n n
I ( X ; Y )   P( x j , yi ) log 2 P( x j )   P( x j , yi ) log 2 P( x j | yi )
i 1 j 1 i 1 j 1
n n P( x j , yi )
I ( X ; Y )    P( x j , yi ) log 2
i 1 j 1 P ( yi ) P ( x j )

If X and Y are statistically independent, the


average mutual information I(X;Y) is equal to
zero, otherwise it is positive.
Since I(X,Y) is always positive (or zero), this
implies that H(X,Y) is less than H(X).
H(X|Y) is the uncertainty in X after observing
Y. The information gained by us (I(X|Y)) after
observing Y is the self information of X minus
the uncertainty in X after observing Y.
H(X,Y)

H(X|Y) I(X;Y) H(Y|X) H(Y)


H(X)
The capacity of an information channel is the
maximum value of the mutual information.
yi 0 1
P(yi|0) q p
P(yi|1) p q

yi 0 1
P(0,yi) p0q p0p
P(1,yi) p1p p1q
The maximum value of the self information is
1 bit, therefore, the expression for the mutual
information can be written as:
n n P( x j , yi )
I ( X ; Y )    P ( x j , yi ) log 2
i 1 j 1 P ( yi ) P ( x j )

 P ( 0 | 0)   P (0 | 1)   P (1 | 0)   P (1 | 1) 
I ( X ; Y )  P (0,0) log   P (0,1) log   P (1,0) log   P (1,1) log 
 P ( 0 )   P ( 0 )   P (1 )   P (1 ) 

Assuming the probabilities for emission of 0


and 1 to be equal (to ½ ), we have
q p p q
I ( X ;Y )  log 2q   log 2 p   log 2 p   log 2q 
2 2 2 2
Therefore, we obtain the following expression
for I(X,Y)
C = 1-H(p,1-p)
Shannon’s fundamental theorem:
Every binary symmetric channel of capacity
C > 0 can be encoded with an arbitrary
reliability and with information rate close to C.

lim Perr ( K n )  0 lim R ( K n )  C


n n
Shannon’s Noisy Channel Coding theorem:
Let a DMS with an alphabet X have
entropy H(X) and produce symbols every Ts
second. Let a discrete memoryless channel
have a capacity C and be used once every Tc
seconds. Then if H (X ) C

Ts Tc

There exists a coding scheme for which the


source output can be transmitted over the
noisy channel and be reconstructed with an
arbitrarily low probability of error.
(ii) Conversely if H (X ) C

Ts Tc
It is not possible to transmit information over the
channel and reconstruct it with an arbitrarily small
probability of error.
The parameter C/Tc is called the critical rate.
Equivalence of the two statements:
Consider a DMS source that emits equally
likely binary symbols (p=0.5) once every Ts seconds.
The entropy of this source is
1 bit.
From the channel coding theorem,
1 C

Ts Tc

Tc /Ts is nothing but the, code rate R(k) ,


also designated as r. Therefore:
r C

You might also like