You are on page 1of 37

Introduction to Information

theory
channel capacity and models
A.J. Han Vinck
University of Duisburg-Essen
May 2012
This lecture
 Some models
 Channel capacity
 Shannon channel coding theorem
 converse
some channel models
Input X P(y|x) output Y

transition probabilities
memoryless:
- output at time i depends only on input at time i
- input and output alphabet finite
Example: binary symmetric channel (BSC)
Error Source
+
E
X
Output
Input
E X Y © =
E is the binary error sequence s.t. P(1) = 1-P(0) = p
X is the binary information sequence
Y is the binary output sequence
1-p
0 0
p
1 1
1-p
from AWGN
to BSC

Homework: calculate the capacity as a function of A and o
2

p
Other models
0

1
0 (light on)

1 (light off)
p
1-p
X Y
P(X=0) = P
0

0


1
0

E

1
1-e
e


e

1-e

P(X=0) = P
0

Z-channel (optical) Erasure channel (MAC)
Erasure with errors
0


1
0

E

1
p
p
e
e
1-p-e
1-p-e
burst error model (Gilbert-Elliot)
Error Source
Random error channel; outputs independent
P(0) = 1- P(1);
Burst error channel; outputs dependent
Error Source
P(0 | state = bad ) = P(1|state = bad ) = 1/2;
P(0 | state = good ) = 1 - P(1|state = good ) = 0.999
State info: good or bad
good bad
transition probability
P
gb
P
bg
P
gg
P
bb
channel capacity:

I(X;Y) = H(X) - H(X|Y) = H(Y) – H(Y|X) (Shannon 1948)

H(X) H(X|Y)




notes:
capacity depends on input probabilities
because the transition probabilites are fixed






channel
X Y
capacity ) Y ; X ( I max
) x ( P
=
Practical communication system design
message
estimate
channel
decoder
n
Code
word in
receive
There are 2
k
code words of length n
k is the number of information bits transmitted in n channel uses

2
k

Code book
Code book
with errors
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :
for R s C
encoding methods exist
with decoding error probability 0

Encoding and decoding according to Shannon
Code: 2
k
binary codewords where p(0) = P(1) = ½
Channel errors: P(0 ÷1) = P(1 ÷ 0) = p
i.e. # error sequences ~ 2
nh(p)

Decoder: search around received sequence for codeword
with ~ np differences
space of 2
n
binary sequences
decoding error probability
1. For t errors: |t/n-p|> Є
÷ 0 for n ÷ ·
(law of large numbers)
2. > 1 code word in region
(codewords random)

· ÷
÷ < =
÷ = ÷
÷ ~ >
÷ ÷ ÷ ÷ ÷
n and
) p ( h 1
n
k
R for
0 2 2
2
2
) 1 2 ( ) 1 ( P
) R
BSC
C ( n ) R ) p ( h 1 ( n
n
) p ( nh
k
channel capacity: the BSC
1-p
0 0
p
1 1
1-p
X Y
I(X;Y) = H(Y) – H(Y|X)
the maximum of H(Y) = 1
since Y is binary
H(Y|X) = h(p)
= P(X=0)h(p) + P(X=1)h(p)

Conclusion: the capacity for the BSC C
BSC
= 1- h(p)
Homework: draw C
BSC
, what happens for p > ½
channel capacity: the BSC
0.5 1.0
1.0
Bit error p
C
h
a
n
n
e
l

c
a
p
a
c
i
t
y

Explain the behaviour!
channel capacity: the Z-channel
Application in optical communications
0

1
0 (light on)

1 (light off)
p
1-p
X Y
H(Y) = h(P
0
+p(1- P
0
) )

H(Y|X) = (1 - P
0
) h(p)

For capacity,
maximize I(X;Y) over P
0
P(X=0) = P
0

channel capacity: the erasure channel
Application: cdma detection
0


1
0

E

1
1-e
e


e

1-e

X Y
I(X;Y) = H(X) – H(X|Y)
H(X) = h(P
0
)
H(X|Y) = e h(P
0
)

Thus C
erasure
= 1 – e
(check!, draw and compare with BSC and Z) P(X=0) = P
0

Capacity and coding for the erasure channel
Code: 2
k
binary codewords where p(0) = P(1) = ½
Channel errors: P(0 ÷E) = P(1 ÷ E) = e

Decoder: search around received sequence for codeword
with ~ ne differences
space of 2
n
binary sequences
decoding error probability
1. For t erasures: |t/n-e|> Є
÷ 0 for n ÷ ·
(law of large numbers)
2. > 1 candidate codeword
agrees in n(1-e) positions
after ne positiona are
erased (codewords random)
· ÷ ÷ < =
÷ = ÷
÷ ~ >
÷ ÷ ÷ ÷ ÷
÷ ÷
n , e 1
n
k
R for
0 2 2
2 ) 1 2 ( ) 1 ( P
) R C ( n ) R e 1 ( n
) e 1 ( n k
erasure
Erasure with errors: calculate the capacity!
0


1
0

E

1
p
p
e
e
1-p-e
1-p-e
example
 Consider the following example
1/3
1/3
0

1

2
0

1

2
 For P(0) = P(2) = p, P(1) = 1-2p

H(Y) = h(1/3 – 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log
2
3

Q: maximize H(Y) – H(Y|X) as a function of p
Q: is this the capacity?

hint use the following: log
2
x = lnx / ln 2; d lnx / dx = 1/x
channel models: general diagram
x
1
x
2
x
n
y
1
y
2
y
m
:
:
:
:
:
:
P
1|1
P
2|1
P
1|2
P
2|2
P
m|n
Input alphabet X = {x
1
, x
2
, …, x
n
}
Output alphabet Y = {y
1
, y
2
, …, y
m
}
P
j|i
= P
Y|X
(y
j
|x
i
)

In general:
calculating capacity needs more
theory
The statistical behavior of the channel is completely defined by
the channel transition probabilities P
j|i
= P
Y|X
(y
j
|x
i
)

* clue:
I(X;Y)
is convex · in the input probabilities

i.e. finding a maximum is simple
Channel capacity: converse

For R > C the decoding error probability > 0
k/n
C
Pe
Converse: For a discrete memory less channel

1 1 1 1
( ; ) ( ) ( | ) ( ) ( | ) ( ; )
n n n n
n n n
i i i i i i i
i i i i
I X Y H Y H Y X H Y H Y X I X Y nC
= = = =
= ÷ s ÷ = s
¿ ¿ ¿ ¿
X
i
Y
i
m X
n
Y
n
m‘
encoder channel
channel
Source generates one
out of 2
k
equiprobable
messages
decoder
Let Pe = probability that m‘ = m
source
converse R := k/n for any code
Pe > 1 – C/R - 1/nR
Hence: for large n, and R > C,
the probability of error Pe > 0
k = H(M) = I(M;Y
n
)+H(M|Y
n
)

s I(X
n
;Y
n
) +H(M|Y
n
M‘) X
n
is a function of M


s I(X
n
;Y
n
) +H(M|M‘) M‘ is a function of Y
n

s I(X
n
;Y
n
) + h(Pe) + Pe log2
k
Fano inequality
s nC + 1 + k Pe
Appendix:
Assume:
binary sequence P(0) = 1 – P(1) = 1-p
t is the # of 1‘s in the sequence
Then n ÷ · , c > 0
Weak law of large numbers
Probability ( |t/n –p| > c ) ÷ 0

i.e. we expect with high probability pn 1‘s
Appendix:
Consequence:

n(p- c) < t < n(p + c) with high probability


1.

2.

3.

) p ( nh
) p ( n
) p ( n
2 n 2
pn
n
n 2
t
n
c ~
|
|
.
|

\
|
c ~
|
|
.
|

\
|
¿
c +
c ÷
) p ( h
pn
n
n 2 log lim
2
n
1
n
÷
|
|
.
|

\
|
c
· ÷
) p 1 ( log ) p 1 ( p log p ) p ( h
2 2
÷ ÷ ÷ ÷ =
Homework: prove the approximation using ln N! ~ N lnN for N large.

Or use the Stirling approximation: ! 2
N N
N NN e t
÷
÷
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
Binary Entropy: h(p) = -plog
2
p – (1-p) log
2
(1-p)
Note:
h(p) = h(1-p)
Input X Output Y
Noise
)] Noise ( H ) Y ( H [ sup : Cap
W 2 / S
2
x
) x ( p
÷ =
s
Input X is Gaussian with power spectral density (psd) ≤S/2W;


Noise is Gaussian with psd = o
2
noise


Output Y is Gaussian with psd = o
y
2 =
S/2W + o
2
noise

Capacity for Additive White Gaussian Noise
W is (single sided) bandwidth
For Gaussian Channels: o
y
2 =
o
x
2
+o
noise
2

X Y
X Y
Noise
. sec / bits )
W 2 / S
( log W Cap
. trans / bits ) ( log
. trans / bits ) e 2 ( log )) ( e 2 ( log Cap
2
noise
2
noise
2
2
noise
2
x
2
noise
2
2
1
2
noise 2
2
1
2
noise
2
x 2
2
1
o
+ o
=
o
o + o
=
o t ÷ o + o t =
bits ) e 2 ( log ) Z ( H ; e
2
1
) z ( p
2
z 2 2
1
2 / z
2
z
2
z
2
o t =
to
=
o ÷
Middleton type of burst channel model
Select channel k
with probability Q(k)
Transition
probability P(0)
0

1
0

1

channel 1
channel 2
channel k has
transition
probability p(k)
Fritzman model:
multiple states G and only one state B
Closer to an actual real-world channel
G
n
B

1-p
Error probability 0 Error probability h

G
1
Interleaving: from bursty to random
Message interleaver channel interleaver
-1
message
encoder decoder
bursty
„random error“
Note: interleaving brings encoding and decoding delay

Homework: compare the block and convolutional interleaving w.r.t. delay
Interleaving: block
Channel models are difficult to derive:
- burst definition ?
- random and burst errors ?
for practical reasons: convert burst into random error
read in row wise


transmit
column wise
1
0
0
1
1
0
1
0
0
1
1
0
0
0
0

0
0
1
1
0
1
0
0
1
1
De-Interleaving: block
read in column
wise
this row contains 1 error
1
0
0
1
1
0
1
0
0
1
1
e
e
e
e
e
e
1
1
0
1
0
0
1
1
read out
row wise
Interleaving: convolutional
input sequence 0
input sequence 1 delay of b elements
---
input sequence m-1 delay of (m-1)b elements

Example: b = 5, m = 3
in
out