theory
channel capacity and models
A.J. Han Vinck
University of DuisburgEssen
May 2012
This lecture
Some models
Channel capacity
Shannon channel coding theorem
converse
some channel models
Input X P(yx) output Y
transition probabilities
memoryless:
 output at time i depends only on input at time i
 input and output alphabet finite
Example: binary symmetric channel (BSC)
Error Source
+
E
X
Output
Input
E X Y © =
E is the binary error sequence s.t. P(1) = 1P(0) = p
X is the binary information sequence
Y is the binary output sequence
1p
0 0
p
1 1
1p
from AWGN
to BSC
Homework: calculate the capacity as a function of A and o
2
p
Other models
0
1
0 (light on)
1 (light off)
p
1p
X Y
P(X=0) = P
0
0
1
0
E
1
1e
e
e
1e
P(X=0) = P
0
Zchannel (optical) Erasure channel (MAC)
Erasure with errors
0
1
0
E
1
p
p
e
e
1pe
1pe
burst error model (GilbertElliot)
Error Source
Random error channel; outputs independent
P(0) = 1 P(1);
Burst error channel; outputs dependent
Error Source
P(0  state = bad ) = P(1state = bad ) = 1/2;
P(0  state = good ) = 1  P(1state = good ) = 0.999
State info: good or bad
good bad
transition probability
P
gb
P
bg
P
gg
P
bb
channel capacity:
I(X;Y) = H(X)  H(XY) = H(Y) – H(YX) (Shannon 1948)
H(X) H(XY)
notes:
capacity depends on input probabilities
because the transition probabilites are fixed
channel
X Y
capacity ) Y ; X ( I max
) x ( P
=
Practical communication system design
message
estimate
channel
decoder
n
Code
word in
receive
There are 2
k
code words of length n
k is the number of information bits transmitted in n channel uses
2
k
Code book
Code book
with errors
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :
for R s C
encoding methods exist
with decoding error probability 0
Encoding and decoding according to Shannon
Code: 2
k
binary codewords where p(0) = P(1) = ½
Channel errors: P(0 ÷1) = P(1 ÷ 0) = p
i.e. # error sequences ~ 2
nh(p)
Decoder: search around received sequence for codeword
with ~ np differences
space of 2
n
binary sequences
decoding error probability
1. For t errors: t/np> Є
÷ 0 for n ÷ ·
(law of large numbers)
2. > 1 code word in region
(codewords random)
· ÷
÷ < =
÷ = ÷
÷ ~ >
÷ ÷ ÷ ÷ ÷
n and
) p ( h 1
n
k
R for
0 2 2
2
2
) 1 2 ( ) 1 ( P
) R
BSC
C ( n ) R ) p ( h 1 ( n
n
) p ( nh
k
channel capacity: the BSC
1p
0 0
p
1 1
1p
X Y
I(X;Y) = H(Y) – H(YX)
the maximum of H(Y) = 1
since Y is binary
H(YX) = h(p)
= P(X=0)h(p) + P(X=1)h(p)
Conclusion: the capacity for the BSC C
BSC
= 1 h(p)
Homework: draw C
BSC
, what happens for p > ½
channel capacity: the BSC
0.5 1.0
1.0
Bit error p
C
h
a
n
n
e
l
c
a
p
a
c
i
t
y
Explain the behaviour!
channel capacity: the Zchannel
Application in optical communications
0
1
0 (light on)
1 (light off)
p
1p
X Y
H(Y) = h(P
0
+p(1 P
0
) )
H(YX) = (1  P
0
) h(p)
For capacity,
maximize I(X;Y) over P
0
P(X=0) = P
0
channel capacity: the erasure channel
Application: cdma detection
0
1
0
E
1
1e
e
e
1e
X Y
I(X;Y) = H(X) – H(XY)
H(X) = h(P
0
)
H(XY) = e h(P
0
)
Thus C
erasure
= 1 – e
(check!, draw and compare with BSC and Z) P(X=0) = P
0
Capacity and coding for the erasure channel
Code: 2
k
binary codewords where p(0) = P(1) = ½
Channel errors: P(0 ÷E) = P(1 ÷ E) = e
Decoder: search around received sequence for codeword
with ~ ne differences
space of 2
n
binary sequences
decoding error probability
1. For t erasures: t/ne> Є
÷ 0 for n ÷ ·
(law of large numbers)
2. > 1 candidate codeword
agrees in n(1e) positions
after ne positiona are
erased (codewords random)
· ÷ ÷ < =
÷ = ÷
÷ ~ >
÷ ÷ ÷ ÷ ÷
÷ ÷
n , e 1
n
k
R for
0 2 2
2 ) 1 2 ( ) 1 ( P
) R C ( n ) R e 1 ( n
) e 1 ( n k
erasure
Erasure with errors: calculate the capacity!
0
1
0
E
1
p
p
e
e
1pe
1pe
example
Consider the following example
1/3
1/3
0
1
2
0
1
2
For P(0) = P(2) = p, P(1) = 12p
H(Y) = h(1/3 – 2p/3) + (2/3 + 2p/3); H(YX) = (12p)log
2
3
Q: maximize H(Y) – H(YX) as a function of p
Q: is this the capacity?
hint use the following: log
2
x = lnx / ln 2; d lnx / dx = 1/x
channel models: general diagram
x
1
x
2
x
n
y
1
y
2
y
m
:
:
:
:
:
:
P
11
P
21
P
12
P
22
P
mn
Input alphabet X = {x
1
, x
2
, …, x
n
}
Output alphabet Y = {y
1
, y
2
, …, y
m
}
P
ji
= P
YX
(y
j
x
i
)
In general:
calculating capacity needs more
theory
The statistical behavior of the channel is completely defined by
the channel transition probabilities P
ji
= P
YX
(y
j
x
i
)
* clue:
I(X;Y)
is convex · in the input probabilities
i.e. finding a maximum is simple
Channel capacity: converse
For R > C the decoding error probability > 0
k/n
C
Pe
Converse: For a discrete memory less channel
1 1 1 1
( ; ) ( ) (  ) ( ) (  ) ( ; )
n n n n
n n n
i i i i i i i
i i i i
I X Y H Y H Y X H Y H Y X I X Y nC
= = = =
= ÷ s ÷ = s
¿ ¿ ¿ ¿
X
i
Y
i
m X
n
Y
n
m‘
encoder channel
channel
Source generates one
out of 2
k
equiprobable
messages
decoder
Let Pe = probability that m‘ = m
source
converse R := k/n for any code
Pe > 1 – C/R  1/nR
Hence: for large n, and R > C,
the probability of error Pe > 0
k = H(M) = I(M;Y
n
)+H(MY
n
)
s I(X
n
;Y
n
) +H(MY
n
M‘) X
n
is a function of M
s I(X
n
;Y
n
) +H(MM‘) M‘ is a function of Y
n
s I(X
n
;Y
n
) + h(Pe) + Pe log2
k
Fano inequality
s nC + 1 + k Pe
Appendix:
Assume:
binary sequence P(0) = 1 – P(1) = 1p
t is the # of 1‘s in the sequence
Then n ÷ · , c > 0
Weak law of large numbers
Probability ( t/n –p > c ) ÷ 0
i.e. we expect with high probability pn 1‘s
Appendix:
Consequence:
n(p c) < t < n(p + c) with high probability
1.
2.
3.
) p ( nh
) p ( n
) p ( n
2 n 2
pn
n
n 2
t
n
c ~


.

\

c ~


.

\

¿
c +
c ÷
) p ( h
pn
n
n 2 log lim
2
n
1
n
÷


.

\

c
· ÷
) p 1 ( log ) p 1 ( p log p ) p ( h
2 2
÷ ÷ ÷ ÷ =
Homework: prove the approximation using ln N! ~ N lnN for N large.
Or use the Stirling approximation: ! 2
N N
N NN e t
÷
÷
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
Binary Entropy: h(p) = plog
2
p – (1p) log
2
(1p)
Note:
h(p) = h(1p)
Input X Output Y
Noise
)] Noise ( H ) Y ( H [ sup : Cap
W 2 / S
2
x
) x ( p
÷ =
s
Input X is Gaussian with power spectral density (psd) ≤S/2W;
Noise is Gaussian with psd = o
2
noise
Output Y is Gaussian with psd = o
y
2 =
S/2W + o
2
noise
Capacity for Additive White Gaussian Noise
W is (single sided) bandwidth
For Gaussian Channels: o
y
2 =
o
x
2
+o
noise
2
X Y
X Y
Noise
. sec / bits )
W 2 / S
( log W Cap
. trans / bits ) ( log
. trans / bits ) e 2 ( log )) ( e 2 ( log Cap
2
noise
2
noise
2
2
noise
2
x
2
noise
2
2
1
2
noise 2
2
1
2
noise
2
x 2
2
1
o
+ o
=
o
o + o
=
o t ÷ o + o t =
bits ) e 2 ( log ) Z ( H ; e
2
1
) z ( p
2
z 2 2
1
2 / z
2
z
2
z
2
o t =
to
=
o ÷
Middleton type of burst channel model
Select channel k
with probability Q(k)
Transition
probability P(0)
0
1
0
1
…
channel 1
channel 2
channel k has
transition
probability p(k)
Fritzman model:
multiple states G and only one state B
Closer to an actual realworld channel
G
n
B
1p
Error probability 0 Error probability h
…
G
1
Interleaving: from bursty to random
Message interleaver channel interleaver
1
message
encoder decoder
bursty
„random error“
Note: interleaving brings encoding and decoding delay
Homework: compare the block and convolutional interleaving w.r.t. delay
Interleaving: block
Channel models are difficult to derive:
 burst definition ?
 random and burst errors ?
for practical reasons: convert burst into random error
read in row wise
transmit
column wise
1
0
0
1
1
0
1
0
0
1
1
0
0
0
0
0
0
1
1
0
1
0
0
1
1
DeInterleaving: block
read in column
wise
this row contains 1 error
1
0
0
1
1
0
1
0
0
1
1
e
e
e
e
e
e
1
1
0
1
0
0
1
1
read out
row wise
Interleaving: convolutional
input sequence 0
input sequence 1 delay of b elements

input sequence m1 delay of (m1)b elements
Example: b = 5, m = 3
in
out