You are on page 1of 5

2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel

Finite Blocklength Coding for Channels with
Side Information at the Receiver
Amir Ingber and Meir Feder
Department of EE-Systems, The Faculty of Engineering
Tel Aviv University, Tel Aviv 69978, ISRAEL
email: {ingber, meir}@eng.tau.ac.il
Abstract—The communication model of a memoryless channel
that depends on a random state that is known at the receiver only
is studied. The channel can be thought of as a set of underlying
channels with a fixed state, where at each channel use one of
these channels is chosen at random, and this selection is known
to the receiver only. The capacity of a such channel is known,
and is given by the expectation (w.r.t. the random state) of the
capacity of the underlying channels.
In this work we examine the finite-length characteristics of this
channel, and their relation to the characteristics of the underlying
channels. We derive error exponent bounds (random coding and
sphere packing) for the channel and determine their relation to
the corresponding bounds of the underlying channels. We also
determine the channel dispersion and its relation to the dispersion
of the underlying channels. We show that both in the error
exponent bounds and in the dispersion case, the expectation of
these quantities is too optimistic w.r.t. the actual value. Examples
for such channels are discussed.
I. INTRODUCTION
The communication model of a memoryless channel that
depends on a random state is studied. We focus on the
case where the random state, also known as channel state
information (CSI), is known at the receiver only. The channel,
denoted by W, can be thought of as a set of (memoryless)
channels, W
S
, where S is the random state. Such a model
appears many times in practice: the ergodic fading channel
is an example for such a channel, where the fade value is
assumed to be known at the receiver. Sometimes the state
S is a result of the communication scheme design and is
inserted intentionally (for example, in order to attain symmetry
properties).
In this work we study the relationship between the finite
blocklength information theoretic properties of the channel W
and those of the underlying channels W
S
.
The capacity of this channel is well known, an is gen-
erally given by the expectation (over S) of the capacity of
the underlying channel W
S
. We follow by analyzing other
information theoretical properties such as the error exponent
and the channel dispersion of the channel W and comparing
then to the expected values of these properties of the channel
W
S
.
The main results can be summarized as follows:
The random coding and sphere packing error exponent
bounds [1] are both given by the expression E
0
(ρ) − ρR
(optimized w.r.t. ρ), where E
0
(ρ) is a function of the channel.
We show that the function E
0
for the channel W is given by
E
0
(ρ) = −log E
_
2
−E
(S)
0
(ρ)
_
, (1)
where E
(S)
0
is the corresponding E
0
function for the channel
W
S
, E[·] denotes expectation (w.r.t. S), and log = log
2
.
In [2], error exponents for channels with side information
were considered. However, the focus was on channels with
CSI at the transmitter as well, compound channels and more.
While the case of CSI known at the receiver only is as special
case, the contribution here lies in the simplicity of the relation
(1).
We also discuss the channel dispersion (see [3], [4]), which
quantifies the speed at which the rate approaches capacity with
the block length (when the codeword error rate is fixed). We
show the following relationship between the dispersions of W
and W
S
, denoted V and V
S
respectively:
V = E[V
S
] +VAR[C
S
] . (2)
Both in the error exponent and in the dispersion case, we
show that the expected exponent and the expected dispersion
are too optimistic w.r.t. the actual exponent and dispersion.
Finally, we discuss several examples that involve chan-
nels with side information at the receiver, such as channel
symmetrization, multilevel codes with multi-stage decoding
(MLC-MSD) and bit-interleaved coded modulation (BICM).
II. THE GENERAL COMMUNICATION MODEL
A. Channel Model
Let W be a discrete memoryless channel (DMC)
1
with
input x ∈ X, and output (y, s) ∈ Y × S, where s ∈ S is the
channel state, which is independent of the channel input X:
W(y, s|x) = P
Y,S|X
(y, s|x) = P
S
(s)P
Y |S,X
(y|s, x). (3)
Definition 1: Let W
s
be the W where the state S is fixed
to s:
W
s
(y|x) P
Y |S,X
(y|s, x). (4)
1
Similar results for continuous-output channels can be derived similarly.
978-1-4244-8682-3/10/$26.00 c 2010 IEEE
000798
B. Communication Scheme
The communication scheme is defined as follows. Let n be
the codeword length, and let M be a set of 2
nR
messages.
The encoder and decoder are denoted f
n
and g
n
respectively,
where
• f
n
: M → X
n
is the encoder, which maps the input
message m to the channel input x ∈ X
n
.
• g
n
: Y
n
× S
n
→ M is the decoder, which maps the
channel output and the channel state to an estimate ˆ m of
the transmitted message.
• The considered error probability is the codeword error
probability p
e
P( ˆ m = m), where the messages m are
drawn randomly from M with equal probability.
The communication scheme is depicted in Figure 1.
We shall be interested in the tradeoff between rate R,
codelength n and error probability p
e
of the best possible
codes.
III. INFORMATION THEORETIC ANALYSIS
Here we shall be interested in the performance of the
optimal codes for the channel W. We review known results
for the capacity, and present the results for the error exponent
and the channel dispersion.
A. Capacity
Since the channel W is simply a DMC with a scalar input
and a vector output, the capacity can be simply derived (see,
e.g. [5]):
C(W) = max
p(x)
I(X; Y, S)
= max
p(x)
I(X; Y |S) +I(X; S)
= max
p(x)
I(X; Y |S), (5)
where the last equality holds since X and S are independent.
Note that the capacity can also be written as
max
p(x)
E
S
[I(X; Y |S = s)]. (6)
In the paper we limit ourselves to a fixed input distribution
(e.g. equiprobable). In this case the capacity is given by
I(X; Y |S) = E
S
[I(X; Y |S = s)]. (7)
Recalling the definition of the channel conditioned on the state
s, we get
C(W) = I(X; Y |S)
= E
S
[I(X; Y |S = s)]
= E[C(W
S
)], (8)
where C(W
s
) is the capacity of the underlying channel W
s
.
We conclude that the capacity formula can be interpreted as
an expectation over the capacities of the underlying channels.
Note that when the CSI is available at the transmitter as
well, (8) holds even without the assumption of a fixed prior
on X.
B. Error Exponent
The error exponent of a channel is given by [1]
E(R) lim
n→∞

1
n
log (p
e
(n, R)) , (9)
where p
e
(n, R) is the average codeword error probability for
the best code of length n and rate R (assuming that the limit
exists).
While the exact characterization of the error exponent is
still an open question, two important bounds are known [1]:
the random coding and the sphere packing error exponents,
which are a lower and upper bounds, respectively.
The random coding exponent is given by
E
r
(R) = max
ρ∈[0,1]
max
pX(·)
{E
0
(ρ, P
X
) −ρR}, (10)
where E
0
(ρ, P
X
) is given by
−log

y∈Y
_

x∈X
P
X
(x)W(y|x)
1
1+ρ
_
1+ρ


. (11)
The sphere packing bound E
sp
(R) is given by
E
sp
(R) = max
ρ>0
max
pX(·)
{E
0
(ρ, p) −ρR}. (12)
It can be seen that both exponent bounds are similar. In fact,
they only differ in the optimization region of the parameter ρ,
and they coincide at rates beyond a certain rate called the
critical rate.
We note that both bounds depend on the function E
0
(·). For
channels with CSI at the receiver, we derive E
0
(·) explicitly.
Following the relationship (8), we wish to find the connections
between E
0
(·) and the corresponding E
0
functions of the
conditional channels W
s
, denoted E
(s)
0
.
Theorem 1: Let W be a channel with CSI at the receiver.
Then the function E
0
(·) for this channel is given by
E
0
(ρ, P
X
) = −log E
_
2
−E
(S)
0
(ρ,PX)
_
. (13)
Proof: When the channel output is (y, s), we get
E
0
(ρ, P
X
) =
= −log

y∈Y,s∈S
_

x∈X
P
X
(x)W(y, s|x)
1
1+ρ
_
1+ρ


= −log
_

s∈S
P
S
(s)×
×

y∈Y
_

x∈X
P
X
(x)P
Y |S,X
(y|s, x)
1
1+ρ
_
1+ρ


(a)
= −log
_

s∈S
P
S
(s)2
−E
(s)
0
(ρ,PX)
_
= −log E
_
2
−E
(S)
0
(ρ,PX)
_
,
where (a) follows from the definition of E
(s)
0
.
000799
x ∈ X
n
y ∈ Y
n
W m∈M ˆ m
s ∈ S
n
Random
state
Encoder
Decoder
Fig. 1. Communication scheme for channels with CSI at the receiver
As a corollary, we get the random coding and the sphere
packing exponents for the channel W according to (10) and
(12).
Following (8), one might think that the error exponent
bounds (for example, E
r
(R)) will be given by the expectation
of the exponent function w.r.t. S. This is clearly not the case,
as seen in Theorem 1. In addition, the following can be shown:
Theorem 2: Let
˜
E
r
(R) be the average of E
(S)
r
w.r.t. S:
˜
E
r
(R) E
_
E
(S)
r
(R)
_
. (14)
Then
˜
E
r
(R) always overestimates the true random coding
exponent of W, E
r
(R).
Proof: Let
˜
E
0
(ρ, P
X
) = E
_
E
(S)
0
(ρ, P
X
)
_
. Since 2
−(·)
is
convex, it follows by the Jensen inequality and Theorem 1 that
E
0
(ρ, P
X
) ≤
˜
E
0
(ρ, P
X
). (15)
We continue with
˜
E
r
(R):
˜
E
r
(R) = E
_
E
(S)
r
(R)
_
= E
_
sup
PX; ρ∈[0,1]
_
E
(S)
0
(ρ, P
X
) −ρR
_
_
≥ sup
PX; ρ∈[0,1]
E
_
E
(S)
0
(ρ, P
X
) −ρR
_
= sup
PX; ρ∈[0,1]
_
˜
E
0
(ρ, P
X
) −ρR
_
(a)
≥ sup
PX; ρ∈[0,1]
[E
0
(ρ, P
X
) −ρR]
= E
r
(R), (16)
where (a) follow from (15).
Note that the proof of Theorem 2 holds no matter what the
optimization region of ρ is. Therefore the same version for
the sphere packing exponent follows similarly. We conclude
that the expectation (w.r.t. S) of the error exponent bounds
overestimate the true exponent bounds of W (and also the
true error exponent, over the critical rate).
C. Dispersion
An alternative information theoretical measure for quantify-
ing coding performance with finite block lengths is the channel
dispersion. Suppose that a fixed codeword error probability
p
e
and a codeword length n are given. We can then seek
the maximal achievable rate R given p
e
and n. It appears
that for fixed p
e
and n, the gap to the channel capacity is
approximately proportional to Q
−1
(p
e
)/

n (where Q(·) is
the complementary Gaussian cumulative distribution function).
The proportion constant (squared) is called the channel disper-
sion. Formally, define the (operational) channel dispersion as
follows [3]:
Definition 2: The dispersion V(W) of a channel W with
capacity C is defined as
V(W) = lim
pe→0
limsup
n→∞
n ·
_
C −R(n, p
e
)
Q
−1
(p
e
)
_
2
, (17)
where R(n, p
e
) is the highest achievable rate for codeword
error probability p
e
and codeword length n.
In 1962 , Strassen [4] used the Gaussian approximation to
derive the following result for DMCs:
R(n, p
e
) = C −
_
V
n
Q
−1
(p
e
) +O
_
log n
n
_
, (18)
where C is the channel capacity, and the new quantity V is
the (information-theoretic) dispersion, which is given by
V VAR(i(X; Y )), (19)
where i(x; y) is the information spectrum, given by
i(x; y) log
P
XY
(x, y)
P
X
(x)P
Y
(y)
, (20)
and the distribution of X is the capacity-achieving distri-
bution that minimizes V . Strassen’s result proves that the
dispersion of DMCs is equal to VAR(i(X; Y )). This result
was recently tightened (and extended to the power-constrained
AWGN channel) in [3]. It is also known that the channel
dispersion and the error exponent are related as follows. For a
channel with capacity C and dispersion V , the error exponent
can be approximated (for rates close to the capacity) by
E(R)

=
(C−R)
2
2V ln 2
. See [3] for details on the early origins of
this approximation by Shannon.
We now explore the dispersion for the case of channels with
side information at the receiver.
Theorem 3: The dispersion of the channel W with CSI at
the receiver is given by
V(W) = E[V(W
S
)] +VAR[C(W
S
)] , (21)
000800
where both expectation and variance are taken w.r.t. the
random state S.
Proof: Since W is nothing but a DMC with a vec-
tor output, the proof boils down to the calculation of
VAR[i(X; (Y, S))]. The information spectrum in this case is
given by
i(x; y, s) = log
P
Y SX
(y, s, x)
P
Y S
(y, s)P
X
(x)
(a)
= log
P
Y |S,X
(y|s, x)
P
Y |S
(y|s)
i(x; y|s), (22)
where (a) follows since X and S are independent.
Suppose that s is fixed, i.e. consider the channel W
s
. The
capacity is given by
C(W
s
) = E[i(X; Y |S)|S = s]
= I(X; Y |S = s). (23)
The dispersion of the channel W
s
is given by
V(W
s
) = VAR(i(X; Y |S)|S = s)
= E
_
i
2
(X; Y |S)|S = s
¸
−E
2
[i(X; Y |S)|S = s]
= E
_
i
2
(X; Y |S)|S = s
¸
−C(W
s
)
2
. (24)
Finally, the dispersion of the original channel W is given
as follows:
V(W) = VAR(i(X; Y |S))
(a)
= E[VAR[i(X; Y |S)|S = s]]
+VAR[E[i(X; Y |S)|S = s]]
= E[V(W
S
)] +VAR[C(W
S
)] , (25)
where (a) follows from the law of total variance.
A few notes regarding this result:
• Let
˜
V(W) E[V(W
S
)]. As an immediate corollary of
Theorem 3, it can be seen that
˜
V(W) underestimates the
true dispersion of W, V(W). This fact fits the exponent
case: both expected exponent and expected dispersion are
too optimistic w.r.t. the true exponent and dispersion.
• The factor VAR[C(W
S
)] can be viewed as a penalty
factor over the expected dispersion
˜
V(W), that grows as
the capacities of the underlying channels are more spread.
IV. CODE DESIGN
When designing channel codes, the fact that the output is
two-dimensional may complicate the code design. It would
therefore be of interest to apply some processing on the outputs
Y and S, and feed them to the decoder as a single value. We
seek such a processing method that would not compromise
the achievable performance over the modified channel (not
only in the capacity sense, but in the error probability at finite
codelengths sense as well).
For binary channels this can be done easily by calculating
the log-likelihood-ratios for each channel output pair (y, s)
(see Figure 2).
For channel outputs s and y, denote the LLR of x given
(y, s) by z:
z = LLR(y, s) log
P
Y |S,X
(y|s, x = 0)
P
Y |S,X
(y|s, x = 1)
. (26)
It is well known that for channels with binary input, the
optimal ML decoder can be implemented to work on the LLR
values only. Therefore by plugging the LLR calculator at the
channel output, and supplying the decoder with the LLRs only,
the performance is not harmed, and we can therefore regard
the channel as a simple DMC with input x and output z =
LLR(y, s) for code design purposes.
V. EXAMPLES
A. Symmetrization of binary channels with equiprobable input
In the design of state-of-the-art channel codes, it is usually
convenient to have channels that are symmetric. In recent years
there have developed methods to design very efficient binary
codes, such as LDPC codes. When designing LDPC codes, A
desired property of a binary channel is that its output will be
symmetric[6].
Definition 3 (Binary input, output symmetric channels [6]):
A memoryless binary channel U with input alphabet {0, 1}
and output alphabet R is called output-symmetric, if for all
y ∈ R
U(y|0) = U(−y|1). (27)
Consider a general binary channel W with arbitrary out-
put (which is not necessarily symmetric), and suppose that,
for practical reasons, we are interested in coding over this
channel with equiprobable input (which may or may not be
the achieving prior for that channel). The fact that we use
equiprobable input does not make the channel symmetric
according to Definition 3. However, there exists a method
for transforming this channel to a symmetric one, without
compromising on the capacity, error exponent or dispersion:
First, we add the LLR calculation to the channel and regard it
as a part of the channel. This way we get a real-output channel
from any arbitrary channel. Second, before we transmit the
binary codewords on the channel, instead of coding on the
channel directly, we perform a bit-wise XOR operation with
an iid pseudo-random binary vector. It can be shown that by
multiplying the LLR values by −1 wherever the input was
flipped, the LLRs are corrected. It can also be shown that
the channel, with the corrected LLR calculation is symmetric
according to Definition 3. In [7], this method (termed ’channel
adapters’) was used in order to symmetrize the sub-channels of
several coded modulation schemes. It is also shown in [7] that
the capacity is unchanged by the channel adapters. By using
Theorems 1 and 3, it can be verified that the error exponent
bounds and the dispersion remain the same as well.
B. Multilevel Coding and Multistage Decoding (MLC-MSD)
MLC-MSD is a method for using binary codes in order
to achieve capacity on nonbinary channels (see, e.g. [8]). In
000801
W m ˆ m
z ∈ R
n
Random
state
Encoder Decoder
LLR
calc.
Fig. 2. Incorporating LLR calculation into the channel
MLC-MSD, the binary encoders work in parallel over the same
block of channel uses, and the decoders work sequentially as
follows: the first decoder assumes the rest of the codewords
are noise and decodes the message from the first encoder.
Every other decoder, in its turn, decodes the message from the
corresponding encoder assuming that the decoded messages
from the previous decoders are correct, therefore regards these
messages as side-information. The effective channels between
each encoder-decoder, called sub-channels, are in fact channels
with CSI at the receiver, and therefore can be analyzed by
Theorems 1 and 3. For more details on finite-length analysis
of MLC-MSD, see [9].
C. Bit-Interleaved Coded Modulation (BICM)
BICM [10] is another popular method for channel coding
using binary codes over nonbinary channels (for example,
a channel with output of size 2
L
). It is based on taking a
single binary code, feeding it into a long interleaver, and
then mapping the interleaved coded bits onto the nonbinary
channel alphabet (every L-tuple of consecutive bits is mapped
to a symbol in the channel input alphabet of size 2
L
). At the
receiver, the LLR of all coded bits are calculated according to
the mapping, de-interleaved and fed to the decoder.
By assuming that the interleaver is ideal (i.e. of infinite
length), the equivalent channel of BICM is modeled as a
binary channel with a random state [10]. The state is chosen
uniformly from {1, ..., L}, and represents the index of the
input bit in the L-tuple. Since the state is known to the receiver
only, this model fits the channel models discussed in the paper.
Finite blocklength analysis of BICM should be done care-
fully: although the model of a binary channel with a state
known at the receiver allows the derivation of error exponent
and channel dispersion, they do not have the usual meaning of
quantifying the performance of BICM at finite block lengths.
The reason for that is the interleaver: how can one rely on the
existence of an infinite-length interleaver in order to estimate
the finite-length performance?
The solution comes in the form of an explicit finite-length
interleaver. Recently an alternative scheme called Parallel
BICM was proposed [11], where binary codewords are used
in parallel and an interleaver of finite length is used in order
to validate the BICM model of a binary channel with a state
known at the receiver. This allows the proper use of Theorems
1 and 3 (see [11] for the details).
D. Fading Channels
Rayleigh fading channel, which is popular in wireless
communication, can be modeled as a channel with CSI at the
receiver. The state in this setting is the fade value, which is
usually estimated and some version of it is available at the
receiver. When the fading is fast (a.k.a. ergodic fading) the
channel is memoryless and fits the model discussed in the
paper, and Theorems 1 and 3 can be applied.
ACKNOWLEDGMENT
A. Ingber is supported by the Adams Fellowship Program
of the Israel Academy of Sciences and Humanities.
REFERENCES
[1] Robert G. Gallager, Information Theory and Reliable Communication,
John Wiley & Sons, Inc., New York, NY, USA, 1968.
[2] Pierre Moulin and Ying Wang, “Capacity and random-coding exponents
for channel coding with side information,” IEEE Trans. on Information
Theory, vol. 53, pp. 1326–1347, 2007.
[3] Y. Polyanskiy, H.V. Poor, and S. Verd´ u, “Channel coding rate in the
finite blocklength regime,” IEEE Trans. on Information Theory, vol. 56,
no. 5, pp. 2307 –2359, May 2010.
[4] V. Strassen, “Asymptotische absch¨ atzungen in shannons informa-
tionstheorie,” Trans. Third Prague Conf. Information Theory, 1962,
Czechoslovak Academy of Sciences, pp. 689–723.
[5] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory,
John Wiley & sons, 1991.
[6] Thomas J. Richardson, Mohammad Amin Shokrollahi, and R¨ udiger L.
Urbanke, “Design of capacity-approaching irregular low-density parity-
check codes,” IEEE Trans. on Information Theory, vol. 47, no. 2, pp.
619–637, 2001.
[7] Jilei Hou, Paul H. Siegel, Laurence B. Milstein, and Henry D. Pfister,
“Capacity-approaching bandwidth-efficient coded modulation schemes
based on low-density parity-check codes,” IEEE Trans. on Information
Theory, vol. 49, no. 9, pp. 2141–2155, 2003.
[8] Udo Wachsmann, Robert F. H. Fischer, and Johannes B. Huber, “Mul-
tilevel codes: Theoretical concepts and practical design rules,” IEEE
Trans. on Information Theory, vol. 45, no. 5, pp. 1361–1391, 1999.
[9] Amir Ingber and Meir Feder, “Capacity and error exponent analysis
of multilevel coding with multistage decoding,” in Proc. IEEE Inter-
national Symposium on Information Theory, Seoul, South Korea, 2009,
pp. 1799–1803.
[10] Giuseppe Caire, Giorgio Taricco, and Ezio Biglieri, “Bit-interleaved
coded modulation,” IEEE Trans. on Information Theory, vol. 44, no. 3,
pp. 927–946, 1998.
[11] Amir Ingber and Meir Feder, “Parallel bit interleaved coded mod-
ulation,” in ALLERTON 2010, 48th Annual Allerton Conference on
Communication, Control and Computing, September 29 - October 1,
2010, Allerton, USA, 09 2010.
000802

We shall be interested in the tradeoff between rate R. We conclude that the capacity formula can be interpreted as an expectation over the capacities of the underlying channels. where n • fn : M → X is the encoder. The communication scheme is depicted in Figure 1.PX ) = − log E 2−E0 (ρ.s∈S x∈X 1+ρ In the paper we limit ourselves to a fixed input distribution (e. In fact.X (y|s. R)) . (s) where (a) follows from the definition of E0 . I NFORMATION T HEORETIC A NALYSIS Here we shall be interested in the performance of the optimal codes for the channel W . (8) − log s∈S 1+ρ ⎤ ⎦ × y∈Y (a) x∈X PX (x)PY |S. PX ) − ρR}. [5]): C(W ) = = = max I(X. S) max I(X. denoted E0 . S) p(x) p(x) p(x) B. Theorem 1: Let W be a channel with CSI at the receiver. Note that the capacity can also be written as max ES [I(X. We note that both bounds depend on the function E0 (·). Y |S = s)]. Following the relationship (8). Let n be the codeword length. • The considered error probability is the codeword error ˆ probability pe P (m = m). s|x) 1+ρ PS (s)× 1 Recalling the definition of the channel conditioned on the state s. p) − ρR}. x) PS (s)2−E0 s∈S ( S) ( s) 1 1+ρ where C(Ws ) is the capacity of the underlying channel Ws . PX ) is given by ⎡ − log⎣ y∈Y x∈X 1+ρ 1 1+ρ ⎤ ⎦. The random coding exponent is given by Er (R) = max max{E0 (ρ. e. s). respectively. = − log (ρ. we wish to find the connections between E0 (·) and the corresponding E0 functions of the (s) conditional channels Ws . codelength n and error probability pe of the best possible codes. Communication Scheme The communication scheme is defined as follows.PX ) . Y |S). ρ>0 pX (·) (12) max I(X. ρ∈[0. Error Exponent The error exponent of a channel is given by [1] 1 (9) E(R) lim − log (pe (n. The encoder and decoder are denoted fn and gn respectively. (11) PX (x)W (y|x) The sphere packing bound Esp (R) is given by Esp (R) = max max{E0 (ρ. and they coincide at rates beyond a certain rate called the critical rate. While the exact characterization of the error exponent is still an open question.g. (7) ⎤ ⎦ PX (x)W (y. the capacity can be simply derived (see. which are a lower and upper bounds. n→∞ n where pe (n. Y |S) ES [I(X. In this case the capacity is given by I(X. R) is the average codeword error probability for the best code of length n and rate R (assuming that the limit exists). PX ) = − log E 2−E0 ( S) (ρ. p(x) It can be seen that both exponent bounds are similar. PX ) = ⎡ = = − log ⎣ y∈Y. Y |S) + I(X. Y |S = s)]. we derive E0 (·) explicitly. (13) (6) Proof: When the channel output is (y. and let M be a set of 2nR messages. Note that when the CSI is available at the transmitter as well. which maps the channel output and the channel state to an estimate m of ˆ the transmitted message. Y |S = s)] E[C(WS )]. where the messages m are drawn randomly from M with equal probability. Capacity Since the channel W is simply a DMC with a scalar input and a vector output. Then the function E0 (·) for this channel is given by E0 (ρ. III. which maps the input message m to the channel input x ∈ X n . two important bounds are known [1]: the random coding and the sphere packing error exponents. (8) holds even without the assumption of a fixed prior on X.g. they only differ in the optimization region of the parameter ρ.1] pX (·) (10) where E0 (ρ. A.B. We review known results for the capacity. we get E0 (ρ. we get C(W ) = = = I(X. and present the results for the error exponent and the channel dispersion. For channels with CSI at the receiver.PX ) . n n • gn : Y × S → M is the decoder. equiprobable). Y. (5) where the last equality holds since X and S are independent. 000799 . Y |S) = ES [I(X.

t.t. Theorem 3: The dispersion of the channel W with CSI at the receiver is given by V(W ) = E[V(WS )] + VAR [C(WS )] . r (14) pe and a codeword length n are given. define the (operational) channel dispersion as follows [3]: Definition 2: The dispersion V(W ) of a channel W with capacity C is defined as V(W ) = lim lim sup n · pe →0 n→∞ ˜ Then Er (R) always overestimates the true random coding exponent of W . We conclude that the expectation (w. Communication scheme for channels with CSI at the receiver As a corollary. PX ). We can then seek the maximal achievable rate R given pe and n. y) is the information spectrum. Er (R)) will be given by the expectation of the exponent function w. (18) E E E(S) (R) r PX . It appears that for fixed pe and n. It is also known that the channel dispersion and the error exponent are related as follows. ρ∈[0. ˜ We continue with Er (R): ˜ Er (R) = = ≥ = (a) C − R(n. Note that the proof of Theorem 2 holds no matter what the optimization region of ρ is. which is given by V VAR(i(X. In 1962 . Therefore the same version for the sphere packing exponent follows similarly.1] PX . (S) ˜ Proof: Let E0 (ρ. Formally. This result was recently tightened (and extended to the power-constrained AWGN channel) in [3]. PXY (x. PX ) − ρR ˜ E0 (ρ. over the critical rate). Strassen’s result proves that the dispersion of DMCs is equal to VAR(i(X. pe ) Q−1 (pe ) 2 . Suppose that a fixed codeword error probability and the distribution of X is the capacity-achieving distribution that minimizes V . (17) (15) where R(n. Dispersion An alternative information theoretical measure for quantifying coding performance with finite block lengths is the channel dispersion.1] sup sup E E0 (ρ.1] sup (S) E0 (ρ.Random state s ∈ Sn m∈M x ∈ Xn W y ∈ Yn Decoder m ˆ Encoder Fig. it follows by the Jensen inequality and Theorem 1 that ˜ E0 (ρ. the error exponent can be approximated (for rates close to the capacity) by 2 E(R) ∼ (C−R) . ρ∈[0. Following (8). the gap to the √ channel capacity is approximately proportional to Q−1 (pe )/ n (where Q(·) is the complementary Gaussian cumulative distribution function). as seen in Theorem 1. ρ∈[0. Y )). In addition. pe ) = C − V −1 Q (pe ) + O n log n n . pe ) is the highest achievable rate for codeword error probability pe and codeword length n. PX ) = E E0 (ρ. PX ) ≤ E0 (ρ.1] sup = Er (R). (21) 000800 . For a channel with capacity C and dispersion V . Since 2−(·) is convex. Er (R). C. Strassen [4] used the Gaussian approximation to derive the following result for DMCs: R(n. y) . S.r. PX ) (S) − ρR where C is the channel capacity. PX ) . one might think that the error exponent bounds (for example. S) of the error exponent bounds overestimate the true exponent bounds of W (and also the true error exponent. given by i(x.r. Y )). S: ˜ Er (R) E E(S) (R) . where (a) follow from (15). we get the random coding and the sphere packing exponents for the channel W according to (10) and (12). The proportion constant (squared) is called the channel dispersion. See [3] for details on the early origins of = 2V ln 2 this approximation by Shannon.r. 1. We now explore the dispersion for the case of channels with side information at the receiver. This is clearly not the case. PX ) − ρR] (16) ≥ PX . PX (x)PY (y) (19) where i(x. ρ∈[0. y) log (20) PX . PX ) − ρR [E0 (ρ. and the new quantity V is the (information-theoretic) dispersion. the following can be shown: (S) ˜ Theorem 2: Let Er (R) be the average of Er w.t.

with the corrected LLR calculation is symmetric according to Definition 3. it can be seen that V(W ) underestimates the true dispersion of W . the performance is not harmed. (27) Consider a general binary channel W with arbitrary output (which is not necessarily symmetric). Therefore by plugging the LLR calculator at the channel output. [8]). It is also shown in [7] that the capacity is unchanged by the channel adapters. Y |S = s).t. When designing LDPC codes. s) for code design purposes. but in the error probability at finite codelengths sense as well). and supplying the decoder with the LLRs only. The capacity is given by C(Ws ) = = V(Ws ) E [i(X. As an immediate corollary of ˜ Theorem 3. The fact that we use equiprobable input does not make the channel symmetric according to Definition 3. s)PX (x) PY |S. Y |S)|S = s]] +VAR [E[i(X. This fact fits the exponent case: both expected exponent and expected dispersion are too optimistic w. s) = (a) For channel outputs s and y. x) log i(x. Definition 3 (Binary input. instead of coding on the channel directly. Suppose that s is fixed. IV. Y |S)|S = s] I(X. Second. However. s) by z: z = LLR(y.X (y|s. such as LDPC codes.t. i. without compromising on the capacity. the proof boils down to the calculation of VAR[i(X. before we transmit the binary codewords on the channel. it can be verified that the error exponent bounds and the dispersion remain the same as well. e. output symmetric channels [6]): A memoryless binary channel U with input alphabet {0. this method (termed ’channel adapters’) was used in order to symmetrize the sub-channels of several coded modulation schemes. and feed them to the decoder as a single value. x = 0) .X (y|s. and we can therefore regard the channel as a simple DMC with input x and output z = LLR(y. PY |S. we perform a bit-wise XOR operation with an iid pseudo-random binary vector. Y |S)|S = s −E2 [i(X. y|s). s. B. Y |S)|S = s − C(Ws )2 . • The factor VAR [C(WS )] can be viewed as a penalty ˜ factor over the expected dispersion V(W ). PY |S (y|s) log (22) where (a) follows since X and S are independent. C ODE DESIGN When designing channel codes. the true exponent and dispersion. Y |S)|S = s] = E i2 (X. It would therefore be of interest to apply some processing on the outputs Y and S. denote the LLR of x given (y. We seek such a processing method that would not compromise the achievable performance over the modified channel (not only in the capacity sense. and suppose that. that grows as the capacities of the underlying channels are more spread. (25) = = where (a) follows from the law of total variance. x = 1) (26) = PY SX (y. A few notes regarding this result: ˜ • Let V(W ) E[V(WS )]. 000801 . S))]. s) log PY |S. it is usually convenient to have channels that are symmetric.where both expectation and variance are taken w. there exists a method for transforming this channel to a symmetric one. 1} and output alphabet R is called output-symmetric. the random state S. x) PY S (y. In [7]. A desired property of a binary channel is that its output will be symmetric[6]. In recent years there have developed methods to design very efficient binary codes. E XAMPLES A. error exponent or dispersion: First. the fact that the output is two-dimensional may complicate the code design. This way we get a real-output channel from any arbitrary channel. Y |S)) E[VAR[i(X. y. It can also be shown that the channel.r. (Y. The information spectrum in this case is given by i(x. the dispersion of the original channel W is given as follows: V(W ) = (a) VAR(i(X. For binary channels this can be done easily by calculating the log-likelihood-ratios for each channel output pair (y. V. consider the channel Ws .g.X (y|s. It can be shown that by multiplying the LLR values by −1 wherever the input was flipped. if for all y∈R U (y|0) = U (−y|1). we are interested in coding over this channel with equiprobable input (which may or may not be the achieving prior for that channel). Y |S)|S = s) = E i2 (X. By using Theorems 1 and 3. (24) Finally. we add the LLR calculation to the channel and regard it as a part of the channel. Y |S)|S = s]] E[V(WS )] + VAR [C(WS )] . for practical reasons. s) (see Figure 2).e.r. V(W ). the LLRs are corrected. the optimal ML decoder can be implemented to work on the LLR values only. (23) It is well known that for channels with binary input. Proof: Since W is nothing but a DMC with a vector output. In The dispersion of the channel Ws is given by = VAR(i(X. Symmetrization of binary channels with equiprobable input In the design of state-of-the-art channel codes. Multilevel Coding and Multistage Decoding (MLC-MSD) MLC-MSD is a method for using binary codes in order to achieve capacity on nonbinary channels (see.

For more details on finite-length analysis of MLC-MSD. the equivalent channel of BICM is modeled as a binary channel with a random state [10]. see [9]. “Design of capacity-approaching irregular low-density paritycheck codes. and represents the index of the input bit in the L-tuple.” in Proc. 5. H. of infinite length). [2] Pierre Moulin and Ying Wang. September 29 .e. Recently an alternative scheme called Parallel BICM was proposed [11]. no. Giorgio Taricco. May 2010. Ingber is supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities. are in fact channels with CSI at the receiver. and S. [8] Udo Wachsmann. and Ezio Biglieri. which is popular in wireless communication. USA.” in ALLERTON 2010. D. feeding it into a long interleaver. 1991. 09 2010. Cover and Joy A. Seoul. Laurence B. Since the state is known to the receiver only.V. 2. 1998. and therefore can be analyzed by Theorems 1 and 3. IEEE International Symposium on Information Theory. 2010. R EFERENCES [1] Robert G. Pfister. 927–946. Robert F. pp. John Wiley & sons. NY. L}.Random state m Encoder W LLR calc. on Information Theory. [10] Giuseppe Caire. The state in this setting is the fade value. pp. 619–637. called sub-channels. 2001. u Urbanke. Fading Channels Rayleigh fading channel. Siegel. The reason for that is the interleaver: how can one rely on the existence of an infinite-length interleaver in order to estimate the finite-length performance? The solution comes in the form of an explicit finite-length interleaver. no. no. de-interleaved and fed to the decoder. 47. 1799–1803. “Capacity and random-coding exponents for channel coding with side information. South Korea. C.October 1. Verd´ . 000802 . in its turn.k. [5] Thomas M. Strassen.” Trans. H. Third Prague Conf. Information Theory and Reliable Communication. Richardson. “Parallel bit interleaved coded modulation. At the receiver. The effective channels between each encoder-decoder. Polyanskiy. John Wiley & Sons. [6] Thomas J. Huber. “Capacity and error exponent analysis of multilevel coding with multistage decoding. Paul H. and then mapping the interleaved coded bits onto the nonbinary channel alphabet (every L-tuple of consecutive bits is mapped to a symbol in the channel input alphabet of size 2L ). vol. [9] Amir Ingber and Meir Feder. 49. vol. no. Czechoslovak Academy of Sciences. decodes the message from the corresponding encoder assuming that the decoded messages from the previous decoders are correct. pp.” IEEE Trans. 2141–2155. 45. New York. 1962. vol. 1999. “Capacity-approaching bandwidth-efficient coded modulation schemes based on low-density parity-check codes. which is usually estimated and some version of it is available at the receiver. 2. and Johannes B. “Multilevel codes: Theoretical concepts and practical design rules. Poor. and Henry D.” IEEE Trans. no. 48th Annual Allerton Conference on Communication. 1968. Gallager. pp. 1326–1347. [3] Y. [7] Jilei Hou. 53. 689–723. USA. pp. When the fading is fast (a.. Allerton.” IEEE Trans. By assuming that the interleaver is ideal (i. and R¨ diger L. ergodic fading) the channel is memoryless and fits the model discussed in the paper. It is based on taking a single binary code.” IEEE Trans. the LLR of all coded bits are calculated according to the mapping. 5. z ∈ Rn Decoder m ˆ Fig. this model fits the channel models discussed in the paper. they do not have the usual meaning of quantifying the performance of BICM at finite block lengths. therefore regards these messages as side-information. 2007. pp. The state is chosen uniformly from {1. Control and Computing. vol. This allows the proper use of Theorems 1 and 3 (see [11] for the details). and Theorems 1 and 3 can be applied.. Mohammad Amin Shokrollahi. 1361–1391. Fischer. [4] V. Thomas. “Asymptotische absch¨ tzungen in shannons informaa tionstheorie. Finite blocklength analysis of BICM should be done carefully: although the model of a binary channel with a state known at the receiver allows the derivation of error exponent and channel dispersion. ACKNOWLEDGMENT A. “Channel coding rate in the u finite blocklength regime. Inc. 3. on Information Theory. 2003. vol. 2307 –2359. a channel with output of size 2L ). Bit-Interleaved Coded Modulation (BICM) BICM [10] is another popular method for channel coding using binary codes over nonbinary channels (for example. 9.. Elements of Information Theory. vol.” IEEE Trans. “Bit-interleaved coded modulation. on Information Theory. 2009.” IEEE Trans. pp. pp. the binary encoders work in parallel over the same block of channel uses. where binary codewords are used in parallel and an interleaver of finite length is used in order to validate the BICM model of a binary channel with a state known at the receiver. 56.. on Information Theory. 44. Milstein.a. Every other decoder. and the decoders work sequentially as follows: the first decoder assumes the rest of the codewords are noise and decodes the message from the first encoder. Information Theory. on Information Theory. Incorporating LLR calculation into the channel MLC-MSD. . on Information Theory. [11] Amir Ingber and Meir Feder. can be modeled as a channel with CSI at the receiver.