Professional Documents
Culture Documents
Abstract
The output distribution, when rate is above capacity, is investigated. It is shown that there is an
asymptotic equipartition property (AEP) of the typical output sequences, independently of the specific
codebook used, as long as the codebook is typical according to the standard random codebook generation.
This equipartition of the typical output sequences is caused by the mixup of input sequences when there
are too many of them, namely, when the rate is above capacity. This discovery sheds some light on the
optimal design of the compress-and-forward relay schemes.
I. I NTRODUCTION
A fundamental observation of Shannon’s channel coding theorem is that using a randomly
generated codebook (i.i.d. generated according to some p0 (x)) at a rate below capacity will lead
to a distribution pattern of the output sequences, by which, a decoding scheme with arbitrarily
low probability of error can be devised.
In this paper, we are interested in the case when the rate is above capacity. We will show
that such a pattern that can be used for decoding will disappear when there are too many input
sequences, i.e., when the rate is above capacity. Instead, in this case, the output will have an
asymptotic equipartition property on the set of typical output sequences (typical with respect to
P
p0 (y) = x p0 (x)p(y|x)). Interestingly, this set is independent of the specific codebook used, as
long as the codebook is typical according to the random codebook generation. The reason for
this equipartition is that the input sequences are too dense, so that different input sequences can
contribute to the same output sequence and get mixed up.
Investigating the optimal compress-and-forward relay scheme has motivated this study of output
distribution when rate is above capacity. The optimality of the compress-and-forward schemes
is arguably one of the most critical problems in the development of network information theory,
where ambiguity always arises when decoding cannot be done correctly. In the classical approach
of [2], the compression scheme at the relay was only based on the distribution used for generating
the codebook at the source, instead of the specific codebook generated. While many different
codebooks can be generated according to the same distribution, can the knowledge of the specific
codebook be helpful? There have been some discussions on this issue (e.g., [3]). Here, in this
paper, we show that the observations at the relay are somehow independent of the specific
codebook used at the source, and only depend on the distribution by which the codebook is
generated.
To further explore the optimality of the compress-and-forward schemes, we compare the rates
needed to losslessly compress the relay’s observation in two different scenarios: i) the relay uses
the knowledge of the source’s codebook to do the compression; ii) the relay simply ignores this
knowledge. It is shown that the minimum required rates in both scenarios are the same when the
rate of the source’s codebook is above the capacity of the source-to-relay link.
The remainder of the paper is organized as the following. In Section II, we first introduce
some standard definitions of strongly typical sequences, and then give a definition of typical
codebooks. Then, we summarize our main results in Section III, followed by the proof of these
results in Section IV, V and VI. Finally, as an application of the results, the optimality of the
compress-and-forward schemes is discussed in Section VII.
II. P RELIMINARIES
Consider a discrete memoryless channel (X , p(y|x), Y) with capacity C := maxp(x) I(X; Y ).
Under the random coding framework, a random codebook C with respect to p0 (x) with rate R
and block length n is defined as
C := X n (w) ∈ X n , w = 1, . . . , 2nR ,
(1)
where each codeword in C is an i.i.d. random sequence generated according to a fixed input
distribution p0 (x).
It is well known that information can be transmitted with arbitrarily small probability of error
for sufficiently large n if R < C. In this paper, however, we are interested in the case where the
rate is above capacity.
3
A. Strong Typicality
We begin with some standard definitions on strong typicality [3, Ch.13].
(n)
Definition 2.1: The ǫ-strongly typical set with respect to p0 (x), denoted by Aǫ,0 (X), is the set
of sequences xn ∈ X n satisfying:
1. For all a ∈ X with p0 (a) > 0,
1
N(a|x ) − p0 (a) < ǫ ,
n
n |X |
N(a, b|xn , y n ) = 0.
N(a, b|xn , y n) is the number of occurrences of the pair (a, b) in the pair of sequences (xn , y n ).
Definition 2.3: The ǫ-strongly conditionally typical set with the sequence xn with respect to
(n)
the conditional distribution p(y|x), denoted by Aǫ (Y |xn ), is the set of sequences y n ∈ Y n
satisfying:
1. For all (a, b) ∈ X × Y with p(b|a) > 0,
1 1
|N(a, b|xn , y n ) − p(b|a)N(a|xn )| ≤ ǫ(1 + ), (2)
n |Y|
2. For all (a, b) ∈ X × Y with p(b|a) = 0,
B. Typical Codebooks
Definition 2.4: For the discrete memoryless channel (X , p(y|x), Y), the channel noise is said
to be ǫ-typical if for any given input xn , the output Y n is ǫ-strongly conditionally typical with
(n)
xn with respect to the channel transition function p(y|x), i.e., Y n ∈ Aǫ (Y |xn ).
Due to the Law of Large Numbers, the channel noise is “typical” with high probability.
(n) n (n) (n) (n)
Index the sequences in Aǫ,0 (Y ) as yǫ,0 (i), i = 1, . . . , Mǫ,0 , where Mǫ,0 = |Aǫ,0 (Y )|. Consider
the set Fǫ,0 (i) ⊆ X n , where each sequence in Fǫ,0 (i) is strongly typical and can reach yǫ,0
n
(i)
over a channel with typical noise, i.e.,
n o
(n)
Fǫ,0(i) := xn ∈ Aǫ,0 (X) : yǫ,0
n
(i) ∈ A(n)
ǫ (Y |xn
) .
where X̃ n is drawn i.i.d. according to p0 (x) and I(A) is the indicator function:
1 if A holds,
I(A) =
0 otherwise.
C = xn (w) ∈ X n , w = 1, . . . , 2nR
(n)
1) xn (w) ∈ Aǫ,0 (X), ∀w ∈ {1, . . . , 2nR },
n3 R
Nǫ,0 (i|C)
2) sup
2nR − P ǫ,0 (i) ≤
nR
.
(n)
i∈{1,...,M }
2
ǫ,0
Theorem 3.1: Given that an ǫ-typical codebook C is used and the channel noise is also ǫ-
typical, then,1
. (n)
Pr(Y n = yǫ,0
n
(i)|C) = 2−nH0 (Y ) , ∀i ∈ {1, . . . , Mǫ,0 },
when R > I0 (X; Y ), where both H0 (Y ) and I0 (X; Y ) are calculated according to p0 (x, y) =
p0 (x)p(y|x).
Throughout this paper, we generate the codebook C at random according to p0 (x) and reserve
only the ǫ-strongly typical codewords. Then we have Theorem 3.2 and 3.3.
Theorem 3.2: For any ǫ > 0,
where H0 (Y ), I0 (X; Y ) and H0 (Y |X) are all calculated according to p0 (x, y) = p0 (x)p(y|x).
In contrast, without the codebook information, we have
1
lim H(Y n ) = H0 (Y ) for any R > 0.
n→∞ n
Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn ) ≥ 2−n(H0 (Y |X)+ǫ0 )
and Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn ) ≤ 2−n(H0 (Y |X)−ǫ0 ) ,
(n)
Proof: By the definition of Fǫ,0 (i), we have for any xn in Fǫ,0 (i), xn ∈ Aǫ,0 (X) and
n (n)
yǫ,0 (i) ∈ Aǫ (Y |xn ). Then, it follows from the definition of strong typicality that (xn , yǫ,0
n
(i)) ∈
(n)
Aǫ′ ,0 (X, Y ), where ǫ′ → 0 as ǫ → 0. Since strong typicality implies weak typicality, for any xn
in Fǫ,0 (i), we have
1
− log p(xn ) − H0 (X) < ǫ′′ ,
n
1
− log p(xn , y n (i)) − H0 (X, Y ) < ǫ′′ ,
n ǫ,0
Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn )
Pr(Y n = yǫ,0n
(i), Eǫ , X n = xn )
=
Pr(Eǫ , X n = xn )
Pr(Y n = yǫ,0 n
(i), X n = xn )
=
Pr(Eǫ |X n = xn )Pr(X n = xn )
n
p(yǫ,0 (i)|xn )
=
Pr(Eǫ |X n = xn )
n
=(1 + o(1))p(yǫ,0 (i)|xn )
′′ )
≤(1 + o(1))2−n(H0 (Y |X)−2ǫ
=2−n(H0 (Y |X)−ǫ0 ) ,
log(1+o(1))
where ǫ0 := 2ǫ′′ + n
and ǫ0 → 0 as ǫ → 0 and n → ∞. Similarly, for any xn ∈ Fǫ,0(i),
we have
Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn )
n
=(1 + o(1))p(yǫ,0 (i)|xn )
′′ − log(1+o(1)) )
≥2−n(H0 (Y |X)+2ǫ n
≥2−n(H0 (Y |X)+ǫ0 ) ,
7
(n)
Pr((X̃ n , y n ) ∈ Aǫ,0 (X, Y )) ≥ 2−n(I0 (X;Y )+ǫ1 ) (5)
(n)
and Pr((X̃ n , y n ) ∈ Aǫ,0 (X, Y )) ≤ 2−n(I0 (X;Y )−ǫ1 ) , (6)
′
=2−n(I0 (X;Y )−ǫ2 ) (7)
log(1+o(1))
where ǫ′2 := ǫ′1 + n
and ǫ′2 → 0 as ǫ → 0 and n → ∞.
Furthermore, by the standard definitions of strong typicality, it follows that (xn , yǫ,0
n
(i)) ∈
(n) (n) (n)
Aǫ,0 (X, Y ) implies xn ∈ Aǫ,0 (X). Now, we show (xn , yǫ,0
n n
(i)) ∈ Aǫ,0 (X, Y ) also implies yǫ,0 (i) ∈
(n) (n)
Aǫ (Y |xn ). Suppose (xn , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ). Then, we have
8
1) For all (a, b) ∈ X × Y with p(b|a) = 0, p0 (a, b) = 0 and N(a, b|xn , yǫ,0
n
(i)) = 0.
2) For all (a, b) ∈ X ×Y with p(b|a) > 0 and p0 (a) = 0, p0 (a, b) = 0 and N(a, b|xn , yǫ,0
n
(i)) =
0, as well as N(a|xn ) = 0.
3) For all (a, b) ∈ X × Y with p(b|a) > 0 and p0 (a) > 0, p0 (a, b) > 0 and
1
N(a|x ) − p0 (a) < ǫ ,
n
n |X |
1 n n
ǫ
N(a, b|x , y (i)) − p0 (a, b) <
n ǫ,0 |X ||Y| .
Thus,
1
N(a, b|xn , y n (i)) − 1 N(a|xn )p(b|a)
n ǫ,0
n
ǫ ǫ
<p0 (a, b) + − p(b|a)(p0 (a) − )
|X ||Y| |X |
ǫ ǫ
= + p(b|a)
|X ||Y| |X |
ǫ
≤ +ǫ
|Y|
1
=ǫ(1 + ).
|Y|
(n) (n) (n)
Therefore, (xn , yǫ,0
n n
(i)) ∈ Aǫ,0 (X, Y ) implies that yǫ,0 (i) ∈ Aǫ (Y |xn ), as well as xn ∈ Aǫ,0 (X).
Then, we have
n (n)
Pǫ,0(i) =Pr(yǫ,0 (i) ∈ A(n) n n
ǫ (Y |X̃ )|X̃ ∈ Aǫ,0 (X))
(n) (n)
≥Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y )|X̃ n ∈ Aǫ,0 (X))
(n) (n)
Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ), X̃ n ∈ Aǫ,0 (X))
= (n)
Pr(X̃ n ∈ Aǫ,0 (X))
(n)
=(1 + o(1))Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ))
≥(1 + o(1))2−n(I0(X;Y )+ǫ1 )
log(1+o(1))
=2−n(I0 (X;Y )+ǫ1 − n
)
Therefore, if C is a typical codebook, by the definition of the typical codebooks and (9), for
(n)
any i ∈ {1, . . . , Mǫ,0 },
′
Nǫ,0(i|C) ≥ 2nR · 2−n(I0 (X;Y )+ǫ0 ) − n3 R
′
and Nǫ,0(i|C) ≤ 2nR · 2−n(I0 (X;Y )−ǫ0 ) + n3 R,
where I0 (X; Y ) is calculated according to p0 (x)p(y|x) and ǫ′0 goes to 0 as ǫ → 0 and n → ∞.
(n)
Proof: [Proof of Theorem 3.1] Let Eǫ denote the event Y n ∈ Aǫ (Y |xn ) for any given
(n)
input xn . Consider Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical) for any i ∈ {1, . . . , Mǫ,0 }. We lower bound
this probability as follows:
Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical)
2nR
X
= Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical, X n = xn (w))
w=1
1
≥ Nǫ,0 (i|C) · 2−n(H0 (Y |X)+ǫ0 ) (13)
2nR
1 ′
≥ (2nR · 2−n(I0 (X;Y )+ǫ0 ) − n3 R) · 2−n(H0 (Y |X)+ǫ0 ) (14)
2nR
n3 R
′ ′
=2−n(H0 (Y )+ǫ0 +ǫ0 ) · 1 − nR · 2n(I0 (X;Y )+ǫ0 ) .
2
(10) follows from the Law of Total Probability and accumulates the contributions from all the
n
codewords in the codebook to the probability for yǫ,0 (i) to be channel output.
(11) follows from the uniform distribution of message index W .
(12) follows from the the condition Eǫ and the fact that C contains only strongly typical
codewords.
(13) follows from Lemma 4.1.
(14) follows from Lemma 4.2.
(n)
Let ǫ → 0 as n → ∞. Then for any i ∈ {1, . . . , Mǫ,0 },
.
Pr(Y n = yǫn (i)|Eǫ , C is typical) ≥ 2−nH0 (Y ) , (15)
10
Pr(Y n = yǫ,0n
(i)|Eǫ , C is typical)
1
≤ nR Nǫ,0 (i|C) · 2−n(H0 (Y |X)−ǫ0 )
2
1 ′
≤ nR (2nR · 2−n(I0 (X;Y )−ǫ0 ) + n3 R) · 2−n(H0 (Y |X)−ǫ0 )
2
n3 R n(I0 (X;Y )−ǫ′0 )
−n(H0 (Y )−ǫ0 −ǫ′0 )
=2 · 1 + nR · 2 .
2
(n)
Therefore, for any i ∈ {1, . . . , Mǫ,0 },
.
Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical) ≤ 2−nH0 (Y ) , (16)
when R > I0 (X; Y ). Combining (15) and (16), we establish Theorem 3.1.
whenever
8VC-d(F ) 16e 4 2
n > max log2 , log2 . (18)
ǫ ǫ ǫ δ
(n)
Let Fǫ,0 = {Fǫ,0 (i), i = 1, . . . , Mǫ,0 }. To show Theorem 3.2, a finite VC dimension of Fǫ,0
is desired in order to employ the Vapnik-Chervonenkis Theorem. For this reason, we introduce
Lemma 5.1.
11
E := I(Eǫ ),
(n)
where Eǫ denotes the event Y n ∈ Aǫ (Y |xn ) for any given input xn .
13
=H0 (Y ). (25)
Furthermore,
1 1
lim sup H(Y n |C) ≤ lim sup H(Y n ) = H0 (Y ), (26)
n→∞ n n→∞ n
and thus
where (27) follows from Lemma 6.2 and 6.3, (28) follows from the fact that C → X n → Y n
forms a Markov Chain, and (29) follows from Lemma 6.1. This completes the proof of Theorem
3.3.
Proof: [Proof of Lemma 6.2] To prove Lemma 6.2, we begin with Fano’s Inequality (see
Theorem 2.11.1 in [4]):
Let Pe = Pr(g(Y ) 6= X), where g is any function of Y . Then
For the channel (X , p(y|x), Y) with a codebook C, we estimate the message index W from
(n)
Y n . Let the estimate be Ŵ = g(Y n ) and Pe (C) = Pr(W 6= g(Y n )|C). Then, applying Fano’s
Inequality, we have
Then,
X X
H(X n |Y n , C) = p(C)H(W |Y n , C) ≤ p(C)(1 + Pe(n) (C)nR).
C C
Recall the channel coding theorem, which states that if we randomly generate the codebook
according to p0 (x), then when R < I0 (X; Y ),
X
p(C)Pe(n) (C) → 0. (31)
C
=0.
Therefore, to show Lemma 6.3, it suffices to show that limn→∞ n1 H(X n |C) ≥ R. For this
purpose, we first define a class of codebooks as regular codebooks and focus on characterizing
H(X n |C) for a regular codebook C. Then, we show that a regular codebook appears with high
probability when we randomly generate the codebook, and conclude that limn→∞ n1 H(X n |C) ≥
R.
17
2nR
X
n
N(x |C) = I(xn (w) = xn ),
w=1
(n)
and p(xn ) = Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) where X̃ n is drawn i.i.d. according to p0 (x).
(n)
Given a regular C, for any xn ∈ Aǫ,0 (X), we have
where the ǫ′ in (33) goes to 0 as ǫ → 0 and (34) follows from the general assumption that
R < H0 (X). Note that the message index W is uniformly distributed, we have for a given C
(n)
and any xn ∈ Aǫ,0 (X),
P2nR n
n w=1 I(x (w) = xn )
p(x |C) =
2nR
n
N(x |C)
=
2nR
3
n R + o(1)
≤
2nR
′′
=:2−n(R−ǫ )
Below, We use the Vapnik-Chervonenkis Theorem to show that a regular codebook appears
with high probability.
(n) (n)
Let B = {{xn }, xn ∈ Aǫ,0 (X)}. Since |B| = |Aǫ,0 (X)| ≤ 2n(H0 (X)+ǫ) , for any A ⊆ X n ,
(n)
|{{xn } ∩ A : xn ∈ Aǫ,0 (X)}| ≤ 2n(H0 (Y )+ǫ) ,
∆ǫ nR
≥1 − nR
2
→1 as n → ∞. (35)
n3 R ∆ǫ nR
Since 2nR
≥ 2nR
for sufficiently large n, (35) concludes that Pr(C is regular) → 1 as n → ∞.
Therefore,
X
H(X n |C) = p(C)H(X n |C)
C
X
≥ p(C)H(X n |C)
C is regular
X
≥[n(R − ǫ′′ )] p(C)
C is regular
and
1 1
lim H(X n |C) ≥ lim [n(R − ǫ′′ )](1 − o(1))
n→∞ n n→∞ n
=R. (36)
Yn R1 Yn
Yn R2 Yn
1) Y n can be encoded at rate R1 and recovered with arbitrarily low probability of error if
R1 > H0 (Y ).
2) Given that the source’s codebook information C is available to both the encoder and decoder
and Y n is encoded at rate R2 , the decoding probability of error will be bounded away from zero
if R2 < H0 (Y ), which implies that we cannot compress the channel output better even if the
source’s codebook information is employed.
To show Theorem 7.1, we need the following lemma.
Lemma 7.1: For the compression problem in Figure 1-(b), we can encode Y n at rate R2 and
(n)
recover it with the probability of error Pe → 0 only if
1
H(Y n |C).
R2 ≥ lim (37)
n n→∞
Proof: [Proof of Lemma 7.1] The source code for Figure 1-(b) consists of an encoder
(n)
mapping f (Y n , C) and a decoder mapping g(f (Y n , C), C). Let I = f (Y n , C), then Pe =
(n)
Pr(g(I, C) 6= Y n ). By Fano’s Inequality, for any source code with Pe → 0, we have
where ǫn → 0 as n → ∞.
(n)
Therefore, for any source code with rate R2 and Pe → 0, we have the following chain of
inequalities
where (39) follows from the fact that I ∈ {1, 2, . . . , 2nR2 }, (40) follows from the fact that I is a
function of Y n and C, and (41) follows from (38). Dividing the inequality nR2 ≥ H(Y n |C)−nǫn
by n and taking the limit as n → ∞, we establish Lemma 7.1.
Proof: [Proof of Theorem 7.1]
Proof of Part 1): To show Part 1), we only need to show that the sequence Y n satisfies the
(n)
Asymptotic Equipartition Property, i.e., Pr(Y n ∈ Aǫ,0 (Y )) → 1, as n → ∞. Then, following the
21
classical approach to show the source coding theorem, we can conclude that the rate R1 > H0 (Y )
(n)
is achievable. By Lemma 6.1, Pr((X n , Y n ) ∈ Aǫ,0 (X, Y )) → 1 as n → ∞. Thus, the sequence
Y n satisfies the Asymptotic Equipartition Property and the rate R1 > H0 (Y ) is achievable.
Proof of Part 2): By Lemma 7.1, given that the codebook information C is available to both the
(n)
encoder and decoder and Y n is encoded at rate R2 , Pe → 0 only if R2 ≥ limn→∞ n1 H(Y n |C).
By Theorem 3.3, limn→∞ n1 H(Y n |C) = H0 (Y ) when R > I0 (X; Y ). Therefore, when R >
(n)
I0 (X; Y ), Pe → 0 only if R2 ≥ H0 (Y ), which establishes Part 2).
A PPENDIX I
P ROOF OF L EMMA 6.1
Proof of Part 1): Let X̃ n be drawn i.i.d. according to p0 (x) and Ỹ n be generated from X̃ n
through the channel (X , p(y|x), Y). Then, we have
(n)
Pr((X n , Y n ) ∈ Aǫ,0 (X, Y ))
X
= p(xn )p(y n |xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) · Pr(Y n = y n |X n = xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) · Pr(Ỹ n = y n |X̃ n = xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr((X̃ n , Ỹ n ) = (xn , y n )|X̃ n ∈ Aǫ,0 (X))
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n) (n)
=Pr((X̃ n , Ỹ n ) ∈ Aǫ,0 (X, Y )|X̃ n ∈ Aǫ,0 (X))
(n)
Pr((X̃ n , Ỹ n ) ∈ Aǫ,0 (X, Y ))
= (n)
Pr(X̃ n ∈ Aǫ,0 (X))
→1, as n → ∞.
Proof of Part 2): Denote the ǫ-weakly typical sets with respect to p0 (x), p0 (y) and p0 (x, y) by
(n) (n) (n)
Wǫ,0 (X), Wǫ,0 (Y ) and Wǫ,0 (X, Y ) respectively. Along the same line as in the proof of part 1),
(n) (n)
we can prove that Pr((X n , Y n ) ∈ Wǫ,0 (X, Y )) → 1 as n → ∞, and hence Pr(X n ∈ Wǫ,0 (X))
(n)
and Pr(Y n ∈ Wǫ,0 (Y )) both go to 1 as n → ∞.
22
= : φ1 + φ2
For φ1 , we have
X 1
φ1 = p(y n ) log
(n)
p(y n )
y n ∈Wǫ,0 (Y )
X
≤ p(y n ) log 2n(H0 (Y )+ǫ)
(n)
y n ∈Wǫ,0 (Y )
(n)
=n(H0 (Y ) + ǫ)Pr(Y n ∈ Wǫ,0 (Y ))
=n(H0 (Y ) + ǫ)(1 − o(1)),
(n)
where the inequality follows from the fact that p(y n ) ≥ 2−n(H0 (Y )+ǫ) for any y n ∈ Wǫ,0 (Y ).
For φ2 , we have
X 1
φ2 = p(y n ) log
(n)
p(y n )
y n ∈W
/ ǫ,0 (Y )
X
=− p(y n ) log p(y n )
(n)c
y n ∈Wǫ,0 (Y )
p(y n )
P
(n)c
X
n y n ∈Wǫ,0 (Y )
≤− p(y ) log (42)
(n)c
(n)c |Wǫ,0 (Y )|
y n ∈Wǫ,0 (Y )
(n)
n (n) Pr(Y n ∈
/ Wǫ,0 (Y ))
= − Pr(Y ∈
/ Wǫ,0 (Y )) log (n)c
|Wǫ,0 (Y )|
(n) (n) (n) (n)c
= − Pr(Y n ∈
/ Wǫ,0 (Y )) log Pr(Y n ∈
/ Wǫ,0 (Y )) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Wǫ,0 (Y )|
(n) (n)c
=o(1) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Wǫ,0 (Y )| (43)
(n)
≤o(1) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Y|n
(n)
=o(1) + n · Pr(Y n ∈
/ Wǫ,0 (Y )) log |Y|
=n · o(1). (44)
23
(42) follows from the the log sum inequality (see Theorem 2.7.1 in [4]), which states that for
non-negative numbers, a1 , a2 , . . . , an and b1 , b2 , . . . , bn ,
n n
! Pn
X ai X ai
ai log ≥ ai log Pi=1
n
i=1
bi i=1 i=1 bi
ai
with equality if and only if bi
are equal for all i.
(n)
(43) and (44) both follow from the fact that Pr(Y n ∈ Wǫ,0 (Y )) → 1 as n → ∞.
Therefore,
H(Y n ) =φ1 + φ2
≤n(H0 (Y ) + ǫ)(1 − o(1)) + n · o(1)
=n(H0 (Y ) + ǫ)(1 − o(1)). (45)
Similarly, we have
X 1
H(Y n ) ≥ p(y n ) log
(n)
p(y n )
y n ∈Wǫ,0 (Y )
X
≥ p(y n ) log 2n(H0 (Y )−ǫ)
(n)
y n ∈Wǫ,0 (Y )
(n)
=n(H0 (Y ) − ǫ)Pr(Y n ∈ Wǫ,0 (Y ))
=n(H0 (Y ) − ǫ)(1 − o(1)). (46)
R EFERENCES
[1] X. Wu and L.-L. Xie, “AEP of output when rate is above capacity,” in Proc. of the 11th Canadian Workshop on Information
Theory, Ottawa, Canada, May 13-15, 2009.
[2] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Inform. Theory, vol. 25, pp. 572–584,
1979.
[3] F. Xue, P. R. Kumar and L.-L. Xie, “The conditional entropy of the jointly typical set when coding rate is larger than
Shannon Capacity,” Manuscript, 2006.
[4] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[5] V. N. Vapnik and A. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,”
Theory of Probability and its Applications, vol. 16, no. 2, pp. 264–280, Jan. 1971.
[6] V. N. Vapnik, Estimation of dependences based on empirical data. New York: Springer-Verlag, 1982.