You are on page 1of 23

1

Asymptotic Equipartition Property of Output


when Rate is above Capacity
Xiugang Wu and Liang-Liang Xie
Department of Electrical and Computer Engineering
arXiv:0908.4445v1 [cs.IT] 31 Aug 2009

University of Waterloo, Waterloo, ON, Canada N2L 3G1


Email: x23wu@uwaterloo.ca, llxie@uwaterloo.ca

Abstract

The output distribution, when rate is above capacity, is investigated. It is shown that there is an
asymptotic equipartition property (AEP) of the typical output sequences, independently of the specific
codebook used, as long as the codebook is typical according to the standard random codebook generation.
This equipartition of the typical output sequences is caused by the mixup of input sequences when there
are too many of them, namely, when the rate is above capacity. This discovery sheds some light on the
optimal design of the compress-and-forward relay schemes.

I. I NTRODUCTION
A fundamental observation of Shannon’s channel coding theorem is that using a randomly
generated codebook (i.i.d. generated according to some p0 (x)) at a rate below capacity will lead
to a distribution pattern of the output sequences, by which, a decoding scheme with arbitrarily
low probability of error can be devised.
In this paper, we are interested in the case when the rate is above capacity. We will show
that such a pattern that can be used for decoding will disappear when there are too many input
sequences, i.e., when the rate is above capacity. Instead, in this case, the output will have an
asymptotic equipartition property on the set of typical output sequences (typical with respect to
P
p0 (y) = x p0 (x)p(y|x)). Interestingly, this set is independent of the specific codebook used, as
long as the codebook is typical according to the random codebook generation. The reason for
this equipartition is that the input sequences are too dense, so that different input sequences can
contribute to the same output sequence and get mixed up.

Part of the work [1] was presented at CWIT 2009.


2

Investigating the optimal compress-and-forward relay scheme has motivated this study of output
distribution when rate is above capacity. The optimality of the compress-and-forward schemes
is arguably one of the most critical problems in the development of network information theory,
where ambiguity always arises when decoding cannot be done correctly. In the classical approach
of [2], the compression scheme at the relay was only based on the distribution used for generating
the codebook at the source, instead of the specific codebook generated. While many different
codebooks can be generated according to the same distribution, can the knowledge of the specific
codebook be helpful? There have been some discussions on this issue (e.g., [3]). Here, in this
paper, we show that the observations at the relay are somehow independent of the specific
codebook used at the source, and only depend on the distribution by which the codebook is
generated.
To further explore the optimality of the compress-and-forward schemes, we compare the rates
needed to losslessly compress the relay’s observation in two different scenarios: i) the relay uses
the knowledge of the source’s codebook to do the compression; ii) the relay simply ignores this
knowledge. It is shown that the minimum required rates in both scenarios are the same when the
rate of the source’s codebook is above the capacity of the source-to-relay link.
The remainder of the paper is organized as the following. In Section II, we first introduce
some standard definitions of strongly typical sequences, and then give a definition of typical
codebooks. Then, we summarize our main results in Section III, followed by the proof of these
results in Section IV, V and VI. Finally, as an application of the results, the optimality of the
compress-and-forward schemes is discussed in Section VII.

II. P RELIMINARIES
Consider a discrete memoryless channel (X , p(y|x), Y) with capacity C := maxp(x) I(X; Y ).
Under the random coding framework, a random codebook C with respect to p0 (x) with rate R
and block length n is defined as
C := X n (w) ∈ X n , w = 1, . . . , 2nR ,

(1)
where each codeword in C is an i.i.d. random sequence generated according to a fixed input
distribution p0 (x).
It is well known that information can be transmitted with arbitrarily small probability of error
for sufficiently large n if R < C. In this paper, however, we are interested in the case where the
rate is above capacity.
3

A. Strong Typicality
We begin with some standard definitions on strong typicality [3, Ch.13].
(n)
Definition 2.1: The ǫ-strongly typical set with respect to p0 (x), denoted by Aǫ,0 (X), is the set
of sequences xn ∈ X n satisfying:
1. For all a ∈ X with p0 (a) > 0,

1
N(a|x ) − p0 (a) < ǫ ,
n

n |X |

2. For all a ∈ X with p0 (a) = 0, N(a|xn ) = 0.


N(a|xn ) is the number of occurrences of a in xn .
Similarly, we can define the ǫ-strongly typical set with respect to p0 (y) and denote it by
(n)
Aǫ,0 (Y ).
(n)
Definition 2.2: The ǫ-strongly typical set with respect to p0 (x, y), denoted by Aǫ,0 (X, Y ), is
the set of sequences (xn , y n ) ∈ X n × Y n satisfying:
1. For all (a, b) ∈ X × Y with p0 (a, b) > 0,

1 ǫ
N(a, b|xn , y n ) − p0 (a, b) <

n |X ||Y| ,

2. For all (a, b) ∈ X × Y with p0 (a, b) = 0,

N(a, b|xn , y n ) = 0.

N(a, b|xn , y n) is the number of occurrences of the pair (a, b) in the pair of sequences (xn , y n ).
Definition 2.3: The ǫ-strongly conditionally typical set with the sequence xn with respect to
(n)
the conditional distribution p(y|x), denoted by Aǫ (Y |xn ), is the set of sequences y n ∈ Y n
satisfying:
1. For all (a, b) ∈ X × Y with p(b|a) > 0,
1 1
|N(a, b|xn , y n ) − p(b|a)N(a|xn )| ≤ ǫ(1 + ), (2)
n |Y|
2. For all (a, b) ∈ X × Y with p(b|a) = 0,

N(a, b|xn , y n ) = 0. (3)


4

B. Typical Codebooks
Definition 2.4: For the discrete memoryless channel (X , p(y|x), Y), the channel noise is said
to be ǫ-typical if for any given input xn , the output Y n is ǫ-strongly conditionally typical with
(n)
xn with respect to the channel transition function p(y|x), i.e., Y n ∈ Aǫ (Y |xn ).
Due to the Law of Large Numbers, the channel noise is “typical” with high probability.
(n) n (n) (n) (n)
Index the sequences in Aǫ,0 (Y ) as yǫ,0 (i), i = 1, . . . , Mǫ,0 , where Mǫ,0 = |Aǫ,0 (Y )|. Consider
the set Fǫ,0 (i) ⊆ X n , where each sequence in Fǫ,0 (i) is strongly typical and can reach yǫ,0
n
(i)
over a channel with typical noise, i.e.,
n o
(n)
Fǫ,0(i) := xn ∈ Aǫ,0 (X) : yǫ,0
n
(i) ∈ A(n)
ǫ (Y |xn
) .

The following notation is useful for defining the typical codebooks.


(n)
Pǫ,0 (i) := Pr(X̃ n ∈ Fǫ,0 (i)|X̃ n ∈ Aǫ,0 (X)),
2nR
X
Nǫ,0 (i|C) := I(xn (w) ∈ Fǫ,0 (i)),
w=1

where X̃ n is drawn i.i.d. according to p0 (x) and I(A) is the indicator function:

1 if A holds,
I(A) =
0 otherwise.

Definition 2.5: A codebook

C = xn (w) ∈ X n , w = 1, . . . , 2nR


is said to be ǫ-typical with respect to p0 (x) if

(n)
1) xn (w) ∈ Aǫ,0 (X), ∀w ∈ {1, . . . , 2nR },

n3 R

Nǫ,0 (i|C)
2) sup
2nR − P ǫ,0 (i) ≤
nR
.
(n)
i∈{1,...,M }
2
ǫ,0

III. M AIN R ESULTS


The main results of this paper are summarized by the following three theorems. Their proofs
are presented in Sections IV, V and VI respectively. The application of these results to the relay
channel will be discussed in Section VII.
5

Theorem 3.1: Given that an ǫ-typical codebook C is used and the channel noise is also ǫ-
typical, then,1
. (n)
Pr(Y n = yǫ,0
n
(i)|C) = 2−nH0 (Y ) , ∀i ∈ {1, . . . , Mǫ,0 },

when R > I0 (X; Y ), where both H0 (Y ) and I0 (X; Y ) are calculated according to p0 (x, y) =
p0 (x)p(y|x).
Throughout this paper, we generate the codebook C at random according to p0 (x) and reserve
only the ǫ-strongly typical codewords. Then we have Theorem 3.2 and 3.3.
Theorem 3.2: For any ǫ > 0,

Pr(C is ǫ-typical) → 1 as n → ∞. (4)


Theorem 3.3: Consider the conditional entropy of the channel output given the source’s code-
book information, namely H(Y n |C). We have

1 H0 (Y ) when R > I0 (X; Y ),
lim H(Y n |C) =
n→∞ n R + H (Y |X)
0 when R < I0 (X; Y ),

where H0 (Y ), I0 (X; Y ) and H0 (Y |X) are all calculated according to p0 (x, y) = p0 (x)p(y|x).
In contrast, without the codebook information, we have
1
lim H(Y n ) = H0 (Y ) for any R > 0.
n→∞ n

IV. AEP OF T YPICAL O UTPUT S EQUENCES


Essentially, Theorem 3.1 states that there exists an asymptotic equipartition property of the
typical output sequences, irrespective of the specific codebook used, as long as the codebook is
a typical codebook. To prove this theorem, we first introduce two lemmas.
(n)
Lemma 4.1: Let Eǫ denote the event that the output Y n ∈ Aǫ (Y |xn ) for any given input xn .
For any xn ∈ Fǫ,0 (i),

Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn ) ≥ 2−n(H0 (Y |X)+ǫ0 )
and Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn ) ≤ 2−n(H0 (Y |X)−ǫ0 ) ,

where H0 (Y |X) is calculated according to p0 (x, y) = p0 (x)p(y|x) and ǫ0 goes to 0 as ǫ → 0


and n → ∞.
. . .
1 1 an
Same as the notation in [4], we say an = bn if limn→∞ n
log bn
= 0. “≥” and “≤” have similar interpretations.
6

(n)
Proof: By the definition of Fǫ,0 (i), we have for any xn in Fǫ,0 (i), xn ∈ Aǫ,0 (X) and
n (n)
yǫ,0 (i) ∈ Aǫ (Y |xn ). Then, it follows from the definition of strong typicality that (xn , yǫ,0
n
(i)) ∈
(n)
Aǫ′ ,0 (X, Y ), where ǫ′ → 0 as ǫ → 0. Since strong typicality implies weak typicality, for any xn
in Fǫ,0 (i), we have

1
− log p(xn ) − H0 (X) < ǫ′′ ,

n

1
− log p(xn , y n (i)) − H0 (X, Y ) < ǫ′′ ,

n ǫ,0

where ǫ′′ → 0 as ǫ → 0. Thus,



1
− log p(y n (i)|xn ) − H0 (Y |X) < 2ǫ′′ ,

n ǫ,0
and
′′ ′′
2−n(H0 (Y |X)+2ǫ ) ≤ p(yǫ,0
n
(i)|xn ) ≤ 2−n(H0 (Y |X)−2ǫ ) .

Therefore, for any xn ∈ Fǫ,0(i), we have

Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn )
Pr(Y n = yǫ,0n
(i), Eǫ , X n = xn )
=
Pr(Eǫ , X n = xn )
Pr(Y n = yǫ,0 n
(i), X n = xn )
=
Pr(Eǫ |X n = xn )Pr(X n = xn )
n
p(yǫ,0 (i)|xn )
=
Pr(Eǫ |X n = xn )
n
=(1 + o(1))p(yǫ,0 (i)|xn )
′′ )
≤(1 + o(1))2−n(H0 (Y |X)−2ǫ
=2−n(H0 (Y |X)−ǫ0 ) ,
log(1+o(1))
where ǫ0 := 2ǫ′′ + n
and ǫ0 → 0 as ǫ → 0 and n → ∞. Similarly, for any xn ∈ Fǫ,0(i),
we have

Pr(Y n = yǫ,0
n
(i)|Eǫ , X n = xn )
n
=(1 + o(1))p(yǫ,0 (i)|xn )
′′ − log(1+o(1)) )
≥2−n(H0 (Y |X)+2ǫ n

≥2−n(H0 (Y |X)+ǫ0 ) ,
7

which finishes the proof of Lemma 4.1.


(n)
Lemma 4.2: If C is a typical codebook, then for any i ∈ {1, . . . , Mǫ,0 },

Nǫ,0(i|C) ≥ 2nR · 2−n(I0 (X;Y )+ǫ0 ) − n3 R

and Nǫ,0(i|C) ≤ 2nR · 2−n(I0 (X;Y )−ǫ0 ) + n3 R,

where I0 (X; Y ) is calculated according to p0 (x)p(y|x) and ǫ′0 goes to 0 as ǫ → 0 and n → ∞.


Proof: To prove Lemma 4.2, we need the following standard result on strong typicality
(see Lemma 13.6.2 in [4]):
(n)
Let X̃ n be drawn i.i.d. according to p0 (x) = p0 (x, y). For y n ∈ Aǫ,0 (Y ),
P
y

(n)
Pr((X̃ n , y n ) ∈ Aǫ,0 (X, Y )) ≥ 2−n(I0 (X;Y )+ǫ1 ) (5)
(n)
and Pr((X̃ n , y n ) ∈ Aǫ,0 (X, Y )) ≤ 2−n(I0 (X;Y )−ǫ1 ) , (6)

where I0 (X; Y ) is calculated according to p0 (x, y) and ǫ1 goes to 0 as ǫ → 0 and n → ∞.


According to the definition of Pǫ,0(i),
n (n)
Pǫ,0(i) = Pr(yǫ,0 (i) ∈ Aǫ(n) (Y |X̃ n )|X̃ n ∈ Aǫ,0 (X)),

where X̃ n is drawn i.i.d. according to p0 (x).


(n) (n) (n)
Since X̃ n ∈ Aǫ,0 (X) and yǫ,0
n
(i) ∈ Aǫ (Y |X̃ n ) imply that (X̃ n , yǫ,0
n
(i)) ∈ Aǫ′ ,0 (X, Y ), where
ǫ′ goes to 0 as ǫ → 0, we have
(n) (n)
Pǫ,0(i) ≤Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ′ ,0 (X, Y )|X̃ n ∈ Aǫ,0 (X))
(n) (n)
Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ′ ,0 (X, Y ), X̃ n ∈ Aǫ,0 (X))
= (n)
Pr(X̃ n ∈ Aǫ,0 (X))
(n)
≤(1 + o(1))Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ′ ,0 (X, Y ))

≤(1 + o(1))2−n(I0(X;Y )−ǫ1 )
′ log(1+o(1))
=2−n(I0 (X;Y )−ǫ1 − n
)


=2−n(I0 (X;Y )−ǫ2 ) (7)
log(1+o(1))
where ǫ′2 := ǫ′1 + n
and ǫ′2 → 0 as ǫ → 0 and n → ∞.
Furthermore, by the standard definitions of strong typicality, it follows that (xn , yǫ,0
n
(i)) ∈
(n) (n) (n)
Aǫ,0 (X, Y ) implies xn ∈ Aǫ,0 (X). Now, we show (xn , yǫ,0
n n
(i)) ∈ Aǫ,0 (X, Y ) also implies yǫ,0 (i) ∈
(n) (n)
Aǫ (Y |xn ). Suppose (xn , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ). Then, we have
8

1) For all (a, b) ∈ X × Y with p(b|a) = 0, p0 (a, b) = 0 and N(a, b|xn , yǫ,0
n
(i)) = 0.
2) For all (a, b) ∈ X ×Y with p(b|a) > 0 and p0 (a) = 0, p0 (a, b) = 0 and N(a, b|xn , yǫ,0
n
(i)) =
0, as well as N(a|xn ) = 0.
3) For all (a, b) ∈ X × Y with p(b|a) > 0 and p0 (a) > 0, p0 (a, b) > 0 and

1
N(a|x ) − p0 (a) < ǫ ,
n

n |X |

1 n n
ǫ
N(a, b|x , y (i)) − p0 (a, b) <
n ǫ,0 |X ||Y| .
Thus,

1
N(a, b|xn , y n (i)) − 1 N(a|xn )p(b|a)

n ǫ,0
n
ǫ ǫ
<p0 (a, b) + − p(b|a)(p0 (a) − )
|X ||Y| |X |
ǫ ǫ
= + p(b|a)
|X ||Y| |X |
ǫ
≤ +ǫ
|Y|
1
=ǫ(1 + ).
|Y|
(n) (n) (n)
Therefore, (xn , yǫ,0
n n
(i)) ∈ Aǫ,0 (X, Y ) implies that yǫ,0 (i) ∈ Aǫ (Y |xn ), as well as xn ∈ Aǫ,0 (X).
Then, we have
n (n)
Pǫ,0(i) =Pr(yǫ,0 (i) ∈ A(n) n n
ǫ (Y |X̃ )|X̃ ∈ Aǫ,0 (X))
(n) (n)
≥Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y )|X̃ n ∈ Aǫ,0 (X))
(n) (n)
Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ), X̃ n ∈ Aǫ,0 (X))
= (n)
Pr(X̃ n ∈ Aǫ,0 (X))
(n)
=(1 + o(1))Pr((X̃ n , yǫ,0
n
(i)) ∈ Aǫ,0 (X, Y ))
≥(1 + o(1))2−n(I0(X;Y )+ǫ1 )
log(1+o(1))
=2−n(I0 (X;Y )+ǫ1 − n
)

=2−n(I0 (X;Y )+ǫ2 ) (8)


log(1+o(1))
where ǫ2 := ǫ1 − n
and ǫ2 → 0 as ǫ → 0 and n → ∞.
Let ǫ′0 = max{ǫ2 , ǫ′2 }. Combining (7) and (8), we have
′ ′
2−n(I0 (X;Y )+ǫ0 ) ≤ Pǫ,0 (i) ≤ 2−n(I0 (X;Y )−ǫ0 ) . (9)
9

Therefore, if C is a typical codebook, by the definition of the typical codebooks and (9), for
(n)
any i ∈ {1, . . . , Mǫ,0 },

Nǫ,0(i|C) ≥ 2nR · 2−n(I0 (X;Y )+ǫ0 ) − n3 R

and Nǫ,0(i|C) ≤ 2nR · 2−n(I0 (X;Y )−ǫ0 ) + n3 R,
where I0 (X; Y ) is calculated according to p0 (x)p(y|x) and ǫ′0 goes to 0 as ǫ → 0 and n → ∞.

(n)
Proof: [Proof of Theorem 3.1] Let Eǫ denote the event Y n ∈ Aǫ (Y |xn ) for any given
(n)
input xn . Consider Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical) for any i ∈ {1, . . . , Mǫ,0 }. We lower bound
this probability as follows:
Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical)
2nR
X
= Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical, X n = xn (w))
w=1

· Pr(X n = xn (w)|Eǫ , C is typical) (10)


2nR
1 X
= Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical, X n = xn (w)) (11)
2nR w=1
1 X
= Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical, X n = xn (w)) (12)
2nR
xn (w)∈F ǫ,0 (i)

1
≥ Nǫ,0 (i|C) · 2−n(H0 (Y |X)+ǫ0 ) (13)
2nR
1 ′
≥ (2nR · 2−n(I0 (X;Y )+ǫ0 ) − n3 R) · 2−n(H0 (Y |X)+ǫ0 ) (14)
2nR
n3 R
 
′ ′
=2−n(H0 (Y )+ǫ0 +ǫ0 ) · 1 − nR · 2n(I0 (X;Y )+ǫ0 ) .
2
(10) follows from the Law of Total Probability and accumulates the contributions from all the
n
codewords in the codebook to the probability for yǫ,0 (i) to be channel output.
(11) follows from the uniform distribution of message index W .
(12) follows from the the condition Eǫ and the fact that C contains only strongly typical
codewords.
(13) follows from Lemma 4.1.
(14) follows from Lemma 4.2.
(n)
Let ǫ → 0 as n → ∞. Then for any i ∈ {1, . . . , Mǫ,0 },
.
Pr(Y n = yǫn (i)|Eǫ , C is typical) ≥ 2−nH0 (Y ) , (15)
10

when R > I0 (X; Y ).


Similarly, following (12), by Lemmas 4.1 and 4.2, we have

Pr(Y n = yǫ,0n
(i)|Eǫ , C is typical)
1
≤ nR Nǫ,0 (i|C) · 2−n(H0 (Y |X)−ǫ0 )
2
1 ′
≤ nR (2nR · 2−n(I0 (X;Y )−ǫ0 ) + n3 R) · 2−n(H0 (Y |X)−ǫ0 )
2
n3 R n(I0 (X;Y )−ǫ′0 )
 
−n(H0 (Y )−ǫ0 −ǫ′0 )
=2 · 1 + nR · 2 .
2
(n)
Therefore, for any i ∈ {1, . . . , Mǫ,0 },
.
Pr(Y n = yǫ,0
n
(i)|Eǫ , C is typical) ≤ 2−nH0 (Y ) , (16)

when R > I0 (X; Y ). Combining (15) and (16), we establish Theorem 3.1.

V. T HE P ROBABILITY THAT A T YPICAL C ODEBOOK A PPEARS


In this section, we will show that with high probability, a typical codebook will be generated
by the random codebook generation. We begin with some relevant definitions and the Vapnik-
Chervonenkis Theorem [5], [6]:
A Range Space is a pair (X, F ), where X is a set and F is a family of subsets of X. For
any A ⊆ X, we define PF (A), the projection of F on A, as {F ∩ A : F ∈ F }. We say that A
is shattered by F if PF (A) = 2A , i.e., if the projection of F on A includes all possible subsets
of A. The VC-dimension of F , denoted by VC-d(F ) is the cardinality of the largest set A that
F shatters. If arbitrarily large finite sets are shattered, the VC dimension of F is infinite.
The Vapnik-Chervonenkis Theorem: If F is a set of finite VC-dimension and {Yj } is a sequence
of n i.i.d. random variables with common probability distribution P , then for every ǫ, δ > 0
( )
1 X n
Pr sup I(Yj ∈ F ) − P (F ) ≤ ǫ > 1 − δ (17)

F ∈F n
j=1

whenever  
8VC-d(F ) 16e 4 2
n > max log2 , log2 . (18)
ǫ ǫ ǫ δ
(n)
Let Fǫ,0 = {Fǫ,0 (i), i = 1, . . . , Mǫ,0 }. To show Theorem 3.2, a finite VC dimension of Fǫ,0
is desired in order to employ the Vapnik-Chervonenkis Theorem. For this reason, we introduce
Lemma 5.1.
11

Lemma 5.1: For a fixed block length n, VC-d(Fǫ,0 ) ≤ n(H0 (Y ) + ǫ′ ), where ǫ′ → 0 as ǫ → 0.


(n) ′
Proof: By the Asymptotic Equipartition Property, |Fǫ,0| = Mǫ,0 ≤ 2n(H0 (Y )+ǫ ) , where ǫ′ → 0
as ǫ → 0. Thus, for any A ⊆ X n ,

|{Fǫ,0(i) ∩ A : Fǫ,0 (i) ∈ Fǫ,0}| ≤ 2n(H0 (Y )+ǫ ) ,

and hence VC-d(Fǫ,0) ≤ n(H0 (Y ) + ǫ′ ).


Proof: [Proof of Theorem 3.2] Since we reserve only the ǫ-strongly typical codewords
when generating the codebook, for any random codebook, the first condition in Definition 2.5
is obviously satisfied. Below, we focus on showing that a random codebook satisfies the second
condition in Definition 2.5 with high probability.
For the given p0 (x), consider all the codewords in a random codebook, X n (w), w = 1, . . . , 2nR .
(n)
They are generated with the common distribution p(xn ) = Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)), where
X̃ n is drawn i.i.d. according to p0 (x). Since VC-d(Fǫ,0 ) is finite for a fixed n, we employ the
Vapnik-Chervonenkis Theorem under the range space (X n , Fǫ,0). To satisfy (18), let both ǫ and δ
∆ǫ nR
in (17) be 2nR
, where ∆ǫ := max{8VC-d(Fǫ,0), 16e}. Then the Vapnik-Chervonenkis Theorem
states that
( )
Nǫ,0 (i|C) ∆ǫ nR
Pr sup − Pǫ,0 (i) ≤ nR
Fǫ,0 (i)∈Fǫ,0 2nR 2
∆ǫ nR
≥1 −
2nR
→1 as n → ∞, (19)
P2nR n n3 R ∆ǫ nR
where Nǫ,0 (i|C) = w=1 I(X (w) ∈ Fǫ,0 (i)). Since 2nR
≥ 2nR
for sufficiently large n, (19)
concludes the proof of Theorem 3.2.

VI. P ROOF OF T HEOREM 3.3


Before proceeding to the proof of Theorem 3.3, we first introduce Lemma 6.1, which will
facilitate the later discussions. The proof of Lemma 6.1 is given in Appendix I.
Lemma 6.1: For the channel (X , p(y|x), Y), generate the codebook at random according to
p0 (x) and reserve only the ǫ-strongly typical codewords. The channel input and output X n and
Y n satisfy that
(n)
1) Pr((X n , Y n ) ∈ Aǫ,0 (X, Y )) → 1 as n → ∞, for any ǫ > 0;
12

2) limn→∞ n1 H(X n ) = H0 (X), limn→∞ n1 H(Y n ) = H0 (Y ), and limn→∞ n1 H(X n , Y n ) =


H0 (X, Y ).
Remark 6.1: Since we reserve only the ǫ-typical codewords when generating the codebook,
generally, the channel input X n is no longer an i.i.d. random process. However, Lemma 6.1
essentially states that the random process (X n , Y n ) still satisfies the joint asymptotic equipartition
property and furthermore, the entropy rates of the random processes X n , Y n and (X n , Y n ) can
still be simply expressed in the single letter form respectively. This observation will facilitate
our later discussions.
Proof: [Proof of Theorem 3.3] We prove Theorem 3.3 by characterizing limn→∞ n1 H(Y n |C)
in two different cases: when R > I0 (X; Y ) and when R < I0 (X; Y ), respectively.

A. When R > I0 (X; Y )


Define an indicator random variable E as

E := I(Eǫ ),
(n)
where Eǫ denotes the event Y n ∈ Aǫ (Y |xn ) for any given input xn .
13

When R > I0 (X; Y ), we have


H(Y n |C)
≥H(Y n |E, C) (20)
=Pr(E = 1)H(Y n |E = 1, C) + Pr(E = 0)H(Y n |E = 0, C)
≥Pr(E = 1) · H(Y n |E = 1, C)
=(1 − o(1)) · H(Y n |E = 1, C) (21)
X
=(1 − o(1)) · p(C) · H(Y n |E = 1, C = C)
C
X
≥(1 − o(1)) · p(C) · H(Y n |E = 1, C = C)
C is typical
!
X X 1
=(1 − o(1)) · p(C) · p(y n |Eǫ , C) log n
C is typical yn
p(y |Eǫ , C)
 
X X 1
≥(1 − o(1)) · p(C) ·  p(y n |Eǫ , C) log
 
p(y n |Eǫ , C)

C is typical (n)
y n ∈Aǫ,0 (Y )
 
X X ∗ 
≥(1 − o(1)) · p(C) ·  p(y n |Eǫ , C) log 2n[H0 (Y )−ǫ ]  (22)

C is typical (n)
y n ∈Aǫ,0 (Y )
 
X X
=n[H0 (Y ) − ǫ∗ ] · (1 − o(1)) · p(C) ·  p(y n |Eǫ , C)
 
C is typical (n)
y n ∈Aǫ,0 (Y )
(n)
X
=n[H0 (Y ) − ǫ∗ ] · (1 − o(1)) · p(C) · Pr(Y n ∈ Aǫ,0 (Y )|Eǫ , C)
C is typical
(n)
X
=n[H0 (Y ) − ǫ∗ ] · (1 − o(1)) · p(C|Eǫ ) · Pr(Y n ∈ Aǫ,0 (Y )|Eǫ , C)
C is typical
(n)
=n[H0 (Y ) − ǫ ] · (1 − o(1)) · Pr(Y n ∈ Aǫ,0 (Y ), C is typical|Eǫ )

=n[H0 (Y ) − ǫ∗ ] · (1 − o(1)) · (1 − o(1)) (23)


=n[H0 (Y ) − ǫ∗ ] · (1 − o(1))
(20) follows from the fact that conditioning reduces entropy.
(21) follows from the fact that Pr(Eǫ ) → 1 as n → ∞, for any ǫ > 0.
∗]
(22) follows from Theorem 3.1, which upper bounds p(y n |Eǫ , C) by 2−n[H0(Y )−ǫ for any
(n)
y n ∈ Aǫ,0 (Y ) and typical C, where ǫ∗ → 0 as n → ∞.
14

(23) follows from the fact that


(n)
Pr(Y n ∈ Aǫ,0 (Y ), C is typical|Eǫ ) → 1 as n → ∞.

This can be seen from the following.


(n)
Pr(Y n ∈ Aǫ,0 (Y ), C is typical|Eǫ )
(n)
Pr(Y n ∈ Aǫ,0 (Y ), Eǫ , C is typical)
=
Pr(Eǫ )
(n)
Pr((X n , Y n ) ∈ Aǫ,0 (X, Y ), Eǫ , C is typical)
≥ . (24)
Pr(Eǫ )
(n)
Since Pr(Eǫ ), Pr(C is typical) and Pr((X n , Y n ) ∈ Aǫ,0 (X, Y )) all go to 1, obviously both the
numerator and denominator of (24) go to 1 as n → ∞. Thus,
(n)
Pr(Y n ∈ Aǫ,0 (Y ), C is typical|Eǫ ) → 1 as n → ∞.

Therefore, when R > I0 (X; Y ),


1
lim inf H(Y n |C)
n→∞ n
1
≥ lim inf (n[H0 (Y ) − ǫ∗ ] · (1 − o(1)))
n→∞ n

= lim inf [H0 (Y ) − ǫ∗ ] · (1 − o(1))


n→∞

=H0 (Y ). (25)

Furthermore,
1 1
lim sup H(Y n |C) ≤ lim sup H(Y n ) = H0 (Y ), (26)
n→∞ n n→∞ n

where the last equality follows from Lemma 6.1.


Combining (25) and (26), we have that when R > I0 (X; Y ),
1
lim H(Y n |C) = H0 (Y ).
n→∞ n

B. When R < I0 (X; Y )


To find limn→∞ n1 H(Y n |C) when R < I0 (X; Y ), we first introduce two lemmas. The proofs
of these two lemmas are given at the end of this section.
15

Lemma 6.2: When R < I0 (X; Y ),


1
H(X n |C, Y n ) → 0, as n → ∞.
n
Lemma 6.3:
1
lim H(X n |C) = R.
n→∞ n
Now, expanding H(X n , Y n |C) in two different ways, we have

H(X n , Y n |C) =H(X n |C) + H(Y n |X n , C)


=H(Y n |C) + H(X n |C, Y n ),

and thus

H(Y n |C) = H(X n |C) + H(Y n |X n , C) − H(X n |C, Y n ).

Therefore, when R < I0 (X; Y ),


1 1 1 1
lim H(Y n |C) = lim H(X n |C) + lim H(Y n |X n , C) − lim H(X n |C, Y n )
n→∞ n n→∞ n n→∞ n n→∞ n
1
=R + lim H(Y n |X n , C) (27)
n→∞ n
1
=R + lim H(Y n |X n ) (28)
n→∞ n
1
=R + lim [H(X n , Y n ) − H(X n )]
n→∞ n

=R + H0 (X, Y ) − H0 (X) (29)


=R + H0 (Y |X),

where (27) follows from Lemma 6.2 and 6.3, (28) follows from the fact that C → X n → Y n
forms a Markov Chain, and (29) follows from Lemma 6.1. This completes the proof of Theorem
3.3.
Proof: [Proof of Lemma 6.2] To prove Lemma 6.2, we begin with Fano’s Inequality (see
Theorem 2.11.1 in [4]):
Let Pe = Pr(g(Y ) 6= X), where g is any function of Y . Then

1 + Pe log |X | ≥ H(X|Y ). (30)

For the channel (X , p(y|x), Y) with a codebook C, we estimate the message index W from
(n)
Y n . Let the estimate be Ŵ = g(Y n ) and Pe (C) = Pr(W 6= g(Y n )|C). Then, applying Fano’s
Inequality, we have

H(W |Y n , C) ≤ 1 + Pe(n) (C) log 2nR = 1 + Pe(n) (C)nR.


16

Since given C, X n is a function of W , say X n = X n (W ), we have

H(X n |Y n , C) ≤ H(W |Y n , C) ≤ 1 + Pe(n) (C)nR.

Then,
X X
H(X n |Y n , C) = p(C)H(W |Y n , C) ≤ p(C)(1 + Pe(n) (C)nR).
C C

Recall the channel coding theorem, which states that if we randomly generate the codebook
according to p0 (x), then when R < I0 (X; Y ),
X
p(C)Pe(n) (C) → 0. (31)
C

Therefore, when R < I0 (X; Y ),


1 1X
lim sup H(X n |Y n , C) ≤ lim sup p(C)[1 + Pe(n) (C)nR]
n→∞ n n→∞ n C
1 X
= lim sup [1 + nR p(C)Pe(n) (C)]
n→∞ n
C
1 X
= lim sup + lim sup R p(C)Pe(n) (C)
n→∞ n n→∞
C

=0.

Furthermore, it is obvious that n1 H(X n |Y n , C) ≥ 0 and hence


1
H(X n |C, Y n ) → 0, as n → ∞,
n
when R < I0 (X; Y ).
Proof: [Proof of Lemma 6.3] Given any C, X n is a function of W . Thus, H(X n |C) ≤
H(W |C) = nR, and
1 1X
H(X n |C) = p(C)H(X n |C) ≤ R. (32)
n n C

Therefore, to show Lemma 6.3, it suffices to show that limn→∞ n1 H(X n |C) ≥ R. For this
purpose, we first define a class of codebooks as regular codebooks and focus on characterizing
H(X n |C) for a regular codebook C. Then, we show that a regular codebook appears with high
probability when we randomly generate the codebook, and conclude that limn→∞ n1 H(X n |C) ≥
R.
17

We say a codebook C is regular if


N(xn |C) n3 R

n
sup − p(x ) ≤ 2nR ,
(n)
xn ∈A (X)
2nR
ǫ,0

where N(x |C) is the number of occurrences of xn in C, defined by


n

2nR
X
n
N(x |C) = I(xn (w) = xn ),
w=1
(n)
and p(xn ) = Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) where X̃ n is drawn i.i.d. according to p0 (x).
(n)
Given a regular C, for any xn ∈ Aǫ,0 (X), we have

N(xn |C) ≤2nR p(xn ) + n3 R


(n)
=2nR Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) + n3 R

≤2nR (1 + o(1))2−n(H0 (X)−ǫ ) + n3 R (33)
=n3 R + o(1), (34)

where the ǫ′ in (33) goes to 0 as ǫ → 0 and (34) follows from the general assumption that
R < H0 (X). Note that the message index W is uniformly distributed, we have for a given C
(n)
and any xn ∈ Aǫ,0 (X),
P2nR n
n w=1 I(x (w) = xn )
p(x |C) =
2nR
n
N(x |C)
=
2nR
3
n R + o(1)

2nR
′′
=:2−n(R−ǫ )

where ǫ′′ goes to 0 as n → ∞. Therefore,


X 1
H(X n |C) = p(xn |C) log
(n)
p(xn |C)
xn ∈Aǫ,0 (X)
X ′′ )
≥ p(xn |C) log 2n(R−ǫ
(n)
xn ∈Aǫ,0 (X)
X
=[n(R − ǫ′′ )] p(xn |C)
(n)
xn ∈Aǫ,0 (X)

=[n(R − ǫ′′ )].


18

Below, We use the Vapnik-Chervonenkis Theorem to show that a regular codebook appears
with high probability.
(n) (n)
Let B = {{xn }, xn ∈ Aǫ,0 (X)}. Since |B| = |Aǫ,0 (X)| ≤ 2n(H0 (X)+ǫ) , for any A ⊆ X n ,
(n)
|{{xn } ∩ A : xn ∈ Aǫ,0 (X)}| ≤ 2n(H0 (Y )+ǫ) ,

and hence VC-d(B) ≤ n(H0 (X) + ǫ).


Since VC-d(B) is finite for a fixed n, we employ the Vapnik-Chervonenkis Theorem under
∆ǫ nR
the range space (X n , B). To satisfy (18), let both ǫ and δ in (17) be 2nR
, where ∆ǫ :=
max{8VC-d(B), 16e}. Then the Vapnik-Chervonenkis Theorem states that
 
N(xn |C)


n
∆ǫ nR 
Pr sup − p(x ) ≤
xn ∈A(n) (X) 2nR 2nR 
ǫ,0

∆ǫ nR
≥1 − nR
2
→1 as n → ∞. (35)
n3 R ∆ǫ nR
Since 2nR
≥ 2nR
for sufficiently large n, (35) concludes that Pr(C is regular) → 1 as n → ∞.
Therefore,
X
H(X n |C) = p(C)H(X n |C)
C
X
≥ p(C)H(X n |C)
C is regular
X
≥[n(R − ǫ′′ )] p(C)
C is regular

=[n(R − ǫ′′ )](1 − o(1)),

and
1 1
lim H(X n |C) ≥ lim [n(R − ǫ′′ )](1 − o(1))
n→∞ n n→∞ n

= lim (R − ǫ′′ )(1 − o(1))


n→∞

=R. (36)

Combining (32) and (36), we finish the proof of Lemma 6.3.


19

VII. R ATE N EEDED TO C OMPRESS R ELAY ’ S O BSERVATION


To study the optimality of the compress-and-forward strategy, in this section, we investigate
the rate needed for the relay to losslessly compress its observation. In the classical approach of
[2], the compression scheme at the relay was only based on the distribution used for generating
the codebook at the source, without being specific on the codebook generated. However, since
both the relay and destination have the knowledge of the exact codebook used at the source, it
is natural to ask whether it is beneficial for the relay to compress its observation based on this
codebook information. This question motivates us to compare the rates needed to compress the
relay’s observation in two different scenarios: when the relay uses the knowledge of the source’s
codebook and when the relay simply ignores this knowledge.
Specifically, we consider the two compression problems shown in Figure 1, where Y n is
generated from X n through the channel (X , p(y|x), Y), and C in (b) is the source’s codebook
information available to both the encoder and decoder. Interestingly, we will show that to perfectly
recover Y n , the minimum required rates in both scenarios are the same when the rate R associated
with C is greater than the channel capacity.

Yn R1 Yn

Yn R2 Yn

Fig. 1. Two scenarios where the relay compresses its observation.

Formally, we have the following theorem:


Theorem 7.1: For the discrete memoryless channel (X , p(y|x), Y), generate the codebook at
random according to p0 (x) and reserve only the ǫ-strongly typical codewords. Let C be the
source’s codebook with rate R, and X n and Y n be the input and output of the channel respectively.
When R > I0 (X; Y ), to compress the channel output Y n , we have
20

1) Y n can be encoded at rate R1 and recovered with arbitrarily low probability of error if
R1 > H0 (Y ).
2) Given that the source’s codebook information C is available to both the encoder and decoder
and Y n is encoded at rate R2 , the decoding probability of error will be bounded away from zero
if R2 < H0 (Y ), which implies that we cannot compress the channel output better even if the
source’s codebook information is employed.
To show Theorem 7.1, we need the following lemma.
Lemma 7.1: For the compression problem in Figure 1-(b), we can encode Y n at rate R2 and
(n)
recover it with the probability of error Pe → 0 only if
1
H(Y n |C).
R2 ≥ lim (37)
n n→∞
Proof: [Proof of Lemma 7.1] The source code for Figure 1-(b) consists of an encoder
(n)
mapping f (Y n , C) and a decoder mapping g(f (Y n , C), C). Let I = f (Y n , C), then Pe =
(n)
Pr(g(I, C) 6= Y n ). By Fano’s Inequality, for any source code with Pe → 0, we have

H(Y n |I, C) ≤ Pe(n) log |Y n | + 1 = Pe(n) n log |Y| + 1 = nǫn , (38)

where ǫn → 0 as n → ∞.
(n)
Therefore, for any source code with rate R2 and Pe → 0, we have the following chain of
inequalities

nR2 ≥ H(I) (39)


≥ H(I|C)
= H(Y n , I|C) − H(Y n |I, C)
= H(Y n |C) + H(I|Y n , C) − H(Y n |I, C)
= H(Y n |C) − H(Y n |I, C) (40)
≥ H(Y n |C) − nǫn (41)

where (39) follows from the fact that I ∈ {1, 2, . . . , 2nR2 }, (40) follows from the fact that I is a
function of Y n and C, and (41) follows from (38). Dividing the inequality nR2 ≥ H(Y n |C)−nǫn
by n and taking the limit as n → ∞, we establish Lemma 7.1.
Proof: [Proof of Theorem 7.1]
Proof of Part 1): To show Part 1), we only need to show that the sequence Y n satisfies the
(n)
Asymptotic Equipartition Property, i.e., Pr(Y n ∈ Aǫ,0 (Y )) → 1, as n → ∞. Then, following the
21

classical approach to show the source coding theorem, we can conclude that the rate R1 > H0 (Y )
(n)
is achievable. By Lemma 6.1, Pr((X n , Y n ) ∈ Aǫ,0 (X, Y )) → 1 as n → ∞. Thus, the sequence
Y n satisfies the Asymptotic Equipartition Property and the rate R1 > H0 (Y ) is achievable.
Proof of Part 2): By Lemma 7.1, given that the codebook information C is available to both the
(n)
encoder and decoder and Y n is encoded at rate R2 , Pe → 0 only if R2 ≥ limn→∞ n1 H(Y n |C).
By Theorem 3.3, limn→∞ n1 H(Y n |C) = H0 (Y ) when R > I0 (X; Y ). Therefore, when R >
(n)
I0 (X; Y ), Pe → 0 only if R2 ≥ H0 (Y ), which establishes Part 2).

A PPENDIX I
P ROOF OF L EMMA 6.1
Proof of Part 1): Let X̃ n be drawn i.i.d. according to p0 (x) and Ỹ n be generated from X̃ n
through the channel (X , p(y|x), Y). Then, we have
(n)
Pr((X n , Y n ) ∈ Aǫ,0 (X, Y ))
X
= p(xn )p(y n |xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) · Pr(Y n = y n |X n = xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr(X̃ n = xn |X̃ n ∈ Aǫ,0 (X)) · Pr(Ỹ n = y n |X̃ n = xn )
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n)
X
= Pr((X̃ n , Ỹ n ) = (xn , y n )|X̃ n ∈ Aǫ,0 (X))
(n)
(xn ,y n )∈Aǫ,0 (X,Y )
(n) (n)
=Pr((X̃ n , Ỹ n ) ∈ Aǫ,0 (X, Y )|X̃ n ∈ Aǫ,0 (X))
(n)
Pr((X̃ n , Ỹ n ) ∈ Aǫ,0 (X, Y ))
= (n)
Pr(X̃ n ∈ Aǫ,0 (X))
→1, as n → ∞.

Proof of Part 2): Denote the ǫ-weakly typical sets with respect to p0 (x), p0 (y) and p0 (x, y) by
(n) (n) (n)
Wǫ,0 (X), Wǫ,0 (Y ) and Wǫ,0 (X, Y ) respectively. Along the same line as in the proof of part 1),
(n) (n)
we can prove that Pr((X n , Y n ) ∈ Wǫ,0 (X, Y )) → 1 as n → ∞, and hence Pr(X n ∈ Wǫ,0 (X))
(n)
and Pr(Y n ∈ Wǫ,0 (Y )) both go to 1 as n → ∞.
22

Now, consider H(Y n ). We have


X 1
H(Y n ) = p(y n ) log
yn
p(y n )
X 1 X 1
= p(y n ) log + p(y n ) log
(n)
p(y n ) (n)
p(y n )
y n ∈Wǫ,0 (Y ) y n ∈W
/ ǫ,0 (Y )

= : φ1 + φ2

For φ1 , we have
X 1
φ1 = p(y n ) log
(n)
p(y n )
y n ∈Wǫ,0 (Y )
X
≤ p(y n ) log 2n(H0 (Y )+ǫ)
(n)
y n ∈Wǫ,0 (Y )
(n)
=n(H0 (Y ) + ǫ)Pr(Y n ∈ Wǫ,0 (Y ))
=n(H0 (Y ) + ǫ)(1 − o(1)),
(n)
where the inequality follows from the fact that p(y n ) ≥ 2−n(H0 (Y )+ǫ) for any y n ∈ Wǫ,0 (Y ).
For φ2 , we have
X 1
φ2 = p(y n ) log
(n)
p(y n )
y n ∈W
/ ǫ,0 (Y )
X
=− p(y n ) log p(y n )
(n)c
y n ∈Wǫ,0 (Y )
 
p(y n )
P
(n)c
X
n y n ∈Wǫ,0 (Y )
≤− p(y ) log (42)
 
(n)c
(n)c |Wǫ,0 (Y )|
y n ∈Wǫ,0 (Y )
(n)
n (n) Pr(Y n ∈
/ Wǫ,0 (Y ))
= − Pr(Y ∈
/ Wǫ,0 (Y )) log (n)c
|Wǫ,0 (Y )|
(n) (n) (n) (n)c
= − Pr(Y n ∈
/ Wǫ,0 (Y )) log Pr(Y n ∈
/ Wǫ,0 (Y )) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Wǫ,0 (Y )|
(n) (n)c
=o(1) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Wǫ,0 (Y )| (43)
(n)
≤o(1) + Pr(Y n ∈
/ Wǫ,0 (Y )) log |Y|n
(n)
=o(1) + n · Pr(Y n ∈
/ Wǫ,0 (Y )) log |Y|
=n · o(1). (44)
23

(42) follows from the the log sum inequality (see Theorem 2.7.1 in [4]), which states that for
non-negative numbers, a1 , a2 , . . . , an and b1 , b2 , . . . , bn ,
n n
! Pn
X ai X ai
ai log ≥ ai log Pi=1
n
i=1
bi i=1 i=1 bi
ai
with equality if and only if bi
are equal for all i.
(n)
(43) and (44) both follow from the fact that Pr(Y n ∈ Wǫ,0 (Y )) → 1 as n → ∞.
Therefore,

H(Y n ) =φ1 + φ2
≤n(H0 (Y ) + ǫ)(1 − o(1)) + n · o(1)
=n(H0 (Y ) + ǫ)(1 − o(1)). (45)

Similarly, we have
X 1
H(Y n ) ≥ p(y n ) log
(n)
p(y n )
y n ∈Wǫ,0 (Y )
X
≥ p(y n ) log 2n(H0 (Y )−ǫ)
(n)
y n ∈Wǫ,0 (Y )
(n)
=n(H0 (Y ) − ǫ)Pr(Y n ∈ Wǫ,0 (Y ))
=n(H0 (Y ) − ǫ)(1 − o(1)). (46)

Combining (45) and (46), we have limn→∞ n1 H(Y n ) = H0 (Y ).


Along the same line as above, we can also prove that limn→∞ n1 H(X n ) = H0 (X) and
limn→∞ n1 H(X n , Y n ) = H0 (X, Y ), which concludes the proof of Lemma 6.1.

R EFERENCES
[1] X. Wu and L.-L. Xie, “AEP of output when rate is above capacity,” in Proc. of the 11th Canadian Workshop on Information
Theory, Ottawa, Canada, May 13-15, 2009.
[2] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Inform. Theory, vol. 25, pp. 572–584,
1979.
[3] F. Xue, P. R. Kumar and L.-L. Xie, “The conditional entropy of the jointly typical set when coding rate is larger than
Shannon Capacity,” Manuscript, 2006.
[4] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[5] V. N. Vapnik and A. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,”
Theory of Probability and its Applications, vol. 16, no. 2, pp. 264–280, Jan. 1971.
[6] V. N. Vapnik, Estimation of dependences based on empirical data. New York: Springer-Verlag, 1982.

You might also like