Professional Documents
Culture Documents
5, SEPTEMBER 1974
Abstract-The sharp lower bound&) on the per-symbol output entropy entropy and a single use of the channel. In a remarkable
for a given per-symbol input entropy x is determined for stationary paper [4], Wyner and Ziv showed that for
discrete memoryless channels; it is the lower convex envelope of the
bound g(x) for a single channel use. The bounds agree for all noiseless
T=1-6 6
channels and all binary channels. However, for nonbinary channels, 9 is
not generally convex so that the bounds differ. Such is the case for the
( 6 l-6 1
Hamming channels that generalize the binary symmetric channels. The the binary symmetric channel, the function gT is convex.
bounds are of interest in connection with multiple-user communication, This enabled them to show that the same function gives
as exemplified by Wyner’s applications of “Mrs. Gerber’s lemma” (the
bound for binary symmetric channels first obtained by Wyner and Ziv).
the lower bound on the output entropy per symbol for a
These applications extend from the binary symmetric case to the. given per-symbol input entropy and blocks of arbitrary
Hamming case. Doubly stochastic channels are characterized by the lengths and arbitrary joint input distribution. Several
property of never decreasing entropy. significant applications to multiple-user communication
were made [4, pt. II].
I. INTRODUCTION In this paper, we consider general discrete stationary
ET T be the m x IZ matrix of transition probabilities memoryless channels and show that a sharp bound for
L of a stationary discrete memoryless channel with
input (respectively, output) alphabet of cardinality y1
arbitrary blocks is always given by the function fT defined
as follows. For A 2 0 let
(respectively,m). Thus T is a stochastic matrix; i.e., tij 2 0, (3)
m
and denote the convex hull of this set by cv ST. As (l/K)H(Y,; * *,Y,), where (Y,; * *,Y,) denotes the corre-
continuous image of A,, ST is compact which implies sponding output is f&c).
compactnessof cv ST.
If g is a real function on a convex d o m a in, the convex Proof: Denote (X1, * * *,Xi) by X1 i and (Y,, * * *, Yi) by
envelope f = envg of g is defined to be the supremum Y,‘
. Since the channelis memorylessone has
of all affine (linear plus a constant) functions which do not P(Y, = y, x:-l = 4)
exceedg at any point of the d o m a in, (see,e.g., [3]). Then f
is also the supremum of all (lower semicontinuous)convex = i P(Y, = y I X, = x)P(X, = x, x:-1 = e>
functions not exceedingg. .X=1
yields the point (x,fr(x)), i.e., achievesequality in (6). If 8 under T is an interval symmetric about (a + b)/2, i.e., is
is irrational, then for small E > 0, of the form [((a -I- b)/2) - 6, ((a + b)/2) + S], with
6 2 0. By concavity of h, the m inimum of h(q) is attained
x = flex1 + (1 - &)(x, - E) at an endpoint of this interval. By (7), the lower end point
with is furthest away from & hence furnishes the m inimum. By
e, = &x2 - x1) - E (8) this point correspondsto the lower endpoint of
(x2 - Xl) - 8 {P I h(p) 2 4.
and Thus the function g(x) = m in {h(q) 1h(p) 2 x} is de-
Ye = ~&4x,) + (1 - ~&,(x2 - 4 scribed parametrically by
is achievable as indicated when 0, is rational. As E + 0, osprt
y, --f fT(x) and since 8, -+ 0 through an interval, in which
q = ap + b(1 - p)
rationals are dense, the infimum cannot be greater than
jr(x), completing the proof. x = h(p)
Corollary: If W is conditionally independent of YiK Y = h(q).
given X1” then
Since h(p) is increasing on [0,&l, the sign of the second
derivative of g is the sign of the determinant
dx dy
Proof: Conditional on W = w, Theorem 1 gives dp dp
Proof: For input probabilities p = (0,l - 0) one has V. DOUBLY STOCHASTIC CHANNELS
An n x n transition matrix T (or the channel it defines)
q = Tp = (e(i - a), *a0 + +(l - e), +a0 + +(l - e))
is doubly stochastic if Te, = e,, where e, is the n vector of
h,(p) = W I ones; that is, if it has unit row and column sums.
The following lemma characterizesthe doubly stochastic
h,(q) = k(e) = Ml - a)e) + (1 - (1 - @)log2. channels among all n x n channels in terms of an entropy
inequality.
Let h-l : [0, log 23 + [O,:] be the inverse of the restriction
Lemma 5: For a channel with n x n transition matrix
of h to the domain CO,+].
T the following are equivalent:
Then h,(p) = x implies 8 E {h-‘(x), 1 - h-‘(x)>. Define a) T is doubly stochastic
ti on PAI by b) UTp) 2 Md9 for P E A,
I&) = min (k(a),k(l - a)). c) ST(x) 2 x, for x E [0, log n]
d) fT(x) 2 x, for x E [0, log n]
Then e) H(Y,; - *,Y,) 2 f&Y,,- - *,X,), for all k > 0, all joint
ST(X) = h$ h,(q) distributions of input sequence (X1, * * *,X,), with
(Y1; * *, Yk) the corresponding output.
= ;(h - l(x)).
Proof: a) + b): if T is doubly stochastic then by
For a = &, there is a point c1*E (O,+), where k(a*) = Birkhoff’s theorem [2], it is a convex combination of
k(1 - ol*) while the derivatives of these functions are permutation matrices, i.e.,
different at a*. Then II/ has at M* a derivative to the left
strictly greater than the derivative to the right. As h-l is T = 2 eipi, ei 2 0, 2 ei = 1
i=l i=l
smooth and has positive derivative, the composite function
gr has the same property at x* = h(a*). Hence gT is not Pi the permutation matrices. Since h, is a concave and
convex. symmetric function of its arguments one has
Let Cm,,be the set of all m x n stochastic matrices, h,(Tp) = ME e$‘g> 2 C eMPip) = C W,(p) = h,(p).
i.e., the set of channels with n inputs and m outputs. C,,,, This property of the function h, is known as “concavity in
is the product of n copies of A,,,, corresponding to the the senseof I. Schur” [7].
independent choice of the columns. Thus C,,,, is a convex
b) --f 4 by (9
polytope in Rm”, of dimension (m - 1)n. Its extreme points c) + d) the function x is an affine function nowhere
are the noiselesschannels. C,,, has some positive (m - 1) exceedinggT hence it cannot exceed the envelopefT of gT
n-dimensional Lebesguemeasure,i.e., volume. which is the supremum of all such functions.
Lemma 4: The set of channels in C,,,, for which gr is d) + e) by Theorem 1.
not convex is relatively open in C,,,,,. Hence either 1) this e) + a) apply e) with k = 1, X, uniformly distributed to
set is empty, or 2) the set has positive volume. obtain h,((l/n)Te,) 2 h,((l/n)e,) = log n, hence
Proof: It suffices to show that the set of channels for
which gr is convex is a closed set. For any fixed x the set h, A Te,, = log n
(n >
{p E A-, 1h,,(p) 2 x} is compact. The function (T,p) --t
h,(Tp) is jointly continuous in T and p. Hence the minimum requiring (l/n)Te, = (l/n)e, or Te, = e,.
ST(x) of h,(Tp) is continuous in T for each fixed x. If That doubly stochastic channels are characterized by the
T,, + T elementwise,then gr, -+ gr pointwise. The point- fact that they never decreaseentropy is valid without the
wise limit of convex functions is convex, establishing the assumption that the channel is stationary.
lemma.
Theorem 5: For a memoryless channel with n, x n,
Theorem 4: Form > 1, n > 1, m + n > 4 the channels transition matrix Tt at time t = 0,1,2, * * . the following
for which gT is not convex represent a positive fraction of are equivalent: a) for k 2 1, t, < t, < * * * < t,, inputs
the volume of C,,,,. x = (Jr,,; * *J,,) of arbitrary joint distribution and
Y = (Ytl,***, Y,,) the corresponding outputs, one has
Proof: If rows of zeros are added to a stochastic
H(Y) 2 H(X). b) Tt is doubly stochastic for all t.
matrix T, the set S, and a fortiori the function gT remain
unchanged, since the additional outputs have identically Proof: a) + b) apply Lemma 5 with k = 1, t, = t to
zero probability. By Lemma 4 it is sufficient to exhibit a conclude that T, is doubly stochastic.
single channel in each C,,, to establish the claim. If n = 2, b) * a) the distribution of Y follows from that of X
m > 2 such a channel is obtained by adding m - 3 rows through the transition matrix T,, x T,, x * * * x Tt,
of zeros to the example of Lemma 3. If n > 2, m 2 2 (Kronecker product) which is doubly stochastic since all
such a channel is obtained by adding m - 2 rows of zeros the factors are. Then a) follows by applying Lemma 5 to a
to the 2 x n matrix of Lemma 2. single use of this product channel.
WITSENHAUSEN : ENTROPY INEQUALITIES FOR DISCRETE CHANN!?LS 615
F inalIy, one can characterize,amongthe doubly stochastic named endpoint yields the m inimum of h,(Tp) it sufficesto
channeis,those for which the inequalitiesc) or d) of L e m m a show that at any point of the arc between the endpoints
5 are sharp. dh,(Tp)/dp, is negative.At such a point p1 > p2 > p3 > 0.
Lemma 6: For an n-input n-output channel T, the The relations
following are equivalent:
a) T is doubly stochasticand contains a unit entry. iil Pi= l
b) g&x) = x, for x E [0, log n] and
c) f=(x) = x, for x E [0, log n].
Proof: a) + b) if T is doubly stochastic and tij = 1, ik ti log Pi = --x
then all other entriesin the ith row andjth column are zeros imply
and removing these leaves an (rt - 1) x (n - 1) doubly
stochasticmatrix. Hence for the input probability vector p dp, = -@p, + &d
with pi = 1 - 8, and all other components0/(n - 1) one
obtains the output probability vector q with qj = 1 - 0, dp, = log PI - log P2 dp
IMP, - bP3 ’
and all other components0/(n - 1). Varying 8 from 0 to
I - l/n all points with 0 I x = y < log y1 are seen to from which
belong to ST. By (2), g=(x) 5 x while by L e m m a 5,
gT(x) 2 x. b) + c) since gT is convex it agrees with fT dh,(T,) = --CI dp, log 41 - log q2
c) + a) by L e m m a 5, T is doubly stochastic.For x = 0 one
hasf,(O) = 0 and by L e m m a 1, gT(0) = 0 or
m in
l<i<n
h”(COli T) = 0.
_ log Pl - log P2
@ ‘ii? q2
bP2 - l%P3
- h3 q3)
1 (10)
We have x = &(/3), y = +,(c$), where Corollary 6.1: For n > 2, the channels for which gr is
not convex cover a positive fraction of the volume of the
9,(P)= - (B+ qf) log(B+ 9) polytope of n x n doubly stochastic channels.
For a single use of channel T, with h,(p) 2 x, let gT*(x)
be the m inimum input-output mutual information. If each
- (n - 1) 1-B log 9 column of T has the sameentropy c, then, for input variable
n 1, output Y one has Z(X; Y) = ZZ(Y) - ZZ(YI X) =
ZZ(Y) - c. Thus for such channels gr.*(x) = g=(x) - c =
= log n - 1 [(1 + (n - l)B> log (1 + (n - l)P) gr(x) - gT(0). As this applies in particular to the Hamming
n channels one has the following corollary.
Corollary 6.2: There are channels for which gT* is not
+ (n - 1X1 - P> loi3 (1 - lOI
convex.
= log n - +(n - l)/?’ + +(n - l)(n - 2)p3 + O(p4). REFERENCES
[l] V. Klee and M. Martin, “Semicontinuity of the face-function of a
As dx/dp = c$,‘(/?)is negative, the second derivative of gT convex set (dedicated to Hugo Hadwiger on his sixtieth birthday),”
has the sign of Commun. Math. Helvetia, vol. 46, pp. 1-13, 1971.
[2] L. Mirsky, “Results and problems in the theory of doubly-stochastic
matrices,” Z. Wahrscheinlichkeitstheorie, vol. 1, pp. 319-334, 1963.
dy d2x dx d2y [3] H. S. Witsenhausen, “A minimax control problem for sampled
-----.
linear systems,” IEEE Trans. Automat. Contr., (Appendices), vol.
dB dP2 dfi dP2 AC-13, pp. 5-21, Jan. 1968.
[4] A. D. Wyner and J. Ziv, “A theorem on the entropy of certain
For small /I, this expressionreducesto binary sequences and applications, Parts I and II,” IEEE Trans.
Inform. Theory, vol. IT-19, pp. 769-777, Nov. 1973.
-$(l - N)a2(n - l)‘(n - 2)/12 + O(j13), [5] P. P. Bergmans, “Random coding theorem for broadcast channels
with degraded components,” IEEE Trans. Inform. Theory, vol.
IT-19, pp. 197-207, Mar. 1973.
that is, gr is strictly concave in a neighborhood of x = [6] T. M. Cover, “Broadcast channels,” IEEE Trans. Inform. Theory,
log n, so that the convex envelopefT will be affine in that vol. IT-18, pp. 2-14, Jan. 1972.
[7] A. Ostrowski, “Sur quelques applications des fonctions convexes
region. et concaves au sens de I. Schur,” J. Math. Pures Appl., Ser. ZX,
By the same argument as in Theorem 4 one obtains the vol. 31, pp. 253-293, 1952.
[8] A. D. Wyner, private commun.
following corollary. [9] H. S. Witsenhausen and A. D. Wyner, in preparation.