"Entropy Inequalities For Discrete Channels

610 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-20, NO.
5, SEPTEMBER 1974
Entropy Inequalities for Discrete Channels

HANS S. WITSENHAUSEN, MEMBER, IEEE
Abstract-The sharp lower bound&) on the per-symbol output entropy entropy and a single use of the channel. In a remarkable
for a given per-symbol input entropy x is determined for stationary paper [4], Wyner and Ziv showed that for
discrete memoryless channels; it is the lower convex envelope of the
bound g(x) for a single channel use. The bounds agree for all noiseless
T=1-6 6
channels and all binary channels. However, for nonbinary channels, 9 is
not generally convex so that the bounds differ. Such is the case for the
( 6 l-6 1
Hamming channels that generalize the binary symmetric channels. The the binary symmetric channel, the function gT is convex.
bounds are of interest in connection with multiple-user communication, This enabled them to show that the same function gives
as exemplified by Wyner’s applications of “Mrs. Gerber’s lemma” (the
bound for binary symmetric channels first obtained by Wyner and Ziv).
the lower bound on the output entropy per symbol for a
These applications extend from the binary symmetric case to the. given per-symbol input entropy and blocks of arbitrary
Hamming case. Doubly stochastic channels are characterized by the lengths and arbitrary joint input distribution. Several
property of never decreasing entropy. significant applications to multiple-user communication
were made [4, pt. II].
I. INTRODUCTION In this paper, we consider general discrete stationary
ET T be the m x IZ matrix of transition probabilities memoryless channels and show that a sharp bound for
L of a stationary discrete memoryless channel with
input (respectively, output) alphabet of cardinality y1
arbitrary blocks is always given by the function fT defined
as follows. For A 2 0 let
(respectively,m). Thus T is a stochastic matrix; i.e., tij 2 0, (3)
m
iz tij= ‘* and for 0 I x 5 log y1let
Denote by A” the probability simplex

{(PI,**’ ,Pn)ER”IPi 2 O9 CPi= l> If and only if gr is convex it coincides withf,. It is shown
so that p E An implies Tp E Am. that gT is always convex for m = II = 2 and also for all
Let h,: A,, + R denote the entropy function
noiseless channels. For any (m,n) with m > 1, IZ > 1,
m + n > 4, there are m x nchannels Twithg, not convex.
For doubly stochastic channels (m = n, Te, = e,), the
output entropy cannot be less than the input entropy, a
property characterizing these channels. A particularly
and let
simple class of doubly stochastic channelsare the Hamming
h(0) = -8 log 8 - (1 - 0) log (1 - 0) channels, T = al, + (1 - a)e,e,‘/n, which generalize the
binary symmetric channel. It turns out that only for y1= 2
all logarithms in this paper being natural.
do these channels yield a convex gT.
For any channel Tone may define gT: [0, log 81 -+ R by
The results of this paper, besides throwing light on the
g=(x) = $ {km(Tp) I k,(p) 2 x> (1) action of channelsupon entropy, are of interest in extending
n the applications to multiple-user communication in [4].
which is equivalent to Both for the degraded broadcast channels considered by
Cover [6] and Bergmans[5], and for the problem of source
gdx) = ,mElkl;{hn(Tp) I k,(p) = x> (2) coding with side information, the results of Wyner in [4]
generalizefrom the binary symmetric caseto the Hamming
becausethe set {p E A, 1h,(p) 2 x} is compact and convex, case, using convex envelope formation. The details, to-
and the minimum of the concave function p --f h,(Tp) can gether with other results, will be contained in a future
always be attained at an extreme point, where h,(p) = x. publication [9].
Clearly, gT(0) = min, s is n h, (col, T) and gT (log n) =
h&(1/n) Te,), where col, T denotes the ith column of Tand
II. THE GENERAL SHARP BOUND
e, the it vector of ones. The function gT gives the sharp
lower bound on the output entropy for a given input The immediate properties of gT and fT are summarized
in the following lemma.
Define
Manuscript received September 7, 1973; revised April 15, 1974.
The author is with the Bell Laboratories, Murray Hill, N.J. 07974. ST = {Vdp),h,(Tp)) I P E A,>
WITSENHAUSEN: ENTROPY INEQUALITIES FOR DISCRETE CHANNELS 611
and denote the convex hull of this set by cv ST. As (l/K)H(Y,; * *,Y,), where (Y,; * *,Y,) denotes the corre-
continuous image of A,, ST is compact which implies sponding output is f&c).
compactnessof cv ST.
If g is a real function on a convex d o m a in, the convex Proof: Denote (X1, * * *,Xi) by X1 i and (Y,, * * *, Yi) by
envelope f = envg of g is defined to be the supremum Y,‘
. Since the channelis memorylessone has
of all affine (linear plus a constant) functions which do not P(Y, = y, x:-l = 4)
exceedg at any point of the d o m a in, (see,e.g., [3]). Then f
is also the supremum of all (lower semicontinuous)convex = i P(Y, = y I X, = x)P(X, = x, x:-1 = e>
functions not exceedingg. .X=1
Lemma 1: = x$l T,,P(X, = x, x:-1 = 4)
a) gT is nondecreasing dividing by P(Xt- ’ = c) when nonzero, one has for almost

b) gT is continuous all t,
4 d4 = m in{Y I &Y>E S-1
4 fT = envgT p(y, = y 1 x:-l = 5) = i T,,P(X, = x 1 x:-l = 5)
d f&4 = m in{Y I (x,Y>E cv ST> x=1
f) fT I gr pointwise and by the inequality h,(Tp) 2 g,(h,(p)) 2 f,(k,(p)),

g) fT is convex
h) fT = gr, if and only if gT is convex H(Y, 1x:-l = 5) 2 fT(H(Xk I x:-l = 5))
i) fT is continuous and sincefT is convex, taking the expectation over i: gives
8 fT(O>= g&9 andA (log 4 = gT (log 4
k) for 0 I x < log iz the supremumin (4) is a maximum H(Y, ) x:-l) = Er{H(Y, 1 x:-l = t)}
1) fr is nondecreasing.
2 ~&TW(Xk I x:- 1 = m>
Proof: For x1 I x2 one has {p E a,] h,(p) 2 x1} 2
2 fT(Jqfwk I x:-l = O>)
(P E 4, I k,(p) 2 x21 so that, by W , dd 5 gT(x2)
establishinga). As x varies in [0, log PZ]the set = fTW G k I x:- ‘)>. (5)
{P E 4, I h,(p) 2 4 By the chain rule for entropy, one has
varies continuously in Hausdorff set distance. Since k, is
uniformly continuous b) follows. Assertion c) is equivalent
to (2). By (3), $(A) is the intercept at the origin of the line
of slopeA supporting S, from below. That is x + ilx + 4(n)
is the highest affine function of slope I not exceedinggT
on [O, log ~1. BY (4), fT is the supremum of all affine (conditioning reducesentropy)
functions not exceedinggT, negative slopes need not be
consideredbecausegT is nondecreasing.This establishesd).
= $ jl fwk I x:-‘h
Then e), f), g), and h) are general properties of convex
envelopes,while i) and j) follow from the trivial one-
(Y,, Y:-’ are conditionally independentgiven X:- ‘)
dimensional specialization of the theorem [l] which states
that the envelope of a continuous function on a strictly
convexcompact set is continuous and agreeswith the given (by (5))
function on the boundary. For x = 0 the maximum in (4)
is attained for 1 = 0. For 0 < x < log n a support line
to cv ST at (x,fT(x)) cannot be vertical, hence,it has finite
2 fT (
t $l H(X, 1X:l)) , (convexity off,)
slope L which furnishes the maximum in (4). However, at

x = log n it is possiblethat the only support line be vertical. = fr ($ fw))
This establishesk), while 1)follows from (4) which represents
fT as a supremumof nondecreasingfunctions. which shows that the infimum cannot be less than fT(x).
Conversely, consider a fixed x, if fT(x) = ST(x) then
The direct part of the following theorem and its corollary
equality is achievablein (6) already for K = 1, by definition
are generalizationsof the arguments used by Wyner and
of gT. If fT(x) < g=(x), then the point (x,fT(x)) is a convex
Z iv in [4].
combination of two points (xl,gT(xl)), (x2,gT(x2)) of the
Theorem 1: For a discretestationary memorylesschannel graph of gT, correspondingto probability vectorsp1 and p2.
of m by n transition matrix T and for 0 I x I log n, the One has x = 8x, + (1 - 8)x,. If f3 is rational say 0 = r/s,
infimum over all K > 0 and all joint distributions of input then an input sequence(Xl,. * .,X,) with independentXi of
sequences(Xl, * * *,X,) with (l/K)H(X,;**,X,) 2 x of which r have distribution p1 and s - r distribution p2
612 IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 1974
yields the point (x,fr(x)), i.e., achievesequality in (6). If 8 under T is an interval symmetric about (a + b)/2, i.e., is
is irrational, then for small E > 0, of the form [((a -I- b)/2) - 6, ((a + b)/2) + S], with
6 2 0. By concavity of h, the m inimum of h(q) is attained
x = flex1 + (1 - &)(x, - E) at an endpoint of this interval. By (7), the lower end point
with is furthest away from & hence furnishes the m inimum. By
e, = &x2 - x1) - E (8) this point correspondsto the lower endpoint of
(x2 - Xl) - 8 {P I h(p) 2 4.
and Thus the function g(x) = m in {h(q) 1h(p) 2 x} is de-
Ye = ~&4x,) + (1 - ~&,(x2 - 4 scribed parametrically by
is achievable as indicated when 0, is rational. As E + 0, osprt
y, --f fT(x) and since 8, -+ 0 through an interval, in which
q = ap + b(1 - p)
rationals are dense, the infimum cannot be greater than
jr(x), completing the proof. x = h(p)
Corollary: If W is conditionally independent of YiK Y = h(q).
given X1” then
Since h(p) is increasing on [0,&l, the sign of the second
derivative of g is the sign of the determinant
dx dy
Proof: Conditional on W = w, Theorem 1 gives dp dp
; H( YIK 1 w = w) 2 fT ; H(XIK 1 w = w) --d2x d2y ’

( 1 dp2 dp2
since the conditional distribution of YiK given W = w One has
results from that of XIK given W = w through the action
dx d2x 1
of the channel. Taking the expectation over w and using the - = log -1-P -=-
convexity of fr yields the required inequality. dp P dp2 PO - P)
III. CHANNELS FOR WHICH g Is CONVEX 2 = (a - b) log y $ = -(a - W2 4(1 ‘_ q).

If and only if gT is convex, the bound for a single channel
use coincideswith the bound for multiple uses. In addition, Expansion shows that the determinant has the sign of
g can be determined by a single extremization for each x,
while the determination of f involves a sequenceof two 4(1 - 4) log y - (a - b)p(l - p) log T. (9)
parametric extremizations. It is, therefore, of interest to
determine whether gr is necessarily convex for certain The function $(p) = p(1 - p) log (1 - p)/p has second
classesof channels. derivative I/‘(P) = - 2 log (1 - p)/p - (1 - 2p)/[p(l - p)].
Thus for p E [O,$], $(p) r 0 and @ ‘(p) I 0, i.e., II/ is
Theorem 2: For all binary channels,gr is convex.
nonnegative and concave. Now
Proof: For a binary channel one has
q = ap + b(1 - p)
b
T=
lfa l-b = (a - b)p + (1 - (a - b)) 1 _ (a”_ b)e
with a,b E [O,l]. Swapping the outputs if necessary, one
By (8), a - b E (0,l) and by (7), b/[l - (a - b)] E [O,+].
may assume
By concavity of Q!I,
a+b<l. (7)
Swapping the inputs if necessary,one may assumea 2 b. $(q) 2 (a - bM(p)+ (1 - (a - b))J/( 1 _ (I: _ b))
For a = b one has gr(X) = h(a) while for a - b = 1
one has a = 1, b = 0, T = 1, hencegr(X) = x. Assume 2 (a - bM(p)
O<a-b<l. which shows that the expression (9) is nonnegative. Hence
(8)
The relation g is convex as claimed.
Remark: This provides an independent derivation,
(1 ” ,) = T (1 : P) avoiding series, of “Mrs. Gerber’s Lemma” [4]. On the
other hand, Theorem 2 can be derived from that lemma by
reduces to q = ap + b(1 - p). The set {p I h(p) 2 x} c interpreting an arbitrary binary channel as a cascadeof a
[O,l] is an interval symmetric about 4; thus its image symmetric channel and a Z channel [S].
WITSENHAUSEN: ENTROPY INEQUALITIES FOR DISCRETE CHANNELS 613
Theorem 3: For all noiselesschannels,gT is convex. IV. DEPARTURES FROM CONVEXITY

Proof: Supposetherearem outputs after outputs of zero There are, however, many channelsfor which gr is not
probability have been discardedwithout loss of generality. convex.
Then the n inputs can be partitioned into m sets each con- Lemma 2: The function gr is not convex for the n-input
sisting of the ki (2 1) inputs m a p p ing into output i. A choice 2-output channel
of input probability vectorp is equivalentto the independent
choice of, first, the total probability qi of the ith group, with
q E A, and, second, of m conditional probability vectors
T=(; I”, lEE ...*
.. 1°C)
ni E At+, one for each group. The corresponding output withe = $andn.> 2.
probability vector is q. One has Proof: The input probability vector p may be written
.
n= Cki p = (1 - 8, e~l,.~~,enn-l), 8 E [o,i], n E A,,el.
i=l
It yields the output probabilities
and the set ST is describedby
q = (1 - 8 + .5e,(1 - &)e)
x = k(P) = k(q) + 2 qih/c,(nJ and the entropies
i=l
Y = h,,(q). M-9 = h(e) + h-1(4

AS the n, vary in A,+, each hki(71j) varies over [0, log ki]. h,(q) = 41 - 40
The points of ST corresponding to each q form the set For each fixed 8, the range of h,(p) as n varies is [h(e),
{(x,Y) ( y = h,(q), y I x I .Y + C$1 qi log ki}, The inter-
h(8) + 8 log (n - l)]. Thus
section of S, with the line y = y0 E [0, log m] is, therefore,
an interval of x defined by ST = os~s, {(xh((l - 69) I W)
I x I h(O) + 0 log (n - l)}.
Y,IXS max hm(q)+ F qi log ki = ~0 + $(YO)
h.dq)=yo i=l
For 0 I y < h(c), the equation h(ct) = y has two solutions
where curE [O,E] and CI~E [l - a,l]. Hence the equation
t)(y) E max $J qi log ki. h((i - &)e) = y

h,,,(q)=y i= 1
has only one solution in [O,l], n a m e ly, f3 = c~/(l - a) E
One has [0,&&l - E)) since a2 would require 0 > 1. Therefore, the
G(y) = max 5 qi log ki points (x,y) of ST with 0 I y < h(c) satisfy x I h(8) +
h,(q)ky i= 1 8 log (n - l), for some 8 E [O&l - a)] and afortiori
becausethe maximum of a linear function on the compact
convex set {q I h,(q) 2 y} is always attained at some x<h -f- log (n - 1) = x0.
l--E
extremepoint, where h,,,(q) = y. Hence $ is nonincreasing.
Suppose This implies that
‘!‘(YI> = F qi’l’ log ki, k(P) = Y, g&) 2 44, for x0 I x I log n.
The choice 0 = 1, rc E A,- I generatespoints of ST with
$(Yz) = c qi(‘) log k, h,(q”2’) = y,. y = h(c) and all x in [0, log (n - l)]. Therefore,
For 0 E [O,l], by concavity of h, g&3 5 49, for 0 I x I log (n - 1).
h,(eq”’ + (1 - e)p) 2 ey, + (1 - e)y, For E = Q one has x0 < log (n - l), for all n > 2. Hence
hence g-r(x) = h(E) for x0 5 x 5 log (n - 1).
wy, + (1- e)Y,)= hdq)~Oy~+(l-B)y~

max C qi log ki
i
O n the other hand, for 0 = 0 the point (0,O)is in Sr, that
is g=(O) = 0. This showsthat the function gr is not convex.
2 F @qi (l) + (1 - ejqi(2))log ki Lemma 3: The function gr is not convex for the 2-input
3-output channel
= w(~~) + (1 - fw(Y,) l-a 0
that is, $ is concave.This implies that the set a 1
T= 5 5
ST = {Cw) I 0 I Y I log m , Y 5 x I Y f $09) a 1
is convex. Therefore the function gT, whose graph is the 2 2
lower boundary of ST is convex as claimed. with a = &.
614 ~EE~RAN~AC~~NS~NINFORMA~~NTHE~RY,~EFTEI~BER 1974
Proof: For input probabilities p = (0,l - 0) one has V. DOUBLY STOCHASTIC CHANNELS
An n x n transition matrix T (or the channel it defines)
q = Tp = (e(i - a), *a0 + +(l - e), +a0 + +(l - e))
is doubly stochastic if Te, = e,, where e, is the n vector of
h,(p) = W I ones; that is, if it has unit row and column sums.
The following lemma characterizesthe doubly stochastic
h,(q) = k(e) = Ml - a)e) + (1 - (1 - @)log2. channels among all n x n channels in terms of an entropy
inequality.
Let h-l : [0, log 23 + [O,:] be the inverse of the restriction
Lemma 5: For a channel with n x n transition matrix
of h to the domain CO,+].
T the following are equivalent:
Then h,(p) = x implies 8 E {h-‘(x), 1 - h-‘(x)>. Define a) T is doubly stochastic
ti on PAI by b) UTp) 2 Md9 for P E A,
I&) = min (k(a),k(l - a)). c) ST(x) 2 x, for x E [0, log n]
d) fT(x) 2 x, for x E [0, log n]
Then e) H(Y,; - *,Y,) 2 f&Y,,- - *,X,), for all k > 0, all joint
ST(X) = h$ h,(q) distributions of input sequence (X1, * * *,X,), with
(Y1; * *, Yk) the corresponding output.
= ;(h - l(x)).
Proof: a) + b): if T is doubly stochastic then by
For a = &, there is a point c1*E (O,+), where k(a*) = Birkhoff’s theorem [2], it is a convex combination of
k(1 - ol*) while the derivatives of these functions are permutation matrices, i.e.,
different at a*. Then II/ has at M* a derivative to the left
strictly greater than the derivative to the right. As h-l is T = 2 eipi, ei 2 0, 2 ei = 1
i=l i=l
smooth and has positive derivative, the composite function
gr has the same property at x* = h(a*). Hence gT is not Pi the permutation matrices. Since h, is a concave and
convex. symmetric function of its arguments one has
Let Cm,,be the set of all m x n stochastic matrices, h,(Tp) = ME e$‘g> 2 C eMPip) = C W,(p) = h,(p).
i.e., the set of channels with n inputs and m outputs. C,,,, This property of the function h, is known as “concavity in
is the product of n copies of A,,,, corresponding to the the senseof I. Schur” [7].
independent choice of the columns. Thus C,,,, is a convex
b) --f 4 by (9
polytope in Rm”, of dimension (m - 1)n. Its extreme points c) + d) the function x is an affine function nowhere
are the noiselesschannels. C,,, has some positive (m - 1) exceedinggT hence it cannot exceed the envelopefT of gT
n-dimensional Lebesguemeasure,i.e., volume. which is the supremum of all such functions.
Lemma 4: The set of channels in C,,,, for which gr is d) + e) by Theorem 1.
not convex is relatively open in C,,,,,. Hence either 1) this e) + a) apply e) with k = 1, X, uniformly distributed to
set is empty, or 2) the set has positive volume. obtain h,((l/n)Te,) 2 h,((l/n)e,) = log n, hence
Proof: It suffices to show that the set of channels for
which gr is convex is a closed set. For any fixed x the set h, A Te,, = log n
(n >
{p E A-, 1h,,(p) 2 x} is compact. The function (T,p) --t
h,(Tp) is jointly continuous in T and p. Hence the minimum requiring (l/n)Te, = (l/n)e, or Te, = e,.
ST(x) of h,(Tp) is continuous in T for each fixed x. If That doubly stochastic channels are characterized by the
T,, + T elementwise,then gr, -+ gr pointwise. The point- fact that they never decreaseentropy is valid without the
wise limit of convex functions is convex, establishing the assumption that the channel is stationary.
lemma.
Theorem 5: For a memoryless channel with n, x n,
Theorem 4: Form > 1, n > 1, m + n > 4 the channels transition matrix Tt at time t = 0,1,2, * * . the following
for which gT is not convex represent a positive fraction of are equivalent: a) for k 2 1, t, < t, < * * * < t,, inputs
the volume of C,,,,. x = (Jr,,; * *J,,) of arbitrary joint distribution and
Y = (Ytl,***, Y,,) the corresponding outputs, one has
Proof: If rows of zeros are added to a stochastic
H(Y) 2 H(X). b) Tt is doubly stochastic for all t.
matrix T, the set S, and a fortiori the function gT remain
unchanged, since the additional outputs have identically Proof: a) + b) apply Lemma 5 with k = 1, t, = t to
zero probability. By Lemma 4 it is sufficient to exhibit a conclude that T, is doubly stochastic.
single channel in each C,,, to establish the claim. If n = 2, b) * a) the distribution of Y follows from that of X
m > 2 such a channel is obtained by adding m - 3 rows through the transition matrix T,, x T,, x * * * x Tt,
of zeros to the example of Lemma 3. If n > 2, m 2 2 (Kronecker product) which is doubly stochastic since all
such a channel is obtained by adding m - 2 rows of zeros the factors are. Then a) follows by applying Lemma 5 to a
to the 2 x n matrix of Lemma 2. single use of this product channel.
WITSENHAUSEN : ENTROPY INEQUALITIES FOR DISCRETE CHANN!?LS 615
F inalIy, one can characterize,amongthe doubly stochastic named endpoint yields the m inimum of h,(Tp) it sufficesto
channeis,those for which the inequalitiesc) or d) of L e m m a show that at any point of the arc between the endpoints
5 are sharp. dh,(Tp)/dp, is negative.At such a point p1 > p2 > p3 > 0.
Lemma 6: For an n-input n-output channel T, the The relations
following are equivalent:
a) T is doubly stochasticand contains a unit entry. iil Pi= l
b) g&x) = x, for x E [0, log n] and
c) f=(x) = x, for x E [0, log n].
Proof: a) + b) if T is doubly stochastic and tij = 1, ik ti log Pi = --x
then all other entriesin the ith row andjth column are zeros imply
and removing these leaves an (rt - 1) x (n - 1) doubly
stochasticmatrix. Hence for the input probability vector p dp, = -@p, + &d
with pi = 1 - 8, and all other components0/(n - 1) one
obtains the output probability vector q with qj = 1 - 0, dp, = log PI - log P2 dp
IMP, - bP3 ’
and all other components0/(n - 1). Varying 8 from 0 to
I - l/n all points with 0 I x = y < log y1 are seen to from which
belong to ST. By (2), g=(x) 5 x while by L e m m a 5,
gT(x) 2 x. b) + c) since gT is convex it agrees with fT dh,(T,) = --CI dp, log 41 - log q2
c) + a) by L e m m a 5, T is doubly stochastic.For x = 0 one
hasf,(O) = 0 and by L e m m a 1, gT(0) = 0 or
m in
l<i<n
h”(COli T) = 0.
_ log Pl - log P2
@ ‘ii? q2
bP2 - l%P3
- h3 q3)
1 (10)
where log qi = c$(Iogpi) with 4(t) = log [ae’ + (1 - ol)/n].

Therefore, at least one column of T is a basis vector, Since 4”(t) = (cr/n)(l - a)e’[ae’+ (1 - c()/n]-2 is strictly
establishinga). positive for 0 < M .< 1, the function 4 is strictly convex.
Then, since p1 > p2 > p3 > 0, the bracket on the right
VI. HAMMING CHANNELS side of (10) is strictly positive. This completes the proof
A natural generalizationof the classof binary symmetric for n = 3. For n > 3, the same argument applies to any
channelsis the class of H a m m ing channels,defined by the 3 positive components of p (they can be renormalized to
II x n transition matrices conditional probabilities in A3): unlesspi 2 pj = pk, for
somepermutations of i, j, k, a variation of pi, pi, pk leaving
T = T, = a1, + (1 - a) 1 e,e,’ fixed i) their sum, ii) all other components,and iii) their
n contribution to h,(p), will decreaseh,(Tp) contradicting
with 0 5 a 5 1. m inimality. The same argument still applies to triples
For - l/(n - 1) I cx< 0, T, is still a stochasticmatrix consisting of 2 positive and 1 zero component. These
but as the diagonal elementsare then smaller than the off- conditions rule out all vectors p but those of the form
diagonal ones (which can be undone by permutation for j?b + (1 - /?)(l/n)e,. For 0 < a < 1, the m inimum is
n = 2 but not for n > 2) its properties are somewhat attained only for such vectors (with b determineduniquely
different and the present treatment will assume Q >; 0. by h,(/?b + (1 - /3)(l/n)eJ = x while for CI = 0 or 1,
Note also that TplTa = Tols. h,(Tp) is the samefor all vectorsp E A,, satisfyingh,(p) = x.
For a H a m m ing channel the relation q = Tp becomes How truly remarkable Mrs. Gerber’s lemma is can only
4 = Ccp+ (1 - a)(l/n)e,, that is qi = “pi f (1 - a)/% be fully appreciatedin the light of the next theorem.
Now consider the m inimization defining gT for such
channels. Theorem 6: For a H a m m ing channel T, with n > 2,
Lemma 7: For an n x n H a m m ing channel T and 0 < a < 1, the function gT is not convex.
0 < x I log n, the m inimum of h,(Tp) subjectto h,(p) = x Proof: By the previous lemma, the function y = gr(x)
is attained for p = /?b + (1 - /I)(l/n)e,, where b is a basis is describedparametrically by
vector and 0 I p -< 1.
P = pb + (1 - P>e,/n
Proof: The assertion is clear for x = 0 and x = log n.
F irst considern = 3. As h,(p) and h,(Tp)are both invariant x = Up)
under permutations, one may assumep1 2 pz 2 p3. These
relations togetherwith h(p) = x, (0 < x < log n) confinep q = ccp + (1 - a)e,/n
to an arc of curve in A,. If x < log 2, the end points are
= c@b + (1 - c$)e,/n
characterizedby p1 > p2, p3 = 0, and p1 > p2 = p3. If
x 2 log 2 the endpointsare characterizedbyp, = p2 > p3 Y = h,(q)
and p1 > p2 = p3. In either case p1 varies continuously
and monotonically along the arc. To show that the last with 0 s fi I 1.
616 fBEE TRANSACTIONS ON INFORMATION THEQRY, SEPTEMBER 1974
We have x = &(/3), y = +,(c$), where Corollary 6.1: For n > 2, the channels for which gr is
not convex cover a positive fraction of the volume of the
9,(P)= - (B+ qf) log(B+ 9) polytope of n x n doubly stochastic channels.
For a single use of channel T, with h,(p) 2 x, let gT*(x)
be the m inimum input-output mutual information. If each
- (n - 1) 1-B log 9 column of T has the sameentropy c, then, for input variable
n 1, output Y one has Z(X; Y) = ZZ(Y) - ZZ(YI X) =
ZZ(Y) - c. Thus for such channels gr.*(x) = g=(x) - c =
= log n - 1 [(1 + (n - l)B> log (1 + (n - l)P) gr(x) - gT(0). As this applies in particular to the Hamming
n channels one has the following corollary.
Corollary 6.2: There are channels for which gT* is not
+ (n - 1X1 - P> loi3 (1 - lOI
convex.
= log n - +(n - l)/?’ + +(n - l)(n - 2)p3 + O(p4). REFERENCES
[l] V. Klee and M. Martin, “Semicontinuity of the face-function of a
As dx/dp = c$,‘(/?)is negative, the second derivative of gT convex set (dedicated to Hugo Hadwiger on his sixtieth birthday),”
has the sign of Commun. Math. Helvetia, vol. 46, pp. 1-13, 1971.
[2] L. Mirsky, “Results and problems in the theory of doubly-stochastic
matrices,” Z. Wahrscheinlichkeitstheorie, vol. 1, pp. 319-334, 1963.
dy d2x dx d2y [3] H. S. Witsenhausen, “A minimax control problem for sampled
-----.
linear systems,” IEEE Trans. Automat. Contr., (Appendices), vol.
dB dP2 dfi dP2 AC-13, pp. 5-21, Jan. 1968.
[4] A. D. Wyner and J. Ziv, “A theorem on the entropy of certain
For small /I, this expressionreducesto binary sequences and applications, Parts I and II,” IEEE Trans.
Inform. Theory, vol. IT-19, pp. 769-777, Nov. 1973.
-$(l - N)a2(n - l)‘(n - 2)/12 + O(j13), [5] P. P. Bergmans, “Random coding theorem for broadcast channels
with degraded components,” IEEE Trans. Inform. Theory, vol.
IT-19, pp. 197-207, Mar. 1973.
that is, gr is strictly concave in a neighborhood of x = [6] T. M. Cover, “Broadcast channels,” IEEE Trans. Inform. Theory,
log n, so that the convex envelopefT will be affine in that vol. IT-18, pp. 2-14, Jan. 1972.
[7] A. Ostrowski, “Sur quelques applications des fonctions convexes
region. et concaves au sens de I. Schur,” J. Math. Pures Appl., Ser. ZX,
By the same argument as in Theorem 4 one obtains the vol. 31, pp. 253-293, 1952.
[8] A. D. Wyner, private commun.
following corollary. [9] H. S. Witsenhausen and A. D. Wyner, in preparation.

"Entropy Inequalities For Discrete Channels

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

"Entropy Inequalities For Discrete Channels

Uploaded by

Copyright:

Available Formats

610 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-20, NO.

Entropy Inequalities for Discrete Channels

iz tij= ‘* and for 0 I x 5 log y1let

Denote by A” the probability simplex

Lemma 1: = x$l T,,P(X, = x, x:-1 = 4)

a) gT is nondecreasing dividing by P(Xt- ’ = c) when nonzero, one has for almost

f) fT I gr pointwise and by the inequality h,(Tp) 2 g,(h,(p)) 2 f,(k,(p)),

slope L which furnishes the maximum in (4). However, at

; H( YIK 1 w = w) 2 fT ; H(XIK 1 w = w) --d2x d2y ’

III. CHANNELS FOR WHICH g Is CONVEX 2 = (a - b) log y $ = -(a - W2 4(1 ‘_ q).

Theorem 3: For all noiselesschannels,gT is convex. IV. DEPARTURES FROM CONVEXITY

Y = h,,(q). M-9 = h(e) + h-1(4

t)(y) E max $J qi log ki. h((i - &)e) = y

wy, + (1- e)Y,)= hdq)~Oy~+(l-B)y~

where log qi = c$(Iogpi) with 4(t) = log [ae’ + (1 - ol)/n].

You might also like