Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The
Annals of Probability.
http://www.jstor.org
The Annab; of Probability
1986, Vol. 14, No. 1, 336-342
BY ANDREW R. BARRON
StanfordUniversity
A strengthenedcentrallimittheoremfordensitiesis establishedshowing
monotoneconvergencein the sense of relativeentropy.
336
ENTROPY AND THE CENTRAL LIMIT THEOREM 337
Let X be any
LEMMA 1. Entropyis an integralof Fisher informations.
randomvariablewithfinitevariance,then
I ~~~~dt
(2.1) D(X ) + ,-tZ) dt
=| J(X
Jo ~~~~2t'
normalrandomvariablewiththesame mean and
whereZ is an independent
varianceas X.
differential
Equation (2.1) is derivedin Section4 fromthe corresponding
equation (d/dt)D(Yt) = J(Yt)/2t whereYt= ViX+ A1D-tZ and 0 < t < 1.
Two basic convolutioninequalitiesare neededin theproofs.Let X and Y be
randomvariablesas above and supposethat the densityforY has a bounded
derivative.Let V be any randomvariableindependentof X and Y. Then
convolutionincreasesentropyand decreasesFisherinformation:
(2.2) H(X+ V) 2 H(X) and I(Y+ V) < I(Y).
Equality holds if and onlyif V is almostsurelyconstant.These inequalities
followfromthe convexity ofthefunctions x logx and x2 and are wellknown.
Morespecializedconvolution willalso be needed.Let Y1and Y2be
inequalities
independentrandom variableshaving withboundedderivatives
densities and let
?
ai 0, a, a2 1,
+ = thenI( aj Y1 + <
a2Y2) aj1(Y1) + a21(Y2) [see Stam(1959)
and Blachman(1965)].Henceif Y1and Y2 have the same variance,
(2.3) J(a1 + a2Y2)< aJ(YD + a2J(Y2).
Equalityholdsifand onlyif Y1and Y2arenormal.Animmediate of
consequence
Lemma 1 is the inequalityD( ra1X1?+ a <
X2) a1D(Xl) a2D(X2)
+ forany
independentrandomvariablesX1,X2 havingthe same finitevariance.This
338 A. R. BARRON
powerinequality
inequalityforD mayalso be deducedfromShannon'sentropy
(Shannon,1948,Stam,1959,and Blachman,1965).
LEMMA2. Let S,, be the standardized sum and let Sn,= dtSn + 1D- tZ for
fixed0 < t < 1. ThennJ(Sn)is a subadditivesequence,thatis (p + q)J(Sp+q)
< pJ(SP) + qJ(Sq). In particular, J(S2n) < J(Sn). Furthermore, the stan-
dardizedFisherinformationconverges tozero
(3.1) lim J(Sn) = 0.
n coo
I(Sn) convergesto the
Equivalently,the sequence of Fisher informations
Cramer-Raobound1/a2.
COROLLARY. Suppose the entropy H(S,) is finite for some n. Then S, has a
density function fn which converges to 4 in the L1 sense,
Now D(Ya) < log 1/ 1 - a frominequality (2.2), thus lima OD(Ya)= 0. Also
D(Yb) < D(X) + log1/ $, so limSUpblD(Yb) < D(X). Note thatlimb lYb=
X in probabilityand hence in distribution,so liminfblD(Yb) ? D(X) by the
lowersemicontinuity oftherelativeentropy. Thus limb lD(Yb) = D(X). If the
integralon (0,1) is finite,
thenlettinga -* 0 and b -* 1 in (4.1) we obtainthe
desiredresultD(X) = foJ(Yt)dt/2t.If the integralis infinite, thenby Fatou's
lemmaD is also infinite. This completestheproofofLemma1. C
ofBrown(1982).Firstwe state
To proveLemma2, we extendthearguments
his contribution.
BROWN'S RESULT. Fix Tr> 0 and set Yk = S2k + ZT. Let Pk = gk/gk be the
score functionfor Yk. Note that Yk,1 = (Yk + Yk,)/v' where Yk is an indepen-
dentcopyof Y,. The scoreforYk-l is theconditional expectation: Pk?I(yk? I) =
E[(Pk(Yk) + Pk(Yk'))/ v IYk? 1]. Hencethescoreofthesum Pk?1( Yk, 1) provides
thebestestimateofthesumofthescoresin thesenseofminimum mean-squared
error. The corresponding pythagoreanrelation yields I(Yk) - I(Ykl) =
E[(pk(Yk) ? Pk(Yk))/ Vi - Pk?l(Yk?)]2. Thisdifference
sequence must converge
to zero, since the sequence I(Yk) is decreasingand bounded. Therefore
E[(pk(Z7/2) ? pk(Z~/2))/v'P pk1l((Z7/2 ? Z/72)/ v'F)]2 also converges
- to zero,
sincethedensityforYk S2k + ZT is boundedbelowby a multiple
= of thenormal
(0, T/2) density.But foran arbitrary v withfiniteEv2(Z),
function the mean-
squared errorof estimationof v(Z) + v(Z') using any functionof the sum
Z + Z' is shownto exceeda multipleof the mean-squared errorusing the best
linearestimate.Consequently, Brownestablished that
(4.2) lim E(Pk(ZT/2) - PO(Z/2 ))2 = 0
k-oo
and furthermore, sincethescoreis thederivative ofthelogarithm ofthedensity,
the sequenceofdensitiesgk forYk = S2k + ZT satisfies
(4.3) gk(Y) -* k(Y)
uniformly on compactsubsets.Here 0 is the normal(0,a2 + r) density.Brown
used (4.3) and let T -* O to proveconvergence We use (4.2) and
in distribution.
(4.3) plus a uniform argument
integrability to identify the limitof the Fisher
informations.
ENTROPY AND THE CENTRAL LIMIT THEOREM 341
PROOF OF LEMMA2. We show that lim I(S,,) = 1/a2. Let gk = gk, be the
density for Yk = S2k + Za. Note that the Fisher informationsatisfies I(Yk) =
Ep2(Yk) = Jp2,g*/' do where I is thenomal (0, 2 + T) distribution.From (4.2)
the score Pk convergesto p4,in normal(0, T/2) probabilityand hencein (F
probability. lemmaJf
From(4.3) and Scheffe's ik - PI-- 0 and hencegk/4.-+ 1
in (F probability.The productof sequencesconvergent in probabilityis also
convergent Thus p2kgkl/converges
in probability. to pa in (D probability.
Conse-
quently,
REFERENCES
BARRON,A. R. (1984). Monotonic central limit theoremfor densities.Technical Report 50, Dept.
Statistics,StanfordUniv.
BILLINGSLEY,P. (1979). Probabilityand Measure. Wiley,New York.
BLACHMAN,N. M. (1965). The convolutioninequality for entropypowers. IEEE Trans. Inform.
Theory IT-11 267-271.
BROWN,L. D. (1982). A proofof the centrallimittheoremmotivatedby the Cramrr-Rao inequality.
In Statistics and Probability:Essays in Honor of C. R. Rao (G. Kallianpur, P. R.
Krishnaiah, and J. K. Ghosh,eds.) North-Holland,Amsterdam.
CHERNOFF,H. (1956). Large sample theory-parametric case. Ann. Math. Statist. 27 1-22.
CSISZAR,I. (1967). Information-type measuresof differenceof probabilitydistributionsand indirect
observations.Studia Sci. Math. Hungar. 2 299-318.
CSISZAR, I. (1975). I-divergencegeometryof probabilitydistributionsand minimizationproblems.
Ann. Probab. 3 146-158.
GALLAGER,R. G. (1968). InformationTheoryand Reliable Communications.Wiley,New York.
KOLMOGOROV,A. N. and GNEDENKO,B. V. (1954). Limit Distributionsfor Sums of Independent
Random Variables. Translated by K. L. Chung,Addison-Wesley,Reading, Mass.
KULLBACK,S. (1967). A lowerbound fordiscriminationin termsof variation. IEEE Trans. Inform.
Theory IT-13 126-127.
LINNIK, Yu. V. (1959). An information-theoretic proof of the central limit theorem with the
Lindebergcondition.TheoryProbab. Appl. 4 288-299.
PINSKER, M. S. (1964). Informationand InformationStabilityof Random Variables. Translated by
A. Feinstein, Holden-Day, San Francisco.
PROHOROV,Yu. V. (1952). On a local limit theoremfor densities. Dokl. Akad. Nauk. SSSR 83
797-800.
RENYI, A. (1970). ProbabilityTheory.North-Holland,Amsterdam.
SHANNON,C. (1948). A mathematicaltheoryof communication.Bell SystemTech. J. 27 379-423,
623-656.
STAM, A. J. (1959). Some inequalities satisfied by the quantities of informationof Fisher and
Shannon. Inform.and Control.2 101-112.
DEPARTMENT OF STATISTICS
THE UNIVERSITY OF ILLINOIS
URBANA, ILLINOIS 61801