# Nonparametric entropy estimation: an overview

J. Beirlant

Dep. of Mathematics Kath. Univ. Leuven Celestijnenlaan 200B B-3001 Heverlee Belgium

∗

E. J. Dudewicz

Dep. of Mathematics Syracuse Univ. Syracuse NY 13244-1150 USA

L. Gy¨rﬁ o

Dep. of Mathematics Tech. Univ. of Budapest Stoczek u.2 H-1521 Budapest Hungary

**E. C. van der Meulen
**

Dep. of Mathematics Kath. Univ. Leuven Celestijnenlaan 200B B-3001 Heverlee Belgium

July 17, 2001

Dedicated to Professor L. L. Campbell, in celebration of his longstanding career in mathematics and statistics and in tribute to his many scholarly contributions to information theory.

Abstract An overview is given of the several methods in use for the nonparametric estimation of the diﬀerential entropy of a continuous random variable. The properties of various methods are compared. Several applications are given such as tests for goodness-of-ﬁt, parameter estimation, quantization theory and spectral estimation.

I

Introduction

Let X be a random vector taking values in Rd with probability density function (pdf) f (x), then its diﬀerential entropy is deﬁned by H(f ) = − f (x) ln f (x)dx. (1)

We assume that H(f ) is well-deﬁned and is ﬁnite. The concept of diﬀerential entropy was introduced in Shannon’s original paper ([55]). Since then, entropy has been of great theoretical and applied interest. The basic properties

This research was supported by the Scientiﬁc Exchange Program between the Belgian Academy of Sciences and the Hungarian Academy of Sciences in the ﬁeld of Mathematical Information Theory, and NATO Research Grant No. CRG 931030.

∗

1

(2)
n→∞
Mean square consistency:
n→∞
lim E{(Hn − H(f ))2 } = 0. σ 2 )
(5)
in distribution.s. or L2 rate of convergence:
n→∞
lim nE{(Hn − H(f ))2 } = σ 2 .
(3)
Strong consistency:
n→∞
lim Hn = H(f ) a.
(4)
Root-n consistency results are either of form of asymptotic normality. 1].d. For a general characterization theorem see [38].of diﬀerential entropy are described in Chapter 9 of Cover and Thomas [10]. 1] then the diﬀerential entropy is maximal iﬀ f is uniform on [0. Many distributions in statistics can be characterized as having maximum entropy. . i.
(6)
Next we list the usual conditions on the underlying density. Verdugo Lazo and Rathie [64] provide a useful list containing the explicit expression of H(f ) for many common univariate pdf’s.
n→∞
lim n1/2 (Hn − H(f )) = N (0. Xn Hn is an estimate of H(f ) then we consider the following types of consistencies: Weak consistency: lim Hn = H(f ) in probability.e. . Smoothness conditions: 2
.i. and then H(f ) = 0.
II
II. (II) If the density is concentrated on the positive half line and has a ﬁxed expectation then the diﬀerential entropy takes its maximum for the exponential distribution. Ahmed and Gokhale [1] calculated H(f ) for various multivariate pdf’s.1
Estimates
Criteria and conditions
If for the i. sample X1 . . (III) If the density has ﬁxed variance then the diﬀerential entropy is maximized by the Gaussian density. The diﬀerential entropy has some important extremal properties: (I) If the density f is concentrated on the unit interval [0..

Xn . which is of the form Hn = −
An
fn (x) ln fn (x)dx. . The integral estimator can however be easily calculated if. where An = [−bn .S1: f is continuous. Mokkadem [44] calculated the expected Lr error of this estimate. Tail conditions: T1: H([X]) < ∞. The ﬁrst such estimator was introduced by Dmitriev and Tarasenko [17].
(7)
where. bn ] and fn is the kernel density estimator.) P2: f is bounded. where fn is a kernel density estimate. then the strong consistency has been proved under the only condition T1. They showed the strong consistency of Hn deﬁned by (7). where [X] is the integer part of X. (i) We ﬁrst consider the integral estimate of entropy. . but he points out that the calculation of (7) when fn is a kernel estimator gets more diﬃcult for d ≥ 2. for example. Joe [37] considered the estimation of H(f ) 3
. who proposed to estimate H(f ) by (7) for d = 1. (This is also a mild tail condition. [7] extended this approach to the estimation of H(f ) when the observations are censored. If {x ∈ An } = {fn (x) ≥ an } o with 0 < an → 0. See also Prakasa Rao [49]. Carbonez. They showed the mean square consistency of (8). Joe [37] considers estimating H(f ) by (7) when f is a multivariate pdf. fn is a histogram.2
Plug-in estimates
The plug-in estimates of entropy are based on a consistent density estimate fn of f such that fn depends on X1 . (ii) The resubstitution estimate is of the form Hn = − 1 n ln fn (Xi ). Peak conditions: P1: f (ln f )2 < ∞. . The evaluation of the integral in (7) however requires numerical integration and is not easy if fn is a kernel density estimator. T2: inf f (x)>0 f (x) > 0. S2: f is k times diﬀerentiable. n i=1 (8)
Ahmad and Lin [2] proposed estimating H(f ) by (8). with the set An one typically excludes the small or tail values of fn .
II. This approach is taken by Gy¨rﬁ and van der Meulen [29]. et al. He therefore excludes the integral estimate from further study.

If fn. Based on X1 . (iv) The ﬁnal plug-in estimate is based on a cross-validation estimate or leave-one-out density estimate. (9)
They point out that the histogram-based estimator can only be root-n consistent when d = 1 or 2. Their results are valid for a wide class of densities f having unbounded support. m i=1 i l (10)
Gy¨rﬁ and van der Meulen used this approach in [29] for fl being the histogram density o estimate. since when d = 2 any root-n consistent histogram-based estimator of entropy will have signiﬁcant bias. Here one decomposes the ∗ ∗ sample into two sub-samples: X1 . . also based on a kernel density estimate. He obtained asymptotic bias and variance terms. and showed that non-unimodal kernels satisfying certain conditions can reduce the mean square error. . His analysis and simulations suggest that the sample size needed for good estimates increases rapidly with the dimension d of the multivariate density. His results rely heavily on conditions T2 and P2. . They also suggest an application to projection pursuit. n = l + m. estimates H(f ) by Hn = − 1 m I[X ∗ ∈A ] ln fl (Xi∗ ). then the corresponding density estimate is of the form Hn = − 1 n I[X ∈A ] ln fn. using this density estimate and the second sample. using a penality term. and then. 0 < al → 0. Xl one constructs a density estimate fl . and argue that root-n consistency of entropy estimation requires appropriate assumptions about each of these three features.i denotes a density estimate based on X1 . distribution smoothness and dimensionality on convergence properties. They suggest an empirical rule for the binwidth. and conclude it is particularly attractive for d = 1. . . Xm . Xl and X1 . . Under some mild tail and smoothness conditions on f the strong consistency is shown for general dimension d. . .i (Xi ).for multivariate pdf’s by an entropy estimate of the resubstitution type (8). . Hall and Morton [35] investigated the properties of an estimator of the type (8) both when fn is a histogram density estimator and when it is a kernel estimator. Xn leaving Xi out. n i=1 i n 4 (11)
. For histogram they show (5) under certain tail and smoothness conditions with σ 2 = V ar(ln f (X)). They study the eﬀects of tail behaviour. . (iii) The next plug-in estimate is the splitting data estimate. . . in [30] for fl being the kernel density estimate. and in [31] for fl being any L1 consistent density estimate such that [Xi∗ ∈ Al ] = [fl (Xi∗ ) ≥ al ].

. . Hall [32] proved the weak consistency of (13) for densities satisfying T2 and P2.i )) − ψ(m) + ln m. (14)
5
. Xn.(i−1)m .i is called a spacing of order m. For uniform f the consistency of (13) has been proved by Tarasenko [57] and Beirlant and van Zuijlen [4]. properties of Hn were studied in the context of Kullback-Leibler loss in [34]. who all proved (5) under T2 and P2 with σ 2 = (2m2 − 2m + 1)ψ (m) − 2m + 1 + V ar(ln f (X)). Under the conditions T2 and P2 the asymptotic normality of Hm. . They showed strong consistency.1 ≤ Xn. too. Dudewicz and van der Meulen [19].n be the corresponding order statistics. This implies that in (13) there is an additional term correcting the asymptotic bias.Ivanov and Rozhkova [36] proposed such entropy estimate when fn. mn → ∞. Based on spacings it is possible to construct a density estimate: fn (x) = 1 m n Xn.
Since sample-spacings are deﬁned only for d = 1.3
Estimates of entropy based on sample-spacings. (i) We consider ﬁrst the m-spacing estimate for ﬁxed m: Hm.n = n 1 n−m ln( (Xn.im − Xn. mn /n → 0. The authors point out that the case d = 2 is of practical interest in projection pursuit. . Hall and Morton [35] also studied entropy estimates of the type (11) based on kernel estimator. real valued random variables.d. Then Xn. For d = 1.i+m − Xn.
II. .(i−1)m
if x ∈ [Xn.n in form of (5) was studied by Cressie [11]. we assume that X1 . However. surprisingly one can get a consistent spacings based entropy estimate from a non-consistent spacings density estimate. or m-spacing (1 ≤ i < i+m ≤ n). . Under some conditions the analysis in [35] yields a root-n consistent estimate of the entropy when 1 ≤ d ≤ 3.i is a kernel estimate. ≤ Xn. Let Xn. Hall [32] and Beirlant [3].i+m −Xn.2 ≤ . n i=1 m (13)
where ψ(x) = −(ln Γ(x)) is the digamma function. This density estimate is consistent if. (12)
The estimate of entropy based on sample-spacings can be derived as a plug-in integral estimate using a spacing density estimate.im ). Xn is a sample of i. r ≥ 1. Then the corresponding density estimate is not consistent. and also made an assertion regarding the rate of convergence of the moments E|Hn − H(f )|r .i. as n → ∞.

Dudewicz and van der Meulen [19].4
Estimates of entropy based on nearest neighbor distances
The nearest neighbor estimate is deﬁned for general d. (ii) In order to decrease the asymptotic variance consider next the mn -spacing estimate with mn → ∞: 1 n−mn n Hn = ln( (Xn.i be the nearest neighbor distance of Xi and the other Xj : ρn. Levit [40]). Then the nearest neighbor estimate is 1 n Hn = ln(nρn. 1/3).For m = 1 (14) yields σ2 = π2 − 1 + V ar(ln f (X)). (17) n→∞ ˆ for slight modiﬁcations Hn of the mn -spacing estimate Hn . If f is uniform on [0. Under general conditions on f they proved (5). Beirlant and van Zuijlen [4].i )). Bickel and Breiman [6] considered estimating a general functional of a density. 1] then Dudewicz and van der Meulen [19] and van Es [62] showed. where f is the empiric pdf. Hall [33] showed this result also for the non-consistent choice mn /n → ρ if ρ is irrational.
II. Unfortunately their study excludes the entropy.i ) + ln 2 + CE . derived from a smoothed version of the empirical distribution function. 6 (15)
ˆ ˆ Dudewicz and van der Meulen [21] proposed to estimate H(f ) by H(f ).j≤n Xi − Xj . Hall [33] and Van Es [62] proved (5) with (9) if f is not uniform but satisﬁes T2 and P2. In these papers the weak and strong consistencies are proved under (12).i = minj=i. (16) n i=1 mn This case is considered in the papers of Vasicek [63]. and for mn = o(n1/3 ) that ˆ lim (mn)1/2 (Hn − H(f )) = N (0. Consistencies for densities with unbounded support is proved only in [57] and [4]. Hall [33] and van Es [62]. Tsybakov and van der Meulen [61] showed root-n rate of convergence for a truncated version of Hn when d = 1 for a class of densities with unbounded support and exponentially decreasing tails. Beirlant [3]. This asymptotic variance is the smallest one for an entropy estimator if f is not uniform (cf. This led to the introduction of the notion of empiric entropy of order m. δ > 0. Let ρn. respectively for mn = o(n1/3−δ ). (18) n i=1 where CE is the Euler constant: CE = − 0∞ e−t ln tdt.
6
. such as the Gaussian density. Under some mild conditions like P1 Kozachenko and Leonenko [39] proved the mean square consistency for general d.i+mn − Xn.

0 ≤ x ≤ 1. where S denotes the sample standard deviation.3 for more details) and under the assumption that f is a bounded positive step function on [0. he proposed an entropy based test for the ˆ composite hypothesis of normality.
(19)
with Xn. in terms of Pitman asymptotic relative eﬃciency. which rejects the null hypothesis if eHn /S < C = e−1. and has a ﬁnite number of discontinuities.1
Applications
Entropy-based tests for goodness-of-ﬁt
Moran [45] was the ﬁrst to use a test statistic based on 1-spacings for testing the goodnessof-ﬁt hypothesis of uniformity.i+mn − Xn. to hold if f is concentrated on [0. 1]. Cressie continued his investigations in [12]. Vasicek [63] introduced his test statistic n n ˆn = 1 H ln( (Xn. and considered the testing of uniformity.1 for j < 1. one gets π2 lim n−1/2 (Mn − nCE ) = N (0. Making use of the fact that for ﬁxed variance.i−mn )). Cressie [11] compared his test procedure with Greenwood’s ([27]).j = Xn.n+1 = 1. as a generalization. ¿From a diﬀerent point of view. His test statistic is very close to the 2mn -spacing entropy estimate (16). entropy is maximized by the normal distribution. and stated this fact.i )).i+1 − Xn. II. and independently. 1]. and thus to Cressie’s statistic (21). satisﬁes conditions T2 and P2. Vasicek [63] provided critical values of 7
. [13]. From this it follows that the test based on rejecting the hypothesis of uniformity for large negative values of Ln will be consistent. (20) n→∞ 6 (This limit law is (5) with (15) for f uniform.42 .n for j > n and Xn. Moran’s statistic is deﬁned by
n
Mn = −
i=0
ln((n + 1)(Xn. for which the test statistic is based on the sum of squares of 1-spacings. (22) n i=1 2mn where Xn.0 = 0 and Xn.i )
(21)
under the hypothesis of uniformity (see Sec. Vasicek [63] proved ˆ that Hn is a weak consistent estimate of H(f ).j = Xn.i+mn − Xn.) Cressie [11] generalized the notion of 1-spacings to mn -spacings.III
III. for a sequence of neighboring alternatives which converge to the hypotheses of uniformity as n → ∞. − 1). Darling [16] showed that under H0 : F (x) = x. He showed the asymptotic normality of his test statistic
n−mn +1
Ln =
i=1
ln(Xn.

Their simulation studies show that the entropy-based test of uniformity performs particularly well for alternatives which are peaked up near 0. Gokhale [26] formalized the entropy-based test construction for a broad class of distributions. Dudewicz and van der Meulen [19] give Monte ∗ Carlo estimates of Hα. however.n for speciﬁc α. Their studies conclude that this entropy-based of exponentiality has reasonably good power properties. that compared with other tests for normality.his test statistic for various values of m and n based on Monte Carlo simulation and also simulated its power against various alternatives. Since this convergence to normality is very slow Mudholkar and Smethurst [47] developed transformation-based methods to accelerate this convergence.m. raised some questions regarding the protection it provides against heavytailed alternatives.3 the asymptotic distribution of mn -spacing entropy estimate has been shown to be normal under conditions T2 and P2 (including uniformity. m and n and carry out a Monte Carlo study of the power of their entropy-based test under seven alternatives. the exponential distribution with speciﬁed mean has maximum entropy. Mudholkar and Smethurst [8] proposed a method for avoiding problems in approximating the null distributions of entropy-based test statistics by constructing a jackknife statistic which is asymptotically normal and may be approximated in moderate sized samples by a scaled t-distribution. Chaubey. He found. As stated in Sec II.m. 1]. Dudewicz and van der Meulen [19] extended Vasicek’s reasoning and proposed an entropy-based test ∗ ∗ ˆ for uniformity which rejects the null hypothesis if Hn ≤ Hα. using the fact that among all distributions with given mean and support (0. for various values of m and n. [23]. Mudholkar and Lin [46] applied the Vasicek logic to develop an entropy-based test for exponentiality. ∞). He shows consistency and focuses on testing serial independence for time series. Using the fact that the uniform distribution maximizes the entropy on [0. which is closely related to diﬀerential entropy.5. 8
. Following the outline in [19]. His estimate is of the integral (and resubstitution) type.m. where Hα.n . Their simulation studies were continued in [22]. excluding the gaussian and the exponential distribution). Their test ˆ ¯ for exponentiality rejects if eHn /X < e. Robinson [53] proposed a test for independence based on an estimate of the KullbackLeibler divergence.n is set so that the test has level α for given m and n. They showed consistency of their test and evaluated the test procedure empirically by a Monte Carlo study. as compared with other tests of uniformity. the entropy-based test performs well against various alternatives. Mudholkar and Lin [46] extended Vasicek’s power simulations for testing normality and concluded that Vasicek’s test provides strong protection against light-tailed alternatives. Prescott ([50]).

Thus it is reasonable to maximize an estimate of the diﬀerential entropy. Assume that the distribution function of X is F (x. and assume that the discrete random variable [X] (the integer part of X) has ﬁnite entropy. Gish and Pierce [25] showed that for 9
. (23)
n→∞
where log stands for the logarithm of base 2. Consider a parametric family of distribution function of real variable {F (x. and its distribution is uniform on [0. where θ0 is the true parameter. It turns out that for large rates. With these very general conditions R´nyi [52] proved that the entropy of the sequence [nX] e behaves as lim (H([nX]) − log n) = H(f ) = − f (x) log f (x)dx. for example. The transformed variable X ∗ = F (X. θ)}. but.2
Entropy-based parameter estimation
Cheng and Amin [9]. the 1-spacings estimate is used then the estimate of the parameter is
n−1
θn = argmaxθ
i=1
ln(F (Xn.Parzen [48] considers entropy-based statistics such as (19) and (22) to test the goodnessof-ﬁt of a parametric model {F (x. θ) − F (Xn. it is usually approximated by the (Shannon) entropy of the source coder output. for the sake of simplicity.
This estimator is referred to as maximum spacing estimator.i .e. 1] iﬀ θ = θ0 . 1]-valued. Here the actual coding rate is the average codelength of a binary preﬁx code for the source codewords. θ0 ). where the maximum likelihood estimate is not consistent. If. its diﬀerential entropy is maximal iﬀ θ = θ0 . i. This shows that for large n the entropy of the uniform quantizer with step-size 1/n is well approximated by H(f ) + log n.3
Diﬀerential entropy and quantization
The notion of diﬀerential entropy is intimately connected with variable-length lossy source coding. Using their own (slightly less general) version of the above statement.
III.i+1 . θ)). the entropy of the encoder output can be expressed through the diﬀerential entropy of the source. θ)}. Let X be a real random variable with density f and ﬁnite diﬀerential entropy H(f ). Ranneby [51] and Shao and Hahn [56] proposed a consistent parameter estimate.
III. θ) is [0.

Again. . . the following asymptotics holds:
∆→0
lim H(Q∆ ) +
1 log(12D(Q∆ )) = H(f ).the squared distortion D(Q∆ ). . For k-dimensional vector quantizers and large rates R. . van der o Meulen [28]). the above lower bound also holds for all D > 0 when (1/k)H(fk ) is replaced by H and Rk (D) is by R(D) = limk→∞ Rk (D). Xk ) e be an Rk valued random vector with diﬀerential entropy H(f ). he derived the formula D(QR ) ≈ ck (P )2−(2/k)[R−H(f )] . and ck (P ) is the normalized second moment of P . we have
δ(A)→0
lim (HA (X) + log (A)) = H(f ). when the diﬀerential entropy rate H = limk→∞ (1/k)H(fk ) exists. . Linder and Zeger [42] gave a rigorous proof of (25) for sources with a ﬁnite diﬀerential entropy. the rate distortion function of the source. Diﬀerential entropy is also the only factor through which the Shannon lower bound to the rate-distortion function depends on the source density. Csisz´r [14]. 10
. . k 2 where Rk (D) is the kth order rate-distortion function of the source having density fk . and for the entropy H(Q∆ ). For squared distortion the Shannon lower bound is 1 1 Rk (D) ≥ H(fk ) − log(2πeD). (25)
where QR is a vector quantizer of entropy R having quantization cells congruent to a polytope P . Using this result. [Xk ]) has ﬁnite entropy. 2
(24)
Here f is assumed to satisfy some smoothness and tail conditions. A heuristic generalization of (24) was given by Gersho [24]. For stationary sources. Both bounds are asymptotically (when D → 0 ) tight. and assume that [X] = ([X1 ]. of the ∆ step-size uniform scalar quantizer. . as was shown by Linkov [43] and by Linder and Zamir [41]. (24) can be proved without any additional smoothness and tail conditions (Gy¨rﬁ. Then considering partitions A of Rk into Borel measurable sets of equal Lebesgue measure (A) and maximum diameter δ(A). Linder. [15] investigated the entropy of countable partitions in an arbitrary measure a space. The rate and distortion tradeoﬀ depends on the source density only through the diﬀerential entropy. He obtained the following strengthening of R´nyi’s result (23): Let X = (X1 . which makes them extremely useful for relating asymptotic quantizer performances to rate-distortion limits. . the diﬀerential entropy provides the rule of thumb D(Q∆ ) ≈ (1/12)2−2[H(Q∆ )−H(f )] for small ∆.
where HA (X) is the entropy of the partition A with respect to the probability measure induced by X.

A nonparametric estimation of the entropy for absolutely continuous distributions. V. 1986. A. 1996. 688-692. E. Levendovszky.III. 1991. 35. Information Theory. A. P. limit theorems and a goodness of ﬁt test. Theil and Laitinen [60] applied the method of maximum entropy estimation of pdf’s in econometrics. Gy¨rﬁ. ranking 13 random number generators on basis of their entropy estimate. E. and Ghokale. For an overview of the available maximum entropy methods in spectroscopy see [18]. o Problems of Control and Inforrmation Theory. 27-57. apart from a constant. and Zuijlen.4
Applications towards econometrics. 441-451. M. In [20] a method is developed to rank and compare random number generators based on their entropy-values. J. Columbus. I. A. [2] Ahmad. [6] Bickel. L. P. Paris XXXI fasc 1. and Breiman L.J.C. Anal. Rodrigez and Van Ryzin [54] studied the large sample properties of a class of histogram estimators whose derivation is based on the maximum entropy principle. Information Theory.
11
. L. Inst. D. Stat. IEEE Trans. spectroscopy and statistical simulations
Theil [58] evaluated the entropy (1) of a density estimate which is ﬁtted to the data by a maximum entropy principle. Sums of functions of nearrest neighbor distances. Dudewicz. J. and Lin. Vol. Ed. [7] Carbonez. Nonparametric entropy estimation based on randomly censored data. 16. Univ. A detailed numerical study. 185-214.C. 22. J. Their work is at the beginning of quite a few papers on this topic in this ﬁeld. Entropy expressions and their estimators for multivariate distribution. Entropy principles play also a key role in spectroscopy and image analysis. E. Pub. II. Here the entropy of the random number generator is estimated using (22). Although he did not so note the entropy of the ﬁtted pdf can be regarded as an estimate of the entropy H(f ) of the underlying pdf f . E. Theil’s estimate turns out to be equal to Vasicek’s estimate (22) for m = 1. Limit theory for spacing statistics from general univariate distributions. Aoshima and K. [3] Beirlant. 1976. Multivar.
References
[1] Ahmed. 1983. [5] Bernhofen. M. [4] Beirlant. C. 300-317. 1985... H Hayakawa. N. in press. In practice entropy is not known.. The empirical distribution function and strong laws for functions of order statistics of uniform spacings. A. T. For an overview of this maximum entropy approach for the estimation of pdf’s and the application of it to problems in econometrics see Theil and Fiebig [59].. Ranking of the best random number generators via entropy-uiformity theory. American Sciences Press Inc. In MSI-2000: Multivariate Statistical Analysis in Honor of Professor Minoru Siotani on his 70th Birthday.. and van der Meulen. 20. Shimizu. but must be estimated from data. Ohio. 1989. Annals of Statist. 11. and van der Meulen. 372-375. J. IEEE Trans. J. moment bounds. is carried out in [5].

On the estimation functions of the probability density and its derivatives. [28] Gy¨rﬁ. and Smethurst. Yu. 1983. IEEE Trans. L.. I.C. E. Ann. L. Wiley. Power results for tests based on higher order gaps. F. 1995. H. The Netherlands. Random Processes (Academia. 1976.. M. and Tarasenko. Gersho. Linder. [16] Darling. Entropy-based tests of uniformity. S. Math. N. 18. Mudholkar. Asymptotically eﬃcient quantizing. 214-218.C. 15. Noordwijkerhout. Soc. Mommaerts. September 1968. Inform. J. 1946. S. J. Pierce. Prague). Columbus. Gish and J. 676–683. G. Theory Probab. [22] Dudewicz. N. K.J. 425-436. pp. o Data Anal. and van der Meulen. vol. ¨ u A. B. [19] Dudewicz. Sinha. [9] Cheng. A.J. Generalized entropy and quantization problems. D. Wertz.1991. 1. and Amin.
12
. in Probability and Statistics. N. Narosa Publishing House. Biometrika. Comput. E. V. [29] Gy¨rﬁ. 628-633. pp. Physica-Verlag.. R.1).. Entropy-based statistical inference. E. Density-free convergence properties of various estimators of entropy. In New Perspectives in Theoretical and Applied Statistics Eds. Wiley. Asymptotically optimal block quantization. and van der Meulen.. Inform. [25] H. and van der Meulen. 4. N. Ed. Statist. 1973. Theory.J. Austral. Elements of Information Theory. Royal Statist. [12] Cressie. [24] A. 157-165. Transactions of the Sixth Praque Conference on Information a Theory.. Oztr¨ k and E. 1977. 1991. On entropy-based goodness-of-ﬁt test. J. 1983. A. 45. 85-110.P. [18] Dudewicz. Basu and B. eds. On a class of problems relating to the random division of an interval. 1969. 76. I.C.J. and van der Meulen. Amer. N.C. 1973. E. October 25-26. with applications to random number generator.[8] Chaubey. 24. Eds. P. [14] Csisz´r. E. [17] Dmitriev. 1987. Statist. a [15] Csisz´r. van der Meulen. R. Hungar. The empiric entropy. 373–380. Biometrika. 1987. E. Soc. 1. On generalized entropy. E. Computational Statistics and Data Analysis. 1953. and Thomas J. 109. and van der Meulen. W. A. Ohio. 1978. A. New Delhi. L. In The Frontiers of Statistical Scirntiﬁc Theeory and Industrial Applications. Math. 394-403. and van der Meulen. On the asymptotic optimality of quantizers. E. J. Y. 401-419. On the logarithms of higher order spacings. E. Mathematical and Management Sciences. 239-253. 343-355. [23] Dudewicz. July 1979..J. II: Selection-of-the-best/complete ranking for continuous distributions on (0. E.C. Stat. IEEE Trans. T. Assoc. E. Maximum entropy methods in modern spectroscopy: a review and an empiric eentropy approach. [10] Cover. C. E. 1987. 63. Estimating parameters in continuous univariate distributions with a shifted origin. 1983. T. Statist. K.C. 19. Statistics and Decisions. 1990.. W. M. C. Puri. 131-145. On the power of entropy-based tests against bump-type alternatives. E. 5. In Contributions to Stochastics. and van der Meulen. IT-14. On entropy-based goodness-of-ﬁt tests: a practical strategy. American Sciences Press. A. [27] Greeenwood M.G. J. K. Statist. a new approach to nonparametric entropy estimation.J. 115-153. The statistical study of infectious diseases. 967-974. 29-35.. vol.C. IT-25. P. [13] Cressie. Studia Sci. The minimum of higher order gaps. 115-160. Vilaplana and W. [26] Gokhale. and van der Meulen. 207-227.C. E. Entropy-based random number evaluation. [11] Cressie. Theory. Inc. [20] Dudewicz. American J. pp. Statistical Decision Functions. 84-88. 65. 1981. Proceedings of the Eleventh o Symposium on Information Theory in the Benelux. 132-143. 1993. D. Appl. 116-120. [21] Dudewicz. Sendler.

A.C. Properties of the statistical estimate of the entropy of a random vector with a probability density.. Statist. Statist. S. On the estimation of entropy and other fuunctionals of a multivariate density. 395-414. 1976. 96. Cs´ki. Proc. Inform. Nov. March 1994. M. C. R. Sample estimate of entropy of a random vector. Linnik. B. 1986. 9. 1951. 2026–2031.C. Ann. Linder and K. J. Berkes. Evaluation of epsilon entropy of random variables for small epsilon. 19. Math. On two applications of characterization theorems to goodness-of-ﬁt. P. [40] Levit B. 32. 1491-1519. Colloquia Mathematica Soc. On powerful distributional tests based on sample spacings. I. Problems of Information Transmission. V. Soc. [32] Hall. C. Part I. and Rao. Inst. B. IEEE Trans. N. V. P. 1989. [39] Kozachenko. Roy. Theory. 40. J. Math. Inform. Linkov. P. A. 1987. Academic Press. 1989. 517-532. 92-98. Phil. On the estimation of the entropy. 1990. [42] T. North Holland. [37] Joe H. [51] Ranneby. 1984. 1994. J´nos Bolyai. Combinatorics. Limit theorems for sums of general functions of m-spacings.. N. 65-72. Ann. 14. 11. R´v´sz. 254-256. 93-112. 69-88. 1984. 1991. Ser. On a test for normality based on sample entropy. Multivariate Statist. [33] Hall. pp. Zeger. [50] Prescott. [44] Mokkadem. 171-178. Information Theory. a e e [31] Gy¨rﬁ. Statist. and Lin. [34] Hall. Statist. Statist. 45. 1. [46] Mudholkar. 1987. 1981. Zamir. Soc. An estimation method related to the maximum likelyhood method. [38] Kagan. G. Goodness of ﬁt tests and entropy. L. Linder and R. IEEE Trans. 40. E. and Smethurst. 575–579. vol. S. On Kullback-Leibler loss and density estimation. Camb. 1973. Austral. E. J. 12–18. B. 1965. [41] T. R. Theory. A. 81-95. N. S. P. 229-240. 1990. 129-136. [35] Hall. Eds. B.
13
. in Limit Theorems in o Probability and Statistics. G. P. [48] Parzen. Estimation of the entropy and information for absolutely continuous random variables. E. Empirically accelerated convergeence of the spacings statistics to normality. [36] Ivanov A.[30] Gy¨rﬁ. Kluwer Academic Publisher. in Nonparametric Functional o Estimation and Related Topics Ed. pp. Soc. and van der Meulen. 38. 95-101. The maximum spacing method. 1989. On the asymptotic tightness of the Shannon lower bound. Annals of Statistics. vol. J. P. 201-225. 1984. Ya. Roussas. [49] Prakasa Rao. The random division of an interval. E. and van der Meulen. 35. Information and System Sciences. C. Asymptotic entropy constrained performance of tessellating and universal randomized lattice quantization. vol. 17. Nonparametric Functional Estimation. L. On nonparametric estimation of entropy functionals. Problems of Information Transmission. 41. pp. G. Asymptotically optimal estimation of nonlinear functionals Problems of Information Transmission. 1983. Characterization Problems in Mathematical Statistics. and Leonenko. IEEE Trans. 16. [45] Moran. T. L. 193-196. A. 1978. Wiley. L. J. J. Statist. 683-697. P. Problems of Information Transmission. and Rozhkova . and Morton S. Y. Inst. P.. P. Math. a [47] Mudholkar. 15. 129-138. Scand. F. 23. An entropy estimate based on a kernel density estimation. 1993. [43] Y.

and Laitinen. G. 24. IEEE 56. [54] Rodrigez. H. C. 5. Acad. Ed. and Rathie. Scand.. 121-132. and van der Meulen. P. 1976. R. Inform. and Van Ryzin. J. [56] Shao. M. R´nyi. North-Holland. P. Letters. G. 27. Krishnaiah. [59] Theil. 23. P. [57] Tarasenko. Hungar. 32. the direct estimation of the entropy from independent observations of a continuous random variable. 1980. Statistics and Probability Letters. J.
14
. A mathematical theory of communication. The entropy of the maximum entropy distribution. [55] Shannon. and Hahn. [58] Theil. 193–215. 1978. 1994. Roy. J. 379-423. Proc. 1995. Theory. [63] Vasicek O. 1980. Consistent nonparametric entropy-based testing. Singular moment matrices in applied econometrics. 1992. and Fiebig. J.. pp. 120-122. [53] Robinson. Scand. Information Theory. A. E. M. On the entropy of continuous probability distributions. 1968. 751-759. A. 1984.[52] A. K. G. Statist. and the distribution-free entropy test of goodness-of-ﬁt. 58. Root-n consistent estimators of entropy for densities with unbounded support. V. [61] Tsybakov. 19. C. Ser. Bell System Tech. C. Review of Economic Studies. H. 1991. J.P. 24. [62] Van Es B. F. in Multivariate Analysis. Harper and Row. 10. H. Statist. Econom. Acta Math. B 38. E. IEEE Trans. A test for normality based on sample entropy. On the evaluation of an unknown probability density function. Exploiting Continuity: Maximum Entropy Estimation of Continuous Distributions. B. Large sample properties of maximum entropy histograms. e 1959. 629-629. Soc. Statist. [64] Verdugo Lazo. C. vol. Limit theorems for the logarithm of sample spacings. IEEE Trans. D. Y. 61-72. N. C. 54-59. 437-453. [60] Theil. Estimating functionals related to a density by a class of statistics based on spacings. 2052-2053. Sci. 1986. 75-83. 145-148.. On the dimension and entropy of probability distributions.