Professional Documents
Culture Documents
Generalized Fisher-Darmois-Koopman-Pitman
Theorem and Rao-Blackwell Type Estimators
for Power-Law Distributions
Atin Gayen and M. Ashok Kumar , Member, IEEE
Abstract— This paper generalizes the notion of sufficiency [19, Sec. 7]. He designated a statistic T to be sufficient for
for estimation problems beyond maximum likelihood. In par- estimating θ if the conditional distribution of the sample, given
ticular, we consider estimation problems based on Jones et al. T , is independent of θ. Formally, if T := T (X1n ) is a statistic
and Basu et al. likelihood functions that are popular among
distance-based robust inference methods. We first characterize (T could very well be a vector-valued function), then T is
the probability distributions that always have a fixed number sufficient (for estimating θ) if the conditional distribution
of sufficient statistics (independent of sample size) with respect
to these likelihood functions. These distributions are power-law
pθX n |T (xn1 |t) is independent of θ (1)
1
extensions of the usual exponential family and contain Student forevery t ∈ T := {T (y1n ) : y1n ∈ Sn }. This definition,
distributions as a special case. We then extend the notion of
minimal sufficient statistics and compute it for these power-law however, is helpful only if one has a suitable guess for T .
families. Finally, we establish a Rao-Blackwell-type theorem for Observe also that the distribution of T is needed to find the
finding the best estimators for a power-law family. This helps us conditional distribution in (1). Later, Koopman proposed the
establish Cramér-Rao-type lower bounds for power-law families. following equivalent definition, where the distribution of T
Index Terms— Cramér-Rao bounds, Fisher-Darmois- is not needed [31]. He called a statistic T sufficient for θ
Koopman-Pitman theorem, Rao-Blackwell theorem, Student if the following holds. If xn1 and y1n are two sets of i.i.d.
Distribution, sufficient statistics. samples such that they are same under T , that is, T (xn1 ) =
T (y1n ), then the difference of their respective log-likelihoods,
I. I NTRODUCTION [L(xn1 ; θ) − L(y1n ; θ)] is independent of θ, where L(xn1 ; θ) :=
P n
i=1 log pθ (xi ) denotes the log-likelihood function of the
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7566 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
the estimation of θ). This seems to suggest that the classical functions, called Jones et al. likelihood function [29]:
notion of sufficiency is based on ML estimation. (α)
LJ (xn1 ; θ)
Observe that the ‘entire sample’ is trivially a sufficient " n #
statistic with respect to any estimation. However, the notion 1 1X
:= log pθ (xi )α−1
of sufficient statistics is helpful only when it exists in a α−1 n i=1
fixed number, independent of the sample size. Fisher-Darmois- Z
1
Koopman-Pitman theorem addresses this problem [1], [16], − log pθ (x)α dx
[20], [31], [41]. It states that, under certain regularity con- α
" n #
ditions, the existence of such a set of sufficient statistics is 1 1 X
= log pθ (xi )α−1 − log ∥pθ ∥, (5)
possible if and only if the underlying family is exponential. α−1 n i=1
A set of probability distributions E is said to be an exponential
1
family if there exist functions w = [w1 , . . . , ws ]⊤ and f = pθ (x)α dx α .
R
where α > 0, α ̸= 1, and ∥pθ ∥ :=
[f1 , . . . , fs ]⊤ with wi : Θ → R, fi : S → R, and h : R → R Let us now consider the following family of Student distri-
such that each pθ ∈ E is given by butions given by
( 2 h (x − µ)2 i−2
Z(θ) exp h(x) + w(θ)⊤ f (x)
if x ∈ S pθ (x) = √ 1+ for x ∈ R, (6)
pθ (x) = (3) 3πσ 3σ 2
0 otherwise,
where θ = [µ, σ 2 ]⊤ , µ ∈ (−∞, ∞) and σ ∈ (0, ∞). Observe
that (6) does not fall in the exponential family as defined
where Z(θ) is the normalizing factor [7]. Bernoulli, Binomial,
in (3). Hence, by the Fisher-Darmois-Koopman-Pitman theo-
Poisson, and Normal distributions are some examples of expo-
rem, there does not exist any non-trivial sufficient statistics
nential families. However, a statistic that is sufficient for ML
for µ and σ. Indeed, it can be shown that the knowledge
estimation might not be sufficient for some other estimations.
of the entire sample is needed for estimating µ and σ in
For example, consider the family of binomial distributions
this case (See [24]). However, Eguchi et al. considered the
with parameters, say (m, θ), where m is a positive integer
estimation based on Jones et al. likelihood function (5) with
denoting the number of trials (assumed known) and θ ∈ (0, 1)
α = 1/2 and found closed-form solution for the estimators
is the success probability, which is unknown. It is well-known
(See
Pn [17]). The Pnestimators are functions of the statistics
that the sum of the sample is always a sufficient statistic 2
X
i=1 i and i=1 iX [17, Th. 3] (c.f. [25, Th. 11]). This
for θ (with respect to the usual likelihood function) (see, for
motivates us to generalize the concept of sufficiency based
example, [10]).
on an estimation method. Let us consider the difference of
Let us now suppose that we want to estimate θ by maximiz- (1/2)
Jones et al. likelihood function LJ evaluated at samples
ing the following likelihood function, called Cauchy-Schwarz n n
x1 and y1 :
likelihood function (c.f. [9, Eq. (2.90)]):
(1/2) (1/2)
" # LJ (xn1 ; θ) − LJ (y1n ; θ)
m m
X .X n n
Lcs (xn1 ; θ) := log pθ (x)2 ,
h i
pn (x)pθ (x) (4)
X X
= 2 log pθ (yi ) − log pθ (xi ) .
x=0 x=0
i=1 i=1
m
where pθ (x) = x θx (1 − θ)m−x for x = 0, . . . , m and pn is For the family of Student distributions (6), we have
the empirical distribution of the observed sample. Maximizing (1/2) (1/2)
LJ (xn1 ; θ) − LJ (y1n ; θ)
the Cauchy-Schwarz likelihood function is the same as min- "
imizing the well-known Cauchy-Schwarz divergence. (This 1 + (1/(µ2 + 3σ 2 ))
= 2 log ·
divergence is popular in machine learning applications such 1 + (1/(µ2 + 3σ 2 ))
as blind source separation, independent component analysis, n n
yi2 − (2µ/(µ2 + 3σ 2 ))
P P
manifold learning, image segmentation, and so on [27], [28], yi #
i=1 i=1
[30], [39], [40], [45].) It is easy to check that the solutions n n . (7)
x2i
P P
of the estimating equation, in this case, are the roots of a − (2µ/(µ2 + 3σ 2 )) xi
i=1 i=1
polynomial in θ whose coefficients involve some of pn (x),
x = 0, . . . , m. For example, when m = 2, it is a polynomial of (7) is independent of θ if
degree five, and the following are, respectively, the coefficients n
X n
X n
X n
X
in descending order of exponents of θ: x2i = yi2 and xi = yi .
8pn (1), 8pn (0)−20pn (1)−6pn (2), −20pn (0)+16pn (1)+ i=1 i=1 i=1 i=1
12pn (2), 18pn (0) − 4pn (1) − 6pn (2), −7pn (0) − 2pn (1) + for µ and σ depend
Notice that the Jones et al. estimatorsP on
n Pn
pn (2), and pn (0) + pn (1). the sample only through the statistics i=1 Xi and i=1 Xi2 .
This shows that at least two among pn (0), pn (1) and pn (2) This implies that, when the estimation is based on Jones et al.
(1/2)
are needed to find the estimator. Hence the sum of the sample likelihood
P function
LJ , it is reasonable to call the pair
2
P
is no longer sufficient for estimating θ. Xi , Xi sufficient for µ and σ.
Observe that the Cauchy-Schwarz likelihood function is a Jones et al. likelihood function is popular in the con-
special case (α = 2) of the more general class of likelihood text of robust estimation [29]. Observe that (5) coincides
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7567
with the usual log-likelihood function L(θ) as α → 1. than the given one. This result is known as the Rao-Blackwell
Hence Jones et al. likelihood function is considered a robust theorem [4], [42]. That is, conditional expectation of an
alternative to ML estimation. Motivated by the works of unbiased estimator given the sufficient statistics results in
Windham [48] and Field and Smith [18], Jones et al. proposed a uniform improvement (in variance) of the estimator. This
this method in search of an estimating equation, an alternative suggests one to consider only such conditional expectations
to the one of MLE, that uses a suitable power of the model while searching for best unbiased estimators.
density to down-weight the effect of outliers (possibly) present In this paper, we find Rao-Blackwell type estimators for
in the data [17], [21], [25], [29]. It is well-known that ML power-law families of the form (6) with respect to Jones et al.
estimation is closely related to the minimization of the well- and Basu et al. likelihood functions. We also find a lower
known Kullback-Leibler divergence or relative entropy [15, bound for the variance of these estimators. This generalizes
Lem. 3.1]. Similarly, Jones et al. estimation is related to the the usual Cramér-Rao Lower Bound (CRLB) and results in a
minimization of the so-called Jones et al. divergence [22], sharper bound for the power-law families. We also apply these
[25], [35], [46], [47] (also known as γ-divergence [11], results to the Student distributions.
[22], projective power divergence [17], logarithmic density Our main contributions in this paper are the following.
power divergence [36], relative α-entropy [32], [33], Rényi (i) Propose a generalized notion of principle of sufficiency
pseudo-distance [5], [6]). While Jones et al. divergence for that is appropriate when the underlying estimation is not
α > 1 corresponds to robust inference, the same for α < necessarily ML estimation.
1 has connections to problems in information theory, such (ii) Characterize the families of probability distributions that
as mismatched guessing [46], source coding [33], encoding have a fixed number of sufficient statistics independent
of tasks [8] and so on. Though Jones et al. estimation for of sample size with respect to Basu et al. and Jones et al.
α < 1 does not have a meaning in terms of robustness, it can likelihood functions.
be used to approximate MLE of a Student distribution (c.f. (iii) Generalize the notion of minimal sufficiency with respect
[25, Rem. 7]). to a likelihood function and find generalized minimal
Another likelihood function that is popular in the context of sufficient statistics for the power-law family.
robust inference is the so-called Basu et al. likelihood function (iv) Extend Rao-Blackwell theorem to both Basu et al. and
given by Jones et al. estimations and find best estimators for the
(α) power-law family.
LB (xn1 ; θ)
n
(v) Derive a Cramér-Rao lower bound for the variance of
1 X α · pθ (xi )α−1 − 1
Z
= − pθ (x)α dx, (8) an unbiased estimator (of the expected value of the
n i=1 α−1 generalized sufficient statistic) of a power-law family
in terms of the asymptotic variance of the Jones et al.
(α)
where α > 0, α ̸= 1 [3, Eq. (2.2)]. Maximization of LB estimator.
results in a non-normalized version of the Jones et al. estimat- The rest of the paper is organized as follows. In Section II,
ing equation (see [25, Sec. 1]). Basu et al. estimation is closely we introduce the generalized notion of sufficiency.
related to the minimization of the so-called Basu et al. diver- In Section III, we establish Fisher-Darmois-Koopman-Pitman
gence, which belongs to the Bregman class [13]. Basu et al. theorem for Jones et al. and Basu et al. likelihood functions.
divergence for α = 2 corresponds to the squared Euclidean In this section, we also find minimal sufficient statistics for
distance between the empirical measure and the model density. the mean and variance parameters of Student distributions.
A comparison of Basu et al. and Jones et al. estimations was In Section IV, we find generalized Rao-Blackwell type
studied in [29]. Unlike other likelihood functions such as the estimators for the power-law family. In Section V, we establish
Hellinger likelihood function, Basu et al. and Jones et al. generalized CRLB for Basu et al. and Jones et al. estimators
likelihood functions are special in robust inference, in the of power-law family. In this section, we also find the
sense that smoothing of the empirical probability measure best estimators for the mean and variance parameters of
is not needed (see, for example, [25, Sec. 1] for details). Student distributions. We end the paper with a summary and
In this paper, we propose a generalized notion of sufficiency concluding remarks in Section VI. Proofs of most of the
based on the likelihood function associated with an estimation results are presented in the Appendix.
problem. Specifically, we will focus on Jones et al. and
Basu et al. likelihood functions. We will characterize the
family of probability distributions that has a fixed number of II. A G ENERALIZED N OTION OF THE
sufficient statistics independent of the sample size with respect P RINCIPLE OF S UFFICIENCY
to these likelihood functions. It turns out that these are the In this section, we first propose a generalized notion of
power-law families studied in [2], [14], [25], [33], and [37]. sufficiency analogous to Koopman’s definition of sufficiency.
Student distributions fall in this family. We then derive a factorization theorem associated with this
Sufficient statistics not only help in data reduction but also generalized notion. We also show that this is equivalent to
help in finding an estimator better than the given one. Indeed, generalized Fisher’s definition of sufficiency.
if one has an unbiased estimator along with sufficient statistics, Definition 1: Let Π = {pθ : θ ∈ Θ} be a family of
it is always possible to find another unbiased estimator (usually probability distributions. Suppose the underlying problem is
a function of sufficient statistics) whose variance is not more to estimate θ by maximizing some generalized likelihood
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7568 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
LG (xn1 ; θ) = u(θ, T (xn1 )) + v(xn1 ), (9) the usual conditional distribution of xn1 given T = t.
Proposition 3: T is sufficient for θ with respect to a likeli-
hood function LG if and only if p∗θX n |T (xn1 |t) is independent
for all sample points xn1 and for all θ ∈ Θ. 1
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7569
p∗θX n |T (y1n |t)qθ∗ (t). This implies that the generated sample z1n (iii) Student distributions fall in M(α) -family [25, Ex. 3].
1
has the same distribution as the initial sample xn1 (with respect We now state the main result of this section.
to p∗θ ). Theorem 7: Let Π = {pθ : θ ∈ Θ ⊂ Rk } be a family
While sufficient statistics help in data compression, minimal of probability distributions with common support S, an open
sufficient statistics help maximum such compression. We now subset of R. Let xn1 be an i.i.d. sample from some pθ ∈ Π.
define the notion of minimal sufficiency in a general setup. For some s (k ≤ s < n), let Tj (xn1 ) for j = 1, . . . , s be
Let us first recall the definition of minimal sufficiency in the differentiable functions on Sn . If [T1 , . . . , Ts ]⊤ is a sufficient
classical sense. statistic for θ with respect to Jones et al. likelihood function
(α)
Definition 4 [10, Def. 6.2.11]: A statistic T is said to LJ , then there exist functions h : S → R, fi : S → R and
be a minimal sufficient statistic if T is a function of any wi : Θ → R, i = 1, . . . , s such that pθ (x) can be expressed
other sufficient statistic. In other words, T is minimal if as (12) for x ∈ S.
the following holds: For any sufficient statistic Te and for Proof: The proof is divided into two parts. The first part
any two i.i.d. samples xn1 and y1n , Te(xn1 ) = Te(y1n ) implies deals with the case when k = s and the next one for k < s.
T (xn1 ) = T (y1n ). (a) Let k = s. We only give the proof when k = s = 1.
There is an easy criterion to find minimal sufficient statistics Other cases can be proved similarly.
for MLE due to Lehmann and Scheffé [10, Th. 6.2.13]. Here Let us first consider the Jones et al. likelihood func-
we first generalize this criterion to a generalized likelihood tion (5). Taking derivative with respect to θ on both sides
function. Later we will use it to find minimal sufficient of (5), we get
statistics for power-law families. ∂ (α) n
Proposition 5: Let Π = {pθ : θ ∈ Θ} be a family of L (x1 ; θ)
∂θ J
probability distributions. Suppose the underlying problem is Pn ∂ α−1
1 i=1 ∂θ pθ (xi ) ∂
to estimate θ by maximizing some likelihood function LG . = Pn α−1
− log ∥pθ ∥. (13)
α−1 i=1 pθ (xi ) ∂θ
Assume that the following condition holds for a statistic T .
For any two i.i.d. samples xn1 and y1n , T (xn1 ) = T (y1n ) if and If T is a sufficient statistic for θ with respect to LJ ,
(α)
only if [LG (xn1 ; θ)−LG (y1n ; θ)] is independent of θ for θ ∈ Θ. then, from Proposition 2, we have
Then T is a minimal sufficient statistic for θ with respect to (α)
LG . LJ (xn1 ; θ) = u(θ, T ) + v(xn1 ) (14)
Proof: See Sec-C of Appendix A. for some functions u and v, where v is independent of
θ. This implies that
III. G ENERALIZED F ISHER -DARMOIS -KOOPMAN -P ITMAN
∂ (α) n
T HEOREM L (x1 ; θ) = u1 (θ, T ), (15)
∂θ J
In this section, we first characterize the probability dis-
∂
tributions that have a fixed number of sufficient statistics where u1 (θ, T ) := u(θ, T ) . Comparing (13)
independent of sample size with respect to Jones et al. and ∂θ
and (15), we have
Basu et al. likelihood functions. We observe that each of
them is a power-law extension of the usual exponential family. u1 (θ, T )
Pn ∂ α−1
We then find sufficient statistics for the power-law families. 1 i=1 ∂θ pθ (xi ) ∂
= P n α−1
− log ∥pθ ∥.
We start by defining the family associated with Jones et al. α−1 p (x
i=1 θ i ) ∂θ
likelihood function. (16)
Definition 6: A k-parameter α-power-law family character-
Now, for a particular value of θ ∈ Θ, say θ0 , (16) can be
ized by h, w, f, Θ and S, where Θ is an open subset of Rk ,
re-written as
w = [w1 , . . . , ws ]⊤ , f = [f1 , . . . , fs ]⊤ , with wi : Θ → R is
differentiable for i = 1, . . . , s, fi : R → R for i = 1, . . . , s, u1 (θ0 , T )
h : R → R, and S ⊆ R, is defined by M(α) = {pθ : θ ∈ Θ}, Pn ∂
h i
i=1 pθ (xi )α−1
where 1 ∂θ
θ0
h∂ i
= Pn − log ∥p θ ∥
1 α−1 pθ0 (xi )α−1 ∂θ
(
θ0
Z(θ) h(x) + w(θ)⊤ f (x) α−1
for x ∈ S i=1
pθ (x) = (12) 1
Pn
g(xi ) h ∂ i
0 otherwise, = P i=1 − log ∥pθ ∥ , (17)
1
α − 1 ni=1 h(xi ) ∂θ θ0
with Z(θ)−1 = [h(x) + w(θ)⊤ f (x)] α−1 dx.
R
∂
S
where g(xi ) = ∂θ pθ (xi )α−1 θ0 , and h(xi ) =
Remark 2: (i) If h(x) = q(x)α−1 for some function pθ0 (xi )α−1 .
q(x) > 0 and f (x) = (1 − α)f˜(x), then M(α) -family can
Pn Pn
Let us denote Te1 := i=1 g(xi ) and Te2 := i=1 h(xi ).
be shown to coincide with an exponential family when Then, from (17), there exists a function Φ such that
α = 1 (see [32, Def. 8]).
T = Φ(Te1 , Te2 ). (18)
(ii) In [33], this family was derived as projections of
Jones et al. divergence on a set of probability distri- Using this, (15) can be re-written as
butions determined by linear constraints. More about ∂ (α) n
M(α) -family can be found in [2], [23], [25], and [33]. L (x1 ; θ) = u2 (θ, Te1 , Te2 ) (19)
∂θ J
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7570 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
for some function u2 . Observe also that, (5) can be re- From (23), it is clear that
written as ∂ h ∂ i
(α) u3 (θ, Te1 , Te2 ) = 0 if and only if
LJ (xn1 ; θ) ∂xi ∂ Te1
X n ∂ h ∂ i
1 1 1 u3 (θ, Te1 , Te2 ) = 0, (24)
= log pθ (xi )α−1 − log ∥pθ ∥α−1 ∂xi ∂ Te2
α−1 n i=1 α−1
∂ ∂
since neither ∂x [g(xj )] nor ∂x [h(xj )] can be zero.
" n #
1 1 X pθ (xi )α−1 j j
= log . Equivalently,
α−1 n i=1 ∥pθ ∥α−1
∂ h ∂ i
This implies that u3 (θ, Te1 , Te2 ) ̸= 0 if and only if
∂xi ∂ Te1
n
h
(α)
i 1 X pθ (xi )α−1 ∂ h ∂ i
exp (α − 1)LJ (xn1 ; θ) = . u3 (θ, Te1 , Te2 ) ̸= 0.
n i=1 ∥pθ ∥α−1 ∂xi ∂ Te2
Further (14) implies that ∂ h ∂ i
Suppose that u3 (θ, Te1 , Te2 ) ̸= 0. Then (23) can
∂xi ∂ Te1
h i
(α)
exp (α − 1)LJ (xn1 ; θ)
h i h i be re-written as
h i
= exp (α − 1)u(T, θ) · exp (α − 1)v(xn1 ) . ∂ ∂
u (θ, T , T ) ∂
∂xi ∂ Te1 3 1 2 ∂xj [h(xj )]
e e
i = − ∂
Thus we have
h
∂xj [g(xj )]
∂ ∂
n ∂xi u3 (θ, Te1 , Te2 )
∂ Te2
1 X pθ (xi )α−1
=: ξ(xj ). (25)
n i=1
∥pθ ∥α−1
Since the right hand side of (25) is a function of xj alone,
= exp (α − 1)u(T, θ) · exp (α − 1)v(xn1 ) .
both ∂∂Te [u3 (θ, Te1 , Te2 )] and ∂∂Te [u3 (θ, Te1 , Te2 )] must be of
(20) 1
the following form
2
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7571
h i Pn
∂ ∂
∂xi ∂ Te2 u3 (θ, T1 , T2 )
e e = 0. This implies that both for i = 1, . . . , s, where fi = n1 j=1 fi (xj ) and h :=
1
P n
∂ ∂
u (θ, T1 , T2 ) and ∂ Te u3 (θ, Te1 , Te2 ) are independent n j=1 h(xj ).
∂ Te1 3
e e
2
of xi ’s where i ̸= j. Thus, in this case also, (27) holds. Proof: See Sec-A of Appendix B.
Since u3 (θ, Te1 , Te2 ) is a function of Te1 , Te2 and θ Remark 3: It is easy to see from the above theorem that
⊤
only, both ∂∂Te u3 (θ, Te1 , Te2 ) and ∂∂Te u3 (θ, Te1 , Te2 ) are f /h := f1 /h, . . . , fs /h is a sufficient statistic for the
1 2
also functions of Te1 , Te2 and θ only. But both Te1 and k-parameter M(α) -family with respect to Jones et al. likelihood
Te2 are symmetric functions of the xi ’s, i = 1, . . . , n. function.
This implies that both quantities in (27) must only be a Example 1: Consider the Student distributions with loca-
function of θ. Let tion parameter µ, scale parameter σ and degrees of freedom
parameter ν ∈ (2, ∞) :
∂
u3 (θ, Te1 , Te2 ) =: w1 (θ) and − ν+1
(x − µ)2
2
∂ Te1
pµ,σ (x) = Nσ,ν 1 + , (29)
∂ νσ 2
u3 (θ, Te1 , Te2 ) =: w2 (θ)
∂ Te2 Γ( ν+1
2 )
for some w1 and w2 which are functions of θ only. Then for x ∈ R, where Nσ,ν := √ is the normalizing
νπσ 2 Γ( ν2 )
we have factor. Observe that (29) includes (6) for ν = 3.
u3 (θ, Te1 , Te2 ) = w1 (θ)Te1 + w2 (θ)Te2 + w3 (θ). Let ν be known, and µ and σ 2 be unknown. Letting α =
ν−1
and θ = [µ, σ 2 ]⊤ , (29) can be re-written as
Using (21), we get ν+1
pθ (xi )α−1 = w1 (θ)g(xi ) + w2 (θ)h(xi ) + w3 (θ)
µ2 bα α−1
1
pθ (x) = Nθ,α 1 + 2 ·
σ
for i = 1, . . . , n. 1
bα x 2
α−1
(b) Let k < s. The proof for this case is quite similar to that 2µbα x
1+ 2 − , (30)
of case (a). For the sake of simplicity, we give a sketch σ + bα µ2 σ 2 + bα µ2
of proof for k = 1 and s = 2. 1 1−α
Let (T1 , T2 ) be a pair of sufficient statistic of θ with where bα = = and Nθ,α = Nσ,ν .
(α)
ν 1+α
respect to Jones et al. likelihood function LJ . Then, pro- (30) implies that Student distributions form an M(α) -family
ceeding as in Case (a), one would get an equation similar with
to (18) with fourP symmetric functionsPof xi ’s (instead of µ2 bα α−1
1
n n
two), say, Te1 = i=1 g1 (xi ), Te2 = i=1 g2 (xi ), Te3 = h(x) ≡ 1, Z(θ) = Nθ,α 1 + 2 ,
Pn Pn σ
i=1 h1 (xi ) and T4 = i=1 h2 (xi ). Hence (21) holds
e ⊤ ⊤
w(θ) = w1 (θ), w2 (θ) , f (x) = f1 (x), f2 (x)) ,
with Te1 , Te2 , Te3 , and Te4 instead of only Te1 and Te2 . Then a
w1 (θ) = bα (σ 2 + bα µ2 ), f1 (x) = x2 ,
similar calculation yields that all ∂∂Te u3 (θ, Te1 , Te2 , Te3 , Te4 )
w2 (θ) = −2µbα (σ 2 + bα µ2 ), and f2 (x) = x.
i
for i = 1, 2, 3, 4 are a function of θ only. This further
implies Pn Pn
By Theorem 8, f¯1 = n1 i=1 x2i and f¯2 = n1 i=1 xi are
2 sufficient statistics for µ and σ with respect to Jones et al.
(1)
X
pθ (xi )α−1 = wj (θ)gj (xi ) likelihood function.
j=1
2
X (2) A. Generalized Fisher-Darmois-Koopman-Pitman Theorem
+ wj (θ)hj (xi ) + w(3) (θ)
for Basu et al. Likelihood Function
j=1
In [24], we characterized the probability distributions that
for i = 1, . . . , n. This completes the proof.
have a fixed number of sufficient statistics with respect to
the Basu et al. likelihood function (only for the case Θ ⊂
The converse of the above theorem is also true. Indeed, R though). We termed the associated family of probability
in the following, we show that an M(α) -family always has distributions as B(α) -family. In the following, we first define
a fixed number of sufficient statistics with respect to the the B(α) -family and then state the results concerning sufficient
(α)
Jones et al. likelihood function LJ . statistics for the same.
Theorem 8: Consider a parametric family of probability Definition 9: Let h, w, f, Θ and S be as in Definition 6. A
distributions Π = {pθ : θ ∈ Θ ⊂ Rk } having a common k-parameter B(α) -family characterized by h, w, f, Θ and S is
support S. Let T = [T1 , . . . , Ts ]⊤ be s statistics defined on defined by B(α) = {pθ : θ ∈ Θ}, where
Sn where s ≥ k. T is a sufficient statistic for θ with respect ( 1
(α)
to Jones et al. likelihood function LJ if the following hold. [h(x) + F (θ) + w(θ)⊤ f (x)] α−1 for x ∈ S
(α)
pθ (x) = (31)
(a) Π is a k-parameter M -family as in Definition 6, 0 otherwise
(b) there exist real valued functions ψ1 , . . . , ψs defined on
for some F : Θ → R, the normalizing factor.
T such that for any i.i.d. sample xn1
Remark 4: (i) Student distributions fall in B(α) -family
ψi (T (xn1 )) = fi h
(28) [25, Ex. 1].
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7572 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
(ii) B(α) -family is also a generalization of the usual exponen- Proof: See Sec-B of Appendix B.
tial family [24], [25]. A more general form of this family Remark 5: In view of Theorem 12, the sufficient statistics
can be found in [14] in connection to the projection derived in Example 1 for Student distributions are minimal too
theorem of Bregman divergences (c.f. [25]). This family (with respect to both Basu et al. and Jones et al. likelihood
was also studied in [2] and [37]. A comparison between functions).
B(α) and M(α) families can be found in [25, Rem. 2].
The following is the generalized Fisher-Darmois-Koopman- IV. R AO -B LACKWELL T HEOREM FOR
Pitman theorem with respect to Basu et al. likelihood J ONES ET AL . E STIMATION
(α)
function LB . Let X1n = xn1 be an i.i.d. sample drawn according to some
Theorem 10: Let k ≤ s < n. If [T1 , . . . , Ts ]⊤ is a sufficient pθ ∈ Π. Here and afterward, we assume θ ∈ Θ ⊂ R unless
statistic for θ with respect to the Basu et al. likelihood function otherwise stated. Let θb := θ(x b n ) be an unbiased estimator
(α) 1
LB , then for x ∈ S, there exist functions h : S → R, fi : n
of θ. Let T := T (x1 ) be a sufficient statistic for θ with
S → R and wi : Θ → R, i = 1, . . . , s such that pθ (x) can be respect to the usual log-likelihood function. Let us consider
expressed as (31). the conditional expectation Eθ [θ|T b ]. It is easy to show from
Proof: The proof for the case s = k can be found in Fisher’s definition of sufficiency that Eθ [θ|T b ] is independent
[24, Th. 2]. On the other hand, let k < s. We give the sketch of θ [4]. Hence Eθ [θ|T ] is a function of T alone. Let ϕ(T ) :=
b
of the proof for k = 1 and s = 2 as in Theorem 7. The Eθ [θ|T
b ]. Then by Rao-Blackwell theorem, ϕ(T ) is a uniformly
other cases can be proved similarly. Following the same steps better unbiased estimator of θ than θb (see [4], [42]). In other
(α)
as in Theorem 7, one gets (19) using LB (xn1 ; θ) in place of words,
(α) n
LJ (x1 ; θ). This, upon taking partial derivative with respect to 1. Eθ [ϕ(T )] = Eθ [θ]b = θ, and
xj first, and then with respect to some xi , i ̸= j, implies (23). 2. Varθ [ϕ(T )] ≤ Varθ [θ]b
Then, proceeding as in Theorem 7, one can prove the
for all θ ∈ Θ, and equality holds if and only if θb is a function of
result.
the sufficient statistic T . This implies that the ‘best’ unbiased
The converse of the above theorem is also true.
estimator of θ (the one with the least variance) is a function
Theorem 11: Consider a k-parameter B(α) -family as in
of the sufficient statistic T only, that is, one from the family
Definition 9. Let T = [T1 , . . . , Ts ]⊤ be s statistics defined
{ψ(T ) : Eθ [ψ(T )] = θ}.
on Sn where s ≥ k. Then T is sufficient statistic for θ with
(α) Let us consider an exponential family E as in (3). f¯ =
respect to LB if there exist real valued functions ψ1 , . . . , ψs 1
P n
defined on T such that ψi (T (xn1 )) = fi for any i.i.d. sample n i=1 f (xi ) is a sufficient statistic for θ with respect to the
⊤ usual log-likelihood function [34] and Eθ [f¯] = Eθ [f (X)] =:
xn1 and i = 1, . . . , s. In particular, f := f1 , . . . , fs is a
(α) τ (θ), called the mean value parameter. Let θb be any unbiased
sufficient statistic for θ with respect to LB .
estimator of τ (θ). Then by Rao-Blackwell theorem, we have
Proof: See [24, Th. 4].
ϕ(f¯) := Eθ [θ|
b f¯] is also an unbiased estimator of τ (θ) and has
Example 2: Consider the Student distributions as in Exam-
variance not more than that of θ. b That is,
ple 1. In [25], we showed that these also form a 2-parameter
B(α) -family. Using Theorem 11, the same statistics as in Eθ [ϕ(f¯)] = τ (θ) and Varθ [ϕ(f¯)] ≤ Varθ [θ].
b (32)
Example 1 form sufficient statistics for their parameters with
respect to Basu et al. likelihood function. Now consider ψ(f¯) = ϕ(f¯) − f¯. Then
Eθ [ψ(f¯)] = Eθ [ϕ(f¯)] − Eθ [f¯] = τ (θ) − τ (θ) = 0,
B. Minimal Sufficiency With Respect to Basu et al. and
Jones et al. Likelihood Functions and
In the following, we show that f¯/h̄ and f¯ are also minimal, Varθ [ϕ(f¯)]
respectively, for regular M(α) and B(α) families. An M(α) - = Varθ [ψ(f¯) + f¯]
family as in Definition 6 is said to be regular if the following
are satisfied (c.f. [25, Def. 3]). = Varθ [ψ(f¯)] + Varθ [f¯] + 2 Covθ [ψ(f¯), f¯]. (33)
(i) The support S does not depend on the parameter θ. It can be shown, under some mild regularity conditions, that
(ii) The number of unknown parameters is the same as the Covθ [ψ(f¯), f¯] = 0. See Sec-A of Appendix C for the detailed
number of wi ’s, that is, s = k. derivation. Hence (33) reduces to
(iii) 1, w1 , . . . , wk are linearly independent.
(iv) f1 , . . . , fk are linearly independent. Varθ [ϕ(f¯)] = Varθ [ψ(f¯)] + Varθ [f¯] ≥ Varθ [f¯].
Similarly, a B(α) -family as in Definition 9 is said to be b = Eθ [f¯], we have
Thus from (32), for any θb with Eθ [θ]
regular if 1, f1 , . . . , fk are linearly independent along with
b ≥ Varθ [ϕ(f¯)] ≥ Varθ [f¯].
Varθ [θ]
conditions (i)-(iii) above (c.f. [25, Def. 1]). ⊤
Theorem 12: The statistics f /h = f1 /h, . . . , fk /h
⊤ This shows that the variance of f¯ is not more than that of
and f = f1 , . . . , fk are minimal sufficient statistics, any unbiased estimator of τ (θ). Thus we have the following
respectively, for a k-parameter regular M(α) -family and a k- (c.f. [10], [44]).
parameter regular B(α) -family with respect to Jones et al. and Proposition 13: Consider a regular exponential family as
Basu et al. likelihood functions. in (3). Let Eθ [f (X)] = τ (θ). Then there does not exist any
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7573
unbiased estimator of τ (θ) that has lesser variance than that Let T be a statistic of the sample xn1 . Then, from (11),
of f¯. That is, f¯ is the best estimator of its expected value. we have
Further, it can be easily shown that (c.f. [43], [44]) ∗ n
pθ (x1 ) if xn ∈ A
1 t
1 [τ ′ (θ)]2 p∗θX n |T (xn1 |t) = q ∗ (t)
Varθ [f¯] = Var[f (X)] = , 1 θ
n n I(θ) 0 otherwise.
Observe that, as α → 1, the above coincides with the usual
where I(θ) is the i Fisher information, that is, I(θ) =
h
∂
2 conditional distribution of xn1 given T = t.
Eθ ∂θ log pθ (X) . This and Proposition 13 imply that b n ) be an unbiased estimator of τ ∗ (θ) := E∗ [θ],
Let θb := θ(x 1 θ
b
′ 2
where E∗θ [· · · ] denotes the expectation with respect to p∗θ .
b ≥ [τ (θ)] .
Varθ [θ] (34) Let T := T (xn1 ) be a sufficient statistic for θ with respect
n I(θ) (α)
to Jones et al. likelihood function LJ . Let us consider the
conditional expectation
Equality holds in (34) if and only if θb = f¯ almost surely. Z
Notice that (34) is the well-known Cramér-Rao inequal- ∗ b
n
Eθ θ|T = T (x1 ) = θ(y b n )p∗ (y n |T (xn ))dy n . (36)
1 θ 1 1 1
ity [12], [42].
Observe that, in Rao-Blackwell theorem, instead of a suf- In the following, we first establish some properties of (36).
ficient statistic T , if one conditions on some other statistic, Proposition 14: The conditional expectation E∗θ [θ|T b ] is
then also the resulting estimator has variance not more independent of θ.
than that of the given estimator. However, conditioning on Proof: See Sec-B of Appendix C.
any other statistic does not guarantee that the conditional Let
expectation is independent of θ. Also by Fisher-Darmois-
E∗θ θ|T
b = T (xn ) =: ϕ∗ (T (xn )).
Koopman-Pitman theorem, a sufficient statistic with respect 1 1
to the usual log-likelihood function exists if and only if
the underlying model is exponential [16], [31], [41]. Hence Let us consider the expectation of ϕ∗ (T ) with respect to p∗θ .
Rao-Blackwell theorem is not applicable outside the expo- That is,
nential family. However, in Section III, we showed that there
Z
are models that have a fixed number of sufficient statistics E∗θ [ϕ∗ (T (X1n ))] = p∗θ (xn1 )ϕ∗ (T (xn1 ))dxn1 .
with respect to likelihood functions other than log-likelihood.
Then we have the following.
In particular, we showed that M(α) -family is one such model
Proposition 15: ϕ∗ (T ) is unbiased for τ ∗ (θ), that is,
which has a fixed number of sufficient statistics with respect
to Jones et al. likelihood function (see Theorem 7 and The- Eθ [ϕ (T (X1n ))] = E∗θ [θ(X
∗ ∗ b n ))] = τ ∗ (θ).
1
orem 8). In the following, we generalize the Rao-Blackwell Proof: See Sec-C of Appendix C.
theorem for Jones et al. likelihood function. Later we will Thus, if we have an unbiased estimator of τ ∗ (θ) and a suffi-
apply this to M(α) -family and find Rao-Blackwell type “best” cient statistic with respect to Jones et al. likelihood function,
estimator. This also helps us find a lower bound for the we can always get another unbiased estimator for τ ∗ (θ) that
variance of the estimators of M(α) -family as an alternative is based only on the sufficient statistic. Let us now consider
to the usual CRLB. the following two classes of estimators based on the sufficient
(α)
The Jones et al. likelihood function LJ as in (5) can be statistic T :
re-written as A = {E∗θ [θ|T
b ]: θb is an unbiased estimator of
(α)
n
h 1 X 1
α−1 i τ ∗ (θ), that is, E∗θ [θ]
b = τ ∗ (θ)},
LJ (xn1 ; θ) = log pθ (xi )α−1 /∥pθ ∥ .
n B = {ψ(T ) : E∗θ [ψ(T )] = τ ∗ (θ)}. (37)
i=1
Then the deformed probability distribution associated with Proposition 16: The above two classes of estimators are the
(α)
LJ is given by same, that is A = B.
Proof: See Sec-D of Appendix C.
(α)
exp[LJ (xn1 ; θ)] The above proposition suggests that any unbiased estimator
p∗θ (xn1 ) := R (α) of τ ∗ (θ) that is a function of the sufficient statistic T is of
exp[LJ (y1n ; θ)]dy1n the form E∗θ [θ|T
b ], where θb is an unbiased estimator of τ ∗ (θ).
n 1
α−1
1
P α−1
In the following, we compare the variance of an estimator of
n pθ (xi ) the form (37) and any other unbiased estimator of τ ∗ (θ).
i=1
= 1
α−1 . (35) Lemma 17: Let θb be any unbiased estimator of τ ∗ (θ) and
n
1 ϕ (T ) = E∗θ [θ|T
∗ b ]. Let ψ ∗ (T ) ∈ B. Then
R
dy1n
P
n pθ (yi )α−1
i=1
Var∗θ [θ]
b = E∗ [(θb − ψ ∗ (T ))2 ] + Var∗ [ψ ∗ (T )]
θ θ
(α)
Observe that when α → 1, LJ (xn1 ; θ) coincides with + 2E∗θ [ψ ∗ (T ){ϕ∗ (T ) − ψ ∗ (T )}]. (38)
n
the log-likelihood function L(x1 ; θ). Hence, in this case,
p∗θ (xn1 ) = pθ (xn1 ). Proof: See Sec-E of Appendix C.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7574 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7575
−1
=
n
·
X
1 Let us consider ψ ∗ (f¯) := ϕ∗ (f¯) − f¯. Then E∗θ [ψ ∗ (f¯)] =
τ ∗ (θ) − τ ∗ (θ) = 0. Also
P
xi
y2n ∈A(Pn xi −1)
i=1
n−1
,
n
Var∗θ [ϕ∗ (f¯)]
= P P = Var∗θ [ψ ∗ (f¯) + f¯]
xi − 1 xi
n = Var∗θ [ψ ∗ (f¯)] + Var∗θ [f¯] + 2 Cov∗θ [ψ ∗ (f¯), f¯]. (42)
1X
= xi .
n i=1 In Sec-A of Appendix D, we have shown that, if w(θ) > 0 for
n
θ ∈ Θ, then Cov∗θ [ψ ∗ (f¯), f¯] ≥ 0. Using this in (42), we get
Thus ϕ∗ (T (xn1 )) = 1
P
xi is uniformly better estimator for
n
i=1 Var∗θ [ϕ∗ (f¯)]
∗
τ (θ) than x1 . It can also be shown that = Var∗θ [ψ ∗ (f¯)] + Var∗θ [f¯] + 2 Cov∗θ [ψ ∗ (f¯), f¯]
i2
≥ Var∗θ [f¯].
h
∂ ∗
τ (θ) (43)
∂θ
Var∗θ [ϕ∗ (T )] = ∗ (θ)
, Using (41) and (43), we get
Iα,n
where Iα,n∗
(θ) is the inverse of the asymptotic variance of b ≥ Var∗ [f¯],
Var∗θ [θ] (44)
θ
Jones et al. estimator for p∗θ (X1n ) [29]. This shows that the
Rao-Blackwell estimator derived from a sufficient statistic where equality holds if and only if θb = f¯ almost surely. Thus
attains the asymptotic variance of Jones et al. estimator. In the we have the following.
following section, we elucidate this for a general M(α) -family. Theorem 20: Consider an M(α) -family as in (39) with
w(θ) > 0 for θ ∈ Θ ⊂ R. Let τ ∗ (θ) = E∗θ [f¯]. Then the
V. E FFICIENCY OF R AO -B LACKWELL E STIMATORS variance of any unbiased estimator of τ ∗ (θ) with respect to
OF P OWER -L AW FAMILY
p∗θ is not lower than that of f¯. That is, f¯ is the best estimator
of its expected value with respect to p∗θ .
In this section, we shall show the efficiency of We now show that Var∗θ [f¯], indeed, equals to the asymptotic
Rao-Blackwell estimators for a power-law family. For sim- variance of Jones et al. estimator for the model p∗θ . The
plicity, we shall consider a regular M(α) -family with h ≡ 1. Jones et al. asymptotic variance for p∗θ is given by (c.f. [29])
That is, for x ∈ S,
1 1 Var∗θ [p∗θ (X1n )α−1 s∗ (X1n , θ)]
pθ (x) = Z(θ)[1 + w(θ)f (x)] α−1 . (39) := , (45)
∗ (θ)
Iα,n Cov∗θ [s∗ (X1n , θ), p∗θ (X1n )α−1 s∗ (X1n , θ)]2
Then
1
! α−1 where
n
1X ∂
pθ (xi )α−1 s∗ (xn1 , θ) := log p∗θ (xn1 ) (46)
n i=1 ∂θ
∗
n
1
! α−1 It can be easily seen that Iα,n (θ) coincides with the usual
1X Fisher information In (θ) for α = 1, where In∗ (θ) =
∗
= Z(θ)α−1 [1 + w(θ)f (xi )]
n i=1 Var∗θ [ ∂θ
∂
log p∗θ (X1n )]. Observe that (40) can be re-written as
1
= Z(θ)[1 + w(θ)f ] α−1 . p∗θ (xn1 ) = N (θ)[1 + w(θ)f ] α−1
1
1
Thus, using (35), we get = N (θ)α−1 + N (θ)α−1 w(θ)f α−1 .
1
[1 + w(θ)f ] α−1 For convenience, let F (θ) := N (θ)α−1 and w(θ) :=
p∗θ (xn1 ) = 1
e
1 + w(θ)f¯ α−1 dxn N (θ)α−1 w(θ). Then
R
1
1 1
= N (θ)[1 + w(θ)f ] α−1 , (40) p∗θ (xn1 ) = [F (θ) + w(θ)f
e ] α−1
n
where f¯ := n1 i=1 f (xi ) and N (θ)−1 :=
P R
[1 + and hence
1
w(θ)f¯] α−1 dxn1 is the normalizing constant. Comparing this ∂ ∗ n 1
with (12), we see that (40) also forms an M(α) -family with p (x ) = p∗ (xn )2−α [F ′ (θ) + w e′ (θ)f ].
∂θ θ 1 α−1 θ 1
h ≡ 1 and f (xn1 ) = f¯. Recall that, f¯ is a sufficient statistic
for M(α) with respect to Jones et al. likelihood function Then
(Theorem 8). Let τ ∗ (θ) := E∗θ [f¯]. In the following, we show Var∗θ p∗θ (X1n )α−1 s∗ (X1n , θ)
that f¯ is the best estimator of τ ∗ (θ). That is the variance of any 2
unbiased estimator of τ ∗ (θ) is bounded below by the variance 1 ′ 2 ∗
= we (θ) Varθ [f¯], (47)
of f¯ with respect to p∗θ . α−1
Let θb be an unbiased estimator of τ ∗ (θ) with respect to p∗θ . and
Then by Theorem 18, ϕ∗ (f¯) = E∗θ [θ| b f¯] is also an unbiased
estimator of τ ∗ (θ) and Cov∗θ [s∗ (X1n , θ), p∗θ (X1n )α−1 s∗ (X1n , θ)]
1 ∂
b ≥ Var∗ [ϕ∗ (f¯)].
Var∗θ [θ] (41) = e′ (θ) τ ∗ (θ).
w (48)
θ α−1 ∂θ
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7576 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7577
unlike the one in Corollary 21 where the bound is given by (when it comes to practical applications) that, the tuning
the asymptotic variance of Jones et al. estimators. parameter α in Basu et al. or Jones et al. likelihood function
We now apply the above result to find best estimator for the should be related to the degrees of freedom parameter ν of the
mean parameter of a Student distribution. Student distribution by α = (ν − 1)/(ν + 1). Basu et al. and
Example 4: Consider the Student distributions with degrees Jones et al. estimations for exponential families, in particular,
of freedom parameter ν ∈ (2, ∞), location parameter µ and for normal distributions have been studied, for example, in [3],
scale parameter σ 2 = 1 : [6], and [29]. However, unlike these works, the present work
i− ν+1 focused on models that have closed-form solutions for these
h 1 2
pµ (x) = Nν 1 + (x − µ)2 , (54) estimations.
ν Another important application of sufficient statistics is that,
2
where Nν is the normalizing constant. Letting α = 1 − ν+1 , given an estimator, one can find another estimator that is
(α) uniformly better than the given one. This result is known as
bα = 1/ν and Nα = Nν , (54) forms a regular B -family as
in (31) with h(x) = Nαα−1 bα x2 , F (µ) = (1 + bα µ2 )Nαα−1 , the Rao-Blackwell theorem. We generalized Rao-Blackwell
w(µ) = −2µNαα−1 bα and f (x) = x. This implies that x theorem when one has sufficient statistics with respect to
is a sufficient statistic for µ with respect to the Basu et al. Basu et al. or Jones et al. likelihood function. This helps us find
likelihood function [24, Th. 4] and x is the best estimator for the “best” estimator for the parameters of a power-law family
E∗µ [X] (Theorem 22). and hence for Student distributions. Consequently, we derived
It can also be shown, in this case, that a lower bound for the variance of estimators of a deformed
form of power-law family, p∗θ . In case of Jones et al. likelihood
p∗µ (xn1 ) function, the bound turns out to be its asymptotic variance
h α 2α (with respect to p∗θ ). The derivation of asymptotic variance
i
= exp Nαα−1 bα x2 + Fα (µ) − Nαα−1 bα µx ,
α−1 α−1 of Jones et al. estimator can be found in [29]. This bound is
where sharper than the usual CRLB (See Corollary 21). Recall that
α n h α i the usual CRLB equals the asymptotic variance of MLE.
Fα (µ) := Nαα−1 bα µ2 − log π− Nαα−1 bα , We now make some general remarks on the deformed
α−1 2 n(α − 1)
α − 1 2α Nαα−1 bα µ probability distribution p∗θ . Recall that, it is the factorization
(1−α)
E∗µ [X] = − = µ, and theorem (Halmos and Savage [26] in the classical case and
α w′ (µ) Proposition 2 in the general case) that makes it clear that the
[τ ′ (µ)]2 (1 − α) notion of sufficiency is with respect to a likelihood function.
Var∗µ [X] ∗ = .
In (µ) 2αbα Nαα−1 However, the generalized notion of sufficiency is defined using
the deformed probability distribution p∗θ . The equivalence of
VI. S UMMARY AND C ONCLUDING R EMARKS analogous factorization theorem and Fisher’s definition of
Fisher introduced the concept of sufficiency in connection sufficiency shows the uniqueness of the usage of p∗θ in the
with parameter estimation problems in an attempt to estimate generalized notion of sufficiency. This fact was further evi-
the unknown parameters of a population without knowing the denced in the case of Jones et al. likelihood function, where the
entire sample [19]. Fisher’s notion of sufficiency is based on lower bound of variance of estimators for power-law families
the usual log-likelihood function and the underlying estimation equals the asymptotic variance of the Jones et al. estimator for
is ML estimation. However, there are situations (for example, p∗θ . However, for Basu et al. likelihood function, the bound is
robust inference) where we need to look beyond ML estima- given by the usual CRLB, though computed with respect to
tion. Hence the classical notion of sufficiency is inadequate p∗θ , does not equal the asymptotic variance of the Basu et al.
in such situations. In this paper, we presented a notion of estimator (Theorem 22). This is surprising as one would
sufficiency based on a general likelihood function. In par- expect the bound to equal the Basu et al. asymptotic variance
ticular, we considered two families of likelihood functions, given by [3, Th. 2] (as in the case of Jones et al. likelihood
namely Basu et al. and Jones et al. α-likelihood functions. function). This will be the focus of one of our forthcoming
These (for α > 1) arise in distance-based robust inference works. In any case, such an anomalous behavior of Basu et al.
methods and are generalizations of the usual log-likelihood likelihood function (from Jones et al. likelihood function) was
function. Also, these likelihood functions are special as they do also noticed in the study of the geometry of the associated
not require smoothing of empirical measure (of the observed divergence functions in connection with projection theorems
sample) [29]. We characterized the family of probability for families of probability distributions determined by linear
distributions that have a fixed number of sufficient statistics constraints. While the geometry of the Basu et al. divergence
(independent of sample size) with respect to these likelihood is similar to that of the Kullback-Leibler divergence, the same
functions. This family generalizes the usual exponential family for Jones et al. divergence was different [33].
and has a power law. Student distributions fall in this family.
A PPENDIX A
We also extended the notion of minimal sufficiency and found
P ROOF OF RESULTS OF S ECTION II
minimal sufficient statistics for the power-law families. This
could potentially help when the data is modeled using one A. Proof of Proposition 2
of these power-law distributions. However, we would like Let T be a sufficient statistic for θ with respect to LG . Let
to emphasize that the established results have the limitation xn1 and y1n be two samples such that T (xn1 ) = T (y1n ). Then
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7578 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
from generalized Koopman’s definition of sufficiency, we have Conversely, let p∗θX n |T (xn1 |t) be independent of θ. Let
1
[LG (xn1 ; θ) − LG (y1n ; θ)] is independent of θ. Let us define a p∗θX n |T (xn1 |t) := v1 (xn1 ). Then
relation ‘∼’ on the set of all sample points of length n by 1
xn1 ∼ y1n if and only if T (xn1 ) = T (y1n ). Then ‘∼’ is an exp[LG (xn1 ; θ)]
= v1 (xn1 ).
equivalence relation. Let us denote the equivalence classes by
R
exp[LG (y1n ; θ)]dy1n
St for each t ∈ T = {r : r = T (xn1 ) for some xn1 ∈ Sn }. For AT (xn )
1
each equivalence class St , designate an element xn1,t ∈ St .
This implies
Let xn1 be a sample point such that T (xn1 ) = t∗ ∈ T .
∗
Then xn1 ∈ St and T (xn1 ) = T (xn1,t∗ ). This implies that LG (xn1 ; θ) = u(T (xn1 ), θ) + v(xn1 ), (56)
[LG (xn1 ; θ) − LG (xn1,t∗ ; θ)] is independent of θ by hypothesis.
where u(T (xn1 ), θ) = log exp[LG (y1n ; θ)]dy1n and
R
Let v(xn1 ) := [LG (xn1 ; θ) − LG (xn1,t∗ ; θ)]. Then
AT (xn )
1
LG (xn1 ; θ) = LG (xn1,t∗ ; θ) + v(xn1 ) = u(θ, t∗ ) + v(xn1 ), v(xn1 ) = log v1 (xn1 ). Since the above is true for any xn1 ∈ Sn ,
we have the result. ■
where u(θ, t) := LG (xn1,t ; θ) for each t ∈ T .
Conversely, let us assume that there exist two functions u
and v such that C. Proof of Proposition 5
The condition that if T (xn1 ) = T (y1n ) then [LG (xn1 ; θ) −
LG (xn1 ; θ) = u(θ, T (xn1 )) + v(xn1 ) LG (y1n ; θ)] is independent of θ, for all θ ∈ Θ implies that T is
for all sample points xn1 and for all θ ∈ Θ. Thus for any two a sufficient statistic for θ with respect to LG , by Definition 1.
sample points xn1 and y1n with T (xn1 ) = T (y1n ), we have Thus the only thing we need to prove is that T is further
minimal. Let Te be a sufficient statistic such that Te(xn1 ) =
LG (xn1 ; θ) − LG (y1n ; θ) Te(y1n ). Then by Definition 1, [LG (xn1 ; θ) − LG (y1n ; θ)] is
= u(θ, T (xn1 )) + v(xn1 )−u(θ, T (y1n )) − v(y1n ) independent of θ. Hence by hypothesis, we have T (xn1 ) =
= v(xn1 ) − v(y1n ). T (y1n ). Therefore T is minimal, by definition. ■
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7579
1 h(xn1 )
Z
(α) (α) ∗ n ∗
LJ (xn1 ; θ) − LJ (y1n ; θ) = log := pθ (x1 )ϕ (T (xn1 ))dxn1
α−1 h(y1n )
AT (yn )
1 1 + w(θ)T f (xn1 ) h(xn1 ) Z1
+ log .
α−1 = p∗θ (xn1 )ϕ∗ (T (y1n ))dxn1 , (58)
1 + w(θ)T f (y1n ) h(y1n )
Since the above is independent of θ, we must have AT (yn )
1
xn1
∈ AT (y1n ) implies T (xn1 ) = T (y1n ).
1 + w(θ)T f (xn1 ) h(xn1 ) since
log = C, Observe that
1 + w(θ)T f (y1n ) h(y1n )
where C is a constant independent of θ (may depend on xn1 ϕ∗ (T (xn1 ))
Z
and y1n ). This implies b n )p∗ (y n |T (xn ))dy n
= θ(y 1 θ 1 1 1
(1 − exp(C)) + w(θ)T f (xn1 ) h(xn1 )
Z
= b n )p∗ (y n |T (xn ))dy n
θ(y
− exp(C) f (y1n ) h(y1n ) = 0. 1 θ 1 1 1
AT (xn )
Since the given family is regular, the functions 1, 1
Z
w1 , . . . , wk are linearly independent. Hence we have b n )p∗ (y n |T (xn ))dy n
+ θ(y 1 θ 1 1 1
exp(C) = 1 and therefore
AcT (xn )
f (xn1 ) h(xn1 ) − f (y1n ) h(y1n ) = 0. Z 1
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7580 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
Observe that (59) holds for every partition set AT (y1n ) of Sn . G. Proof of Proposition 19
This implies both the estimators θb and ϕ∗ (T ) have the same Let ϕ1 (T ) and ϕ2 (T ) be two unbiased estimators for
expected value with respect to p∗θ . That is, τ ∗ (θ) with respect to p∗θ . Let σ∗2 be the minimum variance
that any unbiased estimator for τ ∗ (θ) can achieve. Suppose
E∗θ [ϕ∗ (T (X1n ))] = E∗θ [θ(X
b n ))] = τ ∗ (θ).
1
Var∗θ [ϕ1 (T )] = Var∗θ [ϕ2 (T )] = σ∗2 . We show that ϕ1 (T ) =
■ ϕ2 (T ) almost surely.
ϕ1 (T ) + ϕ2 (T )
Consider ξ(T ) := . Then E∗θ [ξ(T )] = τ ∗ (θ)
D. Proof of Proposition 16 2
and
A ⊆ B is immediate from Propositions 14 and 15. Con-
versely, let ψ(T ) ∈ B. Since E∗θ [ψ(T )] = τ ∗ (θ), we have Var∗θ [ξ(T )] ≤ σ∗2 ,
E∗θ [ψ ∗ (T )|T ] ∈ A. But E∗θ [ψ ∗ (T )|T ] = ψ ∗ (T ). This implies
since Cov∗θ [ϕ1 (T ), ϕ2 (T )] ≤ σ∗2 , by Cauchy-Schwarz inequal-
that ψ ∗ (T ) ∈ A. Hence B ⊆ A. This completes the proof. ■
ity. As σ∗2 is the minimum variance that one unbiased estimator
of τ ∗ (θ) can achieve, we have Var∗θ [ξ(T )] ≥ σ∗2 . Therefore
E. Proof of Lemma 17 Var∗θ [ξ(T )] = σ∗2 . This implies
We have
Cov∗θ [ϕ1 (T ), ϕ2 (T )] = σ∗2 .
Var∗θ [θ]
b
That means ϕ1 (T ) and ϕ2 (T ) are positively correlated. Thus
= E∗θ [(θb − τ ∗ (θ))2 ]
there exist constants, say α > 0 and β, such that
= E∗ [(θb − ψ ∗ (T ) + ψ ∗ (T ) − τ ∗ (θ))2 ]
θ
ϕ1 (T ) = αϕ2 (T ) + β almost surely.
= E∗θ [(θb − ψ ∗ (T ))2 ] + E∗θ [(ψ ∗ (T ) − τ ∗ (θ))2 ]
+2E∗ [(θb − ψ ∗ (T ))(ψ ∗ (T ) − τ ∗ (θ))] Taking expectation and variance with respect to p∗θ on both
θ
sides, respectively, we get
= E∗θ [(θb − ψ ∗ (T ))2 ] + Var∗θ [ψ ∗ (T )]
+2E∗ [(θb − ψ ∗ (T ))(ψ ∗ (T ) − τ ∗ (θ))]. τ ∗ (θ) = ατ ∗ (θ) + β and σ∗2 = α2 σ∗2 .
θ
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7581
1 + w(θ)f¯ θ 1 e
N (θ) ∂ h k(θ) i
= (α − 1) ′ . and thus
w (θ) ∂θ N (θ)
Hence
∂ ∗ n 1
h 1 + w(θ)f¯ − 1 i pθ (x1 ) = p∗ (xn )2−α [F ′ (θ) + w
e′ (θ)f ].
Z
(62)
ψ ∗ (f¯)f¯ p∗θ (xn1 )dxn1 ∂θ α−1 θ 1
1 + w(θ)f¯
w(θ)N (θ) ∂ h k(θ) i Hence, from the definition of s∗ ,
= (α − 1) .
w′ (θ) ∂θ N (θ)
This implies Cov∗θ [s∗ (X1n , θ), p∗θ (X1n )α−1 s∗ (X1n , θ)]
h ψ ∗ (f¯)f¯ i Z
E∗θ [ψ ∗ (f¯)f¯] − E∗θ = s∗ (xn1 , θ)2 p∗θ (xn1 )α dxn1
1 + w(θ)f Z h 1 ∂ ∗ n i ∗ n α n
w(θ)N (θ) ∂ h k(θ) i = s∗ (xn1 , θ) ∗ n p (x ) pθ (x1 ) dx1
= (α − 1) . pθ (x1 ) ∂θ θ 1
w′ (θ) ∂θ N (θ) Z
Using (60), we get = α−1 1
p∗θ (xn1 )s∗ (xn1 , θ)[F ′ (θ) + w e′ (θ)f¯]dxn1
w(θ)N (θ) ∂ h k(θ) i Z
k(θ) = (α − 1) . 1
w′ (θ) ∂θ N (θ) = p∗θ (xn1 )s∗ (xn1 , θ)F ′ (θ)dxn1
α−1
Since k(θ) ̸= 0, we have 1
Z
+ p∗θ (xn1 )s∗ (xn1 , θ)we′ (θ)f¯dxn1
∂ h k(θ) i.h k(θ) i w′ (θ) α −Z 1
(α − 1) = .
∂θ N (θ) N (θ) w(θ) 1 1 ∂ ∗ n ¯ n
= e′ (θ) p∗θ (xn1 ) ∗ n
w p (x )f dx1
That is, α−1 pθ (x1 ) ∂θ θ 1
Z
∂ h k(θ) i ∂ 1 ∂
(α − 1) log = log w(θ). = e′ (θ)
w p∗θ (xn1 )f¯dxn1
∂θ N (θ) ∂θ α−1 ∂θ
1 ∂
This reduces to = we′ (θ) τ ∗ (θ).
h k(θ) iα−1 α−1 ∂θ
∂ ∂
log = log w(θ).
∂θ N (θ) ∂θ Using (62), we get
Integrating both sides with respect to θ, we get
h k(θ) iα−1 Z
log = log w(θ) + log C s∗ (xn1 , θ)p∗θ (xn1 )α dxn1
N (θ)
Z
where log C is the constant of integration. This implies ∂
= log p∗ (xn )p∗ (xn )α dxn1
∂θ Z θ 1 θ 1
k(θ)α−1 = C · w(θ)N (θ)α−1 .
1
= e′ (θ)f¯]dxn1
p∗θ (xn1 )[F ′ (θ) + w
Observe that C > 0 and N (θ) > 0 for all θ ∈ Θ. Hence, α−1
if w(θ) > 0 for all θ ∈ Θ, we get k(θ) > 0. This, together
with the case k(θ) = 0, implies k(θ) ≥ 0. That means, in any and
case, E∗θ [ψ ∗ (f¯)f¯] ≥ 0. Therefore
Z
Cov∗ [ψ ∗ (f¯), f¯] = E∗ [ψ ∗ (f¯)f¯] − E∗ [ψ ∗ (f¯)]E∗ [f¯]
θ θ θ θ s∗ (xn1 , θ)2 p∗θ (xn1 )2α−1 dxn1
= E∗θ [ψ ∗ (f¯)f¯] −0 Z h i2
∂
= E∗θ [ψ ∗ (f¯)f¯] = log p∗θ (xn1 ) p∗θ (xn1 )2α−1 dxn1
∂θ
≥ 0. 1 2 Z
= e′ (θ)f¯]2 dxn1 .
p∗θ (xn1 )[F ′ (θ) + w
■ α−1
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
7582 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 69, NO. 12, DECEMBER 2023
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.
GAYEN AND KUMAR: GENERALIZED FISHER-DARMOIS-KOOPMAN-PITMAN THEOREM AND RAO-BLACKWELL TYPE ESTIMATORS 7583
[12] H. Cramér, Mathematical Methods of Statistics. Princeton, NJ, USA: [37] J. Naudts, “Estimators, escort probabilities, and ϕ-exponential families
Princeton Univ. Press, 1946. in statistical physics,” J. Inequalities Pure Appl. Math., vol. 5, no. 4,
[13] I. Csiszar, “Generalized cutoff rates and Renyi’s information measures,” p. 102, 2004.
IEEE Trans. Inf. Theory, vol. 41, no. 1, pp. 26–34, Feb. 1995. [38] J. Neyman, “Sur un teorema concernente la cosidette statisticle suffici-
[14] I. Csiszár and F. Matúš, “Generalized minimizers of convex integral enti,” Giorn. Ist. Ital. Attuari, vol. 6, pp. 320–334, 1935.
functionals, Bergman distance, Pythagorean identities,” Kybernetika, [39] F. Nielsen, K. Sun, and S. Marchand-Maillet, “k-means clustering with
vol. 48, no. 4, pp. 637–689, 2012. Hölder divergences,” in Proc. Int. Conf. Geometric Sci. Inf., 2017,
[15] I. Csiszár and P. C. Shields, “Information theory and statistics: A tuto- pp. 856–863.
rial,” Found. Trends® Commun. Inf. Theory, vol. 1, no. 4, pp. 417–528, [40] F. Nielsen, K. Sun, and S. Marchand-Maillet, “On Hölder projective
2004. divergences,” Entropy, vol. 19, no. 3, pp. 1–28, 2017.
[16] G. Darmois, “Sur les lois de probabilites a estimation exhaustive,” [41] E. J. G. Pitman, “Sufficient statistics and intrinsic accuracy,” Math. Proc.
Comptes Rendus de L’Académie des Sci., vol. 200, no. 1, pp. 1265–1266, Cambridge Phil. Soc., vol. 32, no. 4, pp. 567–579, Dec. 1936.
1935. [42] C. R. Rao, “Information and accuracy attainable in the estimation
[17] S. Eguchi, O. Komori, and S. Kato, “Projective power entropy and of statistical parameters,” Bull. Calcutta Math. Soc., vol. 37, no. 3,
maximum Tsallis entropy distributions,” Entropy, vol. 13, no. 10, pp. 81–91, 1945.
pp. 1746–1764, Sep. 2011. [43] C. R. Rao, “Minimum variance and the estimation of several parame-
[18] C. Field and B. Smith, “Robust estimation: A weighted maximum ters,” Math. Proc. Cambridge Phil. Soc., vol. 43, no. 2, pp. 280–283,
likelihood approach,” Int. Stat. Rev., vol. 62, no. 3, pp. 405–424, 1994. Apr. 1947.
[19] R. A. Fisher, “On the mathematical foundations of theoretical statis- [44] C. R. Rao, “Sufficient statistics and minimum variance estimates,” Math.
tics,” Philos. Trans. Roy. Soc. London A, Math. Phys. Sci., vol. 222, Proc. Cambridge Phil. Soc., vol. 45, no. 2, pp. 213–218, Apr. 1949.
nos. 594–604, pp. 309–368, 1922. [45] Z. Shanqing, Z. Kunlong, and X. Weibin, “Application of
[20] R. A. Fisher, “Two new properties of mathematical likelihood,” London Cauchy–Schwarz divergence in image segmentation,” Comput.
A, Proc. Roy. Soc., vol. 144, no. 8, pp. 285–307, 1972. Eng. Appl., vol. 49, no. 12, pp. 129–131, 2013.
[21] H. Fujisawa, “Normalized estimating equation for robust parameter [46] R. Sundaresan, “A measure of discrimination and its geometric proper-
estimation,” Electron. J. Statist., vol. 7, pp. 1587–1606, Jan. 2013. ties,” in Proc. IEEE Int. Symp. Inf. Theory, May 2002, p. 264.
[22] H. Fujisawa and S. Eguchi, “Robust parameter estimation with a small [47] R. Sundaresan, “Guessing under source uncertainty,” IEEE Trans. Inf.
bias against heavy contamination,” J. Multivariate Anal., vol. 99, no. 9, Theory, vol. 53, no. 1, pp. 269–287, Jan. 2007.
pp. 2053–2081, Oct. 2008. [48] M. P. Windham, “Robustifying model fitting,” J. Roy. Stat. Soc., B,
[23] A. Gayen and M. A. Kumar, “Generalized estimating equation for the Methodol., vol. 57, no. 3, pp. 599–609, Sep. 1995.
student-t distributions,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT),
Jun. 2018, pp. 571–575.
[24] A. Gayen and M. A. Kumar, “A generalized notion of sufficiency for
power-law distributions,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT),
Jul. 2021, pp. 2185–2190.
[25] A. Gayen and M. A. Kumar, “Projection theorems and estimating equa-
tions for power-law models,” J. Multivariate Anal., vol. 184, Jul. 2021,
Art. no. 104734.
Atin Gayen received the bachelor’s degree in mathematics from the University
[26] P. R. Halmos and L. J. Savage, “Application of the Radon–Nikodym
of Calcutta, Kolkata, in 2012, the master’s and M.Phil. degrees in mathematics
theorem to the theory of sufficient statistics,” Ann. Math. Statist., vol. 20,
from Utkal University, Bhubaneswar, in 2014 and 2015, respectively, and the
no. 2, pp. 225–241, Jun. 1949.
Ph.D. degree from the Department of Mathematics in 2022. Subsequently,
[27] H. G. Hoang, B.-N. Vo, B.-T. Vo, and R. Mahler, “The Cauchy–Schwarz he embarked on his doctoral journey, which led him to the Indian Institute of
divergence for Poisson point processes,” IEEE Trans. Inf. Theory, Technology Indore and later to the Indian Institute of Technology Palakkad.
vol. 61, no. 8, pp. 4475–4485, Aug. 2015. He is currently a Post-Doctoral Fellow with the Department of Computer
[28] R. Jenssen, J. C. Principe, D. Erdogmus, and T. Eltoft, Science and Engineering, Indian Institute of Technology Roorkee, Uttarak-
“The Cauchy–Schwarz divergence and Parzen windowing: Connections hand. His research interests include distance-based statistical inference, robust
to graph theory and Mercer kernels,” J. Franklin Inst., vol. 343, no. 6, inference, information geometry, divergence-based learning, and quantum
pp. 614–629, Sep. 2006. information theory.
[29] M. C. Jones, “A comparison of related density-based minimum diver-
gence estimators,” Biometrika, vol. 88, no. 3, pp. 865–873, Oct. 2001.
[30] K. Kampa, E. Hasanbelliu, and J. C. Principe, “Closed-form
Cauchy–Schwarz pdf divergence for mixture of Gaussians,” in Proc. Int.
Joint Conf. Neural Netw., San Jose, CA, USA, 2011, pp. 2578–2585.
[31] B. O. Koopman, “On distributions admitting a sufficient statistic,” Trans.
Amer. Math. Soc., vol. 39, no. 3, pp. 399–409, 1936.
[32] M. A. Kumar and R. Sundaresan, “Minimization problems based on M. Ashok Kumar (Member, IEEE) received the bachelor’s and mas-
relative α-entropy I: Forward projection,” IEEE Trans. Inf. Theory, ter’s degrees in mathematics from Manonmaniam Sundaranar University,
vol. 61, no. 9, pp. 5063–5080, Sep. 2015. Tirunelveli, India, in 1999 and 2001, respectively, and the Ph.D. degree from
[33] M. A. Kumar and R. Sundaresan, “Minimization problems based on the Department of Electrical Communication Engineering, Indian Institute of
relative α-entropy II: Reverse projection,” IEEE Trans. Inf. Theory, Science, Bengaluru, India, in May 2015. From 2001 to 2007, he was a lecturer
vol. 61, no. 9, pp. 5081–5095, Sep. 2015. in mathematics in various educational institutions in India. He was a Visiting
[34] E. L. Lehmann and G. Casella, Theory of Point Estimation, 2nd ed. Scientist with the Stat-Math Unit, Indian Statistical Institute, Bengaluru, from
Cham, Switzerland: Springer, 1998. December 2014 to May 2015, and a Post-Doctoral Fellow with the Department
[35] E. Lutwak, D. Yang, and G. Zhang, “Cramér–Rao and moment-entropy of Electrical Engineering, Technion-Israel Institute of Technology, from May
inequalities for Rényi entropy and generalized Fisher information,” IEEE 2015 to December 2015. He was the Faculty Member in mathematics with
Trans. Inform. Theory, vol. 51, no. 2, pp. 473–478, Feb. 2005. IIT Indore from January 2016 to May 2018. He joined IIT Palakkad in June
[36] A. Maji, A. Ghosh, and A. Basu, “The logarithmic super divergence 2018, where he is currently an Associate Professor with the Department of
and asymptotic inference properties,” Adv. Stat. Anal., vol. 100, no. 1, Mathematics. His research interests include information theory, mathematical
pp. 99–131, Jan. 2016. statistics, and probability.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on April 07,2024 at 14:55:30 UTC from IEEE Xplore. Restrictions apply.