On An Asymptotic Distribution For The MLE

On an Asymptotic Distribution for the MLE
Stephen G. Walker
arXiv:2106.06597v1 [math.ST] 11 Jun 2021
Department of Mathematics
University of Texas at Austin, USA
e-mail: s.g.walker@math.utexas.edu
Abstract
The paper presents a novel asymptotic distribution for a mle when

the log–likelihood is strictly concave in the parameter for all data
points; for example, the exponential family. The new asymptotic dis-
tribution can be seen as a refinement of the usual normal asymptotic
distribution and is comparable to an Edgeworth expansion. However,
it is obtained with weaker conditions than even those for asymptotic
normality. The same technique is then used to find the exact distri-
bution of the weighted likelihood bootstrap sampler.
1 Introduction
One important aspect of statistical inference is quantifying the uncertainty
in statistics; for example, the sampling distribution of the maximum like-
lihood estimator arising from a model and data. If approximations are re-
quired, it is the asymptotic normal distribution which is often, if not always,
used. In this paper we show that if the log–likelihood is strictly concave in
the parameter for all data sets, then an improved asymptotic distribution
is available. The density estimate has similar properties to a second order
Edgeworth expansion, which uses up to three derivtives of the log–likelihood
(see [8] and [5]); whereas we obtain this using only one derivative. It is the
concavity of the log–likelihood which facilitates this. It is also clearly to be
seen how to get the asymptotic normal distribution from this new asymp-
totic distribution.
Key words and phrases: Asymptotic distribution; Central limit theorem; Exponential
family; Weighted likelihood bootstrap.
1
2 S. G. Walker
Consider the family of density functions f (x; θ), with respect to some
dominating measure, which will either be the counting measure or the
Lebesgue measure. Here x ∈ X and θ ∈ Θ ⊂ R. We write
l(x; θ) = − log f (x; θ),
negative the score function, and assume that l0 (x; θ) = ∂l(x; θ)/∂θ exists for
all θ and x, and that l(x; θ) is strictly convex in θ for all x; i.e. for all θ 6= θ0
is is that
l(x; θ) > l(x; θ0 ) + (θ − θ0 )l0 (x; θ0 ).
An example of such is the exponential family, see for example, [6]; so for
some functions c(x) and t(x),
f (x; θ) = c(x) exp{t(x)θ − b(θ)} so l(x; θ) = − log c(x) − t(x)θ + b(θ)
where b is the normalizing constant and known to be a convex function.

Now let θ∗ be a true parameter value which generates independent and
identically distributed data (X1 , . . . , Xn ) from f (x; θ∗ ). The maximum like-
lihood estimator is given by the θb solving
n
X
l0 (Xi ; θ) = 0.
i=1
The paper is about an asymptotic distribution for θb which will be presented

in section 2. Before this we highlight the conditions for asymptotic normal-
ity, see for example [7], which can be found as follows:
(a) The parameter space Θ is an open interval.
(b) The set A = {x : f (x; θ) > 0} does not depend on θ.
(c) For all x ∈ A the density f (x; θ) is thrice diffentiable with respect to θ
and the third derivative is continuous in θ.
(d) The Fisher information I(θ) satisfies

Z
2
0 < I(θ) = l0 (x; θ) f (x; θ) dx < ∞.
(e) Eθ [l0 (x; θ)] = 0 and Eθ [l00 (x; θ)] = I(θ).
(f) For θ∗ ∈ Θ there exists c > 0 and function M (x) such that |l000 (x, θ)| ≤
M (x) for all x ∈ A and |θ − θ∗ | < c and Eθ∗ [M (x)] < ∞.
3
As is well known, under these conditions,

√ p
n (θb − θ∗ )

(1.1) I(θ∗ ) →d N 0, 1 ,
where →d denotes convergence in distribution. This we also write as

(N )
p
(1.2) Fbθb (z) = Φ (z − θ∗ ) n I(θ∗ ) ,
as the estimator of the distribution function for the mle. Here the superscript
N refers to the normal approximation.
In section 2 we provide the new asymptotic distributions for the mle
under the concave condition. Section 3 then presents three illustrations and
section 4 uses the same technique to find the exact distribution for the
weighted likelihood bootstrap sampler. Finally, section 5 concludes with
some ideas for future work and considers the multivariate case.
2 An asymptotic distribution for θb

Let us define
Z Z
∗ 0 ∗ ∗
2
D(θ, θ ) = l (x; θ) f (x; θ ) dx and V (θ, θ ) = l0 (x; θ) f (x; θ∗ ) dx.
Under Assumption (e) we have that D(θ∗ , θ∗ ) = 0 and V (θ∗ , θ∗ ) = I(θ∗ ).

Defining
n
1 X 0
Tn (z) = l (Xi ; z),
n i=1
and note that Tn (z) = −Sn (z)/n, where Sn (z) is the usual score function,
the asymptotic normality of Tn (z), for each z ∈ Θ, implies
√ Tn (z) − D(z, θ∗ )
An (z) = np →d N(0, 1).
V (z, θ∗ ) − D2 (z, θ∗ )
Our estimator of the distribution of the mle is based on this asymptotic

result. Hence,
!
√ ∗
t − D(z, θ )
FbTn (z) (t) = Φ n p .
V (z, θ∗ ) − D2 (z, θ∗ )
See [7] for further details on asymptotic normality of sample means.

The main result of the paper is the following theorem:
4 S. G. Walker
Theorem 2.1. Under Assumptions (a), (b), (d) and (e) combined with
l(x; θ) being strictly convex in θ for all x, it is that
!
√ ∗
D(z, θ )
(2.1) Fbθb(z) = Φ n p
V (z, θ∗ ) − D2 (z, θ∗ )
is an estimator of the distribution of the mle.
Proof. Since l(x; θ) is strictly convex in θ for all data sets, we have the
observation that
P θb ≤ z = P (Tn (z) ≥ 0) .
Hence,
!
√ D(z; θ∗ )
P θb ≤ z = P An (z) ≥ − n p
V (z, θ∗ ) − D2 (z, θ∗ )
and since An (z) →d N(0, 1),

P θb ≤ z
→1
√

D(z;θ∗ )
1−Φ − n √
V (z,θ∗ )−D2 (z,θ∗ )
for all z. Consequently, we have and can take the estimator

!
\ √ D(z; θ∗ )
P θb ≤ z = 1 − Φ − n p ,
V (z, θ∗ ) − D2 (z, θ∗ )
which is given by (2.1), completing the proof.
We can see clearly how to get (1.1) from (2.1); requiring the approximations
∂D ∗ ∗
D(z, θ∗ ) ≈ D(θ∗ , θ∗ ) + (z − θ∗ ) (θ , θ ) = (z − θ∗ ) I(θ∗ ),
∂z
and V (z, θ∗ ) − D2 (z, θ∗ ) ≈ V (θ∗ , θ∗ ), noting ∂D(θ∗ , θ∗ )/∂z = V (θ∗ , θ∗ ) =
I(θ∗ ). This involves some rather loose approximations and suggests the nor-
mal approximation should not necessarily work well with z away from θ∗ .
Indeed we see this phenomenon in an illustration which follows.
Before this we see how (2.1) is comparable to an Edgeworth expansion.
The following standard expansion is to be found in Chapter 16 in [3];
√ p √
P n (θb − θ∗ ) I(θ∗ ) ≤ x = Φ(x) + φ(x) (a + bx2 )/ n,
where a and b use up to the third derivatives of l(x; θ), and are based on
expectations with respect to f (x; θ∗ ).
5
Lemma 2.2. From (2.1) we obtain

√ p √
P n (θb − θ∗ ) I(θ∗ ) ≤ x = Φ(x) + 12 c φ(x) x2 / n,
where
∂ 2D ∗ ∗ ∂V ∗ ∗
c= 2
(θ , θ ) V (θ∗ , θ∗ )−3/2 − (θ , θ ) V (θ∗ , θ∗ )−1 .
∂z ∂z
Proof. The proof to this uses
d2 ∂ 2 D ∗ ∗

∗ d ∗ d ∂D ∗ ∗
D θ + √ ,θ = √ (θ , θ ) + 1
2
(θ , θ ) + O(n−3/2 )
n n ∂z n ∂z 2
and
d d ∂V ∗ ∗
V θ + √ , θ∗
∗
= V (θ∗ , θ∗ ) + √ (θ , θ ) + O(n−1 ).
n n ∂z
So note we recover an Edgeworth expansion type estimate for the distribtu-

ion of the mle; i.e. (2.1), using only D and V which themselves only depend
on the first derivative of l(x; θ).
3 Illustrations
3.1 Exponential family
Consider the exponential family with functions t(x) and b(θ) so
D(θ, θ∗ ) = b0 (θ) − b0 (θ∗ ) and V (θ, θ∗ ) = b00 (θ∗ ) + (b0 (θ) − b0 (θ∗ ))2 .
Therefore, !
√ b0 (z) − b0 (θ∗ )
Fθb(z) = Φ
b n p .
b00 (θ∗ )
On the other hand, the asymptotic normal distribution is given by
√
(N ) ∗
p
Fθb (z) ≈ Φ
b 00 ∗
n (z − θ ) b (θ ) .
In particular, suppose f (x; θ) = θ e−xθ , with x > 0 and θ > 0. Then b(θ) =
− log θ so b0 (θ) = −1/θ and b00 (θ) = 1/θ2 .
Thus, we wish to compare

√
n θ∗ (1/θ∗ − 1/z) ,

(3.1) Fbθb(z) = Φ
6 S. G. Walker
1.0
0.8
0.6
distribution
0.4
0.2
0.0
0 1 2 3 4 5
Figure 1: (i) Dotted line: Fθb∗ (z); (ii) solid line: Fbθb(z); (iii) dashed line
(N )
Fb (z).
θb
the new asymptotic distribution, which is not a normal distribution, with

(N ) √
n (z − θ∗ )/θ∗

(3.2) Fbθb (z) = Φ
which is the usual asymptotic normal distribution. The true distribution for
θb is
(3.3) Fθb∗ (z) = 1 − Γn (nθ∗ /z),
where Γn is the distribution function of a gamma random variable with

shape parameter n and scale parameter 1.
With n = 10 and θ∗ = 1, Fig. 1 presents three curves; the bold solid line
is Fbθb(z), the dotted line is the true distribution of θ,b whereas the dashed
(N )
line is Fbθb (z). So we see that the latter distribution is not accurate whereas
the former, even with a sample of size 10 is good.
3.2 Fisk distribution

The Fisk density function is given by
θ xθ−1
f (x; θ) = , x>0
(1 + xθ )2
7
and θ > 0. Hence
l(x; θ) = 2 log(1 + xθ ) − (θ − 1) log x − log θ,
which is easily shown to be convex in θ for all x > 0. Further,
log x xθ
l0 (x; θ) = 2 − log x − 1/θ.
1 + xθ
This then gives us access to D(z, θ∗ ) and V (z, θ∗ ).

1.0
0.8
0.6
distribution
0.4
0.2
0.0
0 2 4 6 8 10
theta
Figure 2: (i) Solid line: Fθb∗ (z); (ii) dashed line: Fθ(z)
b .
The aim here is to compare the true distribution of θ;b i.e. F ∗ (z), based
θb
on a sample of size n = 10 with the estimate given by (2.1). We obtain
Fθb∗ (z) by simulating samples of size 10 with a true θ∗ = 2. Repeating this
multiple times and maximizing the likelihood each time yields a sample of
mle’s from which we construct the empirical distribution.
On the other hand, we compute Fbθb(z) by estimating D(z, θ∗ ) and V (z, θ∗ )
arbitrarily accurately using Monte Carlo methods. The two distributions are
plotted in Fig. 2; the bold line is the true distribution while the dashed line
is (2.1). As can be seen, they are remarkably close for a sample of size
n = 10.
8 S. G. Walker
3.3 Skew normal distribution

Here we consider the skew normal density, see [1], with density function
f (x; θ) = 2φ(x) Φ(θ x),
with θ ∈ R. Then l(x, θ) = − log f (x; θ) which is convex in θ for all x ∈ R.

√ p
Here we compare TN = n θb I(0), where I(0) = (φ(0)/Φ(0))2 is the Fisher
information evaluated at θ = 0, with
√ D(θ,
b 0)
T = nq .
b 0) − D2 (θ,
V (θ, b 0)
Specifically, we aim to investigate which is closer to a standard normal

variable, where θb is obtained from a sample with θ = 0.
We fix n = 15 and generate 5000 data sets with this sample size from a
standard normal distribution. This gives us 500 mle’s θb which in turn give
us 5000 values of TN and 5000 values of T . The first two moments of TN
are (−0.003, 1.797) while the first two moments for T are (−0.064, 0.799).
So both miss the second moment, but TN over–estimates substantially.
0.4
0.4
0.3
0.3
Density
Density
0.2
0.2
0.1
0.1
0.0
0.0
-5 0 5 -5 0 5 10
T T_N
Figure 3: Density estimates of T and TN with n = 25.

9
When n = 25 we plot the densities of T and TN , with the standard

normal density overlain, in Fig. 3. As can be seen, the T still has slightly
lighter tails to normal (a variance of 0.856) but the TN remains with heavy
tails (a variance of 1.379). With n = 100, we get the corresponding moment
values for TN as (−0.022, 1.263) and for T as (−0.033, 1.240).
4 Weighted likelihood bootstrap

The weighted likelihood bootstrap was introduced in [14] and is a way of
providing approximate posterior samples. Recently the idea has had a resur-
gence of interest; see [15] and [13].
The purpose of this section is to provide the exact distribution of a sam-
ple from the weighted likelihood bootstrap, in the case when − log f (x; θ)
is strictly convex in θ for all x. Here f (x; θ), with x ∈ X, and θ ∈ Θ, is a
family of density functions with Θ a one dimensional parameter space. Even
in this case, a full anayltical study of the weighted likelihood bootstrap has
not been done. Indeed, in Section 4 of [14] a first order approximation to a
proper Bayesian procedure is all that is obtained.
The weighted likelihood bootstrap draws a sample θ by minimizing
n
X
lw (θ) = wi l(xi ; θ)
i=1
where l(x; θ) = − log f (x; θ), and the w = (wi )i=1:n are from a Dirichlet
distribution with all parameters set to 1; i.e.
n
!
X
p(w) ∝ 1 wi ≥ 0, wi = 1 .
i=1
Another sample is taken by resampling w. In practice one can take

vi
wi = Pn
i=1 vi
where the (vi ) are independent and identically distributed as standard ex-
ponential, and minimize
n
X
lv (θ) = vi l(xi ; θ).
i=1
So note that the randomness is generated by the weights now rather than
the data.
10 S. G. Walker
The aim is to find F (z) = P(θ ≤ z). This result relies partly on knowing
the distribution of sums of independent exponentials; e.g.
n
X
S= vi ψi
i=1
where the ψi > 0. See, for example, [11].
4.1 Derivation of F (z)

The starting point is the observation that
n
!
X
(4.1) P(θ ≤ z) = P wi γi (z) ≥ 0 .
i=1
This follows due to the convexity of l(x; θ). Hence, we are interested in the
distribution of n
X
S(z) = wi γi (z)
i=1
and in particular
F (z) = P(S(z) ≥ 0).
Since we are only interested in the probability of S(z) being positive, we
P
can represent the (wi ) without their normalizing constant i=1:n wi , and
so we can take them as independent and identically distributed standard
exponential random variables, (vi ).
Now let us arrange S(z) = S1 (z) − S2 (z) where
X X
S1 (z) = vi γi (z) and S2 (z) = vi |γi (z)|
γi (z)>0 γi (z)<0
and
γi (z) = l0 (xi ; z),
where l0 denotes differentiation with respect to θ. If we now present the
labels so that for i = 1, . . . , m it is that γi (z) > 0, and for i = m + 1, . . . , n,
for some m ∈ {0, . . . , n}, it is that γi (z) < 0; then define λi (z) = 1/|γi (z)|.
We can assume all the λi are mutually distinct arising from the (xi ) being
continuous random variables.
The density function for S1 (z) is
"m # m
Y X
f1 (t; γ1 (z), . . . , γm (z)) = λi (z) q1i (z) e−λi (z)t ,
i=1 i=1
11
where, for i = 1, . . . , m,
Y 1
q1i (z) = .
k=1:m, k6=i
λk (z) − λi (z)
See [11]. Likewise, the density function for S2 (z) is

" n # n
Y X
f2 (t; γm+1 (z), . . . , γn (z)) = λi (z) q2i (z) e−λi (z)t ,
i=m+1 i=m+1
where, for i = m + 1, . . . , n,
Y 1
q2i (z) = .
k=m+1:n, k6=i
λk (z) − λi (z)
Hence, it is now easy to see that

Z ∞

F (z) = F̄1 t; γ1 (z), . . . , γm (z) f2 (t; γm+1 (z), . . . , γn (z)) dt,
0
where F̄ represents the survival function corresponding to density f . The

integration is straightforward leading to
n
! m n
Y X X q1l (z) q2j (z)
(4.2) F (z) = λi (z) .
i=1 l=1 j=m+1
λl (z) (λl (z) + λj (z))
While a complicated function of z, it is quite easy to compute numerically.
In Fig. 4 we present the exact weighted likelihood bootstrap distribu-

tion for the model f (x; θ) = θ xθ−1 , with θ > 0 and 0 < x < 1. In this
case l0 (x; θ) = −1/θ − log x, and we took n = 10 samples from a beta(2, 1)
distribution. The bold line is the exact distribution of the weighted likeli-
hood posterior for θ and the red dashed line is the approximate distribution
obtained from 1000 samples from the weighted likelihood bootstrap.
In Fig. 5 we present the exact weighted likelihood bootstrap posterior

density, obtained by the numerical differentiation of (4.2, compared with
the posterior density using Jeffrey’s prior. The model used is exponential;
i.e. f (x; θ) = θ exp(−xθ) and the n = 10 data points were taken from this
model with a true θ = 1/3.
In this example it is clear that the weighted likelihood bootstrap has less
posterior variance compared to that provided by the Jeffrey’s prior.
12 S. G. Walker
1.0
0.8
0.6
distribution
0.4
0.2
0.0
0 1 2 3 4 5 6
theta
Figure 4: Exact (bold line) and estimated (dashed red line) weighted likeli-
hood bootstrap posterior distribution for the beta model
0.7
0.6
0.5
0.4
density
0.3
0.2
0.1
0.0
0 2 4 6 8
theta
Figure 5: Exact (bold line) weighted likelihood bootstrap posterior density

and exact posterior with Jeffrey’s prior (dashed line) for the exponential
model
13
4.2 Asymptotic approximation

On inspection of (4.1) one can see that a normal type approximation (though
not a normal distribution for F ) is going to be provided by
Pn !
γ i (z)
(4.3) Fb(z) = Φ pPi=1 n 2
,
i=1 γi (z)
where Φ denotes the standard normal cumulative function. This follows

from standard asymptotic theory; namely that, for large n,
n
X
−1
Sn = n vi γi (z)
i=1
will be approximately normal with mean and variance given by

n
X n
X
−1 −2
E Sn = n γi (z) and Var Sn = n γi2 (z),
i=1 i=1
respectively. Then (4.3) follows since Pr(Z(µ, σ 2 ) ≥ 0) = Φ(µ/σ), where

Z(µ, σ 2 ) denotes a normal random variable with mean µ and variance σ 2 .
We can develop the asymptotic approximation further, relying on
n
X n
X n
X
−1 0 b = n−1
0=n l (xi ; θ) l (xi ; θ) + (θb − θ) n−1
0
l00 (xi ; θ) + o(|θb − θ|),
i=1 i=1 i=1
for small |θb − θ|, where θb is the maximum likelihood estimator. Given that
n n
X X 2
n−1 l00 (xi ; θ) and n−1 l0 (xi ; θ)
i=1 i=1
are asymptotically equivalent, both approximating the Fisher information,

I(θ), we obtain the asymptotic equivalence between
Pn
γi (z) √ p
pPi=1
n and n (z − θ)
b I(z).
2
i=1 γi (z)
Hence, a further asymptotic approximation to (4.3) is given by
(4.4) Fe(z) = Φ(Tn (z))
where
√ p
Tn (z) = n z − θb I(z).
It is possibe to see (4.3) as a Bayesian probability matching type procedure.
See, for example, [10]. In particular, in Section 3 of [12], the authors consider
√ p
Tn (θ) = n (θ − θ)b I(θ).
14 S. G. Walker
The probability matching idea is to treat Tn in two ways; first as a random

object induced by the random sample with θ as the fixed true value, and,
second, as random, with the data now fixed and the randomness induced by
a posterior distribution on θ, having found a suitable prior π(θ), to ensure
E [Pπ (Tn ≤ z|x1 , . . . , xn )|θ] = P(Tn ≤ z|θ) + o(1/n).
The former interpretation has Tn as asymptotically standard normal, a well

known result.
The need to provide such a matching via a prior to posterior procedure
has recently been challenged; see for example [9]. Accepting the idea that
one can directly construct a posterior, one can obtain (4.4) directly as a
posterior which is not based on any prior. In this scenario, an asymptotic
motivated probability matching “posterior”, asymptotically equivalent to
the weighted likelihood bootstrap, is provided by samples θ, whereby θ
solves
√ p
(4.5) n(θ − θ)
b I(θ) = z,
where z is a standard normal random variable.

Here we compare samples from (4.5) with those from a weighted likeli-
hood bootstrap with n = 100 data points and model f (x; θ) = exp(θ − xeθ ),
with θ ∈ (−∞, +∞). We took the data with true parameter as log 3 and
took 100 samples from (4.5) and the weighted likelihood bootstrap. The
two empirical distributions of the samples are presented in Fig. 6. The solid
bold line is for the weighted likelihood bootstrap and the dashed line for
the (4.5) samples.
5 Discussion
If l(x; θ) is strictly convex in θ for all x then we can obtain an accurate
estimate of the distribution of the maximum likelihood estimate using only
l0 (x; θ). For the multivariate case; i.e. Θ ⊂ Rd , it is not easy in general to
find
n n
!
X ∂ X ∂
(5.1) Fθb(z1 , . . . , zd ) = P l(xi ; z1 ) ≥ 0, . . . , l(xi ; zd ) ≥ 0 .
i=1
∂θ1 i=1
∂θd
Using a multivariate normal approximation, say MVNd (µ(z), Σ(z)), to the

vector ψ(z), where
n
1X ∂
ψj (z) = l(xi ; zj ),
n i=1 ∂θj
15
1.0
0.8
0.6
distribution
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
theta
Figure 6: Empirical distributions corresponding to samples from weighted

likelihood bootstrap (solid) and probability matching posterior (dashed) for
exponential model
we would have Fθb(z) ≈ P(Y (z) ≥ 0) with Y (z) ∼ MVNd (µ(z), Σ(z)). Here
Z Z
∂ ∗ ∂ ∂
µj (z) = l(x; zj ) f (x; θ ) x., Σjk = l(x; zj ) l(x; zk ) f (x; θ∗ ) dx.
∂θj ∂θj ∂θk
Approximating P(Y (z) ≥ 0) in multidimensions has been considered, for
example, by [2], who could only find adequate approximations up to 3 di-
mensions.
An approximate sampling strategy from (5.1) would involve the paramet-
ric bootstrap; see for example [4]. To sample z from (5.1) approximately,
take a sample xe = (e
x1 , . . . , x
en ) from f (·; θ)
b and take z as the mle with data
x
e; i.e. take X
z = arg min l(e
xi ; θ)
θ
1≤i≤n
as approximately coming from (5.1).

On the other hand, for the multivariate weighted likelihood bootstrap
we can use a sequence of conditional densities. So now assume that l(x; θ),
with θ ∈ Θ ⊂ Rd , is such that ∂ 2 l(x; θ)/∂θj2 ≥ 0 for all j = 1, . . . , d and all
x. Then
n
!
X
P(θj ≤ zj |θ−j = z−j ) = P vi ∂l(xi ; z)/∂θj ≥ 0 ,
i=1
16 S. G. Walker
where z = (z1 , . . . , zd ) = (zj , z−j ). Hence, we can find easily the conditional
density equivalent of (4.2); i.e. F (zj |z−j ) for each j ∈ {1, . . . , d}.
References
[1] A. Azzalini, A class of distributions which includes the normal ones.
Scandinavian Journal of Statistics (1985) 12:171–178.
[2] D. R. Cox and N. Wermuth, A simple approximation for bivariate

and trivariate normal integrals. International Statistical Review (1991)
59, 263–269.
[3] A. DasGupta, Asymptotic Theory of Statistics and Probability. (2008),

Springer Texts in Statistics.
[4] B. Efron, Bayesian inference and the parametric Bootstrap. Ann. Appl.
Statist. (2012) 6, 1971–1997.
[5] J. E. Kolossa, Series Approximation Methods in Statistics. (2006),

Springer Science & Business Media, Vol. 88.
[6] M. Kupperman, Probabilities of hypotheses and information–statistics

in sampling from exponential–class populations, Ann. Math. Statist.
(1958) 9, 571–575.
[7] E. L. Lehmann and G. Casella, Theory of Point Estimation. 2nd

Edition (1998), Springer Series in Statistics.
[8] J. Pfanzagl, The accuracy of the normal approximation for estimates

of vector parameters, Z. Wahr. Verw. Gebiete (1973) 25, 171–198.
[9] E. Belitser, On coverage and local radial rates of credible sets, Ann.
Statist. (2017) 45, 1124–1151.
[10] G. S. Datta and T. J. Sweeting, Probability matching priors. In

Handbook of Statistics 25, Bayesian Thinking: Modeling and Computa-
tion (D.K.Dey and C.R.Rao, eds.) (2005), 91–114. North–Holland, Ams-
terdam.
[11] W. Feller, An Introduction to Probability Theory and its Applica-

tions, Volume II, John Wiley & Sons, N.Y. (1965).
17
[12] M. Ghosh, U. Santra. and D. Kim, Probability matching priors

for some parameters of the bivariate normal distribution, IMS Collections:
Contributions in Honor of Jayanta K. Ghosh (2008) 3, 71–81.
[13] S. P. Lyddon, C. C. Holmes and S. G Walker, General Bayesian

updating and the loss–likelhood bootstrap, Biometrika (2019) 106, 465–478.
[14] M. A. Newton and A. E. Raftery, Approximate Bayesian infer-

ence with weighted likelihood bootstrap, J. R. Statist. Soc. B (1994) 56,
3–48.
[15] M. A. Newton, N. G. Polson and J. Xu, Weighted Bayesian

bootstrap for scalable Bayes, arXiv:1803.04559v1 (2018).

On An Asymptotic Distribution For The MLE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On An Asymptotic Distribution For The MLE

Uploaded by

Copyright:

Available Formats

On an Asymptotic Distribution for the MLE

The paper presents a novel asymptotic distribution for a mle when

l(x; θ) = − log f (x; θ),

f (x; θ) = c(x) exp{t(x)θ − b(θ)} so l(x; θ) = − log c(x) − t(x)θ + b(θ)

where b is the normalizing constant and known to be a convex function.

The paper is about an asymptotic distribution for θb which will be presented

(a) The parameter space Θ is an open interval.

(b) The set A = {x : f (x; θ) > 0} does not depend on θ.

(d) The Fisher information I(θ) satisfies

(e) Eθ [l0 (x; θ)] = 0 and Eθ [l00 (x; θ)] = I(θ).

As is well known, under these conditions,

where →d denotes convergence in distribution. This we also write as

2 An asymptotic distribution for θb

Under Assumption (e) we have that D(θ∗ , θ∗ ) = 0 and V (θ∗ , θ∗ ) = I(θ∗ ).

Our estimator of the distribution of the mle is based on this asymptotic

See [7] for further details on asymptotic normality of sample means.

is an estimator of the distribution of the mle.

and since An (z) →d N(0, 1),

for all z. Consequently, we have and can take the estimator

which is given by (2.1), completing the proof.

Lemma 2.2. From (2.1) we obtain

So note we recover an Edgeworth expansion type estimate for the distribtu-

Thus, we wish to compare

the new asymptotic distribution, which is not a normal distribution, with

(3.3) Fθb∗ (z) = 1 − Γn (nθ∗ /z),

where Γn is the distribution function of a gamma random variable with

3.2 Fisk distribution

and θ > 0. Hence

l(x; θ) = 2 log(1 + xθ ) − (θ − 1) log x − log θ,

which is easily shown to be convex in θ for all x > 0. Further,

This then gives us access to D(z, θ∗ ) and V (z, θ∗ ).

3.3 Skew normal distribution

f (x; θ) = 2φ(x) Φ(θ x),

with θ ∈ R. Then l(x, θ) = − log f (x; θ) which is convex in θ for all x ∈ R.

Specifically, we aim to investigate which is closer to a standard normal

Figure 3: Density estimates of T and TN with n = 25.

When n = 25 we plot the densities of T and TN , with the standard

4 Weighted likelihood bootstrap

Another sample is taken by resampling w. In practice one can take

where the ψi > 0. See, for example, [11].

4.1 Derivation of F (z)

See [11]. Likewise, the density function for S2 (z) is

Hence, it is now easy to see that

where F̄ represents the survival function corresponding to density f . The

While a complicated function of z, it is quite easy to compute numerically.

In Fig. 4 we present the exact weighted likelihood bootstrap distribu-

In Fig. 5 we present the exact weighted likelihood bootstrap posterior

Figure 5: Exact (bold line) weighted likelihood bootstrap posterior density

4.2 Asymptotic approximation

where Φ denotes the standard normal cumulative function. This follows

will be approximately normal with mean and variance given by

respectively. Then (4.3) follows since Pr(Z(µ, σ 2 ) ≥ 0) = Φ(µ/σ), where

are asymptotically equivalent, both approximating the Fisher information,

Hence, a further asymptotic approximation to (4.3) is given by

(4.4) Fe(z) = Φ(Tn (z))

The probability matching idea is to treat Tn in two ways; first as a random

E [Pπ (Tn ≤ z|x1 , . . . , xn )|θ] = P(Tn ≤ z|θ) + o(1/n).

The former interpretation has Tn as asymptotically standard normal, a well

where z is a standard normal random variable.

Using a multivariate normal approximation, say MVNd (µ(z), Σ(z)), to the

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 6: Empirical distributions corresponding to samples from weighted

as approximately coming from (5.1).