You are on page 1of 22

Proposed noninformative and informative priors

for exponential-logarithmic distribution


Fernando A. MOALA,∗
Lı́via Matos GARCIA†
October 26, 2016

Abstract
The exponential-logarithmic is a new lifetime distribution with decreasing fail-
ure rate and interesting applications in the biological and engineering sciences.
Assuming different noninformative prior distributions for the parameters of the
distribution, we introduce a Bayesian analysis using MCMC (Markov Chain
Monte Carlo) methods. In this paper distinct noniformative priors, such as Jef-
freys, reference, MDIP (maximal data information prior) and independent gam-
mas, are derived and compared for the Bayesian inference of the two-parameters
exponential-logarithmic distribution.
then consider noninformative and weakly informative priors in this family.
We use an example to illustrate serious problems with the inverse-gamma family
of “noninformative” prior distributions. We suggest instead to use a uniform
prior on the hierarchical standard deviation, using the half-t family when the
number of groups is small and in other settings where a weakly informative prior
is desired. We also illustrate the use of the half-t family for hierarchical modeling
of multiple variance parameters such as arise in the analysis of variance. Various
noninformative prior distributions have been suggested for scale parameters in
hierarchical models.
A maximum cross-entropy method for determining the reference prior distri-
bution, which represents partial information or complete ignorance, is proposed
and studied in this paper, several properties and interpretations are gi
Much criticism of Bayesian analysis concerns the fact that the result of the
analysis depends on the choice of prior, and that the assignment of this prior
seems rather subjective. Is there some objective way of assigning a prior in the
case that we know little about its possible distribution?
The fundamental limitation of Bayesian statistics lies in the selection of a
suitable expression for the prior probability; in our example the prior probability
of various maps of the radio sky before are given any dat
∗ Department of Statistics, State University of Sao Paulo, Brazil, Email:
femoala@fct.com.br
† Department of Statistics, State University of Sao Paulo, Brazil, Email: ???

1
The problem of determining appropriate prior distributions corresponding
to such ignorance has led to considerable controversy since the time of Laplace
and there is, as yet, no entirely satisfactory way of assigning priors, even in
the simplest situations. Therefore, at this stage, we suggest that it is most
useful to understand the properties of the prior distributions that arise from
the various entropy expressions and to collect arguments that lead to different
priors, without attempting to decide which, if any, are correct
A simulation study is also carried out to compare the performance of the
prior distributions presented in this paper.
Bayesian phylogenetic methods require the selection of prior probability dis-
tributions for all parameters of the model of evolution. These distributions
allow one to incorporate prior information into a Bayesian analysis, but even
in the absence of meaningful prior information, a prior distribution must be
chosen. In such situations, researchers typically seek to choose a prior that
will have little effect on the posterior estimates produced by an analysis, al-
lowing the data to dominate. Sometimes a prior that is uniform (assigning
equal prior probability density to all points within some range) is chosen for
this purpose. In reality, the appropriate prior depends on the parameterization
chosen for the model of evolution, a choice that is largely arbitrary. There is
an extensive Bayesian literature on appropriate prior choice, and it has long
been appreciated that there are parameterizations for which uniform priors can
have a strong influence on posterior estimates. We here discuss the relationship
between model parameterization and prior specification, using the general time-
reversible model of nucleotide evolution as an example. We present Bayesian
analyses of 10 simulated data sets obtained using a variety of prior distributions
and parameterizations of the general time-reversible model. Uniform priors can
produce biased parameter estimates under realistic conditions, and a variety of
alternative priors avoid this bias.
For a Bayesian analysis of the EL(p, β) distribution, we assume different
prior distributions for p and β. In a situation which we have no prior informa-
tion, we want a prior with minimal influence on the inference. There is in the
Bayesian literature several forms of formulating noninformative priors, for ex-
ample, Jeffreys prior (1967), reference prior (Bernardo (1979), Berger&Bernardo
(1991)), MDIP prior (Zellner, 1977, 1984, 1990) and others.
In this work, we first computethe Bayes estimates and construct the credible
intervals of á and ë with respect to different priors, and compare these with
the classical maximum likelihood estimators (MLEs) and with the confidence
intervals based on the asymptotic distributions of the MLEs.
The problem of assigning probability distributions which reflect the prior in-
formation available about experiments is one of the major stumbling blocks in
the use of Bayesian methods of data analysis. In this paper the method of max-
imum (relative) entropy (ME) is used to translate the information contained in
the known form of the likelihood into a prior distribution for Bayesian inference.

Keywords: Exponential-logarithmic distribution, noninformative prior, refer-


ence, Jeffreys, MDIP, orthogonal, MCMC.

2
1 Introduction

The exponential-logarithmic is a new lifetime distribution with decreasing


failure rate introduced by Tahmasbi and Rezaei (2008).
This distribution can be used to study the lengths of organisms, devices,
materials, etc, in the biological and engineering sciences. MAIS INFORMAÇÃO
AQUI****
The exponential-logarithmic could be a good alternative for the use of life-
time distributions due to its simple forms of survival and risk functions, among
some distributions with decreasing failure rate such as Gamma.
Tahmasbi and Rezaei (2008) also point out that the initial and long-term
hazards for this distribution are both finite in contrast to those of Weibull
distribution.
Let be T representing the lifetime of a component with a Exponential–
Logarithmic distribution, denoted by EL(p, β) which density is given by
β(1−p)e−βt

1
f (t | p, β)= − ln p 1−(1−p)e−βt
, for all t > 0,
depending on the parameters p ∈(0, 1) and β > 0.
Observe that if p → 1, the EL(p, β) reduces to the exponential distribution
with parameter β. The EL(p, β) probability density function is displayed in
Figure 1 for selected parameter values.
3.0
2.5

p=0.3, β=1
p=0.5, β=2
p=0.9, β=3
2.0
1.5
f(x)

1.0
0.5
0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Figure 1: 1(a):pdf and 1(b):hazard function (h(x))

The cumulative distribution function is given by

3
 
ln 1 − (1 − p)e−βt
F (t; p, β) = 1 − (1)
ln p
and hence the survival and hazard function associated to (1) are given, respec-
tively, by,
 
ln 1 − (1 − p)e−βt
S(t; p, β) = P {T > t} = (2)
ln p
and

−β(1 − p)e−βt
h(t; p, β) =    , (3)
1 − (1 − p)e−βt ln 1 − (1 − p)e−βt

In Figure 2, we have the plots of the survival function (3) and hazard function
(4) assuming different values of p and β.
1.0

p=0.3, β=1
0.8

p=0.5, β=2
p=0.8, β=3
0.6
S(x)

0.4
0.2
0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

x
1.4

p=0.2, β=1
p=0.5, β=1
p=0.8, β=1
1.3
h(x)

1.2
1.1
1.0

0 1 2 3 4 5

Figure 2: (a): Survivalfunction h(x) and (b):hazard function h(x)

A very simple method for generating EL(p, β) distribution is based on in-


verse transform sampling. Given a random variable U drawn from the uniform
distribution on the interval (0, 1), then the variable
1  1−p 
X= ln . (4)
β 1 − pU

4
has the EL(p, β) distribution with parameters p and β.
Several other properties, such as mean lifetime, moments generating mo-
ments, mode, mean and variance, etc, for the EL(p, β) distribution are derived
in Tahmasbi and Rezaei (2008).
The estimation of the parameters attained by the EM algorithm and their
asymptotic variances and covariances of the proposed distribution are discussed
in Tahmasbi and Rezaei (2008).COPIEI ENTAO MUDAR
Now, suppose we have a complete random sample T1 , . . ., Tn from EL(p, β).
The likelihood function in the parameters p and β, based on T , is then:
Pn
1 n β n (1 − p)n e−β i=1 ti

L(p, β| t) ∝ − Qn −βti
. (5)
i=1 1 − (1 − p)e
ln p
The log-likelihood function based on the observed sample is given by

n
X n
X
log L(p, β| t) = −nlog(−log p)+nlog β+nlog(1−p)−β ti −− log[1−(1−p)e−βti ].
i=1 i=1
(6)
The maximum likelihood estimators (M LE) for the parameters p and β of
EL(p, β) are obtained by the first derivatives of L = log L(p, β | t) given by
n
∂ n n X e−βti
ln L = − − − (7)
∂p p ln p 1−p i=1
1 − (1 − p)e−βti
and
n n
∂ n X X (1 − p)ti e−βti
ln L = − ti − . (8)
∂β β i=1 i=1
1 − (1 − p)e−βti

Because it is not easy to solve the likelihood equations ∂p log L = 0 and

∂β log L = 0, directly, we need to use an iterative approach to find the M LE for
p and β.
To achieve estimations via MLE approach, Tahmasbi and Rezaei (2008)
derive the conditions for the existence and uniqueness of the MLE for p and β
when the other parameter is given or known.
Tahmasbi and Rezaei (2008) also shown in detail that for the EL(p, β)
distribution the Fisher information matrix is given by,

(1−4p)/p2 −2 ln p+3
" #
− (plnlnp+1
p)2 +
1
(1−p)2 + 2(1−p)2 ln p
1−p+p ln p
2βp(1−p)ln p
I(p, β) = n , (9)
1−p+p ln p
2βp(1−p)ln p − dilog(p)
β 2 ln p
R x ln x
where dilog(·) is the dilogarithm function definided as dilog(x)= 1 1−x dx.
Thus, the maximum likelihood estimators pb and β have asymptotic joint
b
normal distribution given by

(b b ∼ N2 [(p, β), I −1 (p, β)]


p , β) f or n → ∞. (10)

5
2 Bayesian inference and Jeffreys prior

For a Bayesian analysis of the EL(p, β) distribution, we assume different


prior distributions for p and β. In a situation which we have no prior informa-
tion, we want a prior with minimal influence on the inference. There is in the
Bayesian literature several forms of formulating noninformative priors, for ex-
ample, Jeffreys prior (1967), reference prior (Bernardo (1979), Berger&Bernardo
(1991)), MDIP prior (Zellner, 1977, 1984, 1990) and others.
In this paper, we develop a Bayesian analysis for the exponential-logarithmic
distribution using MCMC (Markov Chain Monte Carlo) methods (see for ex-
ample, Gelfand and Smith, 1990; or Chib and Greenberg, 1995) to obtain the
posterior summaries of interest. For this Bayesian analysis noninformative pri-
ors are used and their performance for two-parameter EL(p, β) distribution is
investigated. We derived the Jeffreys prior (1967), reference prior (Bernardo
(1979), Berger and Bernardo (1991)), MDIP prior (Zellner, 1977, 1984, 1990)
for the parameters and a comparison of these priors and independent gamma
priors is performed. An exhibition of each prior as well as of their properties it
is also presented.
A numerical illustration to compare the performance of the prior distribu-
tions, with simulated data from the EL(p, β) distribution, is also examined in
this paper.
The paper is organized as follows: in Section 2, we derive the Jeffreys pri-
ori and a Bayesian analysis is carried out, independence gamma priors for the
parameters is also considered; sections 3 and 4 dedicated to the study of the
MDIP and reference prior distributions, respectively, and these priors are de-
rived for the EL(p, β). Section 5??**** COPULAS***. Section 6 illustrates
and discusses the results from the simulation performance.
Moala, at al. (2009) derive the information expected information matrix for
the parameters (R, W ), as ADAPTAR
The Jeffreys prior is the more common prior used in the litearature and
given by
p
πJ (p, β) ∝ det I(p, β).(14) (11)
where I(p, β) is the Fisher Information matrix derived in (12).
Box and Tiao (1973) give an explaining of the derivation of Jeffreys prior in
terms of “data translated” likelihood.
Thus, for the parameters p and β of EL(p, β) distribution the Jeffreys prior
is given by:
1p
π(p, β) ∝ h(p), (15) (12)
β
where

6
 ln p + 1 1 (1 − 4p)/p2 − 2 ln p + 3  dilog(p)  1 − p + p ln p 2
h(p) = − − − .(16)
(p ln p) (1 − p)2
2 2(1 − p)2 ln p ln p 2p(1 − p)ln p
(13)
It is interesting note that this prior provides an independence prior although
its dependence structure.
Other common specification of noninformative prior considered in the liter-
ature is given by the product of independent Gamma distributions,

πp (p) ∼ Gamma(ap , bp )(15)


πβ (β) ∼ Gamma(aβ , bβ )(14)with hyperparameters ap , bp , aβ and bβ assun-
ing values 0.01 or 0.001 to provide no prior information.
Thus the joint posterior distribution for the parameters p and β is propor-
tional to the product of the likelihood function (6) and the priors π(p, β), given
by (14) or (15), resulting in:
Pn
 1 n β n (1 − p)n e−β i=1 ti
p(p, β |t) ∝ − Qn −βti
 π(p, β).(16) (15)
i=1 1 − (1 − p)e
ln p
As we are not able to find an analytic expression for marginal posterior
distributions and hence to extract characteristics of parameters such as Bayes
estimator, and credible intervals, we need to appeal to the MCMC algorithm to
obtain a sample of values of p and β from the joint posterior.

3 Maximal data information prior (MDIP)

The mathematical statement of our background knowledge is defined in


terms of Shannon’s entropy (or sense of diffusion or uncertainty).
We employ maximum entropy to choose among a priori probability distri-
butions subject to our knowledge of moments (of functions) of the distribution
(e.g., the mean). When new evidence is collected, we’re typically interested
in how this data impacts our posterior beliefs regarding the parameter space.
Of course, Bayes’ theorem provides guidance. For some problems, we simply
combine our maximum entropy prior distribution with the likelihood function
to determine the posterior distribution. Equivalently, we can find (again, by
Lagrangian methods) the maximum relative entropy posterior distribution con-
ditional first on the moment conditions then conditional on the data so that the
data eventually outweighs the moment conditions
Now, imagine beginning with complete ignorance (uninformed priors) then
diversity of posterior beliefs derives entirely from differences in likelihood func-
tions say due to differences in interpretation of the veracity of the evidence
and/or asymmetry of information as defined by Jaynes, maximum entropy is a
way to construct a prior distribution that (a) satisfies the constraints imposed

7
by E and (b) has the maximum entropy, relative to a reference measure in the
continuous case:
It is of interesting that the data gives more information about the param-
eter than the information on the prior density, otherwise, there would not be
justification for the realization of the experiment. Thus, we wish a prior dis-
tribution π(φ) that provides the gain in the information supplied by the data
the largest as possible relative to the prior information of the parameter, that
is, maximizes the information on the data. With this idea Zellner (1977, 1984,
1991) and Zellner and Min (1992) derived a prior which maximize the average
information in the data density relative to that one in the prior.
Shannon defines entropy as
Z b
H(φ) = f (x | φ)lnf (X | φ)dx(11) (16)
a

be the negative entropy of f (x | φ), the measure of the information in f (x | φ).


Thus, the following functional criterion is employed in the MDIP approach:
Z b Z b
G[π(φ)] = H(φ)π(φ)dφ − π(φ)ln π(φ)dφ(12) (17)
a a
which is the prior average information in the data density minus the infor-
mation in the prior density. G[π(φ)] is maximized by selection of π(φ) subject
Rb
to a π(φ)dφ = 1.
The solution is then a proper prior given by
n o
π(φ) = kexp H(φ) a ≤ φ ≤ b, (13) (18)
Rb n o
where k −1 = a exp H(φ) dφ is the normalizing constant.
Therefore, the MDIP is a prior that leads to an emphasis on the information
in the data density or likelihood function, that is, its information is weak in
comparison with data information. Moala (1993) mostra a construção dessa
priori em detalhes.
Zellner (1991, 1992) shows several interesting properties of MDIP and addi-
tional conditions that can also be imposed to the approach refleting given initial
information. However, the MDIP has restrictive invariance properties.
We suppose that we do not have much prior information available about
α and β. Therefore, under this condition, the prior distribution MDIP for the
parameters (α, β) of Weibull density (1) is also appropriated for our inference
problems. Firstly, we have to evaluate the measure information
Optimizing over p as functions requires calculus of variations, but the solu-
tions wil
l be in the exponential
family and thus suffer the same drawbacks as the discrete case.
Bayesian prior distribution assignment using the principle of maximum en-
tropy.

8
Theorem 1: The MDIP for the parameters p and β of exponential-logarithmic
distribution (1) is given by:
n o
 1  β(1 − p)exp − polylog(2,1−p)
ln p
πZ (p, β) ∝ − √ .(19) (19)
ln p p
P∞ k
where polylog(z, α) = k=1 kzα .
proof: Firstly, we have to evaluate the measure information,

∞
1  β(1 − p)e−βt  1 β(1 − p)e−βt 
Z
H(p, β) = − ln − dt(20) (20)
0 ln p 1 − (1 − p)e−βt ln p 1 − (1 − p)e−βt

and after some algebras,

1 h  i
H(p, β) = ln(− ) + ln β + ln(1 − p) − βE(T ) − E ln 1 − (1 − p)e−βT .(21)
ln p
(21)
polylog(2,1−p)
Since the mean of the EL(p, β) distribution is E(T ) = β ln p and
h  i
−βT 1
E ln 1 − (1 − p)e = 2 ln p , we have,

1 polylog(2, 1 − p) 1
H(p, β) = ln(− ) + ln β + ln(1 − p) − − lnp.(22) (22)
ln p ln p 2

Now, using H(p, β) above in equation (18), the resulting MDIP for the
parameters p and β will be given by:

n o
n o  1  β(1 − p)exp − polylog(2,1−p)
ln p
πZ (p, β) ∝ exp H(p, β) ∝ − √  (23)
ln p p

Corollary: On combining the likelihood (6) and prior (19), the joint posterior
distribution for the parameters p and β is

n  Pn o
n+1 n+1 polylog(2,1−p)
 1  n+1 β (1 − p) exp − ln p + β i=1 t i
p(p, β| t) ∝ − √ Qn  .(23)
ln p p i=1 1 − (1 − p)e−βti
(24)

4 Refence prior

Another well-known class of noninformative priors is the reference prior first


described by Bernardo (1979) and further developed by Berger and Bernardo

9
(1992). Reference prior requires the knowledgment of which parameter is nui-
sance or interest.
The idea is to derive a prior π(φ) that maximizes the expected posterior
information about the parameters provided by independent replications of an
experiment relative to the information in the prior. A natural measure of the
expected information about φ provided by data x is given by

I(φ) = Ex [K(p(φ|x), π(φ))], (36) (25)


where

p(φ | x)
Z
K(p(φ|x), π(φ)) = p(φ|x)log dφ(37) (26)
Φ π(φ)
is the Kullback-Leibler distance. So, the reference prior is defined as the prior
π(φ) that maximizes the expected Kullback-Leibler distance between the pos-
terior distribution and the prior distribution π(φ), taken over the experimental
data.
The prior density π(φ) which maximizes the functional (36) is found through
calculus of variation and, the solution is not explicit. However, when the pos-
terior p(φ | x) is asymptotically normal, this approach leads to Jeffreys prior
for a single parameter situation. If on the other hand, we are interested in one
of the parameters, being the remaining parameters nuisances, the situation is
quite different, and the appropriated reference prior is not a multivariate Jef-
freys prior. Bernardo argues that when nuisance parameters are present the
reference prior should depend on which parameter(s) are considered to be of
primary interest. The reference prior in this case is derived as follows. We will
present here the two-parameters case in details. For the multiparameter case,
see Berger and Bernardo (1992).
Let θ = (θ1 , θ2 ) be the whole parameter, θ1 being the parameter of interest
and θ2 the nuisance parameter. The algorithm is as follows:
Step 1: Determine π2 (θ2 | θ1 ), the conditional reference prior for θ2 assuming
that θ1 is given,
p
π2 (θ2 | θ1 ) = I22 (θ1 , θ2 )(38) (27)
where I22 (θ1 , θ2 ) is the (2,2)-entry of the Fisher Information Matrix.
Step 2: Normalize π2 (θ2 | θ1 ).
Case π2 (θ2 | θ1 ) is improper, choose a sequence of subsets Ω1 ⊆ Ω2 ⊆ . . . Ω
on which π2 (θ2 | θ1 ) is proper. Define
1
cm (θ1 ) = R (39) (28)
Ωm
π 2 (θ2 | θ1 )dθ2

pm (θ2 | θ1 ) = cm (θ1 )π2 (θ2 | θ1 )1Ωm (θ2 )(40) (29)


Step 3: Find the marginal reference prior for θ1 , i.e., the reference prior for
the experiment formed by marginalizing out with respect to pm (θ2 | θ1 ). We
obtain

10
n 1 Z
det I(θ1 , θ2 ) o
πm (θ1 ) ∝ exp pm (θ2 | θ1 )log
dθ2 (41) (30)
2 Ωm I22 (θ1 , θ2 )
Step 4: Compute the reference prior for (θ1 , θ2 ) when θ2 is a nuisance pa-
rameter:
 
cm (θ1 )πm (θ1 )
π(θ1 , θ2 ) = li m π(θ2 | θ1 )(42) (31)
m→∞ cm (θ ∗ ) πm (θ ∗ )
1 1
where θ1∗ is any fixed point in with positive density for all πm .
*************FAZER UM LINK AQUI**********************
Let f (x | θ1 , θ2 ), (θ1 , θ2 )∈ Φ × Λ ⊆ <2 be a probability model with two
real-valued parameters θ1 and θ2 , where θ1 is the quantity of interest and sup-
pose that the joint posterior distribution of (θ1 , θ2 ) is asymptotically normal
with covariance matrix S(θ1 , θ2 ) with S(θ1 , θ2 )= I −1 (θ1 , θ2 ) where S11 (θ1 , θ2 )=
I22 (θ1 , θ2 )
det I(θ1 , θ2 ) .
If the nuisance parameter space Λ( θ1 )= Λ is independent of θ1 , and the
−1/2 1/2
functions S11 (θ1 , θ2 ) and I22 (θ1 , θ2 ) factorize in the form
−1/2 1/2
S11 (θ1 , θ2 ) = f1 (θ1 )g1 (θ2 ) , I22 (θ1 , θ2 ) = f2 (θ1 )g2 (θ2 )(24) (32)
then,

π(θ1 ) ∝ f1 (θ1 ) , π(θ2 | θ1 ) ∝ g2 (θ2 ), (25) (33)


the reference prior relative the parametric value ordered (θ1 , θ2 ) is given by

πθ1 (θ1 , θ2 ) = f1 (θ1 )g2 (θ2 ) , (26) (34)


and in this case, there is no need for compact approximation, even if the
conditional reference prior is not proper.
Typically, a change in the parameter of interest can give a different result,
i.e., the reference prior depends on the actual parameter of interest.
The method can also be generalized to multidimensional parameter spaces.
Now, we can derive the reference prior for the parameters of the EL(p, β)
model given in (1).
Theorem 2: a) If p is considered the parameter of interest and β the nuisance
parameter, then the reference prior for (p, β) is given by,
s
1 h(p) ln p
πp (p, β) = − , (27) (35)
β dilog(p)
where h(p) is given in (16).
b) If β is the parameter of interest and p the nuisance parameter then the
reference prior for (p, β) is given by,
1p
πβ (p, β) = Ipp , (29) (36)
β

11
where Ipp is given from (12) by

ln p + 1 1 (1 − 4p)/p2 − 2 ln p − 3
Ipp = − 2
+ 2
+ .(30) (37)
(p ln p) (1 − p) 2(1 − p)2 ln p
proof: a) The inverse of the Fisher matrix given in (12) is,
 " dilog(p) #
− h(p) ln p Spβ

Spp Spβ
S(p, β) = = β 2 Ipp , (31) (38)
Spβ Sββ Spβ h(p)
−1
As Spp (p, β) and Iββ (p, β)= −dilog(p)/(β 2 ln p) factorize, we find the refer-
ence prior for the nuisance parameter conditionally on the parameter of interest
given by
1
π(β|p) ∝ , (32) (39)
β
and the marginal reference prior of the parameter of interest p as
1
π(p) ∝ p , (33) (40)
Spp
The joint reference prior needed to obtain a reference posterior for the pa-
rameter of interest p and β is given by
1
πp (p, β) = π(β|p)π(p) = p . (34) (41)
β Spp
b) In a similar way, we obtain the reference prior considering β as the pa-
rameter of interest and p the nuisance parameter. 

5 Elicited prior distribution

????????????? A Elicitao de prioris uma importante ferramenta dentro


da anlise Bayesiana, porm pouco pesquisada. Em estudos envolvendo dados
reais, utilizar a anlise Bayesiana o mais indicado principalmente quando se
tem o conhecimento de especialistas. Alm disso, muito importante considerar
e obter corretamente a opinio de especialistas a fim de complementar as in-
formaes obtidas pela amostra. A elicitao, por sua vez, o processo de extrao
desse conhecimento. Com as informaes obtidas atravs da elicitao, pode-se con-
struir distribuies a priori. Em qualquer anlise estatstica sempre haver alguma
forma disponvel de conhecimento mais aprofundado sobre o assunto alm dos
dados obtidos pela amostra. Por exemplo, considere que o tempo de vida mdio
de um componente esteja sendo investigado. Espera-se que se realizem testes
em amostras desses componentes a fim de estimar seus tempos de vida mdios;
porm, alm dos estudos realizados nas amostras, possvel obter do projetista as
suas expectativas pessoais sobre a vida do componente. Se o conhecimento do

12
projetista a respeito das caractersticas dos componentes pode ser representado,
ento este conhecimento adicional (a priori) pode ser utilizado dentro do sistema
Bayesiano de inferncia. Contudo, o conhecimento a priori no frequentemente
considerado nas anlises Bayesianas devido dificuldade em represent-lo ou ale-
gando que pela grande quantidade de dados o conhecimento a priori tende a
ter pouco efeito nas inferncias finais ou ainda devido s vrias tcnicas disponveis
para utilizar prioris no-informativas. Porm, ao se investigar fenmenos novos
ou raros, os poucos dados podem no ser suficientes para se tomar uma deciso
estatstica. Como exemplos tm-se os modelos dos danos causados por terremo-
tos, o estudo de doenas raras, etc. Nem sempre haver dados suficientes para
que se possa ignorar o conhecimento a priori, e consequentemente, a opinio do
perito pode ser uma das poucas fontes de informao. Alm disso, h casos em
que torna-se difcil obter concluses sobre certos parmetros em modelos estat-
sticos, mesmo com uma quantia razovel de dados. Quando se trabalha com
dados reais, alm de poucos dados, pode-se encontrar tambm dados censurados,
o que justifica ainda mais a utilizao da Inferncia Bayesiana. Tal contexto en-
contrado principalmente na indstria, onde no possvel obter muitas amostras
devido ao encarecimento da pesquisa ou pela demora em ocorrer uma falha.
Quando isso acontece, possvel acelerar o processo, o que pode levar a dados
no condizentes com a realidade. Neste caso, todas as fontes de conhecimento
como a opinio de especialistas devem ser levadas em conta. Vrios autores tm
buscado o estudo da elicitao da priori (), que formaliza o conhecimento do es-
pecialista. Na prtica, numa tomada de deciso, a escolha de uma priori deve-se
considerar o seu prtico manuseio (recursos explcitos e fcil amostragem), que
de grande importncia para estudos de sensibilidade. Definies estatsticas como
matriz de Fisher inversa so, na maioria das vezes, demasiadamente tcnicas e
inacessveis. A principal ideia de elicitao vem de uma viso simples da opinio de
um especialista. possvel considerar que a opinio de um especialista perfeito
deve ser, a grosso modo, semelhante a um levantamento de dados reais, e deve
proporcionar amostras independentes e identicamente distribudas (i.i.d.) dos
dados. Porm, impraticvel esperar que um especialista seja capaz de estabele-
cer valores da densidade a priori. Sabe-se que no fcil responder a perguntas
como: ”qual a probabilidade da varivel de interesse assumir tal valor?”. O
mais razovel seria pedir valores vinculados probabilidades, como P(X ¿ x), ou
seja, o mais conveniente perguntar ”qual o valor que a varivel poderia assumir
para uma certa probabilidade?”. Alm disso, estudos indicam que os momentos
de segunda ordem no so recomendados para serem elicitados. A recomendao
sobrevm da dificuldade de especialistas com conhecimento em conceitos estat-
sticos (ou at mesmo estatsticos) fornecerem medidas de variabilidade precisas
(como varincia e desvio-padro). Em geral, a varincia subestimada. A escolha
mais comum obter quantis, como a mediana ou os quartis, pois so mais fceis
de explicar aos especialistas. Da mesma forma, geralmente intervalos de credi-
bilidade so bem entendidos e avaliados com bastante preciso, desde que no seja
exigido do especialista o fornecimento de intervalos com alta probabilidade de
cobertura (por exemplo 95a elicitao de quantis permite a avaliao adequada da
varincia da distribuio a priori do especialista, em vez de elicit-la diretamente.

13
OHagan (1998) d um exemplo do uso de percentis elicitados para estimao da
varincia. Portanto, o melhor estabelecer probabilidades de fcil entendimento
para leigos em estatstica e pedir a eles que proponham valores correspondentes
essas porcentagens. Desta maneira a informao fornecida pelos especialistas ser
obtida de forma mais fiel realidade.

5.1 Aproximao de Laplace


possvel encontrar, com frequncia, situaes em que no possvel resolver integrais
analiticamente, isto , no h como determinar uma frmula fechada para a integral.
Uma possibilidade para se contornar este problema utilizar mtodos numricos de
aproximao. Porm, na Inferncia Bayesiana necessita-se de uma expresso para tais
integrais e no do valor numrico da integral. Um dos mtodos de aproximao mais
populares para integrais, dado pelo Mtodo de Laplace (Tierney and Kadane
(1986)). O Mtodo de Laplace para Aproximao de Integrais uma tcnica muito
utilizada na Inferncia Bayesiana para o clculo de posteriori marginais, momentos
a posteriori e densidades preditivas. baseado na expanso de Taylor de 2a ordem
aplicada ao termo exponencial do integrando. Nesta seo ser dada uma breve e
informal descrio do mtodo e suas propriedades. Para mais detalhes, ver Kass et
al. (1990).
Seja h : Rm → R uma fun??o diferenci?vel de φ com −h tendo um ?nico
ponto de m?ximo φ̂. O M?todo de Laplace aproxima uma integral da forma:
Z ∞
I= f (φ)exp{−nh(φ)}dφ (42)
0

expandindo h e f sobre φ̂.


A fun??o f (φ̂) ? escolhida de modo que φ̂ seja obtido de forma expl?cita.
Para o caso em que φ ? um par?metro unidimensional obt?m-se:
∞ 1/2
2πσ 2
Z 
f (φ)exp{−nh(φ)}dφ ∼
= f (φ̂)exp{−nh(φ̂)} (43)
0 n

onde σ = h(φ̂)−1/2
[?] argumentam que:

I = Iˆ · 1 + O(n−1 ) (44)

 2 1/2
onde Iˆ = 2πσ
n f (φ̂)exp{−nh(φ̂)}.
Para o caso m-dimensional,

Z ∞  m/2

f (φ)exp{−nh(φ)}dφ ∼
= det(Σ)1/2 f (φ̂)exp{−nh(φ̂)} (45)
0 n

onde Σ−1 = D2 h(φ̂) (a Hessiana de h para φ̂).

14
6 Elicita??o da confiabilidade
No caso da Elicita??o da confiabilidade R(t), o especialista indica um valor
condicionado a uma probabilidade especificada. Ou seja, dada uma probabil-
idade p, pergunta-se ao especialista qual o valor que ele acredita que p% dos
produtos est?o acima desse valor.
Portanto, k valores t1 , t2 , ..., tk s?o escolhidos e o especialista fornece suas
probabilidades Rt1 , Rt2 , ..., Rtk com

Rtj = P (T ≥ tj ), j = 1, ..., k, (46)

que nada mais ? do que a confiabilidade de R dado os par?metros p e β.


A escolha das probabilidades que ser?o pedidas ? uma tarefa muito impor-
tante. ? importante levar em conta que as probabilidades devem ser de f?cil
entendimento para quem n?o possui conhecimentos em estat?stica. Neste caso,
foram utilizados os percentis 10◦ , 25◦ , 50◦ e 75◦ .
Considerando as prioris Beta(a, b) para o par?metro p e Gama(c, d) para β
e supondo independ?ncia entre os par?metros, a express?o em (??) pode ser
definida como:
Z ∞ Z 1
P (T ≥ t|a, b, c, d) = P (T ≥ t|a, b, c, d)π(p, β)dpdβ
0 0


2πΓ(a + b)dc β̂ c (1 − e−û )b
   
1 1
=− p exp − a − û + dβ̂ ln[1−(1−e−û )e−β̂t ]
Γ(a)Γ(b)Γ(c) (b − 1)(c − 1) 2 û
(47)

onde û = ln a+b−1 e β̂ = c−1


 
2 d , de modo que a + b > 1 e c > 1.
Portanto, utilizando a equa??o (??) ? poss?vel encontrar valores para (a, b, c, d)
atrav?s dos quatro valores (t1 , t2 , t3 , t4 ), correspondentes aos quatro percentis
10%, 25%, 50% e 75%, indicados pelo especialista. As quatro equa??es formam
um sistema n?o-linear que ser? calculado no software R utilizando o pacote
”nleqslv”.
Proof. Seja S(t|p, β) a fun??o de sobreviv?ncia dada em (??). Al?m disso, sejam
p ∼ Beta(a, b) e β ∼ Gama(c, d), isto ?,

Γ(a + b) a−1 d
π(p|a, b) = p (1 − p)b−1 e π(β|c, d) = (dβ)c−1 e−dβ .
Γ(a)Γ(b) π(c)
Sabe-se que:
Z 1 Z ∞
S(t) = P (T > t|p, β)π(p, β)dpdβ. (48)
0 0
Supondo p e β independentes, tem-se que π(p, β) = π(p)π(β), ou seja,

15
Γ(a + b)dc a−1
π(p, β) = p (1 − p)b−1 β c−1 e−dβ . (49)
Γ(a)Γ(b)Γ(c)
Portanto, de (??) tem-se:

1 ∞
Γ(a + b)dc ln[1 − (1 − p)e−βt ] a−1
Z Z
S(t) = p (1 − p)b−1 β c−1 e−dβ dpdβ.
Γ(a)Γ(b)Γ(c) 0 0 ln p
(50)
? f?cil notar que a integral em (??) n?o possui solu??o anal?tica. Neste
caso, a aproxima??o de Laplace torna-se muito ?til pois ? um m?todo eficaz
para aproxima??o de integrais complicadas.
Para que se possa utilizar o M?todo de Laplace em (??), ? preciso fazer uma
transforma??o em p para se obter limites 0 a ∞. Seja ent?o a transforma??o
u = − ln p tal que p → 0 implica em u → ∞, p → 1 implica em u → 0, p = e−u
e dp = −e−u du.
Assim,
1
ln[1 − (1 − p)e−βt ] a−1
Z
p (1 − p)b−1 dp
0 ln p

ln[1 − (1 − e−u )e−βt ] −(a−1)u
Z
=− e (1 − e−u )b−1 e−u du.
0 u
Portanto,
1 ∞
ln[1 − (1 − p)e−βt ] a−1
Z Z
p (1 − p)b−1 β c−1 e−dβ dpdβ
0 0 ln p

∞ ∞
ln[1 − (1 − e−u )e−βt ]
Z Z
=− (1 − e−u )b−1 e−au β c−1 e−dβ dudβ.
0 0 u
−u
)e−βt ]
Sejam f (u, β) = − ln[1−(1−eu e n = 1 pois n?o h? dados em X. Ent?o,

exp{−h(u, β)} = exp{−au + (b − 1) ln(1 − e−u ) + (c − 1) ln β − dβ}

que implica em h(u, β) = au − (b − 1) ln(1 − e−u ) − (c − 1) ln β + dβ.


Calculando as derivadas parciais tem-se:

∂h(u, β) e−u ∂h(u, β) (c − 1)


= a − (b − 1) e =d− .
∂u 1 − e−u ∂β β
∂h(u,β) ∂h(u,β)
Sabendo que ∂u =0e ∂β = 0, ent?o:
a+b−1 c−1
u = ln[ ] e β̂ = .
a d

16
Observa-se que β̂ ? a moda da distribui??o Gama.
∂2h ∂2h ∂2h
Agora ? preciso calcular as derivadas segundas ∂u 2, ∂β 2 e ∂u∂β para obter
−1
a matriz Σ , que ? a Matriz Hessiana dada a seguir.
2 2
" #
∂ h ∂ h
Σ−1 = ∂u2
∂2h
∂u∂β
∂2h
. (51)
∂u∂β ∂β 2
1
Nota-se que det(Σ) = det(Σ −1 )

Calculando as derivadas segundas ? poss?vel obter:

∂2h (b − 1)e−u ∂2h ∂2h c−1


= , =0 e = .
∂u2 (1 − e−u )2 ∂u∂β ∂β β2
Portanto,
(b−1)e−u
" #
−1 (1−e−u )2 0
Σ = c−1
. (52)
0 β2

Al?m disso,

−1
(b − 1)(c − 1)e−u β 2 (1 − e−u )2

1
det(Σ) = = = .
det(Σ−1 ) β 2 (1 − e−u )2 (b − 1)(c − 1)e−u

Considerando θ = (a, b, c, d) e utilizando a aproxima??o de Laplace ? poss?vel


obter:


Γ(a + b)dc 2π c−1 −dβ̂ −aû β̂(1 − e−û ) û 1
S(t|θ) = − β̂ e e (1−e−û )b−1 p e 2 ln[1−(1−e−û )e−β̂t ]
Γ(a)Γ(b)Γ(c) (b − 1)(c − 1) û
(53)


2πΓ(a + b)dc β̂ c (1 − e−û )b
   
1 1
S(t|θ) = − p exp − a − û + dβ̂ ln[1−(1−e−û )e−β̂t ]
Γ(a)Γ(b)Γ(c) (b − 1)(c − 1) 2 û
(54)

onde û = ln a+b−1 e β̂ = c−1


 
2 d , de modo que a + b > 1 e c > 1.
Igualando a express?o acima quatro vezes aos diferentes percentis, haver?
quatro inc?gnitas (os hiperpar?metros a, b, c, d) para quatro igualdades, ou seja,
um sistema n?o-linear.

Procedimento para encontrar os hiperpar?metros a, b, c, d considerando o


conhecimento de especialistas:

• Fixar valores diferentes para t e elicitar S(t).

17
• Se forem considerados desconhecidos os quatro hiperpar?metros a, b, c, d,
ser? necess?rio quatro valores S(t) dado quatro valores t.
• Pode-se considerar desconhecidos apenas ”a” e ”b”, sendo c e d conhecidos.
Assim, ser? preciso apenas dois diferentes valores para S(t) dados t1 e t2 .

Simula??o para a distribui??o Exponencial- Logar?tmica:

• Fixar valores para p e β.


• Considerar S(t) = 0.90, 0.75 , 0.50 e 0.25.
• Calcular os valores ti , i = 1, 2, 3, 4 a partir de S(t). Ou seja, dado S(t) =
S0 , calcular:
 
1 1−p
t = ln
β 1 − pS0

• Haver?, portanto, um sistema n?o-linear composto por quatro equa??es e


quatro inc?gnitas a, b, c, d, cujas equa??es s?o dadas por:


2πΓ(a + b)dc β̂ c (1 − e−û )b
   
1
S(t|θ) = − p exp − a − û + dβ̂ ·
Γ(a)Γ(b)Γ(c) (b − 1)(c − 1) 2

1
· ln[1 − (1 − e−û )e−β̂t ],

 a+b−1  c−1
onde û = ln 2 e β̂ = d , de modo que a + b > 1 e c > 1.

7 Simulation study

?????????????

8 An example with literature data

In this section, let us consider the data set related to the lifetime of a type
of electrical insulator subject to a constant voltage stress, introduced in Lawless
(1982). The data are not censored and represent the lifetime (in minutes) to
fail: 0.96, 4.15, 0.19, 0.78, 8.01, 31.75, 7.35, 6.50, 8.27, 33.91, 32.52, 3.16, 4.85,
2.78, 4.67, 1.31, 12.06, 36.71, 72.89. Let us denote this data as “Lawless data”.
Lawless used this data set to illustrate the fit of the model Weibull lifetime
data and we assume a exponential-logarithmic distribution with density (1) to
analyse the data.

18
To determine the appropriate distribution for fitting this data, the Weibull
proposed by Lawless or Exponential-Logarithmic, various selection criteria were
examined. These include information-based criteria (AIC, BIC, and DIC) given
in the Table 5 for each distribution. From this table, we conclude that Exponential-
Logarithmic gives better fit (smaller value for AIC and BIC) while by considering
the DIC criterion it seems that both Exponential-Logarithmic and Weibull dis-
tributions are appropriate for the data. However, if one has to choose preferably
one of them based on DIC, the chosen will also be the Exponential-Logarithmic
(smaller value for DIC).

Table 1: Information-based Model Selection Criteria (AIC, BIC and DIC) for
Lawless data

AIC BIC DIC


WEIBULL 231.9720 236.0553 140.830
EL 141.8041 145.8874 139.167

The maximum likelihood estimators and their respective standard-deviations


(in parenthesis) for p and β are given by pb = 0.0978 (0.3127) and βb = 0.0393
(0.0069), with 95% confidence interval (-0.1536; 0.3492) and (0.00197; 0.07662)
for p and β , respectively. Note that interval endpoint for parameter p is nega-
tive.
For a Bayesian analysis of the data let us assume the prior distributions (16),
(20), and the pair of priors Uniform/Gamma for p and β.
We performed a MCMC simulation of 25000 iterations and discarded the
first 5000 as a burn-in. The MCMC plots suggest we have achieved convergence
and the algorithm also showed a rate of acceptance around 35% by considering
the Jeffreys prior. *************
The posterior summaries of interest considering the different prior distribu-
tions are given in Tables 3 and 4.
Now we examine the performance of the priors by considering several point
estimates for parameters α and λ. The maximum likelihood estimate (MLE) is
also evaluated in Table 2.
The estimates obtained in Table 2 provided reasonable results over all pro-
posed prior distributions. For n = 50 the estimates were so close that one
choice is almost impossible.

Table 2: Posterior summaries for p (Lawless data)


Prioris M?dia Desvio-Padr?o Intervalo 95%
Beta/Gama 0.3198 0.2915 (0.0152, 0.9779)
Uniforme/Gama 0.3467 0.2594 (0.0296 , 0.9333)
Jeffreys 0.1633 0.1891 (0.0368, 0.7409)
MDIP/Gama 0.3580 0.2679 (0.1359,0.9413)
Refp 0.1291 0.1637 (0.0283,0.6466)

19
Table 3: Posterior summaries for β (dados de Lawless).
Prioris M?dia Desvio-Padr?o Intervalo 95%
Beta/Gama 0.0483 0.0187 (0.0165, 0.0873)
Uniforme/Gama 0.0513 0.0177 (0.0195, 0.0879)
Jeffreys 0.0396 0.0182 ( 0.0259, 0.0796)
MDIP/Gama 0.0514 0.0182 (0.0389,0.0916)
Refp 0.0366 0.0171 (0.0242,0.0749)

From the results of Tables 3, we observe similar inference results assuming


the different prior distributions for p.
From the Table 4, we observe similar inference results. The comparison of
the marginal posterior densities is given in Figure 3.
Na Figura (??) nota-se claramente que as prioris de Jeffreys e de Refer?ncia
considerando p o par?metro de interesse mostram-se muito similares entre si e
bastante divergentes em rela??o ?s outras prioris Beta & Gama, Uniforme &
Gama e MDIP & Gama. Em contrapartida, as prioris n?o-informativas Beta &
Gama e Uniforme & Gama em conjunto com a priori MDIP & Gama mostram-se
bastante f lat dentro do intervalo (0, 1). Observa-se, por?m, que a bimodalidade
da priori Beta & Gama ? mantida, embora de forma bem sutil.
Quando considerado o par?metro β, h? diferen?a novamente entre as prioris
de Jeffreys e de Refer?ncia e as demais prioris. As densidades marginais con-
siderando as prioris de Jeffreys e de Refer?ncia est?o deslocadas para perto do
zero enquanto as demais mant?m-se semelhantes e bastante flat.
Para determinar a distribui??o mais apropriada para o ajuste desse conjunto
de dados, novamente foram calculados os valores para os crit?rios AIC, BIC
e DIC. Foram analisadas as distribui??es Weibull e Exponencial-Logar?tmica,
sendo que para os par?metros da primeira distribui??o foram utilizadas prioris
Gama (0.01, 0.01) para cada par?metro e para a EL(p, β) foram utilizadas as
prioris Beta & Gama, Uniforme & Gama, Jeffreys, MDIP & Gama e de Re-
fer?ncia.
Analisando a tabela (4) pode-se concluir que a Exponencial-Logar?tmica
proporciona um melhor ajuste para os dados, pois os valores para o AIC e BIC
s?o pequenos em compara??o com os valores para a Weibull. Considerando o
crit?rio DIC, as duas distribui??es s?o apropriadas para ajustar os dados. No

Table 4: Informa??es baseadas nos crit?rios de sele??o de modelo (AIC, BIC e


DIC) para os dados de Lawless.
AIC BIC DIC
Weibull 231.9720 236.0553 140.830
Uniforme/Gama 141.8041 145.8874 139.167
Beta/Gama 141.08031 142.9692 139.1063
MDIP/Gama 141.1381 143.027 139.1575
Refp 140.2788 142.1677 139.8291

20
Figure 3: Marginal posterior densities of parameters p and β (Lawless data)

entanto, se ? necess?rio escolher entre as duas distribui??es atrav?s do DIC,


ent?o ser? escolhida a Exponencial-Logar?tmica, pois possui menor valor para
ele. Entre as prioris da distribui??o Exponencial-Logar?tmica a priori Beta &
Gama ? que produz menores valores de um modo geral.

9 Conclusions

????????????? In this paper, noninformative priors are derived for the pa-
rameters α and λ of the generalized exponential distribution. A fully Bayesian
analysis was developed and a simulation study was performed to determinate
which prior provides less information, mainly for small sample size.
We shown that maximal data information prior causes a improper posterior
distribution for parameters α and λ. We also shown that reference prior pro-
cedure for generalized exponencial distribution provides nonuniqueness of prior
due to the choice of the parameter of interest.
The results obtained through Bayesian summaries and credible intervals are
similars for all the priors studied here. There is a tendency of the measures of
performance of the estimators and intervals get better as the sample size in-
creases. In addition, a comparison of frequentist coverage probabilities among
the priors shows that they are very good but the reference prior slightly domi-
nates the others when n is small.

21
Thus, these results are of great interest in applications of this distribution.
It is observed by Tahmasbi and Rezaei (2008) that the EL(p, β) distribution
can be used quite erectively in analyzing several lifetime data, particularly in
place of two-parameter gamma or two-parameter Weibull distribution

10 References

?????????????
Berger, J. O. ; Bernardo, J. M. (1992). On the development of reference
priors. Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A.
F. M. Smith, eds.) Oxford: University Press, 35–60 (with discussion).
Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian In-
ference. Journal Royal Statistical Society, v.41, n.2, p. 113-147.
Box, G.E.P., and Tiao, G.C. (1973). Bayesian inference in statistical analy-
sis. Addison Weiley.
Chib, S. ; Greenberg, E. (1995). Understanding The Metropolis-Hasting
Algorithm. The American Statistician, Vol. 49, No 4.
Gelfand, A. E.; Smith, F. M. (1990). Sampling-based approaches to calcu-
lating marginal densities. Journal of the American Statistical Association, n.
85, p. 398-409.
Jeffreys, Sir Harold (1967). Theory of probability. 3rd rev. ed., London:
Oxford U. Press.
Sinha, B. and Zellner, A. (1990), A note on the prior distributions of Weibull
parameters, SCIMA, 19, 5-13.
Tahmasbi, R., Rezaei, S., (2008). “A two-parameter lifetime distribution
with decreasing failure rate”, Computational Statistics and Data Analysis, Vol.
52, pp. 3889-3901.
Zellner, A. (1977). Maximal Data Information Prior Distributions, In New
Methods in the applications of Bayesian Methods, A. Aykac e C. Brumat (eds).
North-Holland, Amsterdam.
Zellner, A. (1984). Maximal Data Information Prior Distributions, Basic
Issues in Econometrics, U. of Chicago Press.
Zellner, A. (1990). Bayesian Methods and Entropy in Economics and Econo-
metrics. In W.T. Grandy, Jr. and L.H. Schick (eds.), Maximum Entropy and
Bayesian Methods, Dordrecht, Netherlands: Kluwer Academic Publishers, p.
17-31.

22