This action might not be possible to undo. Are you sure you want to continue?
https://www.scribd.com/doc/71386405/InferencefortheWeibullDistribution
03/24/2014
text
original
Stat 498B Industrial Statistics
Fritz Scholz
May 22, 2008
1 The Weibull Distribution
The 2parameter Weibull distribution function is deﬁned as
F
α,β
(x) = 1 −exp
_
−
_
x
α
_
β
_
for x ≥ 0 and F
α,β
(x) = 0 for t < 0.
We also write X ∼ W(α, β) when X has this distribution function, i.e., P(X ≤ x) = F
α,β
(x). The
parameters α > 0 and β > 0 are referred to as scale and shape parameter, respectively. The Weibull
density has the following form
f
α,β
(x) = F
α,β
(x) =
d
dx
F
α,β
(x) =
β
α
_
x
α
_
β−1
exp
_
−
_
x
α
_
β
_
.
For β = 1 the Weibull distribution coincides with the exponential distribution with mean α. In
general, α represents the .632quantile of the Weibull distribution regardless of the value of β since
F
α,β
(α) = 1 − exp(−1) ≈ .632 for all β > 0. Figure 1 shows a representative collection of Weibull
densities. Note that the spread of the Weibull distributions around α gets smaller as β increases.
The reason for this will become clearer later when we discuss the logtransform of Weibull random
variables.
The m
th
moment of the Weibull distribution is
E(X
m
) = α
m
Γ(1 +m/β)
and thus the mean and variance are given by
µ = E(X) = αΓ(1 + 1/β) and σ
2
= α
2
_
Γ(1 + 2/β) −{Γ(1 + 1/β)}
2
_
.
Its pquantile, deﬁned by P(X ≤ x
p
) = p, is
x
p
= α(−log(1 −p))
1/β
.
For p = 1 −exp(−1) ≈ .632 (i.e., −log(1 −p) = 1) we have x
p
= α regardless of β, as pointed out
previously. For that reason one also calls α the characteristic life of the Weibull distribution. The
term life comes from the common use of the Weibull distribution in modeling lifetime data. More
on this later.
1
Weibull densities
ββ
1
== 0.5
ββ
2
== 1
ββ
3
== 1.5
ββ
4
== 2
ββ
5
== 3.6
ββ
6
== 7
αα == 10000
αα
36.8% 63.2%
Figure 1: A Collection of Weibull Densities with α = 10000 and Various Shapes
2
2 Minimum Closure and Weakest Link Property
The Weibull distribution has the following minimum closure property: If X
1
, . . . , X
n
are independent
with X
i
∼ W(α
i
, β), i = 1, . . . , n, then
P(min(X
1
, . . . , X
n
) > t) = P(X
1
> t, . . . , X
n
> t) =
n
i=1
P(X
i
> t)
=
n
i=1
exp
_
−
_
t
α
i
_
β
_
= exp
_
−t
β
n
i=1
1
α
β
i
_
= exp
_
−
_
t
α
_
β
_
with α
=
_
n
i=1
1
α
β
i
_
−1/β
,
i.e., min(X
1
, . . . , X
n
) ∼ W(α
, β). This is reminiscent of the closure property for the normal
distribution under summation, i.e., if X
1
, . . . , X
n
are independent with X
i
∼ N(µ
i
, σ
2
i
) then
n
i=1
X
i
∼ N
_
n
i=1
µ
i
,
n
i=1
σ
2
i
_
.
This summation closure property plays an essential role in proving the central limit theorem: Sums
of independent random variables (not necessarily normally distributed) have an approximate normal
distribution, subject to some mild conditions concerning the distribution of such random variables.
There is a similar result from Extreme Value Theory that says: The minimum of independent,
identically distributed random variables (not necessarily Weibull distributed) has an approximate
Weibull distribution, subject to some mild conditions concerning the distribution of such random
variables. This is also referred to as the “weakest link” motivation for the Weibull distribution.
The Weibull distribution is appropriate when trying to characterize the random strength of materials
or the random lifetime of some system. This is related to the weakest link property as follows. A
piece of material can be viewed as a concatenation of many smaller material cells, each of which has
its random breaking strength X
i
when subjected to stress. Thus the strength of the concatenated
total piece is the strength of its weakest link, namely min(X
1
, . . . , X
n
), i.e., approximately Weibull.
Similarly, a system can be viewed as a collection of many parts or subsystems, each of which has a
random lifetime X
i
. If the system is deﬁned to be in a failed state whenever any one of its parts or
subsystems fails, then the system lifetime is min(X
1
, . . . , X
n
), i.e., approximately Weibull.
Figure 2 gives a sense of usage of the Weibull distribution and Figure 3 shows the “real thing.”
Googling “Weibull distribution” produced 185,000 hits while ”normal distribution” had 2,420,000
hits.
3
Figure 2: Publications on the Weibull Distribution
4
Figure 3: Waloddi Weibull
5
The Weibull distribution is very popular among engineers. One reason for this is that the Weibull
cdf has a closed form which is not the case for the normal cdf Φ(x). However, in today’s computing
environment one could argue that point since typically the computation of even exp(x) requires
computing. That this can be accomplished on most calculators is also moot since many calculators
also give you Φ(x). Another reason for the popularity of the Weibull distribution among engi
neers may be that Weibull’s most famous paper, originally submitted to a statistics journal and
rejected, was eventually published in an engineering journal: Waloddi Weibull (1951) “A statistical
distribution function of wide applicability.” Journal of Applied Mechanics, 18, 293297.
“. . . he tried to publish an article in a wellknown British journal. At this time, the distri
bution function proposed by Gauss was dominating and was distinguishingly called the normal
distribution. By some statisticians it was even believed to be the only possible one. The arti
cle was refused with the comment that it was interesting but of no practical importance. That
was just the same article as the highly cited one published in 1951.” (G¨oran W. Weibull, 1981,
http://www.garfield.library.upenn.edu/classics1981/A1981LD32400001.pdf)
Sam Saunders (1975): ‘Professor Wallodi (sic) Weibull recounted to me that the now famous paper of
his “A Statistical Distribution of Wide Applicability”, in which was ﬁrst advocated the “Weibull”
distribution with its failure rate a power of time, was rejected by the Journal of the American
Statistical Association as being of no interrest. Thus one of the most inﬂuential papers in statistics
of that decade was published in the Journal of Applied Mechanics. See [35]. (Maybe that is the
reason it was so inﬂuential!)’
3 The Hazard Function
The hazard function for any nonnegative random variable with cdf F(x) and density f(x) is deﬁned
as h(x) = f(x)/(1−F(x)). It is usually employed for distributions that model random lifetimes and
it relates to the probability that a lifetime comes to an end within the next small time increment
of length d given that the lifetime has exceeded x so far, namely
P(x < X ≤ x +dX > x) =
P(x < X ≤ x +d)
P(X > x)
=
F(x +d) −F(x)
1 −F(x)
≈
d ×f(x)
1 −F(x)
= d ×h(x) .
In the case of the Weibull distribution we have
h(x) =
f
α,β
(x)
1 −F
α,β
(x)
=
β
α
_
x
α
_
β−1
.
Various other terms are used equivalently for the hazard function, such as hazard rate, failure rate
(function), or force of mortality. In the case of the Weibull hazard rate function we observe that it
6
is increasing in x when β > 1, decreasing in x when β < 1 and constant when β = 1 (exponential
distribution with memoryless property).
When β > 1 the part or system, for which the lifetime is modeled by a Weibull distribution, is
subject to aging in the sense that an older system has a higher chance of failing during the next
small time increment d than a younger system.
For β < 1 (less common) the system has a better chance of surviving the next small time increment
d as it gets older, possibly due to hardening, maturing, or curing. Often one refers to this situation
as one of infant mortality, i.e., after initial early failures the survival gets better with age. However,
one has to keep in mind that we may be modeling parts or systems that consist of a mixture of
defective or weak parts and of parts that practically can live forever. A Weibull distribution with
β < 1 may not do full justice to such a mixture distribution.
For β = 1 there is no aging, i.e., the system is as good as new given that it has survived beyond x,
since for β = 1 we have
P(X > x +hX > x) =
P(X > x +h)
P(X > x)
=
exp(−(x +h)/α)
exp(−x/α)
= exp(−h/α) = P(X > h) ,
i.e., it is again exponential with same mean α. One also refers to this as a random failure model in
the sense that failures are due to external shocks that follow a Poisson process with rate λ = 1/α.
The random times between shocks are exponentially distributed with mean α. Given that there are
k such shock events in an interval [0, T] one can view the k occurrence times as being uniformly
distributed over the interval [0, T], hence the allusion to random failures.
4 LocationScale Property of log(X)
Another useful property, of which we will make strong use, is the following locationscale property
of the logtransformed Weibull distribution. By that we mean that: X ∼ W(α, β) =⇒ log(X) = Y
has a locationscale distribution, namely its cumulative distribution function (cdf) is
P(Y ≤ y) = P(log(X) ≤ y) = P(X ≤ exp(y)) = 1 −exp
_
_
−
_
exp(y)
α
_
β
_
_
= 1 −exp [−exp {(y −log(α)) ×β}] = 1 −exp
_
−exp
_
y −log(α)
1/β
__
= 1 −exp
_
−exp
_
y −u
b
__
with location parameter u = log(α) and scale parameter b = 1/β. The reason for referring to such
parameters this way is the following. If Z ∼ G(z) then Y = µ +σZ ∼ G((y −µ)/σ) since
H(y) = P(Y ≤ y) = P(µ +σZ ≤ y) = P(Z ≤ (y −µ)/σ) = G((y −µ)/σ) .
7
The form Y = µ +σX should make clear the notion of location scale parameter, since Z has been
scaled by the factor σ and is then shifted by µ. Two prominent locationscale families are
1. Y = µ +σZ ∼ N(µ, σ
2
), where Z ∼ N(0, 1) is standard normal with cdf G(z) = Φ(z)
and thus Y has cdf H(y) = Φ((y −µ)/σ),
2. Y = u +bZ where Z has the standard extreme value distribution with cdf
G(z) = 1 −exp(−exp(z)) for z ∈ R, as in our logtransformed Weibull example above.
In any such a locationscale model there is a simple relationship between the pquantiles of Y and
Z, namely y
p
= µ + σz
p
in the normal model and y
p
= u + bw
p
in the extreme value model (using
the location and scale parameters u and b resulting from logtransformed Weibull data). We just
illustrate this in the extreme value locationscale model.
p = P(Z ≤ w
p
) = P(u +bZ ≤ u +bw
p
) = P(Y ≤ u +bw
p
) =⇒ y
p
= u +bw
p
with w
p
= log(−log(1 −p)). Thus y
p
is a linear function of w
p
= log(−log(1 −p)), the pquantile
of G. While w
p
is known and easily computable from p, the same cannot be said about y
p
, since
it involves the typically unknown parameters u and b. However, for appropriate p
i
= (i − .5)/n
one can view the i
th
ordered sample value Y
(i)
(Y
(1)
≤ . . . ≤ Y
(n)
) as a good approximation for y
p
i
.
Thus the plot of Y
(i)
against w
p
i
should look approximately linear. This is the basis for Weibull
probability plotting (and the case of plotting Y
(i)
against z
p
i
for normal probability plotting), a very
appealing graphical procedure which gives a visual impression of how well the data ﬁt the assumed
model (normal or Weibull) and which also allows for a crude estimation of the unknown location
and scale parameters, since they relate to the slope and intercept of the line that may be ﬁtted to
the perceived linear point pattern. For more in relation to Weibull probability plotting we refer to
Scholz (2008).
5 Maximum Likelihood Estimation
There are many ways to estimate the parameters θ = (α, β) based on a random sample X
1
, . . . , X
n
∼
W(α, β). Maximum likelihood estimation (MLE) is generally the most versatile and popular
method. Although MLE in the Weibull case requires numerical methods and a computer, that is no
longer an issue in today’s computing environment. Previously, estimates that could be computed
by hand had been investigated, but they are usually less eﬃcient than mle’s (estimates derived by
MLE). By eﬃcient estimates we loosely refer to estimates that have the smallest sampling variance.
MLE tends to be eﬃcient, at least in large samples. Furthermore, under regularity conditions MLE
produces estimates that have an approximate normal distribution in large samples.
8
When X
1
, . . . , X
n
∼ F
θ
(x) with density f
θ
(x) then the maximum likelihood estimate of θ is that
value θ =
ˆ
θ =
ˆ
θ(x
1
, . . . , x
n
) which maximizes the likelihood
L(x
1
, . . . , x
n
, θ) =
n
i=1
f
θ
(x
i
)
over θ, i.e., which gives highest local probability to the observed sample (X
1
, . . . , X
n
) = (x
1
, . . . , x
n
)
L(x
1
, . . . , x
n
,
ˆ
θ) = sup
θ
_
n
i=1
f
θ
(x
i
)
_
.
Often such maximizing values
ˆ
θ are unique and one can obtain them by solving, i.e.,
∂
∂θ
j
n
i=1
f
θ
(x
i
) = 0 j = 1, . . . , k ,
where k is the number of parameters involved in θ = (θ
1
, . . . , θ
k
). These above equations reﬂect the
fact that a smooth function has a horizontal tangent plane at its maximum (minimum or saddle
point). Thus solving such equations is necessary but not suﬃcient, since it still needs to be shown
that it is the location of a maximum.
Since taking derivatives of a product is tedious (product rule) one usually resorts to maximizing
the log of the likelihood, i.e.,
(x
1
, . . . , x
n
, θ) = log (L(x
1
, . . . , x
n
, θ)) =
n
i=1
log (f
θ
(x
i
))
since the value of θ that maximizes L(x
1
, . . . , x
n
, θ) is the same as the value that maximizes
(x
1
, . . . , x
n
, θ), i.e.,
(x
1
, . . . , x
n
,
ˆ
θ) = sup
θ
_
n
i=1
log (f
θ
(x
i
))
_
.
It is a lot simpler to deal with the likelihood equations
∂
∂θ
j
(x
1
, . . . , x
n
,
ˆ
θ) =
∂
∂θ
j
n
i=1
log(f
θ
(x
i
)) =
n
i=1
∂
∂θ
j
log(f
θ
(x
i
)) = 0 j = 1, . . . , k
when solving for θ =
ˆ
θ =
ˆ
θ(x
1
, . . . , x
n
).
In the case of a normal random sample we have θ = (µ, σ) with k = 2 and the unique solution of
the likelihood equations results in the explicit expressions
ˆ µ = ¯ x =
n
i=1
x
i
/n and ˆ σ =
¸
¸
¸
_
n
i=1
(x
i
− ¯ x)
2
/n and thus
ˆ
θ = (ˆ µ, ˆ σ) .
9
In the case of a Weibull sample we take the further simplifying step of dealing with the log
transformed sample (y
1
, . . . , y
n
) = (log(x
1
), . . . , log(x
n
)). Recall that Y
i
= log(X
i
) has cdf F(y) =
1 − exp(−exp((x − u)/b)) = G((y − u)/b) with G(z) = 1 − exp(−exp(z)) with g(z) = G
(z) =
exp(z −exp(z)). Thus
f(y) = F
(y) =
d
dy
F(y) =
1
b
g((y −u)/b))
with
log(f(y)) = −log(b) +
y −u
b
−exp
_
y −u
b
_
.
As partial derivatives of log(f(y)) with respect to u and b we get
∂
∂u
log(f(y)) = −
1
b
+
1
b
exp
_
y −u
b
_
∂
∂b
log(f(y)) = −
1
b
−
1
b
y −u
b
+
1
b
y −u
b
exp
_
y −u
b
_
and thus as likelihood equations
0 = −
n
b
+
1
b
n
i=1
exp
_
y
i
−u
b
_
or
n
i=1
exp
_
y
i
−u
b
_
= n or exp(u) =
_
1
n
n
i=1
exp
_
y
i
b
_
_
b
,
0 = −
n
b
−
1
b
n
i=1
y
i
−u
b
+
1
b
n
i=1
y
i
−u
b
exp
_
y
i
−u
b
_
.
i.e., we have a solution u = ˆ u once we have a solution b =
ˆ
b. Substituting this expression for exp(u)
into the second likelihood equation we get (after some cancelation and manipulation)
0 =
n
i=1
y
i
exp(y
i
/b)
n
i=1
exp(y
i
/b)
−b −
1
n
n
i=1
y
i
.
Analyzing the solvability of this equation is more convenient in terms of β = 1/b and we thus write
0 =
n
i=1
y
i
w
i
(β) −
1
β
− ¯ y where w
i
(β) =
exp(y
i
β)
n
j=1
exp(y
j
β)
with
n
i=1
w
i
(β) = 1 .
Note that the derivative of these weights with respect to β take the following form
w
i
(β) =
d
dβ
w
i
(β) = y
i
w
i
(β) −w
i
(β)
n
j=1
y
j
w
j
(β) .
10
Hence
d
dβ
_
n
i=1
y
i
w
i
(β) −
1
β
− ¯ y
_
=
n
i=1
y
i
w
i
(β) +
1
β
2
=
n
i=1
y
2
i
w
i
(β) −
_
_
n
j=1
y
j
w
j
(β)
_
_
2
+
1
β
2
> 0
since
var
w
(y) =
n
i=1
y
2
i
w
i
(β) −
_
_
n
j=1
y
j
w
j
(β)
_
_
2
= E
w
(y
2
) −[E
w
(y)]
2
≥ 0
can be interpreted as a variance of the n values of y = (y
1
, . . . , y
n
) with weights or probabilities given
by w = (w
1
(β), . . . , w
n
(β)). Thus the reduced second likelihood equation
y
i
w
i
(β) −1/β − ¯ y = 0
has a unique solution (if it has a solution at all) since the the equation’s left side is strictly increasing.
Note that w
i
(β) →1/n as β →0. Thus
y
i
w
i
(β) −1/β − ¯ y ≈ −1/β →−∞ as β →0.
Furthermore, with M = max(y
1
, . . . , y
n
) and β →∞ we have
w
i
(β) = exp(β(y
i
−M))/
n
j=1
exp(β(y
j
−M)) →0 when y
i
< M and w
i
(β) →1/r when y
i
= M,
where r ≥ 1 is the number of y
i
coinciding with M. Thus
y
i
w
i
(β) −1/β − ¯ y ≈ M −1/β − ¯ y →M − ¯ y > 0 as β →∞
where M− ¯ y > 0 assumes that not all y
i
coincide (a degenerate case with probability 0). That this
unique solution corresponds to a maximum and thus a unique global maximum takes some extra
eﬀort and we refer to Scholz (1996) for an even more general treatment that covers Weibull analysis
with censored data and covariates.
However, a somewhat loose argument can be given as follows. If we consider the likelihood of the
logtransformed Weibull data we have
L(y
1
, . . . , y
n
, u, b) =
1
b
n
n
i=1
g
_
y
i
−u
b
_
.
Contemplate this likelihood for ﬁxed y = (y
1
, . . . , y
n
) and for parameters u with u → ∞ (the
location moves away from all observed data values y
1
, . . . , y
n
) and b with b →0 (the spread becomes
very concentrated on some point and cannot simultaneously do so at all values y
1
, . . . , y
n
, unless
they are all the same, excluded as a zero probability degeneracy) and b → ∞ (in which case all
probability is diﬀused thinly over the whole half plane {(u, b) : u ∈ R, b > 0}), it is then easily seen
that this likelihood approaches zero in all cases. Since this likelihood is positive everywhere (but
approaching zero near the fringes of the parameter space, the above half plane) it follows that it
11
must have a maximum somewhere with zero partial derivatives. We showed there is only one such
point (uniqueness of the solution to the likelihood equations) and thus there can only be one unique
(global) maximum, which then is also the unique maximum likelihood estimate
ˆ
θ = (ˆ u,
ˆ
b).
In solving 0 =
y
i
exp(y
i
/b)/
exp(y
i
/b) − b − ¯ y it is numerically advantageous to solve the
equivalent equation 0 =
y
i
exp((y
i
−M)/b)/
exp((y
i
−M)/b)−b−¯ y where M = max(y
1
, . . . , y
n
).
This avoids overﬂow or accuracy loss in the exponentials when the y
i
tend to be large.
The above derivations go through with very little change when instead of observing a full sample
Y
1
, . . . , Y
n
we only observe the r ≥ 2 smallest sample values Y
(1)
< . . . < Y
(r)
. Such data is referred
to as type II censored data. This situation typically arises in a laboratory setting when several
units are put on test (subjected to failure exposure) simultaneously and the test is terminated
(or evaluated) when the ﬁrst r units have failed. In that case we know the ﬁrst r failure times
X
(1)
< . . . < X
(r)
and thus Y
(i)
= log(X
(i)
), i = 1, . . . , r, and we know that the lifetimes of the
remaining units exceed X
(r)
or that Y
(i)
> Y
(r)
for i > r. The advantage of such data collection is
that we do not have to wait until all n units have failed. Furthermore, if we put a lot of units on
test (high n) we increase our chance of seeing our ﬁrst r failures before a ﬁxed time y. This is a
simple consequence of the following binomial probability statement:
P(Y (r) ≤ y) = P(at least r failures ≤ y in n trials) =
n
i=r
_
n
i
_
P(Y ≤ y)
i
(1 −P(Y ≤ y))
n−i
which is strictly increasing in n for any ﬁxed y and r ≥ 1 (exercise).
The joint density of Y
(1)
, . . . , Y
(n)
at (y
1
, . . . , y
n
) with y
1
< . . . < y
n
is
f(y
1
, . . . , y
n
) = n!
n
i=1
1
b
g
_
y
i
−u
b
_
= n!
n
i=1
f(y
i
)
where the multiplier n! just accounts for the fact that all n! permutations of y
1
, . . . , y
n
could have
been the order in which these values were observed and all of these orders have the same density
(probability). Integrating out y
n
> y
n−1
> . . . > y
r+1
(> y
r
) and using
¯
F(y) = 1 −F(y) we get after
n −r successive integration steps the joint density of the ﬁrst r failure times y
1
< . . . < y
r
as
f(y
1
, . . . , y
n−1
) = n!
n−1
i=1
f(y
i
) ×
_
∞
y
n−1
f(y
n
)dy
n
= n!
n−1
i=1
f(y
i
)
¯
F(y
n−1
)
f(y
1
, . . . , y
n−2
) = n!
n−2
i=1
f(y
i
) ×
_
∞
y
n−2
f(y
n−1
)
¯
F(y
n−1
)dy
n−1
= n!
n−2
i=1
f(y
i
) ×
1
2
¯
F
2
(y
n−2
)
f(y
1
, . . . , y
n−3
) = n!
n−3
i=1
f(y
i
) ×
_
∞
y
n−3
f(y
n−2
)
¯
F
2
(y
n−2
)/2dy
n−2
= n!
n−3
i=1
f(y
i
) ×
1
3!
¯
F
3
(y
n−3
)
12
. . .
f(y
1
, . . . , y
r
) = n!
r
i=1
f(y
i
) ×
1
(n −r)!
¯
F
n−r
(y
r
) =
_
n!
(n −r)!
r
i=1
f(y
i
)
_
×[1 −F(y
r
)]
n−r
= r!
r
i=1
1
b
g
_
y
i
−u
b
_
×
_
n
r
_
_
1 −G
_
y
r
−u
b
__
n−r
with loglikelihood
(y
1
, . . . , y
r
, u, b) = log
_
n!
(n −r)!
_
−r log(b) +
r
i=1
y
i
−u
b
−
r
i=1
exp
_
y
i
−u
b
_
where we use the notation
r
i=1
x
i
=
r
i=1
x
i
+ (n −r)x
r
.
The likelihood equations are
0 =
∂
∂u
(y
1
, . . . , y
r
, u, b) = −
r
b
+
1
b
r
i=1
exp
_
y
i
−u
b
_
or exp(u) =
_
1
r
r
i=1
exp
_
y
i
b
_
_
b
0 =
∂
∂b
(y
1
, . . . , y
r
, u, b) = −
r
b
−
1
b
r
i=1
y
i
−u
b
+
1
b
r
i=1
y
i
−u
b
exp
_
y
i
−u
b
_
where again the transformed ﬁrst equation gives us a solution ˆ u once we have a solution
ˆ
b for b.
Using this in the second equation it transforms to a single equation in b alone, namely
r
i=1
y
i
exp(y
i
/b)
_
r
i=1
exp(y
i
/b) −b −
1
r
r
i=1
y
i
= 0 .
Again it is advisable to use the equivalent but computationally more stable form
r
i=1
y
i
exp((y
i
−y
r
)/b)
_
r
i=1
exp((y
i
−y
r
)/b) −b −
1
r
r
i=1
y
i
= 0 .
As in the complete sample case one sees that this equation has a unique solution
ˆ
b and that (ˆ u,
ˆ
b)
gives the location of the (unique) global maximimum of the likelihood function, i.e., (ˆ u,
ˆ
b) are the
mle’s.
13
6 Computation of Maximum Likelihood Estimates in R
The computation of the mle’s of the Weibull parameters α and β is facilitated by the function
survreg which is part of the R package survival. Here survreg is used in its most basic form in
the context of Weibull data (full sample or type II censored Weibull data). survreg does a whole
lot more than compute the mle’s but we will not deal with these aspects here, at least for now.
The following is an R function, called Weibull.mle, that uses survreg to compute these estimates.
Note that it tests for the existence of survreg before calling it. This function is part of the R work
space that is posted on the class web site.
Weibull.mle < function (x=NULL,n=NULL){
# This function computes the maximum likelihood estimates of alpha and beta
# for complete or type II censored samples assumed to come from a 2parameter
# Weibull distribution. Here x is the sample, either the full sample or the first
# r observations of a type II censored sample. In the latter case one must specify
# the full sample size n, otherwise x is treated as a full sample.
# If x is not given then a default full sample of size n=10, namely
# c(7,12.1,22.8,23.1,25.7,26.7,29.0,29.9,39.5,41.9) is analyzed and the returned
# results should be
# $mles
# alpha.hat beta.hat
# 28.914017 2.799793
#
# In the type II censored usage
# Weibull.mle(c(7,12.1,22.8,23.1,25.7),10)
# $mles
# alpha.hat beta.hat
# 30.725992 2.432647
if(is.null(x))x < c(7,12.1,22.8,23.1,25.7,26.7,29.0,29.9,39.5,41.9)
r < length(x)
if(is.null(n)){n<r}else{if(r>nr<2){
return("x must have length r with: 2 <= r <= n")}}
xs < sort(x)
if(!exists("survreg"))library(survival)
#tests whether survival package is loaded, if not, then it loads survival
if(r<n){
statusx < c(rep(1,r),rep(0,nr))
dat.weibull < data.frame(c(xs,rep(xs[r],nr)),statusx)
14
}else{statusx < rep(1,n)
dat.weibull < data.frame(xs,statusx)}
names(dat.weibull)<c("time","status")
out.weibull < survreg(Surv(time,status)~1,dist="weibull",data=dat.weibull)
alpha.hat < exp(out.weibull$coef)
beta.hat < 1/out.weibull$scale
parms < c(alpha.hat,beta.hat)
names(parms)<c("alpha.hat","beta.hat")
list(mles=parms)}
Note that survreg analyzes objects of class Surv. Here such an object is created by the function
Surv and it basically adjoins the failure times with a status vector of same length. The status
is 1 when a time corresponds to an actual failure time. It is 0 when the corresponding time is
a censoring time, i.e., we only know that the unobserved actual failure time exceeds the reported
censoring time. In the case of type II censored data these censoring times equal the r
th
largest
failure time.
To get a sense of the calculation speed of this function we ran Weibull.mle a 1000 times, which
tells us that the time to compute the mle’s in a sample of size n = 10 is roughly 5.91/1000 = .00591.
This fact plays a signiﬁcant role later on in the various inference procedures which we will discuss.
system.time(for(i in 1:1000){Weibull.mle(rweibull(10,1))})
user system elapsed
5.79 0.00 5.91
For n = 100, 500, 1000 the elapsed times came to 8.07, 15.91 and 25.87, respectively. The relationship
of computing time to n appears to be quite linear, but with slow growth, as Figure 4 shows.
7 Location and Scale Equivariance of Maximum Likelihood Estimates
The maximum likelihood estimates ˆ u and
ˆ
b of the location and scale parameters u and b have the
following equivariance properties which will play a strong role in the later pivot construction and
resulting conﬁdence intervals.
Based on data z = (z
1
, . . . , z
n
) we denote the estimates of u and b more explicitly by ˆ u(z
1
, . . . , z
n
) =
ˆ u(z) and
ˆ
b(z
1
, . . . , z
n
) =
ˆ
b(z). If we transform z to r = (r
1
, . . . , r
n
) with r
i
= A+Bz
i
, where A ∈ R
and B > 0 are arbitrary constant, then
ˆ u(r
1
, . . . , r
n
) = A +Bˆ u(z
1
, . . . , z
n
) or ˆ u(r) = ˆ u(A +Bz) = A +Bˆ u(z)
15
q
q
q
q
0 200 400 600 800 1000
0
.
0
0
0
0
.
0
0
5
0
.
0
1
0
0
.
0
1
5
0
.
0
2
0
0
.
0
2
5
0
.
0
3
0
sample size n
t
i
m
e
t
o
c
o
m
p
u
t
e
W
e
i
b
u
l
l
m
l
e
'
s
(
s
e
c
)
intercept = 0.005886 , slope = 2.001e−05
Figure 4: Weibull Parameter MLE Computation Time in Relation to Sample Size n
and
ˆ
b(r
1
, . . . , r
n
) = B
ˆ
b(z
1
, . . . , z
n
) or
ˆ
b(r) =
ˆ
b(A +Bz) = B
ˆ
b(z) .
These properties are naturally desirable for any location and scale estimates and for mle’s they are
indeed true.
Proof: Observe the following deﬁning properties of the mle’s in terms of z = (z
1
, . . . , z
n
) and
r = (r
1
, . . . , r
n
)
sup
u,b
_
1
b
n
n
i=1
g((z
i
−u)/b)
_
=
1
ˆ
b
n
(z)
n
i=1
g((z
i
− ˆ u(z))/
ˆ
b(z))
sup
u,b
_
1
b
n
n
i=1
g((r
i
−u)/b)
_
=
1
ˆ
b
n
(r)
n
i=1
g((r
i
− ˆ u(r))/
ˆ
b(r))
=
1
B
n
1
(
ˆ
b(r)/B)
n
n
i=1
g((z
i
−(ˆ u(r) −A)/B)/(
ˆ
b(r)/B))
16
but also
sup
u,b
_
1
b
n
n
i=1
g((r
i
−u)/b)
_
= sup
u,b
_
1
b
n
n
i=1
g((A +Bz
i
−u)/b)
_
= sup
u,b
_
1
B
n
1
(b/B)
n
n
i=1
g((z
i
−(u −A)/B)/(b/B))
_
˜ u = (u −A)/B
˜
b = b/B
⇒ = sup
˜ u,
˜
b
_
1
B
n
1
˜
b
n
n
i=1
g((z
i
− ˜ u)/
˜
b)
_
=
1
B
n
1
ˆ
b
n
(z)
n
i=1
g((z
i
− ˆ u(z))/
ˆ
b(z))
Thus by the uniqueness of the mle’s we have
ˆ u(z) = (ˆ u(r) −A)/B and
ˆ
b(z) =
ˆ
b(r)/B
or
ˆ u(r) = ˆ u(A +Bz) = A +Bˆ u(z) and
ˆ
b(r) =
ˆ
b(A +Bz) = B
ˆ
b(z) q.e.d.
The same equivariance properties hold for the mle’s in the context of type II censored samples, as
is easily veriﬁed.
8 Tests of Fit Based on the Empirical Distribution Function
Relying on subjective assessment of linearity in Weibull probability plots in order to judge whether
a sample comes from a 2parameter Weibull population takes a fair amount of experience. It is
simpler and more objective to employ a formal test of ﬁt which compares the empirical distribution
function
ˆ
F
n
(x) of a sample with the ﬁtted Weibull distribution function
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) using one
of several common discrepancy metrics.
The empirical distribution function (EDF) of a sample X
1
, . . . , X
n
is deﬁned as
ˆ
F
n
(x) =
# of observations ≤ x
n
=
1
n
n
i=1
I
{X
i
≤x}
where I
A
= 1 when A is true, and I
A
= 0 when A is false. The ﬁtted Weibull distribution function
(using mle’s ˆ α and
ˆ
β) is
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) = 1 −exp
_
−
_
x
ˆ α
_ˆ
β
_
.
From the law of large numbers (LLN) we see that for any x we have that
ˆ
F
n
(x) −→ F
α,β
(x) as
n −→∞. Just view
ˆ
F
n
(x) as a binomial proportion or as an average of Bernoulli random variables.
17
From MLE theory we also know that
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) −→ F
α,β
(x) as n −→ ∞ (also derived from
the LLN).
Since the limiting cdf F
α,β
(x) is continuous in x one can argue that these convergence statements
can be made uniformly in x, i.e.,
sup
x

ˆ
F
n
(x) −F
α,β
(x) −→0 and sup
x
F
ˆ α,
ˆ
β
(x) −F
α,β
(x) −→0 as n −→∞
and thus sup
x

ˆ
F
n
(x)−F
ˆ α,
ˆ
β
(x) −→0 as n −→∞ for all α > 0 and β > 0.
The distance D
KS
(F, G) = sup
x
F(x)−G(x)
is known as the KolmogorovSmirnov distance between two cdf’s F and G.
Figures 5 and 6 give illustrations of this KolmogorovSmirnov distance between EDF and ﬁtted
Weibull distribution and show the relationship between sampled true Weibull distribution, ﬁtted
Weibull distribution, and empirical distribution function.
Some comments:
1. It can be noted that the closeness between
ˆ
F
n
(x) and F
ˆ α,
ˆ
β
(x) is usually more pronounced
than their respective closeness to F
α,β
(x), in spite of the sequence of the above convergence
statements.
2. This can be understood from the fact that both
ˆ
F
n
(x) and F
ˆ α,
ˆ
β
(x) ﬁt the data, i.e., try to
give a good representation of the data. The ﬁt of the true distribution, although being the
origin of the data, is not always good due to sampling variation.
3. The closeness between all three distributions improves as n gets larger.
Several other distances between cdf’s F and G have been proposed and investigated in the literature.
We will only discuss two of them, the Cram´ervon Mises distance D
CvM
and the AndersonDarling
distance D
AD
. They are deﬁned respectively as follows
D
CvM
(F, G) =
_
∞
−∞
(F(x) −G(x))
2
dG(x) =
_
∞
−∞
(F(x) −G(x))
2
g(x) dx
and
D
AD
(F, G) =
_
∞
−∞
(F(x) −G(x))
2
G(x)(1 −G(x))
dG(x) =
_
∞
−∞
(F(x) −G(x))
2
G(x)(1 −G(x))
g(x) dx .
Rather than focussing on the very local phenomenon of a maximum discrepancy at some point x as
in D
KS
, these alternate distances or discrepancy metrics integrate these distances in squared form
over all x, weighted by g(x) in the case of D
CvM
(F, G) and by g(x)/[G(x)(1 − G(x))] in the case
18
D
AD
(F, G). In the latter case, the denominator increases the weight in the tails of the G distribution,
i.e., compensates to some extent for the tapering oﬀ in the density g(x). Thus D
AD
(F, G) is favored
in situations where judging tail behavior is important, e.g., in risk situations. Because of the
integration nature of these last two metrics they have more global character. There is no easy
graphical representation of these metrics, except to suggest that when viewing the previous ﬁgures
illustrating D
KS
one should look at all vertical distances (large and small) between
ˆ
F
n
(x) and
ˆ
F(x),
square them and accumulate these squares in the appropriately weighted fashion. For example,
when one cdf is shifted relative to the other by a small amount (no large vertical discrepancy),
these small vertical discrepancies (squared) will add up and indicate a moderately large diﬀerence
between the two compared cdf’s.
We point out the asymmetric nature of these last two metrics, i.e., we typically have
D
CvM
(F, G) = D
CvM
(G, F) and D
AD
(F, G) = D
AD
(G, F) .
When using these metrics for tests of ﬁt one usually takes the cdf with a density (the model
distribution to be tested) as the one with respect to which the integration takes place, while the
other cdf is taken to be the EDF.
As complicated as these metrics may look at ﬁrst glance, their computation is quite simple. We
will give the following computational expressions (without proof):
D
KS
(
ˆ
F
n
(x),
ˆ
F(x)) = D = max
_
max
_
i/n −V
(i)
_
, max
_
V
(i)
−(i −1)/n
__
where V
(1)
≤ . . . ≤ V
(n)
are the ordered values of V
i
=
ˆ
F(X
i
), i = 1, . . . , n.
For the other two test of ﬁt criteria we have
D
CvM
(
ˆ
F
n
(x),
ˆ
F(x)) = W
2
=
n
i=1
_
V
(i)
−
2i −1
2n
_
2
+
1
12n
and
D
AD
(
ˆ
F
n
(x),
ˆ
F(x)) = A
2
= −n −
1
n
n
i=1
(2i −1)
_
log(V
(i)
) + log(1 −V
(n−i+1)
)
_
.
In order to carry out these tests of ﬁt we need to know the null distributions of D, W
2
and A
2
.
Quite naturally we would reject the hypothesis of a sampled Weibull distribution whenever D or
W
2
or A
2
are too large. The null distribution of D, W
2
and A
2
does not depend on the unknown
parameters α and β, being estimated by ˆ α and
ˆ
β in V
i
=
ˆ
F(X
i
) = F
ˆ α,
ˆ
β
(X
i
). The reason for this
is that the V
i
have a distribution that is independent of the unknown parameters α and β. This is
seen as follows. Using our prior notation we write log(X
i
) = Y
i
= u +bZ
i
and since
19
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q qq q q q q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
Figure 5: Illustration of KolmogorovSmirnov Distance for n = 10 and n = 20
20
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q qq q q q q qq qqq q q q q q q q qqqqqqq q qq qqqq q qq q qq q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq q q q qq qq qq q q q qqqq qq qqqq qq qqqq qq qq q qqqqq q qqqq q qqq qq q qqqqq qqq q qqq qqq q q q qq qqq qqqqq qqq q qq q qq q q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
Figure 6: Illustration of KolmogorovSmirnov Distance for n = 50 and n = 100
21
F(x) = P(X ≤ x) = P(log(X) ≤ log(x)) = P(Y ≤ y) = 1 −exp(−exp((y −u)/b))
and thus
V
i
=
ˆ
F(X
i
) = 1 −exp(−exp((Y
i
− ˆ u(Y))/
ˆ
b(Y)))
= 1 −exp(−exp((u +bZ
i
− ˆ u(u +bZ))/
ˆ
b(u +bZ)))
= 1 −exp(−exp((u +bZ
i
−u −bˆ u(Z))/[b
ˆ
b(Z])))
= 1 −exp(−exp((Z
i
− ˆ u(Z))/
ˆ
b(Z)))
and all dependence on the unknown parameters u = log(α) and b = 1/β has canceled out.
This opens up the possibility of using simulation to ﬁnd good approximations to these null distri
butions for any n, especially in view of the previously reported timing results for computing the
mle’s ˆ α and
ˆ
β of α and β. Just generate samples X
= (X
1
, . . . , X
n
) from W(α = 1, β = 1)
(standard exponential distribution), compute the corresponding ˆ α
= ˆ α(X
) and
ˆ
β
=
ˆ
β(X
),
then V
i
=
ˆ
F(X
i
) = F
ˆ α
,
ˆ
β
(X
i
) (where F
α,β
(x) is the cdf of W(α, β)) and from that the values
D
= D(X
), W
2
= W
2
(X
) and A
2
= A
2
(X
). Calculating all three test of ﬁt criteria makes
sense since the main calculation eﬀort is in getting the mle’s ˆ α
and
ˆ
β
. Repeating this a large
number of times, say N
sim
= 10000, should give us a reasonably good approximation to the desired
null distribution and from it one can determine appropriate pvalues for any sample X
1
, . . . , X
n
for which one wishes to assess whether the Weibull distribution hypothesis is tenable or not. If
C(X) denotes the used test of ﬁt criterion then the estimated pvalue of this sample is simply the
proportion of C(X
) that are ≥ C(X).
Prior to the ease of current computing Stephens (1986) provided tables for the (1 − α)quantiles
q
1−α
of these null distributions. For the nadjusted versions A
2
(1 + .2/
√
n) and W
2
(1 + .2/
√
n)
these null distributions appear to be independent of n and (1 − α)quantiles were given for α =
.25, .10, .05, .025, .01. Plotting log(α/(1 −α)) against q
1−α
shows a mildly quadratic pattern which
can be used to interpolate or extrapolate the appropriate pvalue (observed signiﬁcance level α) for
any observed nadjusted value A
2
(1 +.2/
√
n) and W
2
(1 +.2/
√
n), as is illustrated in Figure 7.
For
√
nD the null distribution still depends on n (in spite of the normalizing factor
√
n) and (1−α)
quantiles for α = .10, .05, .025, .01 were tabulated for n = 10, 20, 50, ∞ by Stephens (1986). Here
a double inter and extrapolation scheme is needed, ﬁrst by plotting these quantiles against 1/
√
n,
ﬁtting quadratics in 1/
√
n and reading oﬀ the four interpolated quantile values for the needed n
0
(the sample size at issue) and as a second step perform the interpolation or extrapolation scheme
as it was done previously, but using a cubic this time. This is illustrated in Figure 8.
22
Functions for computing these pvalues (via interpolation from Stephens’ tabled values) are given
in the Weibull R work space provided at the class web site. They are GOF.KS.test, GOF.CvM.test,
and GOF.AD.test for computing pvalues for nadjusted test criteria
√
nD, W
2
(1 + .2/
√
n) , and
A
2
(1 + .2/
√
n), respectively. These functions have an optional argument graphic where graphic
= T causes the interpolation graphs shown in Figures 7 and 8 to be produced, otherwise only the
pvalues are given. The function Weibull.GOF.test does a Weibull goodness of ﬁt test on any given
sample, returning pvalues for all three test criteria. You also ﬁnd there the function Weibull.mle
that was listed earlier, and several other functions not yet documented here.
One could easily reproduce and extend the tables given by Stephens (1986) so that extrapolations
becomes less of an issue. For n = 100 it should take less than 1.5 minutes to simulate the null
distributions based on N
sim
= 10, 000 and the previously given timing of 8.07 sec for N
sim
= 1000.
9 Pivots
Based on the previous equivariance properties of ˆ u(Y) and
ˆ
b(Y) we have the following pivots,
namely functions W = ψ(ˆ u(Y),
ˆ
b(Y), ϑ) of the estimates and an unknown parameter ϑ of interest
such that W has a ﬁxed and known distribution and the function ψ is strictly monotone in the
unknown parameter ϑ, so that it is invertible with respect to ϑ.
Recall that for a Weibull random sample X = (X
1
, . . . , X
n
) we have Y
i
= log(X
i
) ∼ G((y − u)/b)
with b = 1/β and u = log(α). Then Z
i
= (Y
i
−u)/b ∼ G(z) = 1 −exp(−exp(z)), which is a known
distribution (does not depend on unknown parameters). This is seen as follows:
P(Z
i
≤ z) = P((Y
i
−u)/b ≤ z) = P(Y
i
≤ u +bz) = G(([u +bz] −u)/b) = G(z) .
It is this known distribution of Z = (Z
1
, . . . , Z
n
) that is instrumental in knowing the distribution of
the four pivots that we discuss below. There we utilize the representation Y
i
= u+bZ
i
or Y = u+bZ
in vector form.
9.1 Pivot for the Scale Parameter b
As natural pivot for the scale parameter ϑ = b we take
W
1
=
ˆ
b(Y)
b
=
ˆ
b(u +bZ)
b
=
b
ˆ
b(Z)
b
=
ˆ
b(Z) .
The right side, being a function of Z alone, has a distribution that does not involve unknown
parameters and W
1
=
ˆ
b(Y)/b is strictly monotone in b.
23
q
q
q
q
q
A
2
×× ((1 ++ 0.2 n))
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0.0 0.5 1.0 1.5
0
.
0
0
1
0
.
0
1
0
.
0
5
0
.
2
5
0
.
5
0
.
7
5
0
.
9
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
q
q
q
q
q
W
2
×× ((1 ++ 0.2 n))
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0.00 0.05 0.10 0.15 0.20 0.25 0.30
0
.
0
0
1
0
.
0
1
0
.
0
5
0
.
2
5
0
.
5
0
.
7
5
0
.
9
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
Figure 7: Interpolation & Extrapolation for A
2
(1 +.2/
√
n) and W
2
(1 +.2/
√
n)
24
q
q
q
q
quadratic interpolation & linear extrapolation in 1 n
n (on 1 n scale)
∞∞
500 200 100 50 40 30 25 20 15 10 9 8 7 6 5 4
0
.
8
0
3
0
.
8
7
4
0
.
9
3
9
1
.
0
0
7
n
××
D
−
q
u
a
n
t
i
l
e
D
0
.
9
D
0
.
9
5
D
0
.
9
7
5
D
0
.
9
9
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
q
q
q
q
cubic interpolation & linear extrapolation in D
n ×× D
0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0
.
0
0
1
0
.
0
0
5
0
.
0
2
5
0
.
1
0
.
2
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
tabled values
interpolated quantiles
interpolated/extrapolated values
q
q
q
q
q
q
Figure 8: Interpolation & Extrapolation for
√
n ×D
25
How do we obtain the distribution of
ˆ
b(Z)? An analytical approach does not seem possible. The
approach followed here is that presented in Bain (1978), Bain and Engelhardt (1991) and originally
in Thoman et al. (1969, 1970), who provided tables for this distribution (and for those of the other
pivots discussed here) based on N
sim
simulated values of
ˆ
b(Z) (and ˆ u(Z)), where N
sim
= 20000 for
n = 5, N
sim
= 10000 for n = 6, 8, 10, 15, 20, 30, 40, 50, 75, and N
sim
= 6000 for n = 100.
In these simulations one simply generates samples Z
= (Z
1
, . . . , Z
n
) ∼ G(z) and ﬁnds
ˆ
b(Z
) (and
ˆ u(Z
) for the other pivots discussed later) for each such sample Z
. By simulating this process
N
sim
= 10000 times we obtain
ˆ
b(Z
1
), . . . ,
ˆ
b(Z
N
sim
). The empirical distribution function of these
simulated estimates
ˆ
b(Z
i
), denoted by
ˆ
H
1
(w), provides a fairly reasonable estimate of the sampling
distribution H
1
(w) of
ˆ
b(Z) and thus also of the pivot distribution of W
1
=
ˆ
b(Y)/b. From this
simulated distribution we can estimate any γquantile of H
1
(w) to any practical accuracy, provided
N
sim
is suﬃciently large. Values of γ closer to 0 or 1 require higher N
sim
. For .005 ≤ γ ≤ .995 a
simulation level of N
sim
= 10000 should be quite adequate.
If we denote the γquantile of H
1
(w) by η
1
(γ), i.e.,
γ = H
1
(η
1
(γ)) = P(
ˆ
b(Y)/b ≤ η
1
(γ)) = P(
ˆ
b(Y)/η
1
(γ) ≤ b)
we see that
ˆ
b(Y)/η
1
(γ) can be viewed as a 100γ% lower bound to the unknown parameter b. We do
not know η
1
(γ) but we can estimate it by the corresponding quantile ˆ η
1
(γ) of the simulated distri
bution
ˆ
H
1
(w) which serves as proxy for H
1
(w). We then use
ˆ
b(Y)/ˆ η
1
(γ) as an approximate 100γ%
lower bound to the unknown parameter b. For large N
sim
, say N
sim
= 10000, this approximation is
practically quite adequate.
We note here that a 100γ% lower bound can be viewed as a 100(1 − γ)% upper bound, because
1 − γ is the chance of the lower bound falling on the wrong side of its target, namely above. To
get 100γ% upper bounds one simply constructs 100(1 − γ)% lower bounds by the above method.
Similar comments apply to the pivots obtained below, where we only give onesided bounds (lower
or upper) in each case.
Based on the relationship b = 1/β the respective 100γ% approximate lower and upper conﬁdence
bounds for the Weibull shape parameter would be
ˆ η
1
(1 −γ)
ˆ
b(Y)
= ˆ η
1
(1 −γ) ×
ˆ
β(X) and
ˆ η
1
(γ)
ˆ
b(Y)
= ˆ η
1
(γ) ×
ˆ
β(X)
and an approximate 100γ% conﬁdence interval for β would be
_
ˆ η
1
((1 −γ)/2) ×
ˆ
β(X), ˆ η
1
((1 +γ)/2) ×
ˆ
β(X)
_
since (1 +γ)/2 = 1 −(1 −γ)/2. Here X = (X
1
, . . . , X
n
) is the untransformed Weibull sample.
26
9.2 Pivot for the Location Parameter u
For the location parameter ϑ = u we have the following pivot
W
2
=
ˆ u(Y) −u
ˆ
b(Y)
=
ˆ u(u +bZ) −u
ˆ
b(u +bZ)
=
u +bˆ u(Z) −u
b
ˆ
b(Z)
=
ˆ u(Z)
ˆ
b(Z)
.
It has a distribution that does not depend on any unknown parameter, since it only depends on the
known distribution of Z. Furthermore W
2
is strictly decreasing in u. Thus W
2
is a pivot with respect
to u. Denote this pivot distribution of W
2
by H
2
(w) and its γquantile by η
2
(γ). As before this
pivot distribution and its quantiles can be approximated suﬃciently well by simulating ˆ u(Z
)/
ˆ
b(Z
)
a suﬃcient number N
sim
times and using the empirical cdf
ˆ
H
2
(w) of the ˆ u(Z
i
)/
ˆ
b(Z
i
) as proxy for
H
2
(w).
As in the previous pivot case we can exploit this pivot distribution as follows
γ = H
2
(η
2
(γ)) = P
_
ˆ u(Y) −u
ˆ
b(Y)
≤ η
2
(γ)
_
= P(ˆ u(Y) −
ˆ
b(Y)η
2
(γ) ≤ u)
and thus we can view ˆ u(Y) −
ˆ
b(Y)η
2
(γ) as a 100γ% lower bound for the unknown parameter u.
Using the γquantile ˆ η
2
(γ) obtained from the empirical cdf
ˆ
H
2
(w) we then treat ˆ u(Y) −
ˆ
b(Y)ˆ η
2
(γ)
as an approximate 100γ% lower bound for the unknown parameter u.
Based on the relation u = log(α) this translates into an approximate 100γ% lower bound
exp(ˆ u(Y) −
ˆ
b(Y)ˆ η
2
(γ)) = exp(log(ˆ α(X)) − ˆ η
2
(γ)/
ˆ
β(X)) = ˆ α(X) exp(−ˆ η
2
(γ)/
ˆ
β(X)) for α.
Upper bounds and intervals are handled as in the previous situation for b or β.
9.3 Pivot for the pquantile y
p
With respect to the pquantile ϑ = y
p
= u +b log(−log(1 −p)) = u +bw
p
of the Y distribution the
natural pivot is
W
p
=
ˆ y
p
(Y) −y
p
ˆ
b(Y)
=
ˆ u(Y) +
ˆ
b(Y)w
p
−(u +bw
p
)
ˆ
b(Y)
=
ˆ u(u +bZ) +
ˆ
b(u +bZ)w
p
−(u +bw
p
)
ˆ
b(u +bZ)
=
u +bˆ u(Z) +b
ˆ
b(Z)w
p
−(u +bw
p
)
b
ˆ
b(Z)
=
ˆ u(Z) + (
ˆ
b(Z) −1)w
p
ˆ
b(Z)
.
Again its distribution only depends on the known distribution of Z and not on the unknown param
eters u and b and the pivot W
p
is a strictly decreasing function of y
p
. Denote this pivot distribution
27
function by H
p
(w) and its γquantile by η
p
(γ). As before, this pivot distribution and its quantiles
can be approximated suﬃciently well by simulating
_
ˆ u(Z) + (
ˆ
b(Z) −1)w
p
_
/
ˆ
b(Z) a suﬃcient num
ber N
sim
times. Denote the empirical cdf of such simulated values by
ˆ
H
p
(w) and the corresponding
γquantiles by ˆ η
p
(γ).
As before we proceed with
γ = H
p
(η
p
(γ)) = P
_
ˆ y
p
(Y) −y
p
ˆ
b(Y)
≤ η
p
(γ)
_
= P
_
ˆ y
p
(Y) −η
p
(γ)
ˆ
b(Y) ≤ y
p
_
and thus we can treat ˆ y
p
(Y) − η
p
(γ)
ˆ
b(Y) as a 100γ% lower bound for y
p
. Again we can treat
ˆ y
p
(Y) − ˆ η
p
(γ)
ˆ
b(Y) as an approximate 100γ% lower bound for y
p
.
Since
ˆ y
p
(Y) −η
p
(γ)
ˆ
b(Y) = ˆ u(Y) +w
p
ˆ
b(Y) −η
p
(γ)
ˆ
b(Y) = ˆ u(Y) −k
p
(γ)
ˆ
b(Y)
with k
p
(γ) = η
p
(γ) −w
p
, we could have obtained the same lower bound by the following argument
that does not use a direct pivot, namely
γ = P(ˆ u(Y) −k
p
(γ)
ˆ
b(Y) ≤ y
p
) = P(ˆ u(Y) −k
p
(γ)
ˆ
b(Y) ≤ u +bw
p
)
= P(ˆ u(Y) −u −k
p
(γ)
ˆ
b(Y) ≤ bw
p
)
= P
_
ˆ u(Y) −u
b
−k
p
(γ)
ˆ
b(Y)
b
≤ w
p
_
= P(ˆ u(Z) −k
p
(γ)
ˆ
b(Z) ≤ w
p
) = P
_
ˆ u(Z) −w
p
ˆ
b(Z)
≤ k
p
(γ)
_
and we see that k
p
(γ) can be taken as the γquantile of the distribution of (ˆ u(Z) −w
p
)/
ˆ
b(Z).
This distribution can be estimated by the empirical cdf of N
sim
simulated values (ˆ u(Z
i
)−w
p
)/
ˆ
b(Z
i
),
i = 1, . . . , N
sim
and its γquantile
ˆ
k
p
(γ) serves as a good approximation to k
p
(γ).
It is easily seen that this produces the same quantile lower bound as before. However, in this
approach one sees one further detail, namely that h(p) = −k
p
(γ) is strictly increasing in p
1
, since
w
p
is strictly increasing in p.
1
Suppose p
1
< p
2
and h(p
1
) ≥ h(p
2
) with γ = P(ˆ u(Z) +h(p
1
)
ˆ
b(Z) ≤ w
p1
) and γ = P(ˆ u(Z) +h(p
2
)
ˆ
b(Z) ≤ w
p2
) =
P(ˆ u(Z) + h(p
1
)
ˆ
b(Z) ≤ w
p1
+ (w
p2
− w
p1
) + (h(p
1
) − h(p
2
))
ˆ
b(Z)) ≥ P(ˆ u(Z) + h(p
1
)
ˆ
b(Z) ≤ w
p1
+ (w
p2
− w
p1
)) > γ
(i.e., γ > γ, a contradiction) since P(w
p1
< ˆ u(Z) +h(p
1
)
ˆ
b(Z) ≤ w
p1
+(w
p2
−w
p1
)) > 0. A thorough argument would
show that
ˆ
b(z) and thus ˆ u(z) are continuous functions of z = (z
1
, . . . , z
n
) and since there is positive probability in
any neighborhood of any z ∈ R there is positive probability in any neighborhood of (ˆ u(z),
ˆ
b(z)).
28
Of course it makes intuitive sense that quantile lower bounds should be increasing in p since its
target pquantiles are increasing in p. This strictly increasing property allows us to immediately
construct upper conﬁdence bounds for left tail probabilities as is shown in the next section.
Since x
p
= exp(y
p
) is the corresponding pquantile of the Weibull distribution we can view
exp
_
ˆ y
p
(Y) − ˆ η
p
(γ)
ˆ
b(Y)
_
= ˆ α(X) exp
_
(w
p
− ˆ η
p
(γ))/
ˆ
β(X)
_
= ˆ α(X) exp
_
−
ˆ
k
p
(γ)/
ˆ
β(X)
_
as an approximate 100γ% lower bound for x
p
= exp(u +bw
p
) = α(−log(1 −p))
1/β
.
Since α is the (1 − exp(−1))quantile of the Weibull distribution, lower bounds for it can be seen
as a special case of quantile lower bounds. Indeed, this particular quantile lower bound coincides
with the one given previously.
9.4 Upper Conﬁdence Bounds for the Tail Probability p(y) = P(Y ≤ y)
As far as an appropriate pivot for p(y) = P(Y ≤ y) is concerned, the situation here is not as
straightforward as in the previous three cases. Clearly
ˆ p(y) = G
_
y − ˆ u(Y)
ˆ
b(Y)
_
is the natural estimate (mle) of p(y) = P(Y ≤ y) = G
_
y −u
b
_
and one easily sees that the distribution function H of this estimate depends on u and b only
through p(y), namely
ˆ p(y) = G
_
y − ˆ u(Y)
ˆ
b(Y)
_
= G
_
(y −u)/b −(ˆ u(Y) −u)/b
ˆ
b(Y)/b
_
= G
_
G
−1
(p(y)) − ˆ u(Z)
ˆ
b(Z)
_
∼ H
p(y)
.
Thus by the probability integral transform it follows that
W
p(y)
= H
p(y)
(ˆ p(y)) ∼ U(0, 1)
i.e., W
p(y)
is a true pivot, contrary to what is stated in Bain (1978) and Bain and Engelhardt (1991).
Rather than using this pivot we will go a more direct route as was indicated by the strictly increasing
property of h(p) = h
γ
(p) in the previous section. Denote by h
−1
(·) the inverse function to h(·). We
then have
γ = P(ˆ u(Y) +h(p)
ˆ
b(Y) ≤ y
p
) = P(h(p) ≤ (y
p
−ˆ u(Y))/
ˆ
b(Y)) = P
_
p ≤ h
−1
_
(y
p
− ˆ u(Y))/
ˆ
b(Y)
__
,
for any p ∈ (0, 1). If we parameterize such p via p(y) = P(Y ≤ y) = G((y −u)/b) we have y
p(y)
= y
and thus also
γ = P
_
p(y) ≤ h
−1
_
(y − ˆ u(Y))/
ˆ
b(Y)
__
29
for any y ∈ R and u ∈ R and b > 0. Hence ˆ p
U
(y) = h
−1
_
(y − ˆ u(Y))/
ˆ
b(Y)
_
can be viewed as
100γ% upper conﬁdence bound for p(y) for any given threshold y.
The only remaining issue is the computation of such bounds. Does it require the inversion of h
and the concomitant calculations of many h(p) = −k(p) for the iterative convergence of such an
inversion? It turns out that there is a direct path just as we had it in the previous three conﬁdence
bound situations.
Note that h
−1
(x) solves −k
p
= x for p. We claim that h
−1
(x) is the γquantile of the G(ˆ u(Z)+x
ˆ
b(Z))
distribution which we can simulate by calculating as before ˆ u(Z) and
ˆ
b(Z) a large number N
sim
times.
The above claim concerning h
−1
(x) is seen as follows. If for any x = h(p) we have
P(G(ˆ u(Z) +x
ˆ
b(Z)) ≤ h
−1
(x)) = P(G(ˆ u(Z) +h(p)
ˆ
b(Z)) ≤ p)
= P(ˆ u(Z) +h(p)
ˆ
b(Z) ≤ w
p
)
= P(ˆ u(Z) −k
γ
(p)
ˆ
b(Z) ≤ w
p
) = γ ,
as seen in the previous section. Thus h
−1
(x) is the γquantile of the G(ˆ u(Z) +x
ˆ
b(Z)) distribution.
If we observe Y = y and obtain ˆ u(y) and
ˆ
b(y) as our maximum likelihood estimates for u and
b we get our 100γ% upper bound for p(y) = G((y − u)/b) as follows: For the ﬁxed value of
x = (y − ˆ u(y))/
ˆ
b(y) = G
−1
(ˆ p(y)) simulate the G(ˆ u(Z) +x
ˆ
b(Z)) distribution (with suﬃciently high
N
sim
) and calculate the γquantile of this distribution as the desired approximate 100γ% upper
bound for p(y) = P(Y ≤ y) = G((y −u)/b).
10 Tabulation of Conﬁdence Quantiles η(γ)
For the pivots for b, u and y
p
it is possible to carry out simulations once and for all for a desired set
of conﬁdence levels γ, sample sizes n and choices of p, and tabulate the required conﬁdence quantiles
ˆ η
1
(γ), ˆ η
2
(γ), and ˆ η
p
(γ). This has essentially been done (with
√
n scaling modiﬁcations) and such
tables are given in Bain (1978), Bain and Engelhardt (1991) and Thoman et al. (1969,1970). Similar
tables for bounds on p(y) are not quite possible since the appropriate bounds depend on the observed
value of ˆ p(y), which varies from sample to sample. Instead Bain (1978), Bain and Engelhardt (1991)
and Thoman et al. (1970) tabulate conﬁdence bounds for p(y) for a reasonably ﬁne grid of values
for ˆ p(y), which can then serve for interpolation purposes with the actually observed value of ˆ p(y).
It should be quite clear that all this requires extensive tabulation. The use of these tables is not
easy. Table 4 in Bain (1978) does not have a consistent format and using these tables would
require delving deeply into the text for each new use, unless one does this kind of calculation all
the time. In fact, in the second edition, Bain and Engelhardt (1991), Table 4 has been greatly
reduced to just cover the conﬁdence factors dealing with the location parameter u, and it now
30
leaves out the conﬁdence factors for general pquantiles. For the pquantiles one is referred to the
same interpolation scheme that is needed when getting conﬁdence bounds for p(y), using Table 7
in Bain and Engelhardt (1991). The example that they present (page 248) would have beneﬁtted
by showing some intermediate steps in the interpolation process. They point out that the resulting
conﬁdence bound for x
p
is slightly diﬀerent (14.03) from that obtained using the conﬁdence quantiles
of the original Table 4, namely 13.92. They attribute the diﬀerence to roundoﬀ errors or other
discrepancies. Among the latter one may consider that possibly diﬀerent simulations were involved.
Further, note that some entries in the tables given in Bain (1978) seem to have typos. Presumably
they were transcribed by hand from computer output, just as the book (and its second edition) itself
is typed and not typeset. We give just give a few examples. In Bain (1978) Table 4A, p.235, bottom
row, the second entry from the right should be 3.625 instead of 3.262. This discrepancy shows up
clearly when plotting the row values against log(p/(1 − p)), see a similar plot for a later example.
In Table 3A, p.222, row 3 column 5 shows a double minus sign (still present in the 1991 second
edition). In comparing the values of these tables with our own simulation of pivot distribution
quantiles, just to validate our simulation for n = 40, we encountered an apparent error in Table
4A, p. 235 with last column entry of 4.826. Plotting log(p/(1 −p)) against the corresponding row
value (γquantiles) one clearly sees a change in pattern, see the top plot in Figure 9. We suspect
that the whole last column was calculated for p = .96 instead of the indicated p = .98. The bottom
plot shows our simulated values for these quantiles as solid dots with the previous points (circles)
superimposed.
The agreement is good for the ﬁrst 8 points. Our simulated γquantile was 5.725 (corresponding to
the 4.826 above) and it ﬁts quite smoothly into the pattern of the previous 8 points. Given that
this was the only case chosen for comparison it leaves some concern in fully trusting these tables.
However, this example also shows that the great majority of tabled values are valid.
11 The R Function WeibullPivots
Rather than using these tables we will resort to direct simulations ourselves since computing speed
has advanced suﬃciently over what was common prior to 1978. Furthermore, computing availability
has changed dramatically since then. It may be possible to further increase computing speed by
putting the loop over N
sim
calculations of mle’s into compiled form rather than looping within R for
each simulation iteration. For example, using qbeta in vectorized form reduced the computing time
to almost 1/3 of the time compared to looping within R itself over the elements in the argument
vector of qbeta.
However, such an increase in speed would require writing Ccode (or Fortran code) and linking that
in compiled form to R. Such extensions of R are possible, see chapter 5 System and foreign language
interfaces in the Writing R Extensions manual available under the toolbar Help in R.
31
2 3 4 5
0
1
2
3
4
Bain's tabled quantiles for n=40, γγ == 0.9
l
o
g
(
p
/
(
1
−
p
)
)
q0.96
0.98
q
q
q
q
q
q
q
q
q
2 3 4 5
0
1
2
3
4
our simulated quantiles for n=40, γγ == 0.9
l
o
g
(
p
/
(
1
−
p
)
)
Figure 9: Abnormal Behavior of Tabulated Conﬁdence Quantiles
32
For the R function WeibullPivots (available within the R work space for Weibull Distribution
Applications on the class web site) the call
system.time(WeibullPivots(Nsim = 10000, n = 10, r = 10, graphics = F))
gave an elapsed time of 59.76 seconds. Here the default sample size n = 10 was used and r = 10
(also default) indicates that the 10 lowest sample values are given and used, i.e., in this case the
full sample. Also, an internally generated Weibull data set was used, since the default in the call to
WeibullPivots is weib.sample=NULL. For sample sizes n = 100 with r = 100 and n = 1000 with
r = 1000 the corresponding calls resulted in elapsed times of 78.22 and 269.32 seconds, respectively.
These three computing times suggest strong linear behavior in n as is illustrated in Figure 10. The
intercept 57.35 and slope of .2119 given here are fairly consistent with the intercept .005886 and
slope of 2.001 ×10
−5
given in Figure 4. The latter give the calculation time of a single set of mle’s
while in the former case we calculate N
sim
= 10000 such mle’s, i.e., the previous slope and intercept
for a single mle calculation need to be scaled up by the factor 10000.
For all the previously discussed conﬁdence bounds, be they upper or lower bounds for their respective
targets, all that is needed is the set of (ˆ u(z
i
),
ˆ
b(z
i
)) for i = 1, . . . , N
sim
. Thus we can construct
conﬁdence bounds and intervals for u and b, for y
p
for any collection of p values, and for p(y) and
1 −p(y) for any collection of threshold values y and we can do this for any set of conﬁdence levels
that make sense for the simulated distributions, i.e., we don’t have to run the simulations over and
over for each target parameter, conﬁdence level, p or y, unless one wants independent simulations
for some reason.
Proper use of this function only requires understanding the calling arguments, purpose, and output
of this function, and the time to run the simulations. The time for running the simulation should
easily beat the time spent in dealing with tabulated conﬁdence quantiles in order to get desired
conﬁdence bounds, especially since WeibullPivots does such calculations all at once for a broad
spectrum of y
p
and p(y) and several conﬁdence levels without greatly impacting the computing time.
Furthermore, WeibullPivots does all this not only for full samples but also for type II censored
samples, for which appropriate conﬁdence factors are available only sparsely in tables.
We will now explain the calling sequence of WeibullPivots and its output. The calling sequence
with all arguments given with their default values is as follows:
WeibullPivots(weib.sample=NULL,alpha=10000,beta=1.5,n=10,r=10,
Nsim=1000,threshold=NULL,graphics=T)
Here Nsim = N
sim
has default value 1000 which is appropriate when trying to get a feel for the
function for any particular data set. The sample size is input as n = n and r = r indicates the
number of smallest sample values available for analysis. When r < n we are dealing with a type II
censored data set where observation stops as soon as the smallest r lifetimes have been observed.
We need r > 1 and at least two distinct observations among X
(1)
, . . . , X
(r)
in order to estimate
33
q
q
q
0 200 400 600 800 1000
0
5
0
1
0
0
1
5
0
2
0
0
2
5
0
3
0
0
sample size n
e
l
a
p
s
e
d
t
i
m
e
f
o
r
W
e
i
b
u
l
l
P
i
v
o
t
s
(
N
s
i
m
=
1
0
0
0
0
)
=
=
intercept
slope
57.35
0.2119
Figure 10: Timings for WeibullPivots for Various n
any spread in the data. The available sample values X
1
, . . . , X
r
(not necessarily ordered) are given
as vector input to weib.sample. When weib.sample=NULL (the default), an internal data set is
generated as input sample from fromW(α, β) with α = alpha = 10000 (default) and β = beta = 1.5
(default), either by using the full sample X
1
, . . . , X
n
or a type II censored sample X
1
, . . . , X
r
when
r < n is speciﬁed. The input thresh (= NULL by default) is a vector of thresholds y for which
we desire upper conﬁdence bounds for p(y). The input graphics (default T) indicates whether
graphical output is desired.
Conﬁdence levels γ are set internally as .005, .01, .025, .05, .10, .02, .8, .9, .95, .975, .99, .995 and these
levels indicate the coverage probability for the individual onesided bounds. A .025 lower bound
is reported as a .975 upper bound, and a pair of .975 lower and upper bounds constitute a 95%
conﬁdence interval. The values of p for which conﬁdence bounds or intervals for x
p
are provided
are also set internally as .001, .005, .01, .025, .05, .1, (.1), .9, .95, .975, .99, .995, .999.
The output from WeibullPivots is a list with components:
$alpha.hat, $alpha.hat, $alpha.beta.bounds,$p.quantile.estimates, $p.quantile.bounds,
$Tail.Probability.Estimates, and $Tail.Probability.Bounds. The structure and meaning of
these components will become clear from the example output given below.
34
$alpha.hat
(Intercept)
8976.2
$beta.hat
[1] 1.95
$alpha.beta.bounds
alpha.L alpha.U beta.L beta.U
99.5% 5094.6 16705 0.777 3.22
99% 5453.9 15228 0.855 3.05
97.5% 5948.6 13676 0.956 2.82
95% 6443.9 12608 1.070 2.64
90% 7024.6 11600 1.210 2.42
80% 7711.2 10606 1.390 2.18
$p.quantile.estimates
0.001quantile 0.005quantile 0.01quantile 0.025quantile 0.05quantile
259.9 593.8 848.3 1362.5 1957.0
0.1quantile 0.2quantile 0.3quantile 0.4quantile 0.5quantile
2830.8 4159.4 5290.4 6360.5 7438.1
0.6quantile 0.7quantile 0.8quantile 0.9quantile 0.95quantile
8582.7 9872.7 11457.2 13767.2 15756.3
0.975quantile 0.99quantile 0.995quantile 0.999quantile
17531.0 19643.5 21107.9 24183.6
$p.quantile.bounds
99.5% 99% 97.5% 95% 90% 80%
0.001quantile.L 1.1 2.6 6.0 12.9 28.2 60.1
0.001quantile.U 1245.7 1094.9 886.7 729.4 561.4 403.1
0.005quantile.L 8.6 16.9 31.9 57.4 106.7 190.8
0.005quantile.U 2066.9 1854.9 1575.1 1359.2 1100.6 845.5
0.01quantile.L 20.1 36.7 65.4 110.1 186.9 315.3
0.01quantile.U 2579.8 2361.5 2021.5 1773.9 1478.4 1165.8
0.025quantile.L 62.8 103.5 169.7 259.3 398.1 611.0
0.025quantile.U 3498.8 3206.6 2827.2 2532.5 2176.9 1783.5
35
0.05quantile.L 159.2 229.2 352.6 497.5 700.0 1011.9
0.05quantile.U 4415.7 4081.3 3673.7 3329.5 2930.0 2477.2
0.1quantile.L 398.3 506.3 717.4 962.2 1249.5 1679.1
0.1quantile.U 5584.6 5261.9 4811.6 4435.7 3990.8 3474.1
0.2quantile.L 1012.6 1160.2 1518.8 1882.9 2287.1 2833.2
0.2quantile.U 7417.1 6978.2 6492.8 6031.2 5543.2 4946.9
0.3quantile.L 1725.4 1945.2 2383.9 2820.0 3305.0 3929.0
0.3quantile.U 8919.8 8460.0 7939.8 7384.1 6865.0 6211.4
0.4quantile.L 2548.0 2848.2 3345.2 3806.6 4353.9 5008.2
0.4quantile.U 10616.3 10130.4 9380.3 8778.2 8139.3 7421.2
0.5quantile.L 3502.4 3881.1 4415.1 4873.3 5443.0 6107.3
0.5quantile.U 12809.0 11919.1 10992.9 10226.8 9485.1 8703.4
0.6quantile.L 4694.0 5022.6 5573.8 6052.8 6624.4 7300.4
0.6quantile.U 15626.1 14350.6 12941.3 11974.8 11041.1 10106.2
0.7quantile.L 6017.1 6399.0 6876.6 7345.8 7938.2 8628.0
0.7quantile.U 19271.6 17679.9 15545.8 14181.1 12958.0 11784.2
0.8quantile.L 7601.3 7971.0 8465.4 8933.5 9504.0 10244.2
0.8quantile.U 24765.2 22445.0 19286.0 17236.0 15605.6 13952.2
0.9quantile.L 9674.7 10033.7 10538.6 11031.1 11653.0 12460.3
0.9quantile.U 35233.4 31065.3 26037.4 22670.5 19835.3 17417.5
0.95quantile.L 11203.6 11584.6 12145.2 12660.2 13365.5 14311.2
0.95quantile.U 46832.9 40053.3 32863.1 27904.7 23903.9 20703.0
0.975quantile.L 12434.7 12833.5 13449.7 14030.5 14781.8 15909.1
0.975quantile.U 59783.1 49209.9 39397.8 33118.7 27938.4 23773.7
0.99quantile.L 13732.6 14207.7 14876.0 15530.0 16431.7 17783.1
0.99quantile.U 76425.0 61385.4 48625.4 40067.3 33233.8 27729.8
0.995quantile.L 14580.4 15115.4 15810.0 16530.6 17551.8 19081.0
0.995quantile.U 89690.9 71480.4 55033.4 45187.1 36918.7 30505.0
0.999quantile.L 16377.7 16885.9 17642.5 18557.1 19792.4 21744.7
0.999quantile.U 121177.7 95515.7 71256.5 56445.5 45328.1 36739.2
$Tail.Probability.Estimates
p(6000) p(7000) p(8000) p(9000) p(10000) p(11000) p(12000) p(13000)
0.36612 0.45977 0.55018 0.63402 0.70900 0.77385 0.82821 0.87242
p(14000) p(15000)
0.90737 0.93424
36
$Tail.Probability.Bounds
99.5% 99% 97.5% 95% 90% 80%
p(6000).L 0.12173 0.13911 0.16954 0.19782 0.23300 0.28311
p(6000).U 0.69856 0.67056 0.63572 0.59592 0.54776 0.49023
p(7000).L 0.17411 0.20130 0.23647 0.26985 0.31017 0.36523
p(7000).U 0.76280 0.73981 0.70837 0.67426 0.62988 0.57670
p(8000).L 0.23898 0.26838 0.30397 0.34488 0.38942 0.44487
p(8000).U 0.82187 0.80141 0.77310 0.74260 0.70414 0.65435
p(9000).L 0.30561 0.33149 0.37276 0.41748 0.46448 0.52203
p(9000).U 0.87042 0.85462 0.82993 0.80361 0.77045 0.72545
p(10000).L 0.36871 0.39257 0.44219 0.48549 0.53589 0.59276
p(10000).U 0.91227 0.89889 0.87805 0.85624 0.82667 0.78641
p(11000).L 0.41612 0.45097 0.50030 0.54631 0.59749 0.65671
p(11000).U 0.94491 0.93318 0.91728 0.89891 0.87425 0.83973
p(12000).L 0.46351 0.50388 0.55531 0.60133 0.65374 0.71215
p(12000).U 0.96669 0.95936 0.94650 0.93210 0.91231 0.88377
p(13000).L 0.50876 0.54776 0.60262 0.65055 0.70218 0.76148
p(13000).U 0.98278 0.97742 0.96794 0.95756 0.94149 0.91745
p(14000).L 0.54668 0.58696 0.64619 0.69359 0.74451 0.80178
p(14000).U 0.99201 0.98837 0.98205 0.97459 0.96267 0.94321
p(15000).L 0.58089 0.62534 0.68389 0.73194 0.78068 0.83590
p(15000).U 0.99653 0.99449 0.99092 0.98596 0.97764 0.96268
The above output was produced with
WeibullPivots(threshold = seq(6000, 15000, 1000), Nsim = 10000, graphics = T).
Since we entered graphics=T as argument we also got two pieces of graphical output. The ﬁrst
gives the two intrinsic pivot distributions of ˆ u/
ˆ
b and
ˆ
b in Figure 11. The second gives a Weibull
plot of the generated sample with a variety of information and with several types of conﬁdence
bounds, see Figure 12. The legend in the upper left gives the mle’s of α, β (agreeing with the
output above), and the mean µ = αΓ(1 + 1/β) together with 95% conﬁdence intervals, based on
respective normal approximation theory for the mle’s. The legend in the lower right explains the
red ﬁtted line (representing the mle ﬁt) and the various pointwise conﬁdence bound curves, giving
95% conﬁdence intervals (blue dashed curves) for pquantiles x
p
for any p on the ordinate and 95%
conﬁdence intervals (green dotdashed line) for p(y) for any y on the abscissa. Both of these interval
types use normal approximations from large sample mle theory. Unfortunately these two types of
bounds are not dual to each other, i.e., don’t coincide or to say it diﬀerently, one is not the inverse
to the other.
37
u
^
b
^
F
r
e
q
u
e
n
c
y
−2 −1 0 1
0
2
0
0
6
0
0
1
0
0
0
b
^
F
r
e
q
u
e
n
c
y
0.5 1.0 1.5 2.0
0
2
0
0
4
0
0
6
0
0
8
0
0
Figure 11: Pivot Distributions of ˆ u/
ˆ
b and
ˆ
b
38
cycles x 1000
p
r
o
b
a
b
i
l
i
t
y
q
q
q
q
q
q
q
q
q
q
0.001 0.01 0.1 1 10 100
.
0
0
1
.
0
1
0
.
1
0
0
.
2
0
0
.
5
0
0
.
9
0
0
.
9
9
9
.
6
3
2
m.l.e.
95 % q−confidence bounds
95 % p−confidence bounds
95 % monotone qp−confidence bounds
Weibull Plot
αα
^
= 8976 , 95 % conf. interval ( 6394 , 12600 )
ββ
^
= 1.947 , 95 % conf. interval ( 1.267 , 2.991 )
MTTF
µµ
^
= 7960 , 95 % conf. interval ( 5690 , 11140 )
n = 10 , r = 10 failed cases
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 12: Weibull Plot Corresponding to Previous Output
39
A third type of bound is presented in the orange curve which simultaneously provides 95% conﬁdence
intervals for x
p
and p(x), depending on the direction in which the curves are used. We either read
sideways from p and down from the curve (at that p level) to get upper and lower bounds for x
p
, or
we read vertically up from an abscissa value x to read oﬀ upper and lower bounds for p(x) on the
ordinate axis as we go from the respective curves at that x value to the left. These latter bounds
are also based on normal mle approximation theory and the approximation will naturally suﬀer for
small sample sizes. However, the principle behind these bounds is a unifying one in that the same
curve is used for quantile and tail probability bounds. If instead of using the approximating normal
distribution one uses the parametric bootstrap approach (simulating samples from an estimated
Weibull distribution) the unifying principle reduces to the pivot simulation approach, i.e., is basically
exact except for the simulation aspect N
sim
< ∞.
The curves representing the latter (pivots with simulated distributions) are the solid black lines
connecting the solid black dots which represent the x
p
95% conﬁdence intervals (using the 97.5%
lower and upper bounds to x
p
given in our output example above. Also seen on these curves are
solid red dots that correspond to the abscissa values x = 6000, (1000), 15000 and viewed vertically
they represent 95% conﬁdence intervals for p(x). This illustrates that the same curves are used.
Figure 13 represents an extreme case where we have a sample of size n = 2 and here another aspect
becomes apparent. Both of the ﬁrst two types of bounds (blue and green) are no longer monotone
in p or x respectively. This is in the nature of an imperfect normal approximation for these two
approaches. Thus we could not (at least not generally) have taken either to take the role of serving
both purposes, i.e., as providing bounds for x
p
and p(x) simultaneously. However, the orange curve
is still monotone and still serves that dual purpose, although its coverage probability properties are
bound to be aﬀected badly by the small sample size n = 2. The pivot based curves are also strictly
monotone and they have exact coverage probability, subject to the N
sim
< ∞ limitation.
40
cycles x 1000
p
r
o
b
a
b
i
l
i
t
y
q
q
0.1 0.2 0.5 1 2 5 10 20 50 100
.
0
0
1
.
0
1
0
.
1
0
0
.
2
0
0
.
5
0
0
.
9
0
0
.
9
9
9
.
6
3
2
m.l.e.
95 % q−confidence bounds
95 % p−confidence bounds
95 % monotone qp−confidence bounds
Weibull Plot
αα
^
= 8125 , 95 % conf. interval ( 6954 , 9493 )
ββ
^
= 9.404 , 95 % conf. interval ( 2.962 , 29.85 )
MTTF
µµ
^
= 7709 , 95 % conf. interval ( 6448 , 9217 )
n = 2 , r = 2 failed cases
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 13: Weibull Plot for Weibull Sample of Size n = 2
41
12 Regression
Here we extend the results of the previous location/scale model for logtransformed Weibull samples
to the more general regression model where the location parameter u for Y
i
= log(X
i
) can vary with
i, speciﬁcally it varies as a linear combination of known covariates c
i,1
, . . . , c
i,k
for the i
th
observation
as follows:
u
i
= ζ
1
c
i,1
+. . . +ζ
k
c
i,k
, i = 1, . . . , n ,
while the scale parameter b stays constant. Thus we have the following model for Y
1
, . . . , Y
n
Y
i
= u
i
+bE
i
= ζ
1
c
i,1
+. . . +ζ
k
c
i,k
+bZ
i
, i = 1, . . . , n ,
with independent Z
i
∼ G(z) = 1−exp(−exp(z)), i = 1, . . . , n, and unknown parameters b > 0 and
ζ
1
, . . . , ζ
k
∈ R.
Two concrete examples of this general linear model will be discussed in detail later on. The ﬁrst is
the simple linear regression model and the other is the ksample model, which exempliﬁes ANOVA
situations.
It can be shown (Scholz, 1996) that the mle’s
ˆ
ζ
= (
ˆ
ζ
1
, . . . ,
ˆ
ζ
k
) and
ˆ
b of ζ and b exist and are unique
provided the covariate matrix C, consisting of the rows c
i
= (c
i,1
, . . . , c
i,k
), i = 1, . . . , n, has full
rank k and n > k. It is customary that the ﬁrst column of C is a vector of n 1’s. Alternatively,
one can also only specify the remaining k − 1 columns and implicitly invoke the default option in
survreg that augments those columns with such a 1vector. These two usages are illustrated in the
function WeibullReg which is given on the next page.
It is very instructive to run this function as part of the following call:
{\tt system.time(for(i in 1:1000)WeibullReg())} ,
i.e., we execute the function WeibullReg a thousand times in close succession.
The rapidly varying plots give a good visual image of the sampling uncertainty and the resulting
sampling variation of the ﬁtted lines. The ﬁxed line represents the true line with respect to which
the Weibull data are generated by simulation. Of course, the logWeibull data are plotted because
of its more transparent relationship to the true line. It is instructive to see the variability of the
data clouds around the true line, but also the basic stability of the overall cloud pattern as a whole.
On my laptop the elapsed time for this call is about 15 seconds, and this includes the plotting time.
When the plotting commands are commented out the elapsed time reduces to about 9 seconds.
This promises reasonable behavior with respect to the computing times that can be anticipated for
the conﬁdence bounds to be discussed below.
42
WeibullReg < function (n=50,x=NULL,alpha=10000,beta=1.5,slope=.05)
{
# We can either input our own covariate vector x of length n
# or such a vector is generated for us (default).
#
if(is.null(x)) x < (1:n(n+1)/2)
uvec < log(alpha)+slope*x
b < 1/beta
# Create the Weibull data
time < exp(uvec+b*log(log(1runif(n))))
# Creating good vertical plotting limits
m < min(uvec)+b*log(log(11/(3*n+1)))
M < max(uvec)+b*log(log(1/(3*n+1)))
plot(x,log(time),ylim=c(m,M))
dat < data.frame(time,x)
out < survreg(Surv(time)~x,data=dat,dist="weibull")
# The last two lines would give the same result as the next three lines
# after removing the # signs.
# x0 < rep(1,n)
# dat < data.frame(time,x0,x)
# survreg(formula = Surv(time) ~ x0 + x  1, data = dat, dist = "weibull")
# Here we created the vector x0 of ones explicitly and removed the implicit
# vector of ones by the 1 in ~ x0+x1.
# Note also, that we did not use a status vector (of ones) in the creation
# of dat, since survreg will use status = 1 for each observation, i.e,
# treat the given time as a failure time as default.
abline(log(alpha),slope) #true line
# estimated line
abline(out$coef[1],out$coef[2],col="blue",lty=2)
# Here out has several components, of which only
# out$coef and out$scale of of interest to us.
# The estimate out$scale is the mle of b=1/beta
# and out$coef is a vector that gives the mle’s
# of intercept and the various regression coefficients.
out
}
43
12.1 Equivariance Properties
From the existence and uniqueness of the mle’s we can again deduce the following equivariance
properties for the mle’s, namely for r = Ca +σz we have
ˆ
ζ(r) = a +σ
ˆ
ζ(z) and
ˆ
b(r) = σ
ˆ
b(z) .
The proof follows the familiar line used in the location/scale case. With r
i
= c
i
a +σz
i
we have
sup
b,ζ
_
n
i=1
1
b
g
_
r
i
−c
i
ζ
b
__
=
1
σ
n
sup
b,ζ
_
n
i=1
1
b/σ
g
_
z
i
−c
i
(ζ −a)/σ
b/σ
__
using
˜
ζ = (ζ −a)/σ and
˜
b = b/σ =
1
σ
n
sup
˜
b,
˜
ζ
_
n
i=1
1
˜
b
g
_
z
i
−c
i
˜
ζ
˜
b
__
=
1
σ
n
n
i=1
1
ˆ
b(z)
g
_
z
i
−c
i
ˆ
ζ(z)
ˆ
b(z)
_
On the other hand
sup
b,ζ
_
n
i=1
1
b
g
_
r
i
−c
i
ζ
b
__
=
n
i=1
1
ˆ
b(r)
g
_
r
i
−c
i
ˆ
ζ(r)
ˆ
b(r)
_
=
1
σ
n
n
i=1
1
ˆ
b(r)/σ
g
_
z
i
−c
i
(
ˆ
ζ(r) −a)/σ
ˆ
b(r)/σ
_
and by the uniqueness of the mle’s the equivariance claim is an immediate consequence.
12.2 Pivots and Conﬁdence Bounds
From these equivariance properties it follows that (
ˆ
ζ −ζ)/
ˆ
b and
ˆ
b/b have distributions that do not
depend on any unknown parameters, i.e., b and ζ. The logtransformed Weibull data have the
following regression structure Y = Cζ + bZ, where Z = (Z
1
, . . . , Z
n
)
consists of independent and
identically distributed components with known cdf G(z) = 1−exp(−exp(z)). From the equivariance
property we have that
ˆ
ζ(Y ) = ζ +b
ˆ
ζ(Z) and
ˆ
b(Y ) = b
ˆ
b(Z) .
Thus
ˆ
ζ(Y ) −ζ
ˆ
b(Y )
=
b
ˆ
ζ(Z)
b
ˆ
b(Z)
=
ˆ
ζ(Z)
ˆ
b(Z)
and
ˆ
b(Y )
b
=
b
ˆ
b(Z)
b
=
ˆ
b(Z) ,
44
which have a distribution free of any unknown parameters. This distribution can be approximated
to any desired degree via simulation, just as in the location scale case, except that we will need to
incorporate the known covariate matrix C in the call to survreg in order to get the N
sim
simulated
parameter vectors (
ˆ
ζ(Z
1
),
ˆ
b
1
(Z
1
)), . . . , (
ˆ
ζ(Z
N
sim
),
ˆ
b(Z
N
sim
)) and thus the empirical distribution of
(
ˆ
ζ(Z
1
)/
ˆ
b
1
(Z
1
),
ˆ
b
1
(Z
1
)), . . . , (
ˆ
ζ(Z
N
sim
)/
ˆ
b(Z
N
sim
),
ˆ
b(Z
N
sim
)).
For any target covariate vector c
0
= (c
0,1
, . . . , c
0,k
) the distribution of (c
0
ˆ
ζ(Y ) −c
0
ζ)/
ˆ
b(Y ) is free
of unknown parameters since
c
0
ˆ
ζ(Y ) −c
0
ζ
ˆ
b(Y )
=
c
0
ˆ
ζ(Z)
ˆ
b(Z)
and we can use the simulated values (c
0
ˆ
ζ(Z
i
))/
ˆ
b(Z
i
), i = 1, . . . , N
sim
, to approximate this pa
rameter free distribution. If ˆ η
2
(γ, c
0
) denotes the γquantile of this simulated distribution then we
can view c
0
ˆ
ζ(Y ) − ˆ η
2
(γ, c
0
)
ˆ
b(Y ) as an approximate 100γ% lower bound for c
0
ζ, the log of the
characteristic life at the covariate vector c
0
. This can be demonstrated as in the location/scale case
for the location parameter u.
Similarly, if ˆ η
1
(γ) is the γquantile of the simulated
ˆ
b(Z
i
), i = 1, . . . , N
sim
, then we can view
ˆ
b(Y )/ˆ η
1
(γ) as approximate 100γ% lower bound for b. We note here that these quantiles ˆ η
1
(γ)
and ˆ η
2
(γ, c
0
) depend on the original covariance matrix C, i.e., they diﬀer from those used in the
location/scale case. The same comment applies to the other conﬁdence bound procedures following
below.
For a given covariate vector c
0
we can target the pquantile y
p
(c
0
) = c
0
ζ +bw
p
of the Y distribution
with covariate dependent location parameter u(c
0
) = c
0
ζ and scale parameter b. We can calculate
c
0
ˆ
ζ(Y ) −
ˆ
k
p
(γ)
ˆ
b(Y ) as an approximate 100γ% lower bound for y
p
(c
0
), where
ˆ
k
p
(γ) is the γquantile
of the simulated (c
0
ˆ
ζ(Z
i
) −w
p
)/
ˆ
b(Z
i
), i = 1, . . . , N
sim
.
For the tail probability p(y
0
) = G((y
0
−c
0
ζ)/b) with given threshold y
0
and covariate vector c
0
we
obtain an approximate 100γ% upper bound by using the γquantile of the simulated values
G(c
0
ˆ
ζ(Z
i
) −x
ˆ
b(Z
i
)), i = 1, . . . , N
sim
,
where x = (y
0
− c
0
ˆ
ζ(y)/
ˆ
b(y) and y is the originally observed sample vector, obtained under the
covariate conditions speciﬁed through V .
We note here that the above conﬁdence bounds for the log(characteristic life) or regression location,
pquantiles and tail probabilities depend on the covariate vector c
0
that is speciﬁed. Not only does
this dependence arise through the use of c
0
ˆ
ζ(Y ) in each case but also through the simulated
distributions which incorporate c
0
in each of these three situations. The only exception is the
conﬁdence bound for b, which makes sense since we assumed a constant scale for all covariate
situations.
45
12.3 The Simple Linear Regression Model
Here we assume the following simple linear regression model for the Y
i
= log(X
i
)
Y
i
= ζ
1
+ζ
2
c
i
, i = 1, . . . , n .
In matrix notation this becomes
Y =
_
_
_
_
Y
1
.
.
.
Y
n
_
_
_
_
=
_
_
_
_
1 c
1
.
.
.
.
.
.
1 c
n
_
_
_
_
_
ζ
1
ζ
2
_
+b
_
_
_
_
Z
1
.
.
.
Z
n
_
_
_
_
= Cζ +b Z .
Here ζ
1
and ζ
2
represent the intercept and slope parameters in the straight line regression model for
the location parameter and b represents the degree of scatter (scale) around that line. In the context
of the general regression model we have k = 2 here and c
1,i
= 1 and c
2,i
= c
i
for i = 1, . . . , n. The
conditions for existence and uniqueness of the mle’s are satisﬁed when the covariate values c
1
, . . . , c
n
are not all the same.
The R function call system.time(WeibullRegSim(n=20,Nsim=10000)) (done twice and recording
an elapsed time of about 76 seconds each) produced each of the plots in Figure 14. Each call
generates its own data set of 20 points using 5 diﬀerent levels of covariate values. The data are
generated from a true Weibull distribution with a known true regression line relationship for log(α)
in relation to the covariates, as shown in the plots. Also shown in these plots is the .10quantile
line. Estimated lines are indicated by the corresponding color coded dashed lines.
In contrast, the quantile lower conﬁdence bounds based on N
sim
= 10000 simulations are represented
by a curve. This results from the fact that the factor
ˆ
k
p
(γ) used in the construction of the lower
bound,
ˆ
ζ
1
(Y ) +
ˆ
ζ
2
(Y )c−
ˆ
k
p
(γ)
ˆ
b(Y ), is the γquantile of the simulated values (c
0
ˆ
ζ(Z
i
) −w
p
)/
ˆ
b(Z
i
),
i = 1, . . . , N
sim
, and these values change depending on which c
0
= (1, c) is involved. This curvature
adjusts to some extent to the sampling variation swivel action in the ﬁtted line.
We repeated the above with a sample of size n = 50 (taking about 85 seconds for each plot) and the
corresponding two plots are shown in Figure 15. We point out two features. In this second set of
plots the lower conﬁdence bound curve is generally closer to the ﬁtted quantile line than in the ﬁrst
set of plots. This illustrates the sample size eﬀect, i.e., we are getting better or less conservative in
our bounds. The second feature shows up in the bottom plot where the conﬁdence curve crosses
the true percentile line, i.e., it gets on the wrong side of it. Such things happen, because we have
only 95% conﬁdence in the bound. Note that these bounds should be interpreted pointwise for
each covariate value and should not be viewed as simultaneous conﬁdence bands.
The function WeibullRegSim is part of the R workspace on the class web site. It can easily be
modiﬁed to handle any simple linear regression Weibull data set. Multiple regression relationships
could also be accommodated quite easily. To get a feel for the behavior of the conﬁdence bounds it
is useful to exercise this function repeatedly, but using N
sim
= 1000 for faster response.
46
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
−2 0 2 4 6 8
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 20
confidence bounds based on 10000 simulations
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 20
confidence bounds based on 10000 simulations
Figure 14: Weibull Regression with Quantile Bounds (n = 20)
47
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
6
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 50
confidence bounds based on 10000 simulations
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
6
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 50
confidence bounds based on 10000 simulations
Figure 15: Weibull Regression with Quantile Bounds (n = 50)
48
12.4 The kSample Problem
A second illustration example concerns the situation of k = 3 samples with same scale but possibly
diﬀerent locations. In terms of the untransformed Weibull data this means that we have possibly
diﬀerent unknown characteristic life parameters (α
1
, α
2
, α
3
) but same unknown shape β for each
sample. The modiﬁcations for k = 3 should be obvious. In matrix notation this model is
Y =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Y
1
.
.
.
Y
n
1
.
.
.
Y
n
1
+1
.
.
.
Y
n
1
+n
2
Y
n
1
+n
2
+1
.
.
.
Y
n
1
+n
2
+n
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
.
.
.
.
.
.
.
.
.
1 0 0
1 1 0
.
.
.
.
.
.
.
.
.
1 1 0
1 0 1
.
.
.
.
.
.
.
.
.
1 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
ζ
1
ζ
2
ζ
3
_
_
_ +b
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Z
1
.
.
.
Z
n
1
.
.
.
Z
n
1
+1
.
.
.
Z
n
1
+n
2
Z
n
1
+n
2
+1
.
.
.
Z
n
1
+n
2
+n
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
= Cζ +b Z .
Here the Y
i
have location u
1
= ζ
1
for the ﬁrst n
1
observations, they have location u
2
= ζ
1
+ ζ
2
for
the next n
2
observations and they have location u
3
= ζ
1
+ζ
3
for the last n
3
observations. Thus we
can consider u
1
= ζ
1
as the baseline location (represented by the ﬁrst n
1
observations), ζ
2
can be
considered as the incremental change from u
1
to u
2
and ζ
3
is the incremental change from u
1
to u
3
.
If we were interested in the question whether the three samples come from the same location/scale
model we would consider testing the hypothesis H
0
: ζ
2
= ζ
3
= 0 or equivalently H
0
: u
1
= u
2
= u
3
.
Instead of using the likelihood ratio test, which invokes the χ
2
k−1
= χ
2
2
distribution as approximate
null distribution, we will employ the test statistic suggested in Lawless (1982) (p. 302, equation
(6.4.12)) for which the same approximate null distribution is invoked. Our reason for following this
choice is its similarity to the standard test statistic used in the corresponding normal distribution
model, i.e., when Z
i
∼ Φ(z) instead of Z
i
∼ G(z) as in the above regression model. Also, the
modiﬁcation of this test statistic for general k(= 3) is obvious. The formal deﬁnition of the test
statistic proposed by Lawless is as follows:
Λ
1
= (
ˆ
ζ
2
,
ˆ
ζ
3
)K
−1
11
(
ˆ
ζ
2
,
ˆ
ζ
3
)
t
,
where K
11
is the asymptotic 2 × 2 covariance matrix of (
ˆ
ζ
2
,
ˆ
ζ
3
). Without going into the detailed
derivation one can give the following alternate and more transparent expression for Λ
1
Λ
1
=
3
i=1
n
i
(ˆ u
i
(Y ) − ˆ u(Y ))
2
ˆ
b(Y )
2
,
49
where
ˆ u
1
(Y ) =
ˆ
ζ
1
(Y ) , ˆ u
2
(Y ) =
ˆ
ζ
1
(Y )+
ˆ
ζ
2
(Y ) , ˆ u
3
(Y ) =
ˆ
ζ
1
(Y )+
ˆ
ζ
3
(Y ) and ˆ u(Y ) =
3
i=1
(n
i
/N)ˆ u
i
(Y ) ,
with N = n
1
+n
1
+n
3
. In the normal case Λ
1
reduces to the traditional Ftest statistic (except for a
constant multiplier, namely (n−k)/((k−1)n) = (n−3)/(2n)) when writing ˆ u
i
(Y ) =
¯
Y
i.
, i = 1, 2, 3
and ˆ u(Y ) =
¯
Y
..
= (n
1
/N)
¯
Y
1.
+ (n
2
/N)
¯
Y
2.
+ (n
3
/N)
¯
Y
3.
and
ˆ
b(Y )
2
=
1
n
k
i=1
n
i
j=1
(Y
ij
−
¯
Y
i.
)
2
,
which are the corresponding mle’s in the normal case. However, in the normal case one uses the
F
k−1,N−k
distribution as the exact null distribution of the properly scaled Λ
1
and the uncertainty
in
ˆ
b(Y )
2
is not ignored by simply referring to the χ
2
k−1
distribution, using a large sample argument.
We don’t have to use a large sample approximation either, since the null distribution of Λ
1
(in the
logWeibull case) is free of any unknown parameters and can be simulated to any desired degree of
accuracy. This is seen as follows from our equivariance properties. Recall that
ˆ u
1
(Y ) −u
1
ˆ
b(Y )
=
ζ
1
(Y ) −ζ
1
ˆ
b(Y )
,
ˆ u
i
(Y ) −u
i
ˆ
b(Y )
=
ζ
1
(Y ) +ζ
i
(Y ) −(ζ
1
+ζ
i
)
ˆ
b(Y )
, i = 2, 3
have distributions free of unknown parameters. Under the hypothesis H
0
when u
1
= u
2
= u
3
(= u)
we thus have that
ˆ u
i
(Y ) −u
ˆ
b(Y )
,
ˆ u(Y ) −u
ˆ
b(Y )
, and thus
ˆ u
i
(Y ) − ˆ u(Y )
ˆ
b(Y )
=
ˆ u
i
(Y ) −u
ˆ
b(Y )
−
ˆ u(Y ) −u
ˆ
b(Y )
have distributions free of any unknown parameters which in turn implies the above claim about Λ
1
.
Thus we can estimate the null distribution of Λ
1
by using the N
sim
simulated values of
ˆ
ζ
i
(Z
j
)/
ˆ
b(Z
j
)
to create
ˆ u
1
(Z
j
)
ˆ
b(Z
j
)
=
ˆ
ζ
1
(Z
j
)
ˆ
b(Z
j
)
,
ˆ u
i
(Z
j
)
ˆ
b(Z
j
)
=
ˆ
ζ
1
(Z
j
) +
ˆ
ζ
i
(Z
j
)
ˆ
b(Z
j
)
, i = 2, 3 and
ˆ u(Z
j
)
ˆ
b(Z
j
)
=
3
i=1
n
i
ˆ u
i
(Z
j
)/N
ˆ
b(Z
j
)
and thus
Λ
1
(Z
j
) =
3
i=1
n
i
(ˆ u
i
(Z
j
) − ˆ u(Z
j
))
2
ˆ
b(Z
j
)
2
j = 1, . . . , N
sim
.
The distribution of these N
sim
values Λ
1
(Z
j
) will give a very good approximation for the true null
distribution of Λ
1
. The accuracy of this approximation is entirely controllable by the choice of N
sim
.
N
sim
= 10000 should be suﬃcient for most practical purposes.
50
The following plots examine the χ
2
2
approximation to the Λ
1
null distribution in the case of 3
samples of respective sizes n
1
= 5, n
2
= 7 and n
3
= 9. This is far from qualifying for a large
sample situation. The histogram in Figure 16 is based on N
sim
= 10000 simulated values of Λ
1
(Z
).
Although the superimposed χ
2
2
density is similar in character, there are strong diﬀerences. Using
the χ
2
2
distribution would result in much smaller pvalues than appropriate when these are on the
low side.
ΛΛ
1
*
D
e
n
s
i
t
y
0 2 4 6 8 10 12
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
asymptotic χχ
2
2
density
Figure 16: Histogram of Λ
1
Null Distribution with Asymptotic Approximation
Figure 17 shows a QQplot of the approximating χ
2
2
quantiles corresponding to the N
sim
= 10000
simulated and ordered Λ
1
(Z
i
) values. For a good approximation the points should follow the
shown main diagonal. The discrepancy is quite strong. Each point on this plot corresponds to a
pquantile where the abscissa value of such a point gives us the approximate pquantile of the Λ
1
null distribution and the corresponding ordinate gives us the pquantile of the χ
2
2
distribution which
is suggested as asymptotic approximation. The vertical probability scale facilitates the reading oﬀ
of p for each quantile level on either axis.
51
0 20 40 60
0
5
1
0
1
5
2
0
simulated & sorted ΛΛ
1
*
0.9999
0.9995
0.999
0.998
0.997
0.995
0.993
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.9
χχ
2 2
−
q
u
a
n
t
i
l
e
s
Figure 17: QQPlots Comparing Λ
1
(Z
) with χ
2
2
Clearly the pquantiles of the χ
2
2
distribution are smaller than the corresponding pquantiles of the
the Λ
1
null distribution. If we take the p corresponding to a Λ
1
on the abscissa and look up its p
according to the χ
2
2
scale, we only need to go up to the main diagonal at that abscissa location and
look up the p
on the χ
2
2
scale to the left. For example, the .95quantile for the Λ
1
null distribution
would yield a p
≈ .994 on the χ
2
2
scale. Thus a true Λ
1
pvalue of .05 would translate into a very
overstated observed signiﬁcance level of .006 according to the χ
2
2
approximation.
Figure 18 shows the comparison of the χ
2
2
quantiles with the corresponding quantiles of the 2×F
2,21−3
distribution (the factor 2 adjusts for the fact that the numerator of the F statistic is divided by
k − 1 = 2). This comparison is the counter part to the previous situation if we were to use the
asymptotic χ
2
2
distribution as approximation to the exact and true 2 × Fratio distribution had
we carried out the corresponding test in a normal data situation, i.e., tested whether three normal
random samples with common σ have the same mean. Again, the approximation is not good. The
departure from the main diagonal is not as severe as in Figure 17 , but the eﬀect is similar, i.e., the
52
χ
2
2
distribution would yield much smaller pvalues than appropriate when these pvalues are on the
small side, i.e., when they become critical.
0 5 10 15 20 25
0
5
1
0
1
5
2*qf((1:1000−.5)/1000,2,18)
q
c
h
i
s
q
(
1
:
1
0
0
0
−
.
5
)
/
1
0
0
0
,
2
)
0.9995
0.999
0.998
0.997
0.995
0.993
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.9
Figure 18: QQPlots Comparing 2 ×F
2,21−3
with χ
2
2
Aside from testing the equality of logWeibull distributions we can also obtain the various types
of conﬁdence bounds for the location, scale, pquantiles and tail probabilities for each sampled
population, whether the k locations are the same or not. This is diﬀerent from doing so for each
sample separately since we use all k samples to estimate b, which was assumed to be the same for
all k populations. This will result in tighter conﬁdence bounds than what would result from the
corresponding conﬁdence bound analysis of individual samples. We omit the speciﬁc details since
they should be clear from the general situation as applied in the simple linear regression case.
53
12.5 Goodness of Fit Tests
As in the location/scale case we can exploit the equivariance properties of the mle’s in the general
regression model to carry out the previously discussed goodness of ﬁt tests by simulation. Using
the previous computational formulas for the D, W
2
and A
2
we only need to deﬁne the appropriate
V
i
, namely
V
i
= G
_
Y
i
−v
i
ˆ
ζ(Y )
ˆ
b(Y )
_
, i = 1, . . . , n .
Pierce and Kopecky (1979) showed that the asymptotic null distributions of D, W
2
and A
2
, using
the sorted values V
(1)
≤ . . . ≤ V
(n)
of these modiﬁed versions of V
i
, are respectively the same as in
the location/scale case, i.e., they do not depend on the additional covariates that may be present.
This assumes that the covariate matrix C contains a vector of ones. However, for ﬁnite sample
sizes the eﬀects of these covariates may still be relevant. The eﬀect of using the small sample tables
given by Stephens (1986) is not clear.
However, one can easily simulate the null distributions of these statistics since they do not depend
on any unknown parameters. Using the data representation Y
i
= c
i
ζ + bZ
i
with i.i.d. Z
i
∼ G(z),
or Y = Cζ +bZ this is seen from the equivariance properties as follows
Y
i
−c
i
ˆ
ζ(Y )
ˆ
b(Y )
=
c
i
ζ +bZ
i
−c
i
(ζ +b
ˆ
ζ(Z))
b
ˆ
b(Z)
=
Z
i
−c
i
ˆ
ζ(Z)
ˆ
b(Z)
and thus
V
i
= G
_
Y
i
−c
i
ˆ
ζ(Y )
ˆ
b(Y )
_
= G
_
Z
i
−c
i
ˆ
ζ(Z)
ˆ
b(Z)
_
.
For any covariate matrix C and sample size n the null distributions of D, W
2
and A
2
can be
approximated to any desired degree. All we need to do is generate vectors Z
= (Z
1
, . . . , Z
n
)
i.i.d.
∼ G(z), compute the mle’s
ˆ
ζ(Z),
ˆ
b(Z), and from that V
, followed by D
= D(V
), W
2
= W
2
(V
)
and A
2
= A
2
(V
). Repeating this a large number of times, say N
sim
= 10000, would yield values
D
i
, W
2
i
, A
2
i
, i = 1, . . . , N
sim
. Their respective empirical distributions would serve as excellent
approximations to the desired null distributions of these test of ﬁt criteria.
54
−1 0 1 2
2
4
6
8
1
0
1
2
covariate c
l
o
g
(
f
a
i
l
u
r
e
t
i
m
e
)
Figure 19: Weibull Regression Example
Figure 19 shows some Weibull regression data which were generated as follows:
Y
i
= log(α) + slope ×c
i
+bZ
i
, i = 1, . . . , n with Z
1
, . . . , Z
n
i.i.d. ∼ G(z) ,
where α = 10000, b = 2/3, slope = 1.3 and there were 20 observations each at c
i
= i − 2.5 for
i = 1, . . . , 5.
Here X
i
= exp(Y
i
) would be viewed as the Weibull failure time data with common shape parameter
β = 1/b = 1.5 and characteristic life parameters α
i
= αexp(scale × c
i
) which are aﬀected in a
multiplicative manner by the covariate values c
i
, i = 1, . . . , 5.
The solid sloped line in Figure 19 indicates the true log(characteristic life) for the Weibull regression
data while the dashed line represents its estimate.
55
n == 100 , N
sim
== 10000
D
*
F
r
e
q
u
e
n
c
y
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0
4
0
0
8
0
0
1
2
0
0 p−value = 0.7746
D == 0.0501
W
2*
F
r
e
q
u
e
n
c
y
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0
5
0
0
1
0
0
0
2
0
0
0
p−value = 0.8001
W
2
== 0.0325
A
2*
F
r
e
q
u
e
n
c
y
0.0 0.5 1.0 1.5
0
5
0
0
1
0
0
0
1
5
0
0
p−value = 0.7135
A
2
== 0.263
Figure 20: Weibull Goodness of Fit for Weibull Regression Example
For this generated Weibull regression data set Figure 20 shows the results of the Weibull goodness
of ﬁt tests in relation to the simulated null distributions for the test criteria D, W
2
and A
2
. The
hypothesis of a Weibull lifetime distribution cannot be rejected by any of the three test of ﬁt criteria
based on the shown pvalues.
This example was produced by the R function WeibullRegGOF available in the R workspace on the
class web site. It took 105 seconds on my laptop. This function performs Weibull goodness of ﬁt
tests for any supplied regression data set. When this data set is missing it generates its own Weibull
regression data set of size n = 100 and uses the indicated covariates and parameters.
56
−1 0 1 2
6
8
1
0
1
2
1
4
covariate c
l
o
g
(
f
a
i
l
u
r
e
t
i
m
e
)
Figure 21: Normal Regression Example
Figure 21 shows some normal regression data which were generated as follows:
Y
i
= log(α) + slope ×c
i
+bZ
i
, i = 1, . . . , n with Z
1
, . . . , Z
n
i.i.d. ∼ Φ(z) ,
where α = 10000, b = 2/3, slope = 1.3 and there were 20 observations each at c
i
= i − 2.5 for
i = 1, . . . , 5. Here X
i
= exp(Y
i
) would be viewed as the failure time data. Such data would have a
lognormal distribution.
This data set was produced internally within WeibullRegGOF by modifying the line that generated
the original data sample, so that Z
i
∼ Φ(z), i.e., Z < rnorm(n,0,1) is used. The simulation of the
test of ﬁt null distributions remains essentially unchanged except that a diﬀerent random number
starting seed was used.
Here the solid sloped line indicates the mean of the normal regression data while the dashed line
represents the estimate according to an assumed Weibull model. Note the much wider discrepancy
here compared to the corresponding situation in Figure 19. The reason for this wider gap is that
the ﬁtted line aims for the .632quantile and not the median of that data.
57
n == 100 , N
sim
== 10000
D
*
F
r
e
q
u
e
n
c
y
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0
4
0
0
8
0
0
1
2
0
0 p−value = 0.0213
D == 0.0953
W
2*
F
r
e
q
u
e
n
c
y
0.0 0.1 0.2 0.3 0.4
0
5
0
0
1
5
0
0
p−value = 5e−04
W
2
== 0.265
A
2*
F
r
e
q
u
e
n
c
y
0.0 0.5 1.0 1.5 2.0 2.5
0
5
0
0
1
5
0
0
2
5
0
0
p−value = 4e−04
A
2
== 1.73
Figure 22: Weibull Goodness of Fit for Normal Regression Example
Here the pvalues clearly indicate that the hypothesis of a Weibull distribution should be rejected,
although the evidence in the case of D is not very strong. However, for W
2
and A
2
there should be
no doubt in the (correct) rejection of the hypothesis.
Any slight diﬀerences in the null distributions shown here and in the previous example are due to
a diﬀerent random number seed being used in the two cases.
58
References
Bain, L.J. (1978) Statistical Analysis of Reliability and LifeTesting Models, New York, Dekker.
Bain, L.J. and Engelhardt M.(1991) Statistical Analysis of Reliability and LifeTesting Models,
Theory and Methods, Second Edition, New York, Dekker.
Lawless, J.F. (1982), Statistical Models and Methods for Lifetime Data, John Wiley & Sons, New
York.
Pierce and Kopecky, K.J. (1979), “Testing Goodness of Fit for the Distribution of Errors in Regres
sion Models,” Biometrika, Vol. 66, No. 1, 15
Saunders, S.C. (1975), “Birnbaum’s Contributions to Reliability,” Reliability and Fault Tree Anal
ysis, Theoretical and Applied Aspects of System Reliability and Safety Assessment,Barlow, R.E,
Fussell, J.B., and Singpurwalla, N.D. editors, Society for Industrial and Applied Mathematics, 33
South 17 Street, Philadelphia PA 19103.
Stephens, M.A. (1986), “Tests based on EDF statistics,” GoodnessofFit Techniques, D’Agostino,
R. and Stephens, M.A. (editors), Dekker, New York.
Scholz, F.W. (1996) (revised 2001), “Maximum Likelihood Estimation for Type I Censored Weibull
Data Including Covariates,” ISSTECH96022, Boeing Information and Support Services.
Scholz, F.W. (2008) “Weibull Probability Paper,”(informal note).
Thoman, D.R., Bain, L.J., and Antle, C.E. (1969), “Inferences on parameters of the Weibull distri
bution,” Technometrics, Vol 11, No. 3, 445460.
Thoman, D.R., Bain, L.J., and Antle, C.E. (1970), “Exact conﬁdence intervals for reliability, and
tolerance limits in the Weibull distribution,” Technometrics, Vol 12, No. 2, 363371.
59
Weibull densities
β1 = 0.5 β2 = 1 β3 = 1.5 β4 = 2 β5 = 3.6 β6 = 7 α = 10000
63.2% 36.8%
α
Figure 1: A Collection of Weibull Densities with α = 10000 and Various Shapes
2
2
Minimum Closure and Weakest Link Property
The Weibull distribution has the following minimum closure property: If X1 , . . . , Xn are independent with Xi ∼ W(αi , β), i = 1, . . . , n, then
n
P (min(X1 , . . . , Xn ) > t) = P (X1 > t, . . . , Xn > t) =
i=1 n
P (Xi > t) 1 β i=1 αi
n −1/β n
=
i=1
exp −
t αi
β
β
= exp −tβ
t = exp − α
1 with α = β i=1 αi
,
i.e., min(X1 , . . . , Xn ) ∼ W(α , β). This is reminiscent of the closure property for the normal 2 distribution under summation, i.e., if X1 , . . . , Xn are independent with Xi ∼ N (µi , σi ) then
n n n
Xi ∼ N
i=1 i=1
µi ,
i=1
2 σi
.
This summation closure property plays an essential role in proving the central limit theorem: Sums of independent random variables (not necessarily normally distributed) have an approximate normal distribution, subject to some mild conditions concerning the distribution of such random variables. There is a similar result from Extreme Value Theory that says: The minimum of independent, identically distributed random variables (not necessarily Weibull distributed) has an approximate Weibull distribution, subject to some mild conditions concerning the distribution of such random variables. This is also referred to as the “weakest link” motivation for the Weibull distribution. The Weibull distribution is appropriate when trying to characterize the random strength of materials or the random lifetime of some system. This is related to the weakest link property as follows. A piece of material can be viewed as a concatenation of many smaller material cells, each of which has its random breaking strength Xi when subjected to stress. Thus the strength of the concatenated total piece is the strength of its weakest link, namely min(X1 , . . . , Xn ), i.e., approximately Weibull. Similarly, a system can be viewed as a collection of many parts or subsystems, each of which has a random lifetime Xi . If the system is deﬁned to be in a failed state whenever any one of its parts or subsystems fails, then the system lifetime is min(X1 , . . . , Xn ), i.e., approximately Weibull. Figure 2 gives a sense of usage of the Weibull distribution and Figure 3 shows the “real thing.” Googling “Weibull distribution” produced 185,000 hits while ”normal distribution” had 2,420,000 hits. 3
Figure 2: Publications on the Weibull Distribution 4
Figure 3: Waloddi Weibull
5
. or force of mortality. It is usually employed for distributions that model random lifetimes and it relates to the probability that a lifetime comes to an end within the next small time increment of length d given that the lifetime has exceeded x so far. was rejected by the Journal of the American Statistical Association as being of no interrest. he tried to publish an article in a wellknown British journal.library. was eventually published in an engineering journal: Waloddi Weibull (1951) “A statistical distribution function of wide applicability. the distribution function proposed by Gauss was dominating and was distinguishingly called the normal distribution. In the case of the Weibull hazard rate function we observe that it 6 . Weibull. “. P (X > x) 1 − F (x) 1 − F (x) In the case of the Weibull distribution we have h(x) = fα. Various other terms are used equivalently for the hazard function. failure rate (function). in today’s computing environment one could argue that point since typically the computation of even exp(x) requires computing. namely P (x < X ≤ x + dX > x) = F (x + d) − F (x) d × f (x) P (x < X ≤ x + d) = ≈ = d × h(x) .pdf) Sam Saunders (1975): ‘Professor Wallodi (sic) Weibull recounted to me that the now famous paper of his “A Statistical Distribution of Wide Applicability”. See [35]. Thus one of the most inﬂuential papers in statistics of that decade was published in the Journal of Applied Mechanics. That was just the same article as the highly cited one published in 1951.edu/classics1981/A1981LD32400001. such as hazard rate. . That this can be accomplished on most calculators is also moot since many calculators also give you Φ(x). By some statisticians it was even believed to be the only possible one.garfield. originally submitted to a statistics journal and rejected. Another reason for the popularity of the Weibull distribution among engineers may be that Weibull’s most famous paper.upenn.” (G¨ran W. The article was refused with the comment that it was interesting but of no practical importance. At this time. in which was ﬁrst advocated the “Weibull” distribution with its failure rate a power of time. (Maybe that is the reason it was so inﬂuential!)’ 3 The Hazard Function The hazard function for any nonnegative random variable with cdf F (x) and density f (x) is deﬁned as h(x) = f (x)/(1−F (x)).The Weibull distribution is very popular among engineers.β (x) α α β−1 . However. 18.” Journal of Applied Mechanics. o http://www. 293297.β (x) β x = 1 − Fα. 1981. One reason for this is that the Weibull cdf has a closed form which is not the case for the normal cdf Φ(x).
. If Z ∼ G(z) then Y = µ + σZ ∼ G((y − µ)/σ) since H(y) = P (Y ≤ y) = P (µ + σZ ≤ y) = P (Z ≤ (y − µ)/σ) = G((y − µ)/σ) . For β < 1 (less common) the system has a better chance of surviving the next small time increment d as it gets older. When β > 1 the part or system. possibly due to hardening..e. However. T ] one can view the k occurrence times as being uniformly distributed over the interval [0.e. β) =⇒ log(X) = Y has a locationscale distribution. Often one refers to this situation as one of infant mortality. 7 . is the following locationscale property of the logtransformed Weibull distribution. T ]. or curing.. 4 LocationScale Property of log(X) Another useful property. The reason for referring to such parameters this way is the following. maturing. The random times between shocks are exponentially distributed with mean α. namely its cumulative distribution function (cdf) is exp(y) P (Y ≤ y) = P (log(X) ≤ y) = P (X ≤ exp(y)) = 1 − exp − α = 1 − exp [− exp {(y − log(α)) × β}] = 1 − exp − exp = 1 − exp − exp β y − log(α) 1/β y−u b with location parameter u = log(α) and scale parameter b = 1/β. for which the lifetime is modeled by a Weibull distribution. since for β = 1 we have exp(−(x + h)/α) P (X > x + h) = = exp(−h/α) = P (X > h) . P (X > x + hX > x) = P (X > x) exp(−x/α) i. A Weibull distribution with β < 1 may not do full justice to such a mixture distribution. after initial early failures the survival gets better with age. the system is as good as new given that it has survived beyond x.e. Given that there are k such shock events in an interval [0. i. decreasing in x when β < 1 and constant when β = 1 (exponential distribution with memoryless property).is increasing in x when β > 1. one has to keep in mind that we may be modeling parts or systems that consist of a mixture of defective or weak parts and of parts that practically can live forever. hence the allusion to random failures. i. it is again exponential with same mean α. One also refers to this as a random failure model in the sense that failures are due to external shocks that follow a Poisson process with rate λ = 1/α. is subject to aging in the sense that an older system has a higher chance of failing during the next small time increment d than a younger system. By that we mean that: X ∼ W(α. of which we will make strong use. For β = 1 there is no aging.
for appropriate pi = (i − . β). This is the basis for Weibull probability plotting (and the case of plotting Y(i) against zpi for normal probability plotting). since they relate to the slope and intercept of the line that may be ﬁtted to the perceived linear point pattern. 1) is standard normal with cdf G(z) = Φ(z) and thus Y has cdf H(y) = Φ((y − µ)/σ). However. . Y = u + bZ where Z has the standard extreme value distribution with cdf G(z) = 1 − exp(− exp(z)) for z ∈ R. . By eﬃcient estimates we loosely refer to estimates that have the smallest sampling variance. . β) based on a random sample X1 . Y = µ + σZ ∼ N (µ.The form Y = µ + σX should make clear the notion of location scale parameter. since Z has been scaled by the factor σ and is then shifted by µ. a very appealing graphical procedure which gives a visual impression of how well the data ﬁt the assumed model (normal or Weibull) and which also allows for a crude estimation of the unknown location and scale parameters. Two prominent locationscale families are 1. as in our logtransformed Weibull example above. 5 Maximum Likelihood Estimation There are many ways to estimate the parameters θ = (α. We just illustrate this in the extreme value locationscale model. . In any such a locationscale model there is a simple relationship between the pquantiles of Y and Z. Xn ∼ W(α. 2. Maximum likelihood estimation (MLE) is generally the most versatile and popular method. but they are usually less eﬃcient than mle’s (estimates derived by MLE). While wp is known and easily computable from p. MLE tends to be eﬃcient. the same cannot be said about yp . namely yp = µ + σzp in the normal model and yp = u + bwp in the extreme value model (using the location and scale parameters u and b resulting from logtransformed Weibull data). Previously. the pquantile of G.5)/n one can view the ith ordered sample value Y(i) (Y(1) ≤ . at least in large samples. Thus the plot of Y(i) against wpi should look approximately linear. Furthermore. For more in relation to Weibull probability plotting we refer to Scholz (2008). estimates that could be computed by hand had been investigated. since it involves the typically unknown parameters u and b. where Z ∼ N (0. ≤ Y(n) ) as a good approximation for ypi . p = P (Z ≤ wp ) = P (u + bZ ≤ u + bwp ) = P (Y ≤ u + bwp ) =⇒ yp = u + bwp with wp = log(− log(1 − p)). Thus yp is a linear function of wp = log(− log(1 − p)). under regularity conditions MLE produces estimates that have an approximate normal distribution in large samples. . 8 . σ 2 ). that is no longer an issue in today’s computing environment. Although MLE in the Weibull case requires numerical methods and a computer. .
. .. . . θ) = ∂θj ∂θj n log(fθ (xi )) = i=1 ∂ log(fθ (xi )) = 0 i=1 ∂θj n j = 1. Thus solving such equations is necessary but not suﬃcient. i.e. . . i. . . i. . . xn . . . σ) with k = 2 and the unique solution of the likelihood equations results in the explicit expressions n n µ=x= ˆ ¯ i=1 xi /n and σ= ˆ i=1 (xi − x)2 /n ¯ 9 and thus ˆ θ = (ˆ. ˆ Often such maximizing values θ are unique and one can obtain them by solving. i.e. θk ). . . . . . . n (x1 . These above equations reﬂect the fact that a smooth function has a horizontal tangent plane at its maximum (minimum or saddle point). . . It is a lot simpler to deal with the likelihood equations ∂ ∂ ˆ (x1 . θ). . . ∂ ∂θj n fθ (xi ) = 0 i=1 j = 1. . . . .When X1 . . . xn ) which maximizes the likelihood n L(x1 . xn ). . . Since taking derivatives of a product is tedious (product rule) one usually resorts to maximizing the log of the likelihood. . . . . . θ) = log (L(x1 . . xn . . . which gives highest local probability to the observed sample (X1 . . .e. . Xn ∼ Fθ (x) with density fθ (x) then the maximum likelihood estimate of θ is that ˆ ˆ value θ = θ = θ(x1 . ˆ (x1 . Xn ) = (x1 . xn ) ˆ L(x1 . .. σ ) . In the case of a normal random sample we have θ = (µ. . . . . . .e. xn . xn . since it still needs to be shown that it is the location of a maximum. k .. . . θ) = sup θ n fθ (xi ) i=1 . θ) is the same as the value that maximizes (x1 . θ) = i=1 fθ (xi ) over θ. k ˆ ˆ when solving for θ = θ = θ(x1 . . . . xn . . . θ)) = i=1 log (fθ (xi )) since the value of θ that maximizes L(x1 . . xn . xn . . . .. . µ ˆ . . xn . where k is the number of parameters involved in θ = (θ1 . . θ) = sup θ n log (fθ (xi )) i=1 . . .
. . Note that the derivative of these weights with respect to β take the following form wi (β) = n d wi (β) = yi wi (β) − wi (β) yj wj (β) .e. yi − u n 1 n yi − u 1 n yi − u − + exp b b i=1 b b i=1 b b i. . . Recall that Yi = log(Xi ) has cdf F (y) = 1 − exp(− exp((x − u)/b)) = G((y − u)/b) with G(z) = 1 − exp(− exp(z)) with g(z) = G (z) = exp(z − exp(z)). into the second likelihood equation we get (after some cancelation and manipulation) 0= n i=1 yi exp(yi /b) n i=1 exp(yi /b) −b− 1 n yi . .In the case of a Weibull sample we take the further simplifying step of dealing with the logtransformed sample (y1 . Thus 1 d f (y) = F (y) = F (y) = g((y − u)/b)) dy b with y−u y−u − exp b b As partial derivatives of log(f (y)) with respect to u and b we get log(f (y)) = − log(b) + . n 1 n yi exp exp(u) = n i=1 b b . we have a solution u = u once we have a solution b = ˆ Substituting this expression for exp(u) ˆ b. log(xn )). . n i=1 Analyzing the solvability of this equation is more convenient in terms of β = 1/b and we thus write n 0= i=1 yi wi (β) − 1 −y ¯ β where wi (β) = exp(yi β) n j=1 exp(yj β) n with i=1 wi (β) = 1 . dβ j=1 10 . yn ) = (log(x1 ). ∂ 1 1 y−u log(f (y)) = − + exp ∂u b b b ∂ 1 1y−u 1y−u y−u log(f (y)) = − − + exp ∂b b b b b b b and thus as likelihood equations n 1 n yi − u exp 0 = − + b b i=1 b 0 = − or yi − u exp = n or b i=1 .. . .
b) : u ∈ R. . ¯ Furthermore. b) = 1 bn n g i=1 yi − u b . . . However. . . u. with M = max(y1 . That this ¯ unique solution corresponds to a maximum and thus a unique global maximum takes some extra eﬀort and we refer to Scholz (1996) for an even more general treatment that covers Weibull analysis with censored data and covariates. excluded as a zero probability degeneracy) and b → ∞ (in which case all probability is diﬀused thinly over the whole half plane {(u. yn . . . . . . yn ) with weights or probabilities given by w = (w1 (β). where r ≥ 1 is the number of yi coinciding with M . Thus yi wi (β) − 1/β − y ≈ M − 1/β − y → M − y > 0 as β → ∞ ¯ ¯ ¯ where M − y > 0 assumes that not all yi coincide (a degenerate case with probability 0). . . . . . b > 0}). . yn ) and b with b → 0 (the spread becomes very concentrated on some point and cannot simultaneously do so at all values y1 . a somewhat loose argument can be given as follows. . . yn .Hence d dβ since n 1 ¯ yi wi (β) − − y = β i=1 n 1 yi wi (β) + 2 = β i=1 n n n i=1 2 yi wi (β) − n 2 yj wj (β) + j=1 1 >0 β2 2 varw (y) = i=1 2 yi wi (β) − j=1 yj wj (β) = Ew (y 2 ) − [Ew (y)]2 ≥ 0 can be interpreted as a variance of the n values of y = (y1 . . Contemplate this likelihood for ﬁxed y = (y1 . . Since this likelihood is positive everywhere (but approaching zero near the fringes of the parameter space. unless they are all the same. . Thus the reduced second likelihood equation yi wi (β) − 1/β − y = 0 ¯ has a unique solution (if it has a solution at all) since the the equation’s left side is strictly increasing. If we consider the likelihood of the logtransformed Weibull data we have L(y1 . wn (β)). . . yn ) and β → ∞ we have n wi (β) = exp(β(yi −M ))/ j=1 exp(β(yj −M )) → 0 when yi < M and wi (β) → 1/r when yi = M . . . it is then easily seen that this likelihood approaches zero in all cases. Thus yi wi (β) − 1/β − y ≈ −1/β → −∞ as β → 0. Note that wi (β) → 1/n as β → 0. . yn ) and for parameters u with u → ∞ (the location moves away from all observed data values y1 . . . the above half plane) it follows that it 11 .
. . . . . . . . . . i = 1. . . which then is also the unique maximum likelihood estimate θ = (ˆ. Furthermore. . . Integrating out yn > yn−1 > . . Y(n) at (y1 . . . . ¯ This avoids overﬂow or accuracy loss in the exponentials when the yi tend to be large. if we put a lot of units on test (high n) we increase our chance of seeing our ﬁrst r failures before a ﬁxed time y. . yn−3 ) = n! i=1 f (yi ) × yn−3 ¯ f (yn−2 )F 2 (yn−2 )/2dyn−2 = n! i=1 f (yi ) × 1 ¯3 F (yn−3 ) 3! 12 . Yn we only observe the r ≥ 2 smallest sample values Y(1) < . This is a simple consequence of the following binomial probability statement: n P (Y (r) ≤ y) = P (at least r failures ≤ y in n trials) = i=r n P (Y ≤ y)i (1 − P (Y ≤ y))n−i i which is strictly increasing in n for any ﬁxed y and r ≥ 1 (exercise). . The advantage of such data collection is that we do not have to wait until all n units have failed. . . . yn ) with y1 < . < yn is f (y1 . . . yn−2 ) = n! i=1 n−3 f (yi ) × yn−2 ∞ ¯ f (yn−1 )F (yn−1 )dyn−1 = n! 1¯ f (yi ) × F 2 (yn−2 ) 2 i=1 n−3 f (y1 . yn ) = n! n 1 yi − u g = n! f (yi ) b i=1 i=1 b n where the multiplier n! just accounts for the fact that all n! permutations of y1 . . . . The joint density of Y(1) . . . Such data is referred to as type II censored data. . . < yr as n−1 ∞ n−1 f (y1 . In that case we know the ﬁrst r failure times X(1) < . . . . . yn−1 ) = n! i=1 n−2 f (yi ) × yn−1 ∞ f (yn )dyn = n! i=1 ¯ f (yi )F (yn−1 ) n−2 f (y1 . . In solving 0 = yi exp(yi /b)/ exp(yi /b) − b − y it is numerically advantageous to solve the ¯ equivalent equation 0 = yi exp((yi −M )/b)/ exp((yi −M )/b)−b− y where M = max(y1 . . . r. yn could have been the order in which these values were observed and all of these orders have the same density ¯ (probability). . We showed there is only one such point (uniqueness of the solution to the likelihood equations) and thus there can only be one unique ˆ (global) maximum. > yr+1 (> yr ) and using F (y) = 1 − F (y) we get after n − r successive integration steps the joint density of the ﬁrst r failure times y1 < . . . This situation typically arises in a laboratory setting when several units are put on test (subjected to failure exposure) simultaneously and the test is terminated (or evaluated) when the ﬁrst r units have failed. and we know that the lifetimes of the remaining units exceed X(r) or that Y(i) > Y(r) for i > r.must have a maximum somewhere with zero partial derivatives. < Y(r) . . . . . . The above derivations go through with very little change when instead of observing a full sample Y1 . . yn ). < X(r) and thus Y(i) = log(X(i) ). . ˆ u b). .
13 . b) u mle’s.. . . . u. ˆ b u b) ˆ are the gives the location of the (unique) global maximimum of the likelihood function. . u. The likelihood equations are r 1 r yi − u ∂ (y1 . u. r i=1 As in the complete sample case one sees that this equation has a unique solution ˆ and that (ˆ. . . . . ˆ b Using this in the second equation it transforms to a single equation in b alone.e.. . yr . . (ˆ. b) = log where we use the notation r r n! yi − u yi − u exp − r log(b) + − (n − r)! b b i=1 i=1 r r xi = i=1 i=1 xi + (n − r)xr . . . b) = ∂b − or yi 1 r exp exp(u) = r i=1 b b r 1 r yi − u 1 r yi − u yi − u − + exp b b i=1 b b i=1 b b where again the transformed ﬁrst equation gives us a solution u once we have a solution ˆ for b. r i=1 Again it is advisable to use the equivalent but computationally more stable form r r yi exp((yi − yr )/b) i=1 i=1 exp((yi − yr )/b) − b − 1 r yi = 0 . . yr . .. i. r f (y1 .. yr . . b) = − + exp 0= ∂u b b i=1 b 0= ∂ (y1 . . yr ) = n! i=1 r f (yi ) × r 1 n! ¯ F n−r (yr ) = f (yi ) × [1 − F (yr )]n−r (n − r)! (n − r)! i=1 = with loglikelihood r! n 1 yi − u g × b r i=1 b 1−G yr − u b n−r (y1 . namely r r yi exp(yi /b) i=1 i=1 exp(yi /b) − b − 1 r yi = 0 .
9. then it loads survival if(r<n){ statusx <.c(rep(1.mle.7.432647 if(is. Here survreg is used in its most basic form in the context of Weibull data (full sample or type II censored Weibull data).5.1. if not.function (x=NULL. # If x is not given then a default full sample of size n=10.1.799793 # # In the type II censored usage # Weibull.26.mle(c(7.25.data.725992 2. Here x is the sample.39.25.mle <. otherwise x is treated as a full sample. that uses survreg to compute these estimates.0.6 Computation of Maximum Likelihood Estimates in R The computation of the mle’s of the Weibull parameters α and β is facilitated by the function survreg which is part of the R package survival.41.hat # 30.12.7.7.hat beta.rep(0.914017 2.29.8.r).23.39.7.41.29.5.8.9) is analyzed and the returned # results should be # $mles # alpha.8. survreg does a whole lot more than compute the mle’s but we will not deal with these aspects here.22.22.12. This function is part of the R work space that is posted on the class web site.1.10) # $mles # alpha. either the full sample or the first # r observations of a type II censored sample.12.rep(xs[r].22.c(7.9) r <.hat # 28.frame(c(xs.7). namely # c(7.1.29.25. at least for now. Weibull.hat beta.statusx) 14 .9.null(x))x <.null(n)){n<r}else{if(r>nr<2){ return("x must have length r with: 2 <= r <= n")}} xs <. The following is an R function.nr)) dat.1.23.29.nr)).26.sort(x) if(!exists("survreg"))library(survival) #tests whether survival package is loaded. Note that it tests for the existence of survreg before calling it.1.23.weibull <.0.n=NULL){ # This function computes the maximum likelihood estimates of alpha and beta # for complete or type II censored samples assumed to come from a 2parameter # Weibull distribution.length(x) if(is. In the latter case one must specify # the full sample size n. called Weibull.
.hat.data=dat. . where A ∈ R and B > 0 are arbitrary constant.hat") list(mles=parms)} Note that survreg analyzes objects of class Surv.hat <.hat) names(parms)<c("alpha. ."status") out.beta. zn ) we denote the estimates of u and b more explicitly by u(z1 . rn ) = A + B u(z1 .87. .survreg(Surv(time. respectively.weibull)<c("time". which tells us that the time to compute the mle’s in a sample of size n = 10 is roughly 5.dist="weibull".mle(rweibull(10. . 7 Location and Scale Equivariance of Maximum Likelihood Estimates The maximum likelihood estimates u and ˆ of the location and scale parameters u and b have the ˆ b following equivariance properties which will play a strong role in the later pivot construction and resulting conﬁdence intervals.frame(xs. To get a sense of the calculation speed of this function we ran Weibull.mle a 1000 times. .exp(out. .n) dat.weibull$coef) beta. as Figure 4 shows.1/out. but with slow growth.rep(1.91 For n = 100. then u(r1 .weibull <. .79 0. zn ) ˆ ˆ or u(r) = u(A + Bz) = A + B u(z) ˆ ˆ ˆ 15 .status)~1. It is 0 when the corresponding time is a censoring time..weibull <. 1000 the elapsed times came to 8.}else{statusx <.1))}) user system elapsed 5.weibull) alpha.statusx)} names(dat. . zn ) = ˆ ˆ b(z b(z). If we transform z to r = (r1 . . . The relationship of computing time to n appears to be quite linear.e.07. 15. . .data.00 5. In the case of type II censored data these censoring times equal the rth largest failure time. . Here such an object is created by the function Surv and it basically adjoins the failure times with a status vector of same length.00591. . 500.91/1000 = . This fact plays a signiﬁcant role later on in the various inference procedures which we will discuss. ."beta. we only know that the unobserved actual failure time exceeds the reported censoring time.weibull$scale parms <. .hat". The status is 1 when a time corresponds to an actual failure time. rn ) with ri = A + Bzi . . . .time(for(i in 1:1000){Weibull. system.c(alpha. zn ) = ˆ u(z) and ˆ 1 . Based on data z = (z1 . . . i.hat <.91 and 25. . .
001e−05 q time to compute Weibull mle's (sec) 0.b 1 bn n g((zi − u)/b) i=1 n 1 sup n b u. rn ) sup u.010 q q 0.020 q 0. r n ) = B ˆ 1 . slope = 2. Proof: Observe the following deﬁning properties of the mle’s in terms of z = (z1 . . These properties are naturally desirable for any location and scale estimates and for mle’s they are indeed true.005886 . . .005 200 400 sample size n 600 800 1000 Figure 4: Weibull Parameter MLE Computation Time in Relation to Sample Size n and ˆ 1 .b g((ri − u)/b) i=1 g((zi − u(z))/ˆ ˆ b(z)) ˆn (z) i=1 b 1 n = g((ri − u(r))/ˆ ˆ b(r)) ˆn (r) i=1 b = = 1 1 n ˆ B (b(r)/B)n n i=1 1 n g((zi − (ˆ(r) − A)/B)/(ˆ u b(r)/B)) 16 . . zn ) and r = (r1 .000 0 0. . . .025 intercept = 0.015 0. zn ) b(r b(z or ˆ = ˆ + Bz) = Bˆ b(r) b(A b(z) . .030 0. . . . . . .0. . .
and ˆ b(z) = ˆ b(r)/B 8 Tests of Fit Based on the Empirical Distribution Function Relying on subjective assessment of linearity in Weibull probability plots in order to judge whether a sample comes from a 2parameter Weibull population takes a fair amount of experience.but also 1 sup n b u.˜ ˜b 1 1 n g((zi − u)/˜ = n ˜ b) g((zi − u(z))/ˆ ˆ b(z)) B ˆn (z) i=1 b i=1 Thus by the uniqueness of the mle’s we have u(z) = (ˆ(r) − A)/B ˆ u or u(r) = u(A + Bz) = A + B u(z) ˆ ˆ ˆ and ˆ = ˆ + Bz) = Bˆ b(r) b(A b(z) q.b n g((ri − u)/b) i=1 1 = sup n b u. α.b = sup u. Xn is deﬁned as n ˆn (x) = # of observations ≤ x = 1 I{Xi ≤x} F n n i=1 where IA = 1 when A is true. . . . 17 . .e. as is easily veriﬁed. It is simpler and more objective to employ a formal test of ﬁt which compares the empirical distribution ˆ ˆ function Fn (x) of a sample with the ﬁtted Weibull distribution function F (x) = Fα.β (x) as ˆ n −→ ∞. The empirical distribution function (EDF) of a sample X1 .b n g((A + Bzi − u)/b) i=1 n 1 1 B n (b/B)n 1 1 B n ˜n b n g((zi − (u − A)/B)/(b/B)) i=1 u = (u − A)/B ˜ ⇒ ˜ = b/B b = sup u.β (x) using one ˆ ˆ of several common discrepancy metrics. The same equivariance properties hold for the mle’s in the context of type II censored samples.β ˆ α ˆ ˆ From the law of large numbers (LLN) we see that for any x we have that Fn (x) −→ Fα. Just view Fn (x) as a binomial proportion or as an average of Bernoulli random variables.d. The ﬁtted Weibull distribution function ˆ (using mle’s α and β) is ˆ ˆ β ˆ (x) = F ˆ(x) = 1 − exp − x F . and IA = 0 when A is false.
.ˆ From MLE theory we also know that F (x) = Fα. It can be noted that the closeness between Fn (x) and Fα. Several other distances between cdf’s F and G have been proposed and investigated in the literature.β (x) is usually more pronounced ˆ ˆ than their respective closeness to Fα. G) = (F (x) − G(x))2 dG(x) = ∞ −∞ (F (x) − G(x))2 g(x) dx −∞ ∞ −∞ (F (x) − G(x))2 dG(x) = G(x)(1 − G(x)) ∞ −∞ (F (x) − G(x))2 g(x) dx .β (x) −→ 0 and x sup Fα.e.β (x) −→ Fα.e.β (x) as n −→ ∞ (also derived from ˆ ˆ the LLN).β (x) ﬁt the data. these alternate distances or discrepancy metrics integrate these distances in squared form over all x. and empirical distribution function.β (x) −→ 0 as n −→ ∞ ˆ ˆ x and thus The distance ˆ sup Fn (x)−Fα. Figures 5 and 6 give illustrations of this KolmogorovSmirnov distance between EDF and ﬁtted Weibull distribution and show the relationship between sampled true Weibull distribution.. G(x)(1 − G(x)) Rather than focussing on the very local phenomenon of a maximum discrepancy at some point x as in DKS . We will only discuss two of them. weighted by g(x) in the case of DCvM (F. G) = sup F (x)−G(x) x is known as the KolmogorovSmirnov distance between two cdf’s F and G. G) = and DAD (F. ˆ 2. ﬁtted Weibull distribution. G) and by g(x)/[G(x)(1 − G(x))] in the case 18 . The ﬁt of the true distribution. 3. i.β (x) is continuous in x one can argue that these convergence statements can be made uniformly in x. in spite of the sequence of the above convergence statements.β (x) −→ 0 as n −→ ∞ ˆ ˆ x for all α > 0 and β > 0. i. They are deﬁned respectively as follows ∞ DCvM (F. although being the origin of the data. ˆ sup Fn (x) − Fα. Some comments: ˆ 1.β (x) − Fα. try to ˆ ˆ give a good representation of the data. is not always good due to sampling variation. Since the limiting cdf Fα. This can be understood from the fact that both Fn (x) and Fα. The closeness between all three distributions improves as n gets larger. the Cram´rvon Mises distance DCvM and the AndersonDarling e distance DAD . DKS (F.β (x).
The null distribution of D. F ) and DAD (F. We will give the following computational expressions (without proof): ˆ ˆ DKS (Fn (x). Using our prior notation we write log(Xi ) = Yi = u + bZi and since 19 . The reason for this ˆ ˆ ˆ is that the Vi have a distribution that is independent of the unknown parameters α and β.e. max V(i) − (i − 1)/n ˆ where V(1) ≤ .e. In the latter case. Because of the integration nature of these last two metrics they have more global character.. .DAD (F. being estimated by α and β in Vi = F (Xi ) = Fα. while the other cdf is taken to be the EDF. i. W 2 and A2 does not depend on the unknown ˆ ˆ parameters α and β. F (x)) = D = max max i/n − V(i) . when one cdf is shifted relative to the other by a small amount (no large vertical discrepancy). . compensates to some extent for the tapering oﬀ in the density g(x). the denominator increases the weight in the tails of the G distribution. i = 1. their computation is quite simple. these small vertical discrepancies (squared) will add up and indicate a moderately large diﬀerence between the two compared cdf’s. square them and accumulate these squares in the appropriately weighted fashion. G). n. ≤ V(n) are the ordered values of Vi = F (Xi ). F (x)) = A2 = −n − (2i − 1) log(V(i) ) + log(1 − V(n−i+1) ) . . n i=1 In order to carry out these tests of ﬁt we need to know the null distributions of D. For example.β (Xi ). . G) is favored in situations where judging tail behavior is important. F ) . . There is no easy graphical representation of these metrics. i.. As complicated as these metrics may look at ﬁrst glance. We point out the asymmetric nature of these last two metrics. e. we typically have DCvM (F. except to suggest that when viewing the previous ﬁgures ˆ ˆ illustrating DKS one should look at all vertical distances (large and small) between Fn (x) and F (x). in risk situations. . When using these metrics for tests of ﬁt one usually takes the cdf with a density (the model distribution to be tested) as the one with respect to which the integration takes place.. This is seen as follows. For the other two test of ﬁt criteria we have n ˆ ˆ DCvM (Fn (x). G) = DAD (G. Thus DAD (F. G) = DCvM (G. W 2 and A2 . F (x)) = W 2 = i=1 V(i) − 2i − 1 2n 2 + 1 12n and 1 n ˆ ˆ DAD (Fn (x). Quite naturally we would reject the hypothesis of a sampled Weibull distribution whenever D or W 2 or A2 are too large.g.
0 ^ ^ ^ KS−Distance = sup Fn(x) − Fα.4 q q q 0.6 q q q q 0.1.0 ^ ^ ^ KS−Distance = sup Fn(x) − Fα. β(x) x q q q qq q q q q q q q q q q q q q q q 0 5000 10000 x 15000 20000 25000 Figure 5: Illustration of KolmogorovSmirnov Distance for n = 10 and n = 20 20 . β(x) q q q q q cumulative distribution function KS−Distance q q q q q 0. β(x) x q q q q q q q q q q 0 5000 10000 x 15000 20000 25000 1.8 ^ EDF = Fn True Sampled CDF = Fα.2 q q q 0. β(x) ^ ^ Weibull Fitted CDF = Fα. β(x) KS−Distance q q q q cumulative distribution function 0. β(x) ^ ^ Weibull Fitted CDF = Fα.4 q q q q 0.2 q q q q 0.0 0.8 ^ EDF = Fn True Sampled CDF = Fα.6 q q 0.0 0.
2 q q q q q q q q q q q 0.0 0.4 0. β(x) q q q q q q q q q q q q q q q q q q cumulative distribution function KS−Distance q q q q q qq q q q q q q 0.0 q q q q q qq q q q q qq q q q q q q q q q qqq q q q q q qq q q q q q q q qq q q q q q ^ ^ ^ KS−Distance = sup Fn(x) − Fα.1.2 0.0 0.8 ^ EDF = Fn True Sampled CDF = Fα. β(x) ^ ^ Weibull Fitted CDF = Fα. β(x) x q q q 0 5000 10000 x 15000 20000 25000 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q ^ ^ ^ KS−Distance = sup Fn(x) − Fα. β(x) KS−Distance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q cumulative distribution function 0.4 0.8 ^ EDF = Fn True Sampled CDF = Fα.6 q q q q q q q q q 0.6 0. β(x) x q q qq q q q qq qq qq q q qq q q q q qq q qq qq q q q qq q q qqq q q q q q q q q qq q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q 0 5000 10000 x 15000 20000 25000 Figure 6: Illustration of KolmogorovSmirnov Distance for n = 50 and n = 100 21 . β(x) ^ ^ Weibull Fitted CDF = Fα.
. Xn ) from W(α = 1.025. say Nsim = 10000. . .25. . β)) and from that the values ˆ ˆ D = D(X ).β (x) is the cdf of W(α.F (x) = P (X ≤ x) = P (log(X) ≤ log(x)) = P (Y ≤ y) = 1 − exp(− exp((y − u)/b)) and thus ˆ Vi = F (Xi ) = 1 − exp(− exp((Yi − u(Y))/ˆ ˆ b(Y))) = 1 − exp(− exp((u + bZi − u(u + bZ))/ˆ + bZ))) ˆ b(u = 1 − exp(− exp((u + bZi − u − bˆ(Z))/[b ˆ u b(Z]))) = 1 − exp(− exp((Zi − u(Z))/ˆ ˆ b(Z))) and all dependence on the unknown parameters u = log(α) and b = 1/β has canceled out. ﬁrst by plotting these quantiles against 1/ n.β (Xi ) (where Fα.2/ n) and W 2 (1 + . .05.10.and extrapolation scheme is needed. . . √ ﬁtting quadratics in 1/ n and reading oﬀ the four interpolated quantile values for the needed n0 (the sample size at issue) and as a second step perform the interpolation or extrapolation scheme as it was done previously. 50. Repeating this a large ˆ number of times.10. W 2 = W 2 (X ) and A2 = A2 (X ). . ∞ by Stephens (1986). This opens up the possibility of using simulation to ﬁnd good approximations to these null distributions for any n. .2/ n) and W 2 (1 + . Just generate samples X = (X1 . √ √ For nD the null distribution still depends on n (in spite of the normalizing factor n) and (1−α)quantiles for α = . If C(X) denotes the used test of ﬁt criterion then the estimated pvalue of this sample is simply the proportion of C(X ) that are ≥ C(X). .05. but using a cubic this time. Here √ a double inter. should give us a reasonably good approximation to the desired null distribution and from it one can determine appropriate pvalues for any sample X1 .025. 22 . . Xn for which one wishes to assess whether the Weibull distribution hypothesis is tenable or not. Calculating all three test of ﬁt criteria makes ˆ sense since the main calculation eﬀort is in getting the mle’s α and β . .2/ n). Plotting log(α/(1 − α)) against q1−α shows a mildly quadratic pattern which can be used to interpolate or extrapolate the appropriate pvalue (observed signiﬁcance level α) for √ √ any observed nadjusted value A2 (1 + . . . This is illustrated in Figure 8. 20. compute the corresponding α = α(X ) and β = β(X ). Prior to the ease of current computing Stephens (1986) provided tables √ the (1 − α)quantiles for √ 2 q1−α of these null distributions. .01. ˆ ˆ ˆ then Vi = F (Xi ) = Fα . especially in view of the previously reported timing results for computing the ˆ mle’s α and β of α and β. β = 1) ˆ ˆ ˆ (standard exponential distribution). For the nadjusted versions A (1 + .01 were tabulated for n = 10.2/ n) these null distributions appear to be independent of n and (1 − α)quantiles were given for α = . as is illustrated in Figure 7.
One could easily reproduce and extend the tables given by Stephens (1986) so that extrapolations becomes less of an issue. 000 and the previously given timing of 8. .2/ n) . being a function of Z alone.test.test does a Weibull goodness of ﬁt test on any given sample.GOF. has a distribution that does not involve unknown parameters and W1 = ˆ b(Y)/b is strictly monotone in b. The function Weibull. W 2 (1 + .1 Pivot for the Scale Parameter b As natural pivot for the scale parameter ϑ = b we take W1 = ˆ b(u bˆ b(Z) ˆ b(Y) ˆ + bZ) = = = b(Z) . . There we utilize the representation Yi = u+bZi or Y = u+bZ in vector form. Recall that for a Weibull random sample X = (X1 .Functions for computing these pvalues (via interpolation from Stephens’ tabled values) are given in the Weibull R work space provided at the class web site. namely functions W = ψ(ˆ(Y).test for computing pvalues for nadjusted test criteria nD. It is this known distribution of Z = (Z1 .5 minutes to simulate the null distributions based on Nsim = 10. Xn ) we have Yi = log(Xi ) ∼ G((y − u)/b) with b = 1/β and u = log(α). 9. GOF. otherwise only the pvalues are given.KS. √ √ and GOF. . They are GOF. 23 . and √ A2 (1 + .test. . Zn ) that is instrumental in knowing the distribution of the four pivots that we discuss below. These functions have an optional argument graphic where graphic = T causes the interpolation graphs shown in Figures 7 and 8 to be produced.2/ n). ˆ u b(Y). returning pvalues for all three test criteria.CvM. . . b b b The right side. 9 Pivots Based on the previous equivariance properties of u(Y) and ˆ ˆ b(Y) we have the following pivots. so that it is invertible with respect to ϑ. This is seen as follows: P (Zi ≤ z) = P ((Yi − u)/b ≤ z) = P (Yi ≤ u + bz) = G(([u + bz] − u)/b) = G(z) . You also ﬁnd there the function Weibull. Then Zi = (Yi − u)/b ∼ G(z) = 1 − exp(− exp(z)). . which is a known distribution (does not depend on unknown parameters).07 sec for Nsim = 1000.AD. ϑ) of the estimates and an unknown parameter ϑ of interest such that W has a ﬁxed and known distribution and the function ψ is strictly monotone in the unknown parameter ϑ. .mle that was listed earlier. respectively. For n = 100 it should take less than 1. and several other functions not yet documented here.
15 W × (1 + 0.2/ n) and W 2 (1 + .10 2 0.05 0.2 n) 0.25 0.001 0.75 tabled values interpolated/extrapolated values qq q q 0.01 q 0.5 q q q 0.001 0.9 q q q tail probability p on log(p (1 − p)) scale 0.2 2 1.5 0.9 q q q tail probability p on log(p (1 − p)) scale 0.20 0.01 q 0.5 A × (1 + 0.75 tabled values interpolated/extrapolated values q 0.30 √ √ Figure 7: Interpolation & Extrapolation for A2 (1 + .05 q q q q 0.0.25 0.0 0.0 n) 1.2/ n) 24 .5 0.25 0.05 q q q q 0.00 0.
95 1.70 0.007 D0.025 q qq q q q q qq q q q 0.quadratic interpolation & linear extrapolation in 1 1.2 q qqq q q 0.05 Figure 8: Interpolation & Extrapolation for 25 √ n×D .75 0.95 D0.9 q n × D−quantile q ∞ 500 200 100 50 40 30 25 20 n (on 1 15 n scale) 10 9 8 7 6 5 4 cubic interpolation & linear extrapolation in D q q q q q q qq q q tail probability p on log(p (1 − p)) scale tabled values interpolated quantiles interpolated/extrapolated values 0.85 n ×D 0.874 0.99 q q q q q q q q q q q q q q q q q q q q n tabled values interpolated/extrapolated values 0.975 D0.803 D0.90 0.00 1.001 0.80 0.939 0.1 0.005 q 0.
i. Nsim = 10000 for n = 6. provided Nsim is suﬃciently large. where Nsim = 20000 for ˆ n = 5. The empirical distribution function of these b(Z ˆ ). . provides a fairly reasonable estimate of the sampling ˆ simulated estimates b(Zi distribution H1 (w) of ˆ b(Z) and thus also of the pivot distribution of W1 = ˆ b(Y)/b.. 40. Values of γ closer to 0 or 1 require higher Nsim . . . 30. 75. 50. By simulating this process ˆ b(Z Nsim = 10000 times we obtain ˆ 1 ). 10. where we only give onesided bounds (lower or upper) in each case. If we denote the γquantile of H1 (w) by η1 (γ). Zn ) ∼ G(z) and ﬁnds ˆ ) (and b(Z u(Z ) for the other pivots discussed later) for each such sample Z . who provided tables for this distribution (and for those of the other pivots discussed here) based on Nsim simulated values of ˆ b(Z) (and u(Z)). We do not know η1 (γ) but we can estimate it by the corresponding quantile η1 (γ) of the simulated distriˆ ˆ 1 (w) which serves as proxy for H1 (w). η1 ((1 + γ)/2) × β(X) ˆ ˆ since (1 + γ)/2 = 1 − (1 − γ)/2.e. 1970). . ˆ Nsim ). In these simulations one simply generates samples Z = (Z1 . 26 . The approach followed here is that presented in Bain (1978). For .How do we obtain the distribution of ˆ b(Z)? An analytical approach does not seem possible. For large Nsim . because 1 − γ is the chance of the lower bound falling on the wrong side of its target. this approximation is practically quite adequate. We then use ˆ bution H b(Y)/ˆ1 (γ) as an approximate 100γ% η lower bound to the unknown parameter b. Based on the relationship b = 1/β the respective 100γ% approximate lower and upper conﬁdence bounds for the Weibull shape parameter would be η1 (1 − γ) ˆ ˆ = η1 (1 − γ) × β(X) ˆ ˆ b(Y) and η1 (γ) ˆ ˆ = η1 (γ) × β(X) ˆ ˆ b(Y) and an approximate 100γ% conﬁdence interval for β would be ˆ ˆ η1 ((1 − γ)/2) × β(X). From this simulated distribution we can estimate any γquantile of H1 (w) to any practical accuracy. We note here that a 100γ% lower bound can be viewed as a 100(1 − γ)% upper bound. 15. say Nsim = 10000. . Xn ) is the untransformed Weibull sample.995 a simulation level of Nsim = 10000 should be quite adequate. 20. Here X = (X1 . (1969. . . . . . and Nsim = 6000 for n = 100. namely above. . To get 100γ% upper bounds one simply constructs 100(1 − γ)% lower bounds by the above method. Bain and Engelhardt (1991) and originally in Thoman et al. γ = H1 (η1 (γ)) = P (ˆ b(Y)/b ≤ η1 (γ)) = P (ˆ b(Y)/η1 (γ) ≤ b) we see that ˆ b(Y)/η1 (γ) can be viewed as a 100γ% lower bound to the unknown parameter b.005 ≤ γ ≤ . Similar comments apply to the pivots obtained below. 8. . denoted by H1 (w).
2 Pivot for the Location Parameter u For the location parameter ϑ = u we have the following pivot W2 = u(Z) ˆ u(Y) − u ˆ u(u + bZ) − u ˆ u + bˆ(Z) − u u = = = . With respect to the pquantile ϑ = yp = u + b log(− log(1 − p)) = u + bwp of the Y distribution the natural pivot is Wp = u(Y) + ˆ ˆ b(Y)wp − (u + bwp ) u(u + bZ) + ˆ + bZ)wp − (u + bwp ) ˆ b(u yp (Y) − yp ˆ = = ˆ ˆ ˆ + bZ) b(Y) b(Y) b(u = u(Z) + (ˆ ˆ b(Z) − 1)wp u + bˆ(Z) + bˆ u b(Z)wp − (u + bwp ) = . As before this pivot distribution and its quantiles can be approximated suﬃciently well by simulating u(Z )/ˆ ) ˆ b(Z ˆ b(Z a suﬃcient number Nsim times and using the empirical cdf H2 (w) of the u(Zi )/ˆ i ) as proxy for ˆ H2 (w). since it only depends on the known distribution of Z.9. Denote this pivot distribution 27 . ˆ ˆ ˆ + bZ) ˆ bb(Z) b(Y) b(u b(Z) It has a distribution that does not depend on any unknown parameter. Thus W2 is a pivot with respect to u. Denote this pivot distribution of W2 by H2 (w) and its γquantile by η2 (γ). Based on the relation u = log(α) this translates into an approximate 100γ% lower bound ˆ ˆ exp(ˆ(Y) − ˆ u b(Y)ˆ2 (γ)) = exp(log(ˆ (X)) − η2 (γ)/β(X)) = α(X) exp(−ˆ2 (γ)/β(X)) η α ˆ ˆ η Upper bounds and intervals are handled as in the previous situation for b or β. As in the previous pivot case we can exploit this pivot distribution as follows γ = H2 (η2 (γ)) = P u(Y) − u ˆ ≤ η2 (γ) = P (ˆ(Y) − ˆ u b(Y)η2 (γ) ≤ u) ˆ b(Y) and thus we can view u(Y) − ˆ ˆ b(Y)η2 (γ) as a 100γ% lower bound for the unknown parameter u. Furthermore W2 is strictly decreasing in u.3 Pivot for the pquantile yp for α. 9. ˆ bˆ b(Z) b(Z) Again its distribution only depends on the known distribution of Z and not on the unknown parameters u and b and the pivot Wp is a strictly decreasing function of yp . ˆ Using the γquantile η2 (γ) obtained from the empirical cdf H2 (w) we then treat u(Y) − ˆ ˆ ˆ b(Y)ˆ2 (γ) η as an approximate 100γ% lower bound for the unknown parameter u.
this pivot distribution and its quantiles can be approximated suﬃciently well by simulating u(Z) + (ˆ ˆ b(Z) − 1)wp /ˆ b(Z) a suﬃcient numˆ ber Nsim times. u b(Z ˆp (γ) serves as a good approximation to kp (γ). . a contradiction) since P (wp1 < u(Z) + h(p1 )ˆ ˆ b(Z) ≤ wp1 + (wp2 − wp1 )) > 0. . A thorough argument would show that ˆ b(z) and thus u(z) are continuous functions of z = (z1 . ˆ u b(z)). in this approach one sees one further detail. ˆ ˆ Since yp (Y) − ηp (γ)ˆ ˆ b(Y) = u(Y) + wpˆ ˆ b(Y) − ηp (γ)ˆ b(Y) = u(Y) − kp (γ)ˆ ˆ b(Y) with kp (γ) = ηp (γ) − wp . ˆ As before we proceed with γ = Hp (ηp (γ)) = P yp (Y) − yp ˆ ≤ ηp (γ) = P yp (Y) − ηp (γ)ˆ ˆ b(Y) ≤ yp ˆ b(Y) and thus we can treat yp (Y) − ηp (γ)ˆ ˆ b(Y) as a 100γ% lower bound for yp . 1 28 . Denote the empirical cdf of such simulated values by Hp (w) and the corresponding γquantiles by ηp (γ). As before. namely γ = P (ˆ(Y) − kp (γ)ˆ u b(Y) ≤ yp ) = P (ˆ(Y) − kp (γ)ˆ u b(Y) ≤ u + bwp ) = P (ˆ(Y) − u − kp (γ)ˆ u b(Y) ≤ bwp ) = P ˆ b(Y) u(Y) − u ˆ − kp (γ) ≤ wp b b u(Z) − wp ˆ ≤ kp (γ) ˆ b(Z) = P (ˆ(Z) − kp (γ)ˆ u b(Z) ≤ wp ) = P and we see that kp (γ) can be taken as the γquantile of the distribution of (ˆ(Z) − wp )/ˆ u b(Z). However. namely that h(p) = −kp (γ) is strictly increasing in p1 . zn ) and since there is positive probability in ˆ any neighborhood of any z ∈ R there is positive probability in any neighborhood of (ˆ(z). . we could have obtained the same lower bound by the following argument that does not use a direct pivot. Again we can treat ˆ yp (Y) − ηp (γ)b(Y) as an approximate 100γ% lower bound for yp . . . i = 1.function by Hp (w) and its γquantile by ηp (γ). This distribution can be estimated by the empirical cdf of Nsim simulated values (ˆ(Zi )−wp )/ˆ i ). since wp is strictly increasing in p. Nsim and its γquantile k It is easily seen that this produces the same quantile lower bound as before. ..e. . Suppose p1 < p2 and h(p1 ) ≥ h(p2 ) with γ = P (ˆ(Z) + h(p1 )ˆ u b(Z) ≤ wp1 ) and γ = P (ˆ(Z) + h(p2 )ˆ u b(Z) ≤ wp2 ) = ˆ ˆ ˆ P (ˆ(Z) + h(p1 )b(Z) ≤ wp1 + (wp2 − wp1 ) + (h(p1 ) − h(p2 ))b(Z)) ≥ P (ˆ(Z) + h(p1 )b(Z) ≤ wp1 + (wp2 − wp1 )) > γ u u (i. . γ > γ.
Since α is the (1 − exp(−1))quantile of the Weibull distribution. the situation here is not as straightforward as in the previous three cases. Wp(y) is a true pivot.4 Upper Conﬁdence Bounds for the Tail Probability p(y) = P (Y ≤ y) As far as an appropriate pivot for p(y) = P (Y ≤ y) is concerned. namely p(y) = G ˆ y − u(Y) ˆ ˆ b(Y) =G (y − u)/b − (ˆ(Y) − u)/b u ˆ b(Y)/b =G G−1 (p(y)) − u(Z) ˆ ˆ b(Z) ∼ Hp(y) . 1) p i. Rather than using this pivot we will go a more direct route as was indicated by the strictly increasing property of h(p) = hγ (p) in the previous section. lower bounds for it can be seen as a special case of quantile lower bounds. Thus by the probability integral transform it follows that Wp(y) = Hp(y) (ˆ(y)) ∼ U (0. If we parameterize such p via p(y) = P (Y ≤ y) = G((y − u)/b) we have yp(y) = y and thus also γ = P p(y) ≤ h−1 (y − u(Y))/ˆ ˆ b(Y) 29 .. We then have γ = P (ˆ(Y) + h(p)ˆ u b(Y) ≤ yp ) = P (h(p) ≤ (yp − u(Y))/ˆ ˆ b(Y)) = P p ≤ h−1 (yp − u(Y))/ˆ ˆ b(Y) . contrary to what is stated in Bain (1978) and Bain and Engelhardt (1991). for any p ∈ (0. Indeed. 9.e. Since xp = exp(yp ) is the corresponding pquantile of the Weibull distribution we can view ˆ ˆ ˆ exp yp (Y) − ηp (γ)ˆ ˆ ˆ b(Y) = α(X) exp (wp − ηp (γ))/β(X) = α(X) exp −kp (γ)/β(X) ˆ ˆ ˆ as an approximate 100γ% lower bound for xp = exp(u + bwp ) = α(− log(1 − p))1/β . Clearly p(y) = G ˆ y − u(Y) ˆ ˆ b(Y) is the natural estimate (mle) of p(y) = P (Y ≤ y) = G y−u b and one easily sees that the distribution function H of this estimate depends on u and b only through p(y). 1). this particular quantile lower bound coincides with the one given previously.Of course it makes intuitive sense that quantile lower bounds should be increasing in p since its target pquantiles are increasing in p. This strictly increasing property allows us to immediately construct upper conﬁdence bounds for left tail probabilities as is shown in the next section. Denote by h−1 (·) the inverse function to h(·).
Note that h−1 (x) solves −kp = x for p. Table 4 has been greatly reduced to just cover the conﬁdence factors dealing with the location parameter u. (1970) tabulate conﬁdence bounds for p(y) for a reasonably ﬁne grid of values for p(y). in the second edition. ˆ ˆ It should be quite clear that all this requires extensive tabulation. Bain and Engelhardt (1991) ˆ and Thoman et al. ˆ If we observe Y = y and obtain u(y) and b(y) as our maximum likelihood estimates for u and ˆ b we get our 100γ% upper bound for p(y) = G((y − u)/b) as follows: For the ﬁxed value of x = (y − u(y))/ˆ ˆ b(y) = G−1 (ˆ(y)) simulate the G(ˆ(Z) + xˆ p u b(Z)) distribution (with suﬃciently high Nsim ) and calculate the γquantile of this distribution as the desired approximate 100γ% upper bound for p(y) = P (Y ≤ y) = G((y − u)/b). and ηp (γ). Similar tables for bounds on p(y) are not quite possible since the appropriate bounds depend on the observed value of p(y). which varies from sample to sample.for any y ∈ R and u ∈ R and b > 0. ˆ The above claim concerning h−1 (x) is seen as follows. Bain and Engelhardt (1991). η2 (γ). Does it require the inversion of h and the concomitant calculations of many h(p) = −k(p) for the iterative convergence of such an inversion? It turns out that there is a direct path just as we had it in the previous three conﬁdence bound situations. The use of these tables is not easy. (1969. sample sizes n and choices of p. Hence pU (y) = h−1 (y − u(Y))/ˆ ˆ ˆ b(Y) can be viewed as 100γ% upper conﬁdence bound for p(y) for any given threshold y. This has essentially been done (with n scaling modiﬁcations) and such ˆ ˆ ˆ tables are given in Bain (1978). and it now 30 . unless one does this kind of calculation all the time.1970). 10 Tabulation of Conﬁdence Quantiles η(γ) For the pivots for b. We claim that h−1 (x) is the γquantile of the G(ˆ(Z)+xˆ u b(Z)) ˆ distribution which we can simulate by calculating as before u(Z) and b(Z) a large number Nsim times. which can then serve for interpolation purposes with the actually observed value of p(y). Instead Bain (1978). Thus h−1 (x) is the γquantile of the G(ˆ(Z) + xˆ u b(Z)) distribution. u and yp it is possible to carry out simulations once and for all for a desired set of conﬁdence levels γ. The only remaining issue is the computation of such bounds. Table 4 in Bain (1978) does not have a consistent format and using these tables would require delving deeply into the text for each new use. In fact. and tabulate the required conﬁdence quantiles √ η1 (γ). Bain and Engelhardt (1991) and Thoman et al. If for any x = h(p) we have P (G(ˆ(Z) + xˆ u b(Z)) ≤ h−1 (x)) = P (G(ˆ(Z) + h(p)ˆ u b(Z)) ≤ p) = P (ˆ(Z) + h(p)ˆ u b(Z) ≤ wp ) = P (ˆ(Z) − kγ (p)ˆ u b(Z) ≤ wp ) = γ . as seen in the previous section.
Given that this was the only case chosen for comparison it leaves some concern in fully trusting these tables. Our simulated γquantile was 5. Such extensions of R are possible. namely 13. However. 11 The R Function WeibullPivots Rather than using these tables we will resort to direct simulations ourselves since computing speed has advanced suﬃciently over what was common prior to 1978.625 instead of 3. The agreement is good for the ﬁrst 8 points. Presumably they were transcribed by hand from computer output. just to validate our simulation for n = 40. see a similar plot for a later example. p. It may be possible to further increase computing speed by putting the loop over Nsim calculations of mle’s into compiled form rather than looping within R for each simulation iteration.235. The example that they present (page 248) would have beneﬁtted by showing some intermediate steps in the interpolation process. For example. see the top plot in Figure 9. this example also shows that the great majority of tabled values are valid. However. In Bain (1978) Table 4A.826 above) and it ﬁts quite smoothly into the pattern of the previous 8 points. p.262.03) from that obtained using the conﬁdence quantiles of the original Table 4. bottom row. They attribute the diﬀerence to roundoﬀ errors or other discrepancies. In Table 3A.96 instead of the indicated p = . p. 31 . They point out that the resulting conﬁdence bound for xp is slightly diﬀerent (14. such an increase in speed would require writing Ccode (or Fortran code) and linking that in compiled form to R. For the pquantiles one is referred to the same interpolation scheme that is needed when getting conﬁdence bounds for p(y). we encountered an apparent error in Table 4A.222. just as the book (and its second edition) itself is typed and not typeset. Furthermore. using qbeta in vectorized form reduced the computing time to almost 1/3 of the time compared to looping within R itself over the elements in the argument vector of qbeta. the second entry from the right should be 3. computing availability has changed dramatically since then. row 3 column 5 shows a double minus sign (still present in the 1991 second edition). The bottom plot shows our simulated values for these quantiles as solid dots with the previous points (circles) superimposed.leaves out the conﬁdence factors for general pquantiles. Among the latter one may consider that possibly diﬀerent simulations were involved. 235 with last column entry of 4. see chapter 5 System and foreign language interfaces in the Writing R Extensions manual available under the toolbar Help in R.826. using Table 7 in Bain and Engelhardt (1991).725 (corresponding to the 4. In comparing the values of these tables with our own simulation of pivot distribution quantiles. note that some entries in the tables given in Bain (1978) seem to have typos.92. This discrepancy shows up clearly when plotting the row values against log(p/(1 − p)). We suspect that the whole last column was calculated for p = . Further. We give just give a few examples. Plotting log(p/(1 − p)) against the corresponding row value (γquantiles) one clearly sees a change in pattern.98.
γ = 0.9 Figure 9: Abnormal Behavior of Tabulated Conﬁdence Quantiles 32 .4 q 0.98 q q q q q 0. γ = 0.96 log(p/(1 − p)) 2 3 1 q q q q 0 2 3 4 5 Bain's tabled quantiles for n=40.9 4 q q q q q q q q q q log(p/(1 − p)) 2 3 1 q q q q q q q q 0 2 3 4 5 our simulated quantiles for n=40.
r = 10. Here the default sample size n = 10 was used and r = 10 (also default) indicates that the 10 lowest sample values are given and used. . WeibullPivots does all this not only for full samples but also for type II censored samples. purpose.e..2119 given here are fairly consistent with the intercept . i. X(r) in order to estimate 33 . The time for running the simulation should easily beat the time spent in dealing with tabulated conﬁdence quantiles in order to get desired conﬁdence bounds. . . be they upper or lower bounds for their respective targets. i. and output of this function. When r < n we are dealing with a type II censored data set where observation stops as soon as the smallest r lifetimes have been observed. and the time to run the simulations. The calling sequence with all arguments given with their default values is as follows: WeibullPivots(weib. The sample size is input as n = n and r = r indicates the number of smallest sample values available for analysis. p or y. . The intercept 57. n = 10.35 and slope of . . Also.sample=NULL.time(WeibullPivots(Nsim = 10000. conﬁdence level. Thus we can construct u b(z conﬁdence bounds and intervals for u and b. i.threshold=NULL.001 × 10−5 given in Figure 4. an internally generated Weibull data set was used. all that is needed is the set of (ˆ(zi )..alpha=10000. . for yp for any collection of p values.5. Nsim=1000. ˆ i )) for i = 1. unless one wants independent simulations for some reason.22 and 269. since the default in the call to WeibullPivots is weib.76 seconds. For sample sizes n = 100 with r = 100 and n = 1000 with r = 1000 the corresponding calls resulted in elapsed times of 78. .For the R function WeibullPivots (available within the R work space for Weibull Distribution Applications on the class web site) the call system. respectively.graphics=T) Here Nsim = Nsim has default value 1000 which is appropriate when trying to get a feel for the function for any particular data set. we don’t have to run the simulations over and over for each target parameter. We need r > 1 and at least two distinct observations among X(1) . and for p(y) and 1 − p(y) for any collection of threshold values y and we can do this for any set of conﬁdence levels that make sense for the simulated distributions.e. especially since WeibullPivots does such calculations all at once for a broad spectrum of yp and p(y) and several conﬁdence levels without greatly impacting the computing time. We will now explain the calling sequence of WeibullPivots and its output. Nsim . graphics = F)) gave an elapsed time of 59.e. For all the previously discussed conﬁdence bounds. The latter give the calculation time of a single set of mle’s while in the former case we calculate Nsim = 10000 such mle’s.n=10. Furthermore.sample=NULL.r=10..005886 and slope of 2. Proper use of this function only requires understanding the calling arguments.32 seconds. for which appropriate conﬁdence factors are available only sparsely in tables. the previous slope and intercept for a single mle calculation need to be scaled up by the factor 10000. in this case the full sample. . These three computing times suggest strong linear behavior in n as is illustrated in Figure 10.beta=1.
975.975.999.2119 q 100 150 200 250 q 0 0 50 q 200 400 sample size n 600 800 1000 Figure 10: Timings for WeibullPivots for Various n any spread in the data. .quantile.sample=NULL (the default). . . .Bounds. The structure and meaning of these components will become clear from the example output given below. The values of p for which conﬁdence bounds or intervals for xp are provided are also set internally as . either by using the full sample X1 . . . The input thresh (= NULL by default) is a vector of thresholds y for which we desire upper conﬁdence bounds for p(y). . .quantile. . $alpha. . . $alpha.Estimates.1. A .05. 34 .975 lower and upper bounds constitute a 95% conﬁdence interval.bounds.8. .99.Probability.hat. Xn or a type II censored sample X1 . .beta. .9. The available sample values X1 .995. .025 lower bound is reported as a . β) with α = alpha = 10000 (default) and β = beta = 1. .02. .975 upper bound. . The output from WeibullPivots is a list with components: $alpha.5 (default).001.Probability. . . . an internal data set is generated as input sample from from W(α. . .elapsed time for WeibullPivots(Nsim=10000) 300 intercept = 57. The input graphics (default T) indicates whether graphical output is desired.025.99.005. .005.995 and these levels indicate the coverage probability for the individual onesided bounds.estimates.10. Conﬁdence levels γ are set internally as .1). . Xr when r < n is speciﬁed. .bounds.95.01. and a pair of . $Tail.01.hat. . (.$p. .05. and $Tail.35 slope = 0.025. When weib. . $p.95.sample. . . Xr (not necessarily ordered) are given as vector input to weib. .9. . .
001quantile 0.5quantile 7438.9 12608 1.6 13676 0.2 10606 1.6 16705 0.U 99.01quantile 0.18 $p.6 0.U 99.3 1165.6 2066.U 0.5 90% 28.0 1783.5 .1quantile 0.4 398.9 16.6 97.quantile.3 $p.2 13767.9 593.8 845.2quantile 0.1 1245.4 1359.9 36.777 3.7 1100.6 1094.5 0.L alpha.7 11457.L 0.3 2532.210 2.4 57.5% 1.6quantile 0.995quantile 0.5% 5948.9 729.9 15228 0.025quantile.U beta.U 0.9quantile 8582.1 0.5 169.956 2.5% 6.5 103.5% 5094.bounds alpha.001quantile.8 3498.7 31.L 0.82 95% 6443.05 97.7quantile 0.L 0.8 611.64 90% 7024.5 315.hat [1] 1.9 1575.L 0.0 886.$alpha.6 186.01quantile.2 $beta.999quantile 17531.2 110.2 35 95% 12.U 0.5 3206.8quantile 0.6 11600 1.1 65.42 80% 7711.7 2361.005quantile 0.005quantile.9 1854.9 24183.855 3.05quantile 1957.1 403.1 2176.025quantile.390 2.0 19643.001quantile.2 561.5 0.9 20.0 0.975quantile 0.1 2579.7 9872.22 99% 5453.95 $alpha.4 2021.9 1478.hat (Intercept) 8976.3 1362.005quantile.4quantile 2830.2 0.bounds 0.5 21107.8 4159.95quantile 15756.9 259.quantile.8 99% 2.070 2.9 80% 60.4 106.01quantile.99quantile 0.8 62.3quantile 0.4 6360.L beta.8 848.1 1773.7 8.7 2827.025quantile 259.beta.1 190.estimates 0.4 5290.
6quantile.0 4353.9 1160.0 22445.9 7971.1 6624.2 $Tail.4 0.7quantile.2 1945.2 12460.55018 0.5 14311.0 55033.U 4415.4 7300.6 0.9 0.U 76425.7 17783.975quantile.9 14207.36612 0.3 4415.4 0.0 30505.05quantile.L 11203.1 0.3 11584.95quantile.4 5008.05quantile.90737 0.3 12833.93424 36 .7 33233.6 14350.U 24765.7 717.8 11974.2 0.9quantile.2 12958.7 14030.0 10033.2 9380.1 1011.9 5573.1 2833.3 17417.U 7417.3 13365.6quantile.2 6107.1 27729.0 40067.5 3990.2 0.2 1679.0 0.4 10106.7 229.2 4873.2 8628.995quantile.6 0.99quantile.5 12660.4quantile.1 5022.9 14781.2quantile.0 17679.2quantile.0 21744.U 15626.999quantile.70900 0.9 95515.4 12145.1 5543.6 6399.8 19081.2 32863.L 159.8 7345.0 19835.2 2820.3 16530.3 0.7 0.3 506.7 352.1 22670.U 19271.8 17551.6 1518.U 46832.L 7601.995quantile.0 15605.6 0.1quantile.1 3474.4 11041.6 45187.7 0.1 8933.0 10538.0 11784.L 2548.8quantile.5 33118.2 8460.2 4435.6 8778.5 700.9 0.0 0.7 15530.6 0.U 8919.6 15545.95quantile.87242 p(14000) p(15000) 0.8 12941.2 7421.7 31065.4 15115.6 3673.8 0.U 35233.5 23903.Estimates p(6000) p(7000) p(8000) p(9000) p(10000) p(11000) p(12000) p(13000) 0.1 0.8 2383.L 398.0 0.3 8703.9 2477.8 14181.5 962.5quantile.3 0.2 27904.9 6031.99quantile.6 40053.L 13732.3 0.5 3329.U 121177.8 2287.1 0.4quantile.L 16377.7 1882.7 61385.4 45328.2 4081.8 6492.0.1 23773.3 6876.5 17236.2 3305.3quantile.1 56445.0 7384.0 2930.0 9485.0 2848.999quantile.U 10616.3 5261.U 89690.4 17642.L 6017.0 11031.9 7939.1 0.1 7938.2 10130.0 6211.5 497.45977 0.9quantile.7 0.77385 0.1 10992.2 13952.8quantile.2 20703.2 4946.L 12434.U 59783.8 8465.Probability.4 15810.3quantile.2 6978.L 3502.4 4811.4 0.9 8139.1 18557.7 39397.L 14580.8 6052.2 10244.63402 0.975quantile.6 0.L 1725.8 27938.8 36918.0 0.7 19792.1 13449.L 9674.7quantile.8 3345.7 36739.7 0.4 71480.0 1249.6 26037.1 11919.8 14876.4 3881.9 3929.4 16885.4 0.0 9504.6 11653.3 10226.L 1012.4 19286.4 16431.5 71256.1quantile.5 49209.0 48625.U 12809.1 3806.82821 0.U 5584.0 15909.L 4694.3 5443.5quantile.0 6865.
23898 0.L 0.91728 0.77310 0.91745 0.U 0.73194 0.76280 0.91227 0.64619 0. Unfortunately these two types of bounds are not dual to each other.99449 97.65671 0.23647 0.91231 0.74451 0.85462 p(10000).L 0.70218 0. β (agreeing with the output above).80141 p(9000). i.37276 0.95756 0. 15000.58089 0.98837 p(15000).93210 0.13911 p(6000). The legend in the lower right explains the red ﬁtted line (representing the mle ﬁt) and the various pointwise conﬁdence bound curves.58696 p(14000).U 0.L 0.17411 0.69856 0.5% 0.31017 0.67426 0.54631 0.26985 0.89891 0.20130 p(7000).38942 0.49023 0.44219 0.70837 0.$Tail.53589 0.82993 0.69359 0.16954 0.L 0.12173 0. The legend in the upper left gives the mle’s of α.U 0.50388 p(12000). 37 .65435 0.94321 0.46351 0.L 0.33149 p(9000). Both of these interval types use normal approximations from large sample mle theory. The second gives a Weibull ˆ b b plot of the generated sample with a variety of information and with several types of conﬁdence bounds.44487 0.46448 0.98596 90% 0.36523 0.L 0.99653 0.36871 0.54668 0.71215 0.97764 80% 0.93318 p(12000).L 0.59749 0.87042 0.62988 0.65055 0.72545 0.54776 p(13000).80178 0.U 0.80361 0.26838 p(8000).19782 0.30397 0.95936 p(13000).96267 0.83973 0.76148 0.48549 0. don’t coincide or to say it diﬀerently.L 0.87805 0.60262 0. see Figure 12.96268 The above output was produced with WeibullPivots(threshold = seq(6000.78068 0.82667 0.63572 0.74260 0.Probability.59276 0.39257 p(10000).99201 0. 1000).96669 0.98205 0.54776 0.60133 0.57670 0.50876 0.5% 99% p(6000).e.23300 0.U 0.Bounds 99.94149 0. The ﬁrst gives the two intrinsic pivot distributions of u/ˆ and ˆ in Figure 11.96794 0.U 0.U 0.55531 0.L 0.50030 0.59592 0.U 0.82187 0.87425 0.98278 0.41748 0. giving 95% conﬁdence intervals (blue dashed curves) for pquantiles xp for any p on the ordinate and 95% conﬁdence intervals (green dotdashed line) for p(y) for any y on the abscissa.94650 0.94491 0.52203 0.65374 0.88377 0.41612 0.97459 0.L 0.28311 0.U 0.78641 0.77045 0.68389 0.83590 0. graphics = T). Nsim = 10000. Since we entered graphics=T as argument we also got two pieces of graphical output.. and the mean µ = αΓ(1 + 1/β) together with 95% conﬁdence intervals. based on respective normal approximation theory for the mle’s.62534 p(15000).97742 p(14000).67056 p(7000).85624 0.70414 0.U 0.99092 95% 0. one is not the inverse to the other.73981 p(8000).89889 p(11000).34488 0.45097 p(11000).30561 0.
Frequency 0 200 600 1000 −2 −1 ^ ^ u b 0 1 Frequency 0 200 400 600 800 0.5 1.0 Figure 11: Pivot Distributions of u/ˆ and ˆ ˆ b b 38 .5 2.0 ^ b 1.
947 . 12600 ) ^ β = 1.100 . 95 % q−confidence bounds 95 % p−confidence bounds 95 % monotone qp−confidence bounds .200 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q . interval ( 5690 .1 1 10 100 cycles x 1000 Figure 12: Weibull Plot Corresponding to Previous Output 39 . 11140 ) n = 10 .e. 2.267 .001 0. r = 10 failed cases .900 q . 95 % conf.001 q q 0. 95 % conf.500 . interval ( 1.999 q q q q q q q q q q q q q q q q ^ α = 8976 . interval ( 6394 . 95 % conf.01 0.l.632 q q probability q q q q q q q .Weibull Plot .010 q q q q m.991 ) MTTF ^ µ = 7960 .
i. the orange curve is still monotone and still serves that dual purpose. This is in the nature of an imperfect normal approximation for these two approaches. If instead of using the approximating normal distribution one uses the parametric bootstrap approach (simulating samples from an estimated Weibull distribution) the unifying principle reduces to the pivot simulation approach. The curves representing the latter (pivots with simulated distributions) are the solid black lines connecting the solid black dots which represent the xp 95% conﬁdence intervals (using the 97. These latter bounds are also based on normal mle approximation theory and the approximation will naturally suﬀer for small sample sizes. i. 15000 and viewed vertically they represent 95% conﬁdence intervals for p(x). (1000).. However.e.. subject to the Nsim < ∞ limitation. Also seen on these curves are solid red dots that correspond to the abscissa values x = 6000. although its coverage probability properties are bound to be aﬀected badly by the small sample size n = 2. Both of the ﬁrst two types of bounds (blue and green) are no longer monotone in p or x respectively. We either read sideways from p and down from the curve (at that p level) to get upper and lower bounds for xp . The pivot based curves are also strictly monotone and they have exact coverage probability. the principle behind these bounds is a unifying one in that the same curve is used for quantile and tail probability bounds. is basically exact except for the simulation aspect Nsim < ∞.A third type of bound is presented in the orange curve which simultaneously provides 95% conﬁdence intervals for xp and p(x). This illustrates that the same curves are used. However.5% lower and upper bounds to xp given in our output example above.e. or we read vertically up from an abscissa value x to read oﬀ upper and lower bounds for p(x) on the ordinate axis as we go from the respective curves at that x value to the left. Thus we could not (at least not generally) have taken either to take the role of serving both purposes. depending on the direction in which the curves are used. 40 . Figure 13 represents an extreme case where we have a sample of size n = 2 and here another aspect becomes apparent. as providing bounds for xp and p(x) simultaneously.
q Weibull Plot . interval ( 6954 .500 . interval ( 6448 .100 .404 . r = 2 failed cases q q q q q q q q q q .85 ) MTTF ^ µ = 7709 .999 q ^ α = 8125 .200 q q q q q q q q q q q q q q q q .e. 95 % q−confidence bounds 95 % p−confidence bounds 95 % monotone qp−confidence bounds q . 95 % conf.962 .l. 29. 9493 ) ^ β = 9.900 q q q q q q q q .632 q probability . 95 % conf. 95 % conf.1 0. interval ( 2.001 q 0.2 0.5 1 2 5 10 20 50 100 cycles x 1000 Figure 13: Weibull Plot for Weibull Sample of Size n = 2 41 . 9217 ) n = 2 .010 q q m.
1 + . has full rank k and n > k. Thus we have the following model for Y1 . The rapidly varying plots give a good visual image of the sampling uncertainty and the resulting sampling variation of the ﬁtted lines. 42 . . . . . . . ζk ) and ˆ of ζ and b exist and are unique b provided the covariate matrix C. speciﬁcally it varies as a linear combination of known covariates ci. 1996) that the mle’s ζ = (ζ1 . which exempliﬁes ANOVA situations. . It is instructive to see the variability of the data clouds around the true line. the logWeibull data are plotted because of its more transparent relationship to the true line. . . . n .e. ˆ ˆ ˆ It can be shown (Scholz. On my laptop the elapsed time for this call is about 15 seconds. n. i. .k for the ith observation as follows: ui = ζ1 ci. . . The ﬁxed line represents the true line with respect to which the Weibull data are generated by simulation. and unknown parameters b > 0 and ζ1 . .1 + . . and this includes the plotting time. . . Yn Yi = ui + bEi = ζ1 ci. but also the basic stability of the overall cloud pattern as a whole. ci. i = 1. n. . . . with independent Zi ∼ G(z) = 1 − exp(− exp(z)).time(for(i in 1:1000)WeibullReg())} . .1 . consisting of the rows ci = (ci. . one can also only specify the remaining k − 1 columns and implicitly invoke the default option in survreg that augments those columns with such a 1vector. This promises reasonable behavior with respect to the computing times that can be anticipated for the conﬁdence bounds to be discussed below. . These two usages are illustrated in the function WeibullReg which is given on the next page. Alternatively. . . . . . . n . It is customary that the ﬁrst column of C is a vector of n 1’s.k ). . . It is very instructive to run this function as part of the following call: {\tt system. . . . ζk ∈ R.. .k + bZi . Two concrete examples of this general linear model will be discussed in detail later on. The ﬁrst is the simple linear regression model and the other is the ksample model. i = 1.12 Regression Here we extend the results of the previous location/scale model for logtransformed Weibull samples to the more general regression model where the location parameter u for Yi = log(Xi ) can vary with i. . while the scale parameter b stays constant. i = 1. . .k . + ζk ci.1 . When the plotting commands are commented out the elapsed time reduces to about 9 seconds. we execute the function WeibullReg a thousand times in close succession. + ζk ci. i = 1. . . ci. Of course.
lty=2) # Here out has several components.alpha=10000.null(x)) x <.1/beta # Create the Weibull data time <.max(uvec)+b*log(log(1/(3*n+1))) plot(x. since survreg will use status = 1 for each observation.n) # dat <. out } 43 .log(time). of which only # out$coef and out$scale of of interest to us.out$coef[2].WeibullReg <.1.e. i.frame(time. abline(log(alpha).function (n=50.x) # survreg(formula = Surv(time) ~ x0 + x .data.M)) dat <.frame(time. that we did not use a status vector (of ones) in the creation # of dat.col="blue".beta=1.min(uvec)+b*log(log(11/(3*n+1))) M <.data. # if(is.data=dat.5.slope) #true line # estimated line abline(out$coef[1]. dist = "weibull") # Here we created the vector x0 of ones explicitly and removed the implicit # vector of ones by the 1 in ~ x0+x1.x0.exp(uvec+b*log(log(1runif(n)))) # Creating good vertical plotting limits m <. data = dat.x) out <.survreg(Surv(time)~x.log(alpha)+slope*x b <. # The estimate out$scale is the mle of b=1/beta # and out$coef is a vector that gives the mle’s # of intercept and the various regression coefficients. # Note also. # x0 <. # treat the given time as a failure time as default.x=NULL.05) { # We can either input our own covariate vector x of length n # or such a vector is generated for us (default).dist="weibull") # The last two lines would give the same result as the next three lines # after removing the # signs.ylim=c(m.slope=.rep(1.(1:n(n+1)/2) uvec <.
e. With ri = ci a + σzi we have sup b.1 Equivariance Properties From the existence and uniqueness of the mle’s we can again deduce the following equivariance properties for the mle’s. Zn ) consists of independent and identically distributed components with known cdf G(z) = 1−exp(− exp(z)).ζ 1 sup σn ˜ ζ b. . where Z = (Z1 .ζ ri − c i ζ 1 g b i=1 b and ˜ = b/σ b n = 1 sup σ n b. From the equivariance property we have that ˆ ˆ ζ(Y ) = ζ + b ζ(Z) Thus ˆ ˆ ˆ ζ(Y ) − ζ b ζ(Z) ζ(Z) = = ˆ ) ˆ b(Y b(Z) bˆ b(Z) and ˆ )=bˆ b(Y b(Z) .2 Pivots and Conﬁdence Bounds ˆ From these equivariance properties it follows that (ζ − ζ)/ˆ and ˆ have distributions that do not b b/b depend on any unknown parameters.ζ 1 ˆ b(z) g 1 g ˆ zi − ci ζ(z) ˆ b(z) 1 ri − c i ζ g b i=1 b n = i=1 1 ˆ b(r) n i=1 ˆ ri − ci ζ(r) ˆ b(r) g ˆ zi − ci (ζ(r) − a)/σ ˆ b(r)/σ = 1 σn ˆ b(r)/σ and by the uniqueness of the mle’s the equivariance claim is an immediate consequence.12. . b and ζ. . ˜ 1 σn n n i=1 zi − ci (ζ − a)/σ 1 g b/σ i=1 b/σ ˜ 1 zi − ci ζ g ˜ ˜ b i=1 b n n ˜ using ζ = (ζ − a)/σ = = On the other hand sup b. ˆ ) b(Y bˆ b(Z) ˆ = = b(Z) . . 12. The logtransformed Weibull data have the following regression structure Y = Cζ + bZ. i. namely for r = Ca + σz we have ˆ ˆ ζ(r) = a + σ ζ(z) and ˆ = σˆ b(r) b(z) .. The proof follows the familiar line used in the location/scale case. b b and 44 .
. i = 1. . This distribution can be approximated to any desired degree via simulation. . . Nsim . c0 )ˆ ) as an approximate 100γ% lower bound for c0 ζ. c0 ) denotes the γquantile of this simulated distribution then we ˆ ˆ ˆ b(Y can view c0 ζ(Y ) − η2 (γ. . (ζ(Z ˆ ˆ ˆ (ζ(Z 1 )/b 1 b 1 Nsim )/b(Z Nsim ). . . 45 . ˆ1 (Z 1 )). i. . For the tail probability p(y0 ) = G((y0 − c0 ζ)/b) with given threshold y0 and covariate vector c0 we obtain an approximate 100γ% upper bound by using the γquantile of the simulated values ˆ G(c0 ζ(Z i ) − xˆ i )). We can calculate ˆ ˆ ˆ b(Y c0 ζ(Y ) − kp (γ)ˆ ) as an approximate 100γ% lower bound for yp (c0 ). which makes sense since we assumed a constant scale for all covariate situations.1 . c0 ) depend on the original covariance matrix C. The same comment applies to the other conﬁdence bound procedures following below. . the log of the characteristic life at the covariate vector c0 . . . ˆ For any target covariate vector c0 = (c0. . Not only does ˆ this dependence arise through the use of c0 ζ(Y ) in each case but also through the simulated distributions which incorporate c0 in each of these three situations. For a given covariate vector c0 we can target the pquantile yp (c0 ) = c0 ζ + bwp of the Y distribution with covariate dependent location parameter u(c0 ) = c0 ζ and scale parameter b. . pquantiles and tail probabilities depend on the covariate vector c0 that is speciﬁed. . to approximate this parameter free distribution.e. .. just as in the location scale case. We note here that these quantiles η1 (γ) b(Y η ˆ and η2 (γ. Nsim . . i = 1. ˆ Nsim )) and thus the empirical distribution of b b(Z ˆ ˆ1 (Z ). if η1 (γ) is the γquantile of the simulated ˆ i ). . i = 1. except that we will need to incorporate the known covariate matrix C in the call to survreg in order to get the Nsim simulated ˆ ˆ parameter vectors (ζ(Z 1 ). This can be demonstrated as in the location/scale case for the location parameter u. . b(Z Nsim )).k ) the distribution of (c0 ζ(Y ) − c0 ζ)/ˆ ) is free b(Y of unknown parameters since ˆ ˆ c ζ(Z) c0 ζ(Y ) − c0 ζ = 0 ˆ ) ˆ b(Y b(Z) ˆ b(Z and we can use the simulated values (c0 ζ(Z i ))/ˆ i ). ˆ where x = (y0 − c0 ζ(y)/ˆ b(y) and y is the originally observed sample vector. . . Nsim . obtained under the covariate conditions speciﬁed through V . . The only exception is the conﬁdence bound for b. Similarly. . c0. If η2 (γ. . (ζ(Z Nsim ). b(Z i = 1. . .which have a distribution free of any unknown parameters. We note here that the above conﬁdence bounds for the log(characteristic life) or regression location. then we can view ˆ b(Z ˆ )/ˆ1 (γ) as approximate 100γ% lower bound for b. they diﬀer from those used in the ˆ location/scale case. Nsim . . ˆ1 (Z )). . . where kp (γ) is the γquantile ˆ b(Z of the simulated (c0 ζ(Z i ) − wp )/ˆ i ).
This illustrates the sample size eﬀect. Each call generates its own data set of 20 points using 5 diﬀerent levels of covariate values. 1 cn ζ1 ζ2 Z1 . . because we have only 95% conﬁdence in the bound. cn are not all the same.e. . + b . Yn 1 c1 . . Multiple regression relationships could also be accommodated quite easily. n . In matrix notation this becomes Y1 Y = . . .e. . Also shown in these plots is the . we are getting better or less conservative in our bounds. is the γquantile of the simulated values (c0 ζ(Z i ) − wp )/ˆ i ). i = 1. . ζ1 (Y ) + ζ2 (Y )c − kp (γ)ˆ ). . Note that these bounds should be interpreted pointwise for each covariate value and should not be viewed as simultaneous conﬁdence bands. .i = ci for i = 1. This results from the fact that the factor kp (γ) used in the construction of the lower ˆ ˆ ˆ ˆ b(Z bound. . . but using Nsim = 1000 for faster response. .Nsim=10000)) (done twice and recording an elapsed time of about 76 seconds each) produced each of the plots in Figure 14. n.3 The Simple Linear Regression Model Here we assume the following simple linear regression model for the Yi = log(Xi ) Yi = ζ1 + ζ2 ci . Estimated lines are indicated by the corresponding color coded dashed lines. The R function call system. the quantile lower conﬁdence bounds based on Nsim = 10000 simulations are represented ˆ by a curve. as shown in the plots.time(WeibullRegSim(n=20. . It can easily be modiﬁed to handle any simple linear regression Weibull data set. . To get a feel for the behavior of the conﬁdence bounds it is useful to exercise this function repeatedly. . = Cζ + b Z . The conditions for existence and uniqueness of the mle’s are satisﬁed when the covariate values c1 . = . Nsim . it gets on the wrong side of it. i.10quantile line. This curvature adjusts to some extent to the sampling variation swivel action in the ﬁtted line. . and these values change depending on which c0 = (1. The second feature shows up in the bottom plot where the conﬁdence curve crosses the true percentile line. 46 . . . c) is involved. . In contrast.12. We point out two features. The data are generated from a true Weibull distribution with a known true regression line relationship for log(α) in relation to the covariates. In this second set of plots the lower conﬁdence bound curve is generally closer to the ﬁtted quantile line than in the ﬁrst set of plots. Such things happen. The function WeibullRegSim is part of the R workspace on the class web site. i. .i = 1 and c2. b(Y i = 1... In the context of the general regression model we have k = 2 here and c1. . We repeated the above with a sample of size n = 50 (taking about 85 seconds for each plot) and the corresponding two plots are shown in Figure 15. Zn Here ζ1 and ζ2 represent the intercept and slope parameters in the straight line regression model for the location parameter and b represents the degree of scatter (scale) around that line. . .
1−quantile q q q q q q q q q 10 11 q log(time) q q q q 9 q q q q 7 8 q total sample size n = 20 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 true characteristic life true 0.1−quantile estimated characteristic life estimated 0.1−quantile estimated characteristic life estimated 0.true characteristic life true 0.1−quantile 95% lower bound to 0.1−quantile q q q 11 q 10 q q q q q log(time) 9 q q q q q q q q q 8 q q 7 total sample size n = 20 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 Figure 14: Weibull Regression with Quantile Bounds (n = 20) 47 .1−quantile 95% lower bound to 0.
1−quantile q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 11 q q q q q q q q q q log(time) 9 q q q q q q 8 q 7 q 6 total sample size n = 50 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 Figure 15: Weibull Regression with Quantile Bounds (n = 50) 48 .true characteristic life true 0.1−quantile estimated characteristic life estimated 0.1−quantile q q q q q q q q q 11 q q q q q q q q q q q q q q q q q 10 q q q q q q q q q q q log(time) 9 q q q q 8 q q q 7 q q q 6 total sample size n = 50 confidence bounds based on 10000 simulations 4 covariate 6 8 −2 0 2 true characteristic life true 0.1−quantile 95% lower bound to 0.1−quantile estimated characteristic life estimated 0.1−quantile 95% lower bound to 0.
2 . 302. . 0 ζ3 Zn +n 1 2 1 Zn1 +n2 +1 . 1 0 . α2 . 0 ζ1 Zn1 +1 . . the modiﬁcation of this test statistic for general k(= 3) is obvious. when Zi ∼ Φ(z) instead of Zi ∼ G(z) as in the above regression model. . . . .12. which invokes the χ2 = χ2 distribution as approximate k−1 2 null distribution. . Zn1 +n2 +n3 Here the Yi have location u1 = ζ1 for the ﬁrst n1 observations. Yn1 +n2 +n3 1 0 1 Zn1 0 . Z1 . If we were interested in the question whether the three samples come from the same location/scale model we would consider testing the hypothesis H0 : ζ2 = ζ3 = 0 or equivalently H0 : u1 = u2 = u3 . . . .12)) for which the same approximate null distribution is invoked. = Cζ + b Z . Thus we can consider u1 = ζ1 as the baseline location (represented by the ﬁrst n1 observations). ζ3 ). Also. In terms of the untransformed Weibull data this means that we have possibly diﬀerent unknown characteristic life parameters (α1 . i. .4 The kSample Problem A second illustration example concerns the situation of k = 3 samples with same scale but possibly diﬀerent locations. . . . . . Yn +n 1 2 Yn1 +n2 +1 . The modiﬁcations for k = 3 should be obvious. . . α3 ) but same unknown shape β for each sample.e. we will employ the test statistic suggested in Lawless (1982) (p. 1 0 1 1 . .4. . Without going into the detailed derivation one can give the following alternate and more transparent expression for Λ1 Λ1 = 3 i=1 ni (ˆi (Y ) − u(Y ))2 u ˆ . . . ζ3 )K11 (ζ2 . . 2 ˆ ) b(Y 49 . Instead of using the likelihood ratio test. . 0 . .. = . Y1 . 1 1 1 0 . Yn1 +1 Y = . The formal deﬁnition of the test statistic proposed by Lawless is as follows: −1 ˆ ˆ ˆ ˆ Λ1 = (ζ2 . . . . ζ + b . . . equation (6. ζ3 )t . . Our reason for following this choice is its similarity to the standard test statistic used in the corresponding normal distribution model. . . . ζ2 can be considered as the incremental change from u1 to u2 and ζ3 is the incremental change from u1 to u3 . . In matrix notation this model is Yn1 . they have location u2 = ζ1 + ζ2 for the next n2 observations and they have location u3 = ζ1 + ζ3 for the last n3 observations. ˆ ˆ where K11 is the asymptotic 2 × 2 covariance matrix of (ζ2 . .
since the null distribution of Λ1 (in the logWeibull case) is free of any unknown parameters and can be simulated to any desired degree of accuracy. in the normal case one uses the Fk−1.. b(Y n i=1 j=1 . = . . However. In the normal case Λ1 reduces to the traditional F test statistic (except for a ¯ constant multiplier. . = . . i = 2.where ˆ ˆ ˆ ˆ ˆ u1 (Y ) = ζ1 (Y ) . The distribution of these Nsim values Λ1 (Z j ) will give a very good approximation for the true null distribution of Λ1 . i = 2. using a large sample argument. ˆ ) b(Y and thus ui (Y ) − u(Y ) ˆ ˆ ui (Y ) − u u(Y ) − u ˆ ˆ = − ˆ ) ˆ ) ˆ ) b(Y b(Y b(Y have distributions free of any unknown parameters which in turn implies the above claim about Λ1 . b(Y k−1 We don’t have to use a large sample approximation either. Nsim . 3 ˆ ¯ ¯ ¯ ¯ and u(Y ) = Y = (n1 /N )Y1 + (n2 /N )Y2 + (n3 /N )Y3 and ˆ . namely (n − k)/((k − 1)n) = (n − 3)/(2n)) when writing ui (Y ) = Yi . 50 . Recall that ζ1 (Y ) − ζ1 ui (Y ) − ui ˆ ζ1 (Y ) + ζi (Y ) − (ζ1 + ζi ) u1 (Y ) − u1 ˆ = . . The accuracy of this approximation is entirely controllable by the choice of Nsim . . This is seen as follows from our equivariance properties. i = 1. u2 (Y ) = ζ1 (Y )+ ζ2 (Y ) . 3 ˆ ) ˆ ) ˆ ) ˆ ) b(Y b(Y b(Y b(Y have distributions free of unknown parameters. ˆ Thus we can estimate the null distribution of Λ1 by using the Nsim simulated values of ζi (Z j )/ˆ j ) b(Z to create ˆ ˆ ˆ u1 (Z j ) ˆ ζ1 (Z j ) ui (Z j ) ˆ ζ1 (Z j ) + ζi (Z j ) = . u i=1 with N = n1 +n1 +n3 . 3 and ˆ ˆ ˆ ˆ b(Z j ) b(Z j ) b(Z j ) b(Z j ) and thus Λ1 (Z j ) = 3 i=1 u(Z j ) ˆ = ˆ b(Z ) j 3 i=1 ni ui (Z j )/N ˆ ˆ b(Z ) j ni (ˆi (Z j ) − u(Z j ))2 u ˆ ˆ b(Z )2 j j = 1. k ni ˆ )2 = 1 ¯ (Yij − Yi )2 .N −k distribution as the exact null distribution of the properly scaled Λ1 and the uncertainty in ˆ )2 is not ignored by simply referring to the χ2 distribution. . 2. . Nsim = 10000 should be suﬃcient for most practical purposes. u3 (Y ) = ζ1 (Y )+ ζ3 (Y ) and u(Y ) = ˆ ˆ ˆ ˆ 3 (ni /N )ˆi (Y ) . Under the hypothesis H0 when u1 = u2 = u3 (= u) we thus have that ui (Y ) − u ˆ . ˆ ) b(Y u(Y ) − u ˆ . which are the corresponding mle’s in the normal case. .
2 0. This is far from qualifying for a large sample situation. The vertical probability scale facilitates the reading oﬀ of p for each quantile level on either axis. The histogram in Figure 16 is based on Nsim = 10000 simulated values of Λ1 (Z ).The following plots examine the χ2 approximation to the Λ1 null distribution in the case of 3 2 samples of respective sizes n1 = 5.0 0. For a good approximation the points should follow the shown main diagonal. Although the superimposed χ2 density is similar in character.4 0.1 0. 0. n2 = 7 and n3 = 9. Using 2 the χ2 distribution would result in much smaller pvalues than appropriate when these are on the 2 low side. there are strong diﬀerences. The discrepancy is quite strong.3 0 2 4 6 Λ* 1 8 10 12 Figure 16: Histogram of Λ1 Null Distribution with Asymptotic Approximation Figure 17 shows a QQplot of the approximating χ2 quantiles corresponding to the Nsim = 10000 2 simulated and ordered Λ1 (Z i ) values. 51 .5 asymptotic χ2 density 2 Density 0. Each point on this plot corresponds to a pquantile where the abscissa value of such a point gives us the approximate pquantile of the Λ1 null distribution and the corresponding ordinate gives us the pquantile of the χ2 distribution which 2 is suggested as asymptotic approximation.6 0.
94 q q q q q q q q q q 0. the approximation is not good.995 q q qqq qq qq q q q 0.93 q q q q q q q q q q q q q q q q q q q q 0. 2 Figure 18 shows the comparison of the χ2 quantiles with the corresponding quantiles of the 2×F2.9 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q χ2 −quantiles 2 0 5 10 0 20 simulated & sorted 40 Λ* 1 60 Figure 17: QQPlots Comparing Λ1 (Z ) with χ2 2 Clearly the pquantiles of the χ2 distribution are smaller than the corresponding pquantiles of the 2 the Λ1 null distribution.05 would translate into a very 2 overstated observed signiﬁcance level of .997 qq q qq qq qq q q q qq 0.998 qq q q q qq qq 0.96 q q q q q q q q q q 0. Again.9999 q q 15 0. For example.99 q qq qq qq qq qq qq q q q q q q q q qq qq q q q qq qq qq 0. If we take the p corresponding to a Λ1 on the abscissa and look up its p according to the χ2 scale. the 52 .95quantile for the Λ1 null distribution 2 would yield a p ≈ .21−3 2 distribution (the factor 2 adjusts for the fact that the numerator of the F statistic is divided by k − 1 = 2). but the eﬀect is similar..98 qq qq q qq qq qq qq qq q q q q q q q 0. the .e.. tested whether three normal random samples with common σ have the same mean. we only need to go up to the main diagonal at that abscissa location and 2 look up the p on the χ2 scale to the left.20 q 0.95 q qq q q q q q q q q 0. This comparison is the counter part to the previous situation if we were to use the asymptotic χ2 distribution as approximation to the exact and true 2 × F ratio distribution had 2 we carried out the corresponding test in a normal data situation.993 q qq qq qq q q q q q qqq qqq 0.006 according to the χ2 approximation.97 q q q qq qq q q q q q qq q 0.999 q q q q q q q q q q q qq q 0. Thus a true Λ1 pvalue of . The departure from the main diagonal is not as severe as in Figure 17 .9995 0. i. i.e.994 on the χ2 scale.
997 q q 0.993 0. 53 . This will result in tighter conﬁdence bounds than what would result from the corresponding conﬁdence bound analysis of individual samples. pquantiles and tail probabilities for each sampled population.χ2 distribution would yield much smaller pvalues than appropriate when these pvalues are on the 2 small side. when they become critical.18) 15 20 25 Figure 18: QQPlots Comparing 2 × F2.9 qq qq qq qq qq qq qq q q q q q q q q q qq qq qq qq q q q q q qq qq qq qq qq q q q q q q qq qq q qq q q q q q q q q q qq qq q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0. i.5)/1000. which was assumed to be the same for all k populations.2.97 q q q q qq qq 0. 0.998 0..21−3 with χ2 2 Aside from testing the equality of logWeibull distributions we can also obtain the various types of conﬁdence bounds for the location.5)/1000.9995 q 15 0. scale.95 qq qq qq 0.93 q q qq qq qq qq q q q qq qq qq qq qq qq 0.2) 10 0.99 q qq qq qq q q q q q 0. We omit the speciﬁc details since they should be clear from the general situation as applied in the simple linear regression case. This is diﬀerent from doing so for each sample separately since we use all k samples to estimate b.995 q q q q q q qchisq(1:1000−.e.96 qq q qq q q q q q 0.94 qq qq qq q q q q q q 0.98 0 5 0 5 10 2*qf((1:1000−.999 q 0. whether the k locations are the same or not.
. one can easily simulate the null distributions of these statistics since they do not depend on any unknown parameters. they do not depend on the additional covariates that may be present.d. followed by D = D(V ). . Wi2 . .e. ˆ ∼ G(z). . W 2 and A2 . compute the mle’s ζ(Z). and from that V . The eﬀect of using the small sample tables given by Stephens (1986) is not clear. Zn ) i. ≤ V(n) of these modiﬁed versions of Vi . . For any covariate matrix C and sample size n the null distributions of D. Repeating this a large number of times. . 54 . However. for ﬁnite sample sizes the eﬀects of these covariates may still be relevant. W 2 = W 2 (V ) and A2 = A2 (V ). i = 1. W 2 and A2 can be approximated to any desired degree..i. . . Nsim .5 Goodness of Fit Tests As in the location/scale case we can exploit the equivariance properties of the mle’s in the general regression model to carry out the previously discussed goodness of ﬁt tests by simulation. .d. W 2 and A2 we only need to deﬁne the appropriate Vi . .12. using the sorted values V(1) ≤ . n . . say Nsim = 10000. ˆ b(Z). Using the previous computational formulas for the D. Vi = G ˆ ) b(Y Pierce and Kopecky (1979) showed that the asymptotic null distributions of D.i. namely ˆ Yi − vi ζ(Y ) . However. A2 . . All we need to do is generate vectors Z = (Z1 . Their respective empirical distributions would serve as excellent i approximations to the desired null distributions of these test of ﬁt criteria. i = 1. Zi ∼ G(z). or Y = Cζ + bZ this is seen from the equivariance properties as follows ˆ ˆ ˆ c ζ + bZi − ci (ζ + bζ(Z)) Zi − ci ζ(Z) Yi − ci ζ(Y ) = i = ˆ ) ˆ b(Y bˆ b(Z) b(Z) and thus Vi = G ˆ Yi − ci ζ(Y ) ˆ ) b(Y =G ˆ Zi − ci ζ(Z) ˆ b(Z) . i. would yield values Di . are respectively the same as in the location/scale case. . This assumes that the covariate matrix C contains a vector of ones. Using the data representation Yi = ci ζ + bZi with i. .
. Zn i. slope = 1. .i. . n with Z1 .5 for i = 1. . The solid sloped line in Figure 19 indicates the true log(characteristic life) for the Weibull regression data while the dashed line represents its estimate. . .q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 12 q q q q q q q q q q q log(failure time) q 8 q q q q q q q q q q q q q q q q q 4 6 2 q −1 0 covariate c 1 2 Figure 19: Weibull Regression Example Figure 19 shows some Weibull regression data which were generated as follows: Yi = log(α) + slope × ci + bZi . b = 2/3.5 and characteristic life parameters αi = α exp(scale × ci ) which are aﬀected in a multiplicative manner by the covariate values ci .3 and there were 20 observations each at ci = i − 2. Here Xi = exp(Yi ) would be viewed as the Weibull failure time data with common shape parameter β = 1/b = 1. ∼ G(z) . . . . . . . 5. . i = 1. .d. where α = 10000. 55 . 5. . i = 1. .
263 0 500 0.8001 Frequency 1000 W2 = 0. It took 105 seconds on my laptop.0 1. 56 .08 0.02 0.30 0.05 0.25 0.n = 100 .7746 Frequency 800 D = 0.04 0. The hypothesis of a Weibull lifetime distribution cannot be rejected by any of the three test of ﬁt criteria based on the shown pvalues.15 W 2* 0. This function performs Weibull goodness of ﬁt tests for any supplied regression data set.5 A2* 1.10 0.00 0.0325 0 500 0.06 D * 0. Nsim = 10000 1200 p−value = 0.5 Figure 20: Weibull Goodness of Fit for Weibull Regression Example For this generated Weibull regression data set Figure 20 shows the results of the Weibull goodness of ﬁt tests in relation to the simulated null distributions for the test criteria D. This example was produced by the R function WeibullRegGOF available in the R workspace on the class web site.12 2000 p−value = 0.35 p−value = 0.0501 0 400 0.00 0. When this data set is missing it generates its own Weibull regression data set of size n = 100 and uses the indicated covariates and parameters. W 2 and A2 .10 0.0 0.7135 Frequency 1500 1000 A2 = 0.20 0.
3 and there were 20 observations each at ci = i − 2. ∼ Φ(z) . 5.i. Zn i.rnorm(n. . . Z <. . Such data would have a lognormal distribution. so that Zi ∼ Φ(z). .. n with Z1 .d. . Here the solid sloped line indicates the mean of the normal regression data while the dashed line represents the estimate according to an assumed Weibull model. . . The simulation of the test of ﬁt null distributions remains essentially unchanged except that a diﬀerent random number starting seed was used. b = 2/3. 57 . . . i. .632quantile and not the median of that data. Here Xi = exp(Yi ) would be viewed as the failure time data.1) is used. i = 1. . This data set was produced internally within WeibullRegGOF by modifying the line that generated the original data sample.14 q q q q q q q q q q q q q q q q q q q 12 q q q q q q q q q q q q q q q q log(failure time) 10 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 8 q q q q q q q q q q q q q q q 6 −1 0 covariate c 1 2 Figure 21: Normal Regression Example Figure 21 shows some normal regression data which were generated as follows: Yi = log(α) + slope × ci + bZi .0. Note the much wider discrepancy here compared to the corresponding situation in Figure 19. where α = 10000. .5 for i = 1. The reason for this wider gap is that the ﬁtted line aims for the . slope = 1.e.
Any slight diﬀerences in the null distributions shown here and in the previous example are due to a diﬀerent random number seed being used in the two cases.0953 0 400 0. although the evidence in the case of D is not very strong.12 p−value = 5e−04 Frequency 1500 W2 = 0.5 Figure 22: Weibull Goodness of Fit for Normal Regression Example Here the pvalues clearly indicate that the hypothesis of a Weibull distribution should be rejected.3 0.10 0. 58 .73 0 500 0. for W 2 and A2 there should be no doubt in the (correct) rejection of the hypothesis.2 W 2* 0.0 A2* 1.n = 100 .0 0.5 2.5 1.0 0.02 0.08 0.00 0. However.265 0 500 0. Nsim = 10000 1200 p−value = 0.06 D* 0.1 0.04 0.0213 Frequency 800 D = 0.0 2.4 2500 p−value = 4e−04 Frequency 1500 A2 = 1.
(1986).J. Pierce and Kopecky.. C. M.A. (editors). and Antle. 2. Theory and Methods. Second Edition. (2008) “Weibull Probability Paper. (1978) Statistical Analysis of Reliability and LifeTesting Models. “Maximum Likelihood Estimation for Type I Censored Weibull Data Including Covariates. D’Agostino.” Reliability and Fault Tree Analysis. Dekker.” Technometrics.. Thoman. Bain. D. 3.J. R. Vol 12.. New York.W. New York. “Birnbaum’s Contributions to Reliability. editors.References Bain.” Biometrika. New York..J. M. Bain.(1991) Statistical Analysis of Reliability and LifeTesting Models.R. J. C. “Exact conﬁdence intervals for reliability. L. Vol 11.E.R. S. Scholz. (1982). “Tests based on EDF statistics. Philadelphia PA 19103. and Antle. and Engelhardt M. 33 South 17 Street. Stephens.” Technometrics.” ISSTECH96022. No.W. N. L. “Testing Goodness of Fit for the Distribution of Errors in Regression Models. and tolerance limits in the Weibull distribution. Dekker.J. 363371. Statistical Models and Methods for Lifetime Data. R.E. (1969). F.B. Lawless. No.A. Scholz. and Singpurwalla. F. K. No.J.Barlow.” GoodnessofFit Techniques. (1979). Boeing Information and Support Services. Fussell. and Stephens. 1.. Vol. 59 . J.D. L. 445460. (1970). (1975). D. 66. L. Theoretical and Applied Aspects of System Reliability and Safety Assessment. Dekker. New York. “Inferences on parameters of the Weibull distribution.C. Thoman. Society for Industrial and Applied Mathematics.E. (1996) (revised 2001).”(informal note). John Wiley & Sons.F. Bain. 15 Saunders.
This action might not be possible to undo. Are you sure you want to continue?