This action might not be possible to undo. Are you sure you want to continue?
Stat 498B Industrial Statistics
Fritz Scholz
May 22, 2008
1 The Weibull Distribution
The 2parameter Weibull distribution function is deﬁned as
F
α,β
(x) = 1 −exp
_
−
_
x
α
_
β
_
for x ≥ 0 and F
α,β
(x) = 0 for t < 0.
We also write X ∼ W(α, β) when X has this distribution function, i.e., P(X ≤ x) = F
α,β
(x). The
parameters α > 0 and β > 0 are referred to as scale and shape parameter, respectively. The Weibull
density has the following form
f
α,β
(x) = F
α,β
(x) =
d
dx
F
α,β
(x) =
β
α
_
x
α
_
β−1
exp
_
−
_
x
α
_
β
_
.
For β = 1 the Weibull distribution coincides with the exponential distribution with mean α. In
general, α represents the .632quantile of the Weibull distribution regardless of the value of β since
F
α,β
(α) = 1 − exp(−1) ≈ .632 for all β > 0. Figure 1 shows a representative collection of Weibull
densities. Note that the spread of the Weibull distributions around α gets smaller as β increases.
The reason for this will become clearer later when we discuss the logtransform of Weibull random
variables.
The m
th
moment of the Weibull distribution is
E(X
m
) = α
m
Γ(1 +m/β)
and thus the mean and variance are given by
µ = E(X) = αΓ(1 + 1/β) and σ
2
= α
2
_
Γ(1 + 2/β) −{Γ(1 + 1/β)}
2
_
.
Its pquantile, deﬁned by P(X ≤ x
p
) = p, is
x
p
= α(−log(1 −p))
1/β
.
For p = 1 −exp(−1) ≈ .632 (i.e., −log(1 −p) = 1) we have x
p
= α regardless of β, as pointed out
previously. For that reason one also calls α the characteristic life of the Weibull distribution. The
term life comes from the common use of the Weibull distribution in modeling lifetime data. More
on this later.
1
Weibull densities
ββ
1
== 0.5
ββ
2
== 1
ββ
3
== 1.5
ββ
4
== 2
ββ
5
== 3.6
ββ
6
== 7
αα == 10000
αα
36.8% 63.2%
Figure 1: A Collection of Weibull Densities with α = 10000 and Various Shapes
2
2 Minimum Closure and Weakest Link Property
The Weibull distribution has the following minimum closure property: If X
1
, . . . , X
n
are independent
with X
i
∼ W(α
i
, β), i = 1, . . . , n, then
P(min(X
1
, . . . , X
n
) > t) = P(X
1
> t, . . . , X
n
> t) =
n
i=1
P(X
i
> t)
=
n
i=1
exp
_
−
_
t
α
i
_
β
_
= exp
_
−t
β
n
i=1
1
α
β
i
_
= exp
_
−
_
t
α
_
β
_
with α
=
_
n
i=1
1
α
β
i
_
−1/β
,
i.e., min(X
1
, . . . , X
n
) ∼ W(α
, β). This is reminiscent of the closure property for the normal
distribution under summation, i.e., if X
1
, . . . , X
n
are independent with X
i
∼ N(µ
i
, σ
2
i
) then
n
i=1
X
i
∼ N
_
n
i=1
µ
i
,
n
i=1
σ
2
i
_
.
This summation closure property plays an essential role in proving the central limit theorem: Sums
of independent random variables (not necessarily normally distributed) have an approximate normal
distribution, subject to some mild conditions concerning the distribution of such random variables.
There is a similar result from Extreme Value Theory that says: The minimum of independent,
identically distributed random variables (not necessarily Weibull distributed) has an approximate
Weibull distribution, subject to some mild conditions concerning the distribution of such random
variables. This is also referred to as the “weakest link” motivation for the Weibull distribution.
The Weibull distribution is appropriate when trying to characterize the random strength of materials
or the random lifetime of some system. This is related to the weakest link property as follows. A
piece of material can be viewed as a concatenation of many smaller material cells, each of which has
its random breaking strength X
i
when subjected to stress. Thus the strength of the concatenated
total piece is the strength of its weakest link, namely min(X
1
, . . . , X
n
), i.e., approximately Weibull.
Similarly, a system can be viewed as a collection of many parts or subsystems, each of which has a
random lifetime X
i
. If the system is deﬁned to be in a failed state whenever any one of its parts or
subsystems fails, then the system lifetime is min(X
1
, . . . , X
n
), i.e., approximately Weibull.
Figure 2 gives a sense of usage of the Weibull distribution and Figure 3 shows the “real thing.”
Googling “Weibull distribution” produced 185,000 hits while ”normal distribution” had 2,420,000
hits.
3
Figure 2: Publications on the Weibull Distribution
4
Figure 3: Waloddi Weibull
5
The Weibull distribution is very popular among engineers. One reason for this is that the Weibull
cdf has a closed form which is not the case for the normal cdf Φ(x). However, in today’s computing
environment one could argue that point since typically the computation of even exp(x) requires
computing. That this can be accomplished on most calculators is also moot since many calculators
also give you Φ(x). Another reason for the popularity of the Weibull distribution among engi
neers may be that Weibull’s most famous paper, originally submitted to a statistics journal and
rejected, was eventually published in an engineering journal: Waloddi Weibull (1951) “A statistical
distribution function of wide applicability.” Journal of Applied Mechanics, 18, 293297.
“. . . he tried to publish an article in a wellknown British journal. At this time, the distri
bution function proposed by Gauss was dominating and was distinguishingly called the normal
distribution. By some statisticians it was even believed to be the only possible one. The arti
cle was refused with the comment that it was interesting but of no practical importance. That
was just the same article as the highly cited one published in 1951.” (G¨oran W. Weibull, 1981,
http://www.garfield.library.upenn.edu/classics1981/A1981LD32400001.pdf)
Sam Saunders (1975): ‘Professor Wallodi (sic) Weibull recounted to me that the now famous paper of
his “A Statistical Distribution of Wide Applicability”, in which was ﬁrst advocated the “Weibull”
distribution with its failure rate a power of time, was rejected by the Journal of the American
Statistical Association as being of no interrest. Thus one of the most inﬂuential papers in statistics
of that decade was published in the Journal of Applied Mechanics. See [35]. (Maybe that is the
reason it was so inﬂuential!)’
3 The Hazard Function
The hazard function for any nonnegative random variable with cdf F(x) and density f(x) is deﬁned
as h(x) = f(x)/(1−F(x)). It is usually employed for distributions that model random lifetimes and
it relates to the probability that a lifetime comes to an end within the next small time increment
of length d given that the lifetime has exceeded x so far, namely
P(x < X ≤ x +dX > x) =
P(x < X ≤ x +d)
P(X > x)
=
F(x +d) −F(x)
1 −F(x)
≈
d ×f(x)
1 −F(x)
= d ×h(x) .
In the case of the Weibull distribution we have
h(x) =
f
α,β
(x)
1 −F
α,β
(x)
=
β
α
_
x
α
_
β−1
.
Various other terms are used equivalently for the hazard function, such as hazard rate, failure rate
(function), or force of mortality. In the case of the Weibull hazard rate function we observe that it
6
is increasing in x when β > 1, decreasing in x when β < 1 and constant when β = 1 (exponential
distribution with memoryless property).
When β > 1 the part or system, for which the lifetime is modeled by a Weibull distribution, is
subject to aging in the sense that an older system has a higher chance of failing during the next
small time increment d than a younger system.
For β < 1 (less common) the system has a better chance of surviving the next small time increment
d as it gets older, possibly due to hardening, maturing, or curing. Often one refers to this situation
as one of infant mortality, i.e., after initial early failures the survival gets better with age. However,
one has to keep in mind that we may be modeling parts or systems that consist of a mixture of
defective or weak parts and of parts that practically can live forever. A Weibull distribution with
β < 1 may not do full justice to such a mixture distribution.
For β = 1 there is no aging, i.e., the system is as good as new given that it has survived beyond x,
since for β = 1 we have
P(X > x +hX > x) =
P(X > x +h)
P(X > x)
=
exp(−(x +h)/α)
exp(−x/α)
= exp(−h/α) = P(X > h) ,
i.e., it is again exponential with same mean α. One also refers to this as a random failure model in
the sense that failures are due to external shocks that follow a Poisson process with rate λ = 1/α.
The random times between shocks are exponentially distributed with mean α. Given that there are
k such shock events in an interval [0, T] one can view the k occurrence times as being uniformly
distributed over the interval [0, T], hence the allusion to random failures.
4 LocationScale Property of log(X)
Another useful property, of which we will make strong use, is the following locationscale property
of the logtransformed Weibull distribution. By that we mean that: X ∼ W(α, β) =⇒ log(X) = Y
has a locationscale distribution, namely its cumulative distribution function (cdf) is
P(Y ≤ y) = P(log(X) ≤ y) = P(X ≤ exp(y)) = 1 −exp
_
_
−
_
exp(y)
α
_
β
_
_
= 1 −exp [−exp {(y −log(α)) ×β}] = 1 −exp
_
−exp
_
y −log(α)
1/β
__
= 1 −exp
_
−exp
_
y −u
b
__
with location parameter u = log(α) and scale parameter b = 1/β. The reason for referring to such
parameters this way is the following. If Z ∼ G(z) then Y = µ +σZ ∼ G((y −µ)/σ) since
H(y) = P(Y ≤ y) = P(µ +σZ ≤ y) = P(Z ≤ (y −µ)/σ) = G((y −µ)/σ) .
7
The form Y = µ +σX should make clear the notion of location scale parameter, since Z has been
scaled by the factor σ and is then shifted by µ. Two prominent locationscale families are
1. Y = µ +σZ ∼ N(µ, σ
2
), where Z ∼ N(0, 1) is standard normal with cdf G(z) = Φ(z)
and thus Y has cdf H(y) = Φ((y −µ)/σ),
2. Y = u +bZ where Z has the standard extreme value distribution with cdf
G(z) = 1 −exp(−exp(z)) for z ∈ R, as in our logtransformed Weibull example above.
In any such a locationscale model there is a simple relationship between the pquantiles of Y and
Z, namely y
p
= µ + σz
p
in the normal model and y
p
= u + bw
p
in the extreme value model (using
the location and scale parameters u and b resulting from logtransformed Weibull data). We just
illustrate this in the extreme value locationscale model.
p = P(Z ≤ w
p
) = P(u +bZ ≤ u +bw
p
) = P(Y ≤ u +bw
p
) =⇒ y
p
= u +bw
p
with w
p
= log(−log(1 −p)). Thus y
p
is a linear function of w
p
= log(−log(1 −p)), the pquantile
of G. While w
p
is known and easily computable from p, the same cannot be said about y
p
, since
it involves the typically unknown parameters u and b. However, for appropriate p
i
= (i − .5)/n
one can view the i
th
ordered sample value Y
(i)
(Y
(1)
≤ . . . ≤ Y
(n)
) as a good approximation for y
p
i
.
Thus the plot of Y
(i)
against w
p
i
should look approximately linear. This is the basis for Weibull
probability plotting (and the case of plotting Y
(i)
against z
p
i
for normal probability plotting), a very
appealing graphical procedure which gives a visual impression of how well the data ﬁt the assumed
model (normal or Weibull) and which also allows for a crude estimation of the unknown location
and scale parameters, since they relate to the slope and intercept of the line that may be ﬁtted to
the perceived linear point pattern. For more in relation to Weibull probability plotting we refer to
Scholz (2008).
5 Maximum Likelihood Estimation
There are many ways to estimate the parameters θ = (α, β) based on a random sample X
1
, . . . , X
n
∼
W(α, β). Maximum likelihood estimation (MLE) is generally the most versatile and popular
method. Although MLE in the Weibull case requires numerical methods and a computer, that is no
longer an issue in today’s computing environment. Previously, estimates that could be computed
by hand had been investigated, but they are usually less eﬃcient than mle’s (estimates derived by
MLE). By eﬃcient estimates we loosely refer to estimates that have the smallest sampling variance.
MLE tends to be eﬃcient, at least in large samples. Furthermore, under regularity conditions MLE
produces estimates that have an approximate normal distribution in large samples.
8
When X
1
, . . . , X
n
∼ F
θ
(x) with density f
θ
(x) then the maximum likelihood estimate of θ is that
value θ =
ˆ
θ =
ˆ
θ(x
1
, . . . , x
n
) which maximizes the likelihood
L(x
1
, . . . , x
n
, θ) =
n
i=1
f
θ
(x
i
)
over θ, i.e., which gives highest local probability to the observed sample (X
1
, . . . , X
n
) = (x
1
, . . . , x
n
)
L(x
1
, . . . , x
n
,
ˆ
θ) = sup
θ
_
n
i=1
f
θ
(x
i
)
_
.
Often such maximizing values
ˆ
θ are unique and one can obtain them by solving, i.e.,
∂
∂θ
j
n
i=1
f
θ
(x
i
) = 0 j = 1, . . . , k ,
where k is the number of parameters involved in θ = (θ
1
, . . . , θ
k
). These above equations reﬂect the
fact that a smooth function has a horizontal tangent plane at its maximum (minimum or saddle
point). Thus solving such equations is necessary but not suﬃcient, since it still needs to be shown
that it is the location of a maximum.
Since taking derivatives of a product is tedious (product rule) one usually resorts to maximizing
the log of the likelihood, i.e.,
(x
1
, . . . , x
n
, θ) = log (L(x
1
, . . . , x
n
, θ)) =
n
i=1
log (f
θ
(x
i
))
since the value of θ that maximizes L(x
1
, . . . , x
n
, θ) is the same as the value that maximizes
(x
1
, . . . , x
n
, θ), i.e.,
(x
1
, . . . , x
n
,
ˆ
θ) = sup
θ
_
n
i=1
log (f
θ
(x
i
))
_
.
It is a lot simpler to deal with the likelihood equations
∂
∂θ
j
(x
1
, . . . , x
n
,
ˆ
θ) =
∂
∂θ
j
n
i=1
log(f
θ
(x
i
)) =
n
i=1
∂
∂θ
j
log(f
θ
(x
i
)) = 0 j = 1, . . . , k
when solving for θ =
ˆ
θ =
ˆ
θ(x
1
, . . . , x
n
).
In the case of a normal random sample we have θ = (µ, σ) with k = 2 and the unique solution of
the likelihood equations results in the explicit expressions
ˆ µ = ¯ x =
n
i=1
x
i
/n and ˆ σ =
¸
¸
¸
_
n
i=1
(x
i
− ¯ x)
2
/n and thus
ˆ
θ = (ˆ µ, ˆ σ) .
9
In the case of a Weibull sample we take the further simplifying step of dealing with the log
transformed sample (y
1
, . . . , y
n
) = (log(x
1
), . . . , log(x
n
)). Recall that Y
i
= log(X
i
) has cdf F(y) =
1 − exp(−exp((x − u)/b)) = G((y − u)/b) with G(z) = 1 − exp(−exp(z)) with g(z) = G
(z) =
exp(z −exp(z)). Thus
f(y) = F
(y) =
d
dy
F(y) =
1
b
g((y −u)/b))
with
log(f(y)) = −log(b) +
y −u
b
−exp
_
y −u
b
_
.
As partial derivatives of log(f(y)) with respect to u and b we get
∂
∂u
log(f(y)) = −
1
b
+
1
b
exp
_
y −u
b
_
∂
∂b
log(f(y)) = −
1
b
−
1
b
y −u
b
+
1
b
y −u
b
exp
_
y −u
b
_
and thus as likelihood equations
0 = −
n
b
+
1
b
n
i=1
exp
_
y
i
−u
b
_
or
n
i=1
exp
_
y
i
−u
b
_
= n or exp(u) =
_
1
n
n
i=1
exp
_
y
i
b
_
_
b
,
0 = −
n
b
−
1
b
n
i=1
y
i
−u
b
+
1
b
n
i=1
y
i
−u
b
exp
_
y
i
−u
b
_
.
i.e., we have a solution u = ˆ u once we have a solution b =
ˆ
b. Substituting this expression for exp(u)
into the second likelihood equation we get (after some cancelation and manipulation)
0 =
n
i=1
y
i
exp(y
i
/b)
n
i=1
exp(y
i
/b)
−b −
1
n
n
i=1
y
i
.
Analyzing the solvability of this equation is more convenient in terms of β = 1/b and we thus write
0 =
n
i=1
y
i
w
i
(β) −
1
β
− ¯ y where w
i
(β) =
exp(y
i
β)
n
j=1
exp(y
j
β)
with
n
i=1
w
i
(β) = 1 .
Note that the derivative of these weights with respect to β take the following form
w
i
(β) =
d
dβ
w
i
(β) = y
i
w
i
(β) −w
i
(β)
n
j=1
y
j
w
j
(β) .
10
Hence
d
dβ
_
n
i=1
y
i
w
i
(β) −
1
β
− ¯ y
_
=
n
i=1
y
i
w
i
(β) +
1
β
2
=
n
i=1
y
2
i
w
i
(β) −
_
_
n
j=1
y
j
w
j
(β)
_
_
2
+
1
β
2
> 0
since
var
w
(y) =
n
i=1
y
2
i
w
i
(β) −
_
_
n
j=1
y
j
w
j
(β)
_
_
2
= E
w
(y
2
) −[E
w
(y)]
2
≥ 0
can be interpreted as a variance of the n values of y = (y
1
, . . . , y
n
) with weights or probabilities given
by w = (w
1
(β), . . . , w
n
(β)). Thus the reduced second likelihood equation
y
i
w
i
(β) −1/β − ¯ y = 0
has a unique solution (if it has a solution at all) since the the equation’s left side is strictly increasing.
Note that w
i
(β) →1/n as β →0. Thus
y
i
w
i
(β) −1/β − ¯ y ≈ −1/β →−∞ as β →0.
Furthermore, with M = max(y
1
, . . . , y
n
) and β →∞ we have
w
i
(β) = exp(β(y
i
−M))/
n
j=1
exp(β(y
j
−M)) →0 when y
i
< M and w
i
(β) →1/r when y
i
= M,
where r ≥ 1 is the number of y
i
coinciding with M. Thus
y
i
w
i
(β) −1/β − ¯ y ≈ M −1/β − ¯ y →M − ¯ y > 0 as β →∞
where M− ¯ y > 0 assumes that not all y
i
coincide (a degenerate case with probability 0). That this
unique solution corresponds to a maximum and thus a unique global maximum takes some extra
eﬀort and we refer to Scholz (1996) for an even more general treatment that covers Weibull analysis
with censored data and covariates.
However, a somewhat loose argument can be given as follows. If we consider the likelihood of the
logtransformed Weibull data we have
L(y
1
, . . . , y
n
, u, b) =
1
b
n
n
i=1
g
_
y
i
−u
b
_
.
Contemplate this likelihood for ﬁxed y = (y
1
, . . . , y
n
) and for parameters u with u → ∞ (the
location moves away from all observed data values y
1
, . . . , y
n
) and b with b →0 (the spread becomes
very concentrated on some point and cannot simultaneously do so at all values y
1
, . . . , y
n
, unless
they are all the same, excluded as a zero probability degeneracy) and b → ∞ (in which case all
probability is diﬀused thinly over the whole half plane {(u, b) : u ∈ R, b > 0}), it is then easily seen
that this likelihood approaches zero in all cases. Since this likelihood is positive everywhere (but
approaching zero near the fringes of the parameter space, the above half plane) it follows that it
11
must have a maximum somewhere with zero partial derivatives. We showed there is only one such
point (uniqueness of the solution to the likelihood equations) and thus there can only be one unique
(global) maximum, which then is also the unique maximum likelihood estimate
ˆ
θ = (ˆ u,
ˆ
b).
In solving 0 =
y
i
exp(y
i
/b)/
exp(y
i
/b) − b − ¯ y it is numerically advantageous to solve the
equivalent equation 0 =
y
i
exp((y
i
−M)/b)/
exp((y
i
−M)/b)−b−¯ y where M = max(y
1
, . . . , y
n
).
This avoids overﬂow or accuracy loss in the exponentials when the y
i
tend to be large.
The above derivations go through with very little change when instead of observing a full sample
Y
1
, . . . , Y
n
we only observe the r ≥ 2 smallest sample values Y
(1)
< . . . < Y
(r)
. Such data is referred
to as type II censored data. This situation typically arises in a laboratory setting when several
units are put on test (subjected to failure exposure) simultaneously and the test is terminated
(or evaluated) when the ﬁrst r units have failed. In that case we know the ﬁrst r failure times
X
(1)
< . . . < X
(r)
and thus Y
(i)
= log(X
(i)
), i = 1, . . . , r, and we know that the lifetimes of the
remaining units exceed X
(r)
or that Y
(i)
> Y
(r)
for i > r. The advantage of such data collection is
that we do not have to wait until all n units have failed. Furthermore, if we put a lot of units on
test (high n) we increase our chance of seeing our ﬁrst r failures before a ﬁxed time y. This is a
simple consequence of the following binomial probability statement:
P(Y (r) ≤ y) = P(at least r failures ≤ y in n trials) =
n
i=r
_
n
i
_
P(Y ≤ y)
i
(1 −P(Y ≤ y))
n−i
which is strictly increasing in n for any ﬁxed y and r ≥ 1 (exercise).
The joint density of Y
(1)
, . . . , Y
(n)
at (y
1
, . . . , y
n
) with y
1
< . . . < y
n
is
f(y
1
, . . . , y
n
) = n!
n
i=1
1
b
g
_
y
i
−u
b
_
= n!
n
i=1
f(y
i
)
where the multiplier n! just accounts for the fact that all n! permutations of y
1
, . . . , y
n
could have
been the order in which these values were observed and all of these orders have the same density
(probability). Integrating out y
n
> y
n−1
> . . . > y
r+1
(> y
r
) and using
¯
F(y) = 1 −F(y) we get after
n −r successive integration steps the joint density of the ﬁrst r failure times y
1
< . . . < y
r
as
f(y
1
, . . . , y
n−1
) = n!
n−1
i=1
f(y
i
) ×
_
∞
y
n−1
f(y
n
)dy
n
= n!
n−1
i=1
f(y
i
)
¯
F(y
n−1
)
f(y
1
, . . . , y
n−2
) = n!
n−2
i=1
f(y
i
) ×
_
∞
y
n−2
f(y
n−1
)
¯
F(y
n−1
)dy
n−1
= n!
n−2
i=1
f(y
i
) ×
1
2
¯
F
2
(y
n−2
)
f(y
1
, . . . , y
n−3
) = n!
n−3
i=1
f(y
i
) ×
_
∞
y
n−3
f(y
n−2
)
¯
F
2
(y
n−2
)/2dy
n−2
= n!
n−3
i=1
f(y
i
) ×
1
3!
¯
F
3
(y
n−3
)
12
. . .
f(y
1
, . . . , y
r
) = n!
r
i=1
f(y
i
) ×
1
(n −r)!
¯
F
n−r
(y
r
) =
_
n!
(n −r)!
r
i=1
f(y
i
)
_
×[1 −F(y
r
)]
n−r
= r!
r
i=1
1
b
g
_
y
i
−u
b
_
×
_
n
r
_
_
1 −G
_
y
r
−u
b
__
n−r
with loglikelihood
(y
1
, . . . , y
r
, u, b) = log
_
n!
(n −r)!
_
−r log(b) +
r
i=1
y
i
−u
b
−
r
i=1
exp
_
y
i
−u
b
_
where we use the notation
r
i=1
x
i
=
r
i=1
x
i
+ (n −r)x
r
.
The likelihood equations are
0 =
∂
∂u
(y
1
, . . . , y
r
, u, b) = −
r
b
+
1
b
r
i=1
exp
_
y
i
−u
b
_
or exp(u) =
_
1
r
r
i=1
exp
_
y
i
b
_
_
b
0 =
∂
∂b
(y
1
, . . . , y
r
, u, b) = −
r
b
−
1
b
r
i=1
y
i
−u
b
+
1
b
r
i=1
y
i
−u
b
exp
_
y
i
−u
b
_
where again the transformed ﬁrst equation gives us a solution ˆ u once we have a solution
ˆ
b for b.
Using this in the second equation it transforms to a single equation in b alone, namely
r
i=1
y
i
exp(y
i
/b)
_
r
i=1
exp(y
i
/b) −b −
1
r
r
i=1
y
i
= 0 .
Again it is advisable to use the equivalent but computationally more stable form
r
i=1
y
i
exp((y
i
−y
r
)/b)
_
r
i=1
exp((y
i
−y
r
)/b) −b −
1
r
r
i=1
y
i
= 0 .
As in the complete sample case one sees that this equation has a unique solution
ˆ
b and that (ˆ u,
ˆ
b)
gives the location of the (unique) global maximimum of the likelihood function, i.e., (ˆ u,
ˆ
b) are the
mle’s.
13
6 Computation of Maximum Likelihood Estimates in R
The computation of the mle’s of the Weibull parameters α and β is facilitated by the function
survreg which is part of the R package survival. Here survreg is used in its most basic form in
the context of Weibull data (full sample or type II censored Weibull data). survreg does a whole
lot more than compute the mle’s but we will not deal with these aspects here, at least for now.
The following is an R function, called Weibull.mle, that uses survreg to compute these estimates.
Note that it tests for the existence of survreg before calling it. This function is part of the R work
space that is posted on the class web site.
Weibull.mle < function (x=NULL,n=NULL){
# This function computes the maximum likelihood estimates of alpha and beta
# for complete or type II censored samples assumed to come from a 2parameter
# Weibull distribution. Here x is the sample, either the full sample or the first
# r observations of a type II censored sample. In the latter case one must specify
# the full sample size n, otherwise x is treated as a full sample.
# If x is not given then a default full sample of size n=10, namely
# c(7,12.1,22.8,23.1,25.7,26.7,29.0,29.9,39.5,41.9) is analyzed and the returned
# results should be
# $mles
# alpha.hat beta.hat
# 28.914017 2.799793
#
# In the type II censored usage
# Weibull.mle(c(7,12.1,22.8,23.1,25.7),10)
# $mles
# alpha.hat beta.hat
# 30.725992 2.432647
if(is.null(x))x < c(7,12.1,22.8,23.1,25.7,26.7,29.0,29.9,39.5,41.9)
r < length(x)
if(is.null(n)){n<r}else{if(r>nr<2){
return("x must have length r with: 2 <= r <= n")}}
xs < sort(x)
if(!exists("survreg"))library(survival)
#tests whether survival package is loaded, if not, then it loads survival
if(r<n){
statusx < c(rep(1,r),rep(0,nr))
dat.weibull < data.frame(c(xs,rep(xs[r],nr)),statusx)
14
}else{statusx < rep(1,n)
dat.weibull < data.frame(xs,statusx)}
names(dat.weibull)<c("time","status")
out.weibull < survreg(Surv(time,status)~1,dist="weibull",data=dat.weibull)
alpha.hat < exp(out.weibull$coef)
beta.hat < 1/out.weibull$scale
parms < c(alpha.hat,beta.hat)
names(parms)<c("alpha.hat","beta.hat")
list(mles=parms)}
Note that survreg analyzes objects of class Surv. Here such an object is created by the function
Surv and it basically adjoins the failure times with a status vector of same length. The status
is 1 when a time corresponds to an actual failure time. It is 0 when the corresponding time is
a censoring time, i.e., we only know that the unobserved actual failure time exceeds the reported
censoring time. In the case of type II censored data these censoring times equal the r
th
largest
failure time.
To get a sense of the calculation speed of this function we ran Weibull.mle a 1000 times, which
tells us that the time to compute the mle’s in a sample of size n = 10 is roughly 5.91/1000 = .00591.
This fact plays a signiﬁcant role later on in the various inference procedures which we will discuss.
system.time(for(i in 1:1000){Weibull.mle(rweibull(10,1))})
user system elapsed
5.79 0.00 5.91
For n = 100, 500, 1000 the elapsed times came to 8.07, 15.91 and 25.87, respectively. The relationship
of computing time to n appears to be quite linear, but with slow growth, as Figure 4 shows.
7 Location and Scale Equivariance of Maximum Likelihood Estimates
The maximum likelihood estimates ˆ u and
ˆ
b of the location and scale parameters u and b have the
following equivariance properties which will play a strong role in the later pivot construction and
resulting conﬁdence intervals.
Based on data z = (z
1
, . . . , z
n
) we denote the estimates of u and b more explicitly by ˆ u(z
1
, . . . , z
n
) =
ˆ u(z) and
ˆ
b(z
1
, . . . , z
n
) =
ˆ
b(z). If we transform z to r = (r
1
, . . . , r
n
) with r
i
= A+Bz
i
, where A ∈ R
and B > 0 are arbitrary constant, then
ˆ u(r
1
, . . . , r
n
) = A +Bˆ u(z
1
, . . . , z
n
) or ˆ u(r) = ˆ u(A +Bz) = A +Bˆ u(z)
15
q
q
q
q
0 200 400 600 800 1000
0
.
0
0
0
0
.
0
0
5
0
.
0
1
0
0
.
0
1
5
0
.
0
2
0
0
.
0
2
5
0
.
0
3
0
sample size n
t
i
m
e
t
o
c
o
m
p
u
t
e
W
e
i
b
u
l
l
m
l
e
'
s
(
s
e
c
)
intercept = 0.005886 , slope = 2.001e−05
Figure 4: Weibull Parameter MLE Computation Time in Relation to Sample Size n
and
ˆ
b(r
1
, . . . , r
n
) = B
ˆ
b(z
1
, . . . , z
n
) or
ˆ
b(r) =
ˆ
b(A +Bz) = B
ˆ
b(z) .
These properties are naturally desirable for any location and scale estimates and for mle’s they are
indeed true.
Proof: Observe the following deﬁning properties of the mle’s in terms of z = (z
1
, . . . , z
n
) and
r = (r
1
, . . . , r
n
)
sup
u,b
_
1
b
n
n
i=1
g((z
i
−u)/b)
_
=
1
ˆ
b
n
(z)
n
i=1
g((z
i
− ˆ u(z))/
ˆ
b(z))
sup
u,b
_
1
b
n
n
i=1
g((r
i
−u)/b)
_
=
1
ˆ
b
n
(r)
n
i=1
g((r
i
− ˆ u(r))/
ˆ
b(r))
=
1
B
n
1
(
ˆ
b(r)/B)
n
n
i=1
g((z
i
−(ˆ u(r) −A)/B)/(
ˆ
b(r)/B))
16
but also
sup
u,b
_
1
b
n
n
i=1
g((r
i
−u)/b)
_
= sup
u,b
_
1
b
n
n
i=1
g((A +Bz
i
−u)/b)
_
= sup
u,b
_
1
B
n
1
(b/B)
n
n
i=1
g((z
i
−(u −A)/B)/(b/B))
_
˜ u = (u −A)/B
˜
b = b/B
⇒ = sup
˜ u,
˜
b
_
1
B
n
1
˜
b
n
n
i=1
g((z
i
− ˜ u)/
˜
b)
_
=
1
B
n
1
ˆ
b
n
(z)
n
i=1
g((z
i
− ˆ u(z))/
ˆ
b(z))
Thus by the uniqueness of the mle’s we have
ˆ u(z) = (ˆ u(r) −A)/B and
ˆ
b(z) =
ˆ
b(r)/B
or
ˆ u(r) = ˆ u(A +Bz) = A +Bˆ u(z) and
ˆ
b(r) =
ˆ
b(A +Bz) = B
ˆ
b(z) q.e.d.
The same equivariance properties hold for the mle’s in the context of type II censored samples, as
is easily veriﬁed.
8 Tests of Fit Based on the Empirical Distribution Function
Relying on subjective assessment of linearity in Weibull probability plots in order to judge whether
a sample comes from a 2parameter Weibull population takes a fair amount of experience. It is
simpler and more objective to employ a formal test of ﬁt which compares the empirical distribution
function
ˆ
F
n
(x) of a sample with the ﬁtted Weibull distribution function
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) using one
of several common discrepancy metrics.
The empirical distribution function (EDF) of a sample X
1
, . . . , X
n
is deﬁned as
ˆ
F
n
(x) =
# of observations ≤ x
n
=
1
n
n
i=1
I
{X
i
≤x}
where I
A
= 1 when A is true, and I
A
= 0 when A is false. The ﬁtted Weibull distribution function
(using mle’s ˆ α and
ˆ
β) is
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) = 1 −exp
_
−
_
x
ˆ α
_ˆ
β
_
.
From the law of large numbers (LLN) we see that for any x we have that
ˆ
F
n
(x) −→ F
α,β
(x) as
n −→∞. Just view
ˆ
F
n
(x) as a binomial proportion or as an average of Bernoulli random variables.
17
From MLE theory we also know that
ˆ
F(x) = F
ˆ α,
ˆ
β
(x) −→ F
α,β
(x) as n −→ ∞ (also derived from
the LLN).
Since the limiting cdf F
α,β
(x) is continuous in x one can argue that these convergence statements
can be made uniformly in x, i.e.,
sup
x

ˆ
F
n
(x) −F
α,β
(x) −→0 and sup
x
F
ˆ α,
ˆ
β
(x) −F
α,β
(x) −→0 as n −→∞
and thus sup
x

ˆ
F
n
(x)−F
ˆ α,
ˆ
β
(x) −→0 as n −→∞ for all α > 0 and β > 0.
The distance D
KS
(F, G) = sup
x
F(x)−G(x)
is known as the KolmogorovSmirnov distance between two cdf’s F and G.
Figures 5 and 6 give illustrations of this KolmogorovSmirnov distance between EDF and ﬁtted
Weibull distribution and show the relationship between sampled true Weibull distribution, ﬁtted
Weibull distribution, and empirical distribution function.
Some comments:
1. It can be noted that the closeness between
ˆ
F
n
(x) and F
ˆ α,
ˆ
β
(x) is usually more pronounced
than their respective closeness to F
α,β
(x), in spite of the sequence of the above convergence
statements.
2. This can be understood from the fact that both
ˆ
F
n
(x) and F
ˆ α,
ˆ
β
(x) ﬁt the data, i.e., try to
give a good representation of the data. The ﬁt of the true distribution, although being the
origin of the data, is not always good due to sampling variation.
3. The closeness between all three distributions improves as n gets larger.
Several other distances between cdf’s F and G have been proposed and investigated in the literature.
We will only discuss two of them, the Cram´ervon Mises distance D
CvM
and the AndersonDarling
distance D
AD
. They are deﬁned respectively as follows
D
CvM
(F, G) =
_
∞
−∞
(F(x) −G(x))
2
dG(x) =
_
∞
−∞
(F(x) −G(x))
2
g(x) dx
and
D
AD
(F, G) =
_
∞
−∞
(F(x) −G(x))
2
G(x)(1 −G(x))
dG(x) =
_
∞
−∞
(F(x) −G(x))
2
G(x)(1 −G(x))
g(x) dx .
Rather than focussing on the very local phenomenon of a maximum discrepancy at some point x as
in D
KS
, these alternate distances or discrepancy metrics integrate these distances in squared form
over all x, weighted by g(x) in the case of D
CvM
(F, G) and by g(x)/[G(x)(1 − G(x))] in the case
18
D
AD
(F, G). In the latter case, the denominator increases the weight in the tails of the G distribution,
i.e., compensates to some extent for the tapering oﬀ in the density g(x). Thus D
AD
(F, G) is favored
in situations where judging tail behavior is important, e.g., in risk situations. Because of the
integration nature of these last two metrics they have more global character. There is no easy
graphical representation of these metrics, except to suggest that when viewing the previous ﬁgures
illustrating D
KS
one should look at all vertical distances (large and small) between
ˆ
F
n
(x) and
ˆ
F(x),
square them and accumulate these squares in the appropriately weighted fashion. For example,
when one cdf is shifted relative to the other by a small amount (no large vertical discrepancy),
these small vertical discrepancies (squared) will add up and indicate a moderately large diﬀerence
between the two compared cdf’s.
We point out the asymmetric nature of these last two metrics, i.e., we typically have
D
CvM
(F, G) = D
CvM
(G, F) and D
AD
(F, G) = D
AD
(G, F) .
When using these metrics for tests of ﬁt one usually takes the cdf with a density (the model
distribution to be tested) as the one with respect to which the integration takes place, while the
other cdf is taken to be the EDF.
As complicated as these metrics may look at ﬁrst glance, their computation is quite simple. We
will give the following computational expressions (without proof):
D
KS
(
ˆ
F
n
(x),
ˆ
F(x)) = D = max
_
max
_
i/n −V
(i)
_
, max
_
V
(i)
−(i −1)/n
__
where V
(1)
≤ . . . ≤ V
(n)
are the ordered values of V
i
=
ˆ
F(X
i
), i = 1, . . . , n.
For the other two test of ﬁt criteria we have
D
CvM
(
ˆ
F
n
(x),
ˆ
F(x)) = W
2
=
n
i=1
_
V
(i)
−
2i −1
2n
_
2
+
1
12n
and
D
AD
(
ˆ
F
n
(x),
ˆ
F(x)) = A
2
= −n −
1
n
n
i=1
(2i −1)
_
log(V
(i)
) + log(1 −V
(n−i+1)
)
_
.
In order to carry out these tests of ﬁt we need to know the null distributions of D, W
2
and A
2
.
Quite naturally we would reject the hypothesis of a sampled Weibull distribution whenever D or
W
2
or A
2
are too large. The null distribution of D, W
2
and A
2
does not depend on the unknown
parameters α and β, being estimated by ˆ α and
ˆ
β in V
i
=
ˆ
F(X
i
) = F
ˆ α,
ˆ
β
(X
i
). The reason for this
is that the V
i
have a distribution that is independent of the unknown parameters α and β. This is
seen as follows. Using our prior notation we write log(X
i
) = Y
i
= u +bZ
i
and since
19
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q qq q q q q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
Figure 5: Illustration of KolmogorovSmirnov Distance for n = 10 and n = 20
20
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q qq q q q q qq qqq q q q q q q q qqqqqqq q qq qqqq q qq q qq q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
0 5000 10000 15000 20000 25000
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
c
u
m
u
l
a
t
i
v
e
d
i
s
t
r
i
b
u
t
i
o
n
f
u
n
c
t
i
o
n
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq q q q qq qq qq q q q qqqq qq qqqq qq qqqq qq qq q qqqqq q qqqq q qqq qq q qqqqq qqq q qqq qqq q q q qq qqq qqqqq qqq q qq q qq q q q q q q q q q q q
q
q
EDF == F
^
n
True Sampled CDF == F
αα,, ββ
((x))
Weibull Fitted CDF == F
αα
^
,, ββ
^
((x))
KS−Distance
KS−Distance == sup
x
F
^
n
((x)) −− F
αα
^
,, ββ
^
((x))
Figure 6: Illustration of KolmogorovSmirnov Distance for n = 50 and n = 100
21
F(x) = P(X ≤ x) = P(log(X) ≤ log(x)) = P(Y ≤ y) = 1 −exp(−exp((y −u)/b))
and thus
V
i
=
ˆ
F(X
i
) = 1 −exp(−exp((Y
i
− ˆ u(Y))/
ˆ
b(Y)))
= 1 −exp(−exp((u +bZ
i
− ˆ u(u +bZ))/
ˆ
b(u +bZ)))
= 1 −exp(−exp((u +bZ
i
−u −bˆ u(Z))/[b
ˆ
b(Z])))
= 1 −exp(−exp((Z
i
− ˆ u(Z))/
ˆ
b(Z)))
and all dependence on the unknown parameters u = log(α) and b = 1/β has canceled out.
This opens up the possibility of using simulation to ﬁnd good approximations to these null distri
butions for any n, especially in view of the previously reported timing results for computing the
mle’s ˆ α and
ˆ
β of α and β. Just generate samples X
= (X
1
, . . . , X
n
) from W(α = 1, β = 1)
(standard exponential distribution), compute the corresponding ˆ α
= ˆ α(X
) and
ˆ
β
=
ˆ
β(X
),
then V
i
=
ˆ
F(X
i
) = F
ˆ α
,
ˆ
β
(X
i
) (where F
α,β
(x) is the cdf of W(α, β)) and from that the values
D
= D(X
), W
2
= W
2
(X
) and A
2
= A
2
(X
). Calculating all three test of ﬁt criteria makes
sense since the main calculation eﬀort is in getting the mle’s ˆ α
and
ˆ
β
. Repeating this a large
number of times, say N
sim
= 10000, should give us a reasonably good approximation to the desired
null distribution and from it one can determine appropriate pvalues for any sample X
1
, . . . , X
n
for which one wishes to assess whether the Weibull distribution hypothesis is tenable or not. If
C(X) denotes the used test of ﬁt criterion then the estimated pvalue of this sample is simply the
proportion of C(X
) that are ≥ C(X).
Prior to the ease of current computing Stephens (1986) provided tables for the (1 − α)quantiles
q
1−α
of these null distributions. For the nadjusted versions A
2
(1 + .2/
√
n) and W
2
(1 + .2/
√
n)
these null distributions appear to be independent of n and (1 − α)quantiles were given for α =
.25, .10, .05, .025, .01. Plotting log(α/(1 −α)) against q
1−α
shows a mildly quadratic pattern which
can be used to interpolate or extrapolate the appropriate pvalue (observed signiﬁcance level α) for
any observed nadjusted value A
2
(1 +.2/
√
n) and W
2
(1 +.2/
√
n), as is illustrated in Figure 7.
For
√
nD the null distribution still depends on n (in spite of the normalizing factor
√
n) and (1−α)
quantiles for α = .10, .05, .025, .01 were tabulated for n = 10, 20, 50, ∞ by Stephens (1986). Here
a double inter and extrapolation scheme is needed, ﬁrst by plotting these quantiles against 1/
√
n,
ﬁtting quadratics in 1/
√
n and reading oﬀ the four interpolated quantile values for the needed n
0
(the sample size at issue) and as a second step perform the interpolation or extrapolation scheme
as it was done previously, but using a cubic this time. This is illustrated in Figure 8.
22
Functions for computing these pvalues (via interpolation from Stephens’ tabled values) are given
in the Weibull R work space provided at the class web site. They are GOF.KS.test, GOF.CvM.test,
and GOF.AD.test for computing pvalues for nadjusted test criteria
√
nD, W
2
(1 + .2/
√
n) , and
A
2
(1 + .2/
√
n), respectively. These functions have an optional argument graphic where graphic
= T causes the interpolation graphs shown in Figures 7 and 8 to be produced, otherwise only the
pvalues are given. The function Weibull.GOF.test does a Weibull goodness of ﬁt test on any given
sample, returning pvalues for all three test criteria. You also ﬁnd there the function Weibull.mle
that was listed earlier, and several other functions not yet documented here.
One could easily reproduce and extend the tables given by Stephens (1986) so that extrapolations
becomes less of an issue. For n = 100 it should take less than 1.5 minutes to simulate the null
distributions based on N
sim
= 10, 000 and the previously given timing of 8.07 sec for N
sim
= 1000.
9 Pivots
Based on the previous equivariance properties of ˆ u(Y) and
ˆ
b(Y) we have the following pivots,
namely functions W = ψ(ˆ u(Y),
ˆ
b(Y), ϑ) of the estimates and an unknown parameter ϑ of interest
such that W has a ﬁxed and known distribution and the function ψ is strictly monotone in the
unknown parameter ϑ, so that it is invertible with respect to ϑ.
Recall that for a Weibull random sample X = (X
1
, . . . , X
n
) we have Y
i
= log(X
i
) ∼ G((y − u)/b)
with b = 1/β and u = log(α). Then Z
i
= (Y
i
−u)/b ∼ G(z) = 1 −exp(−exp(z)), which is a known
distribution (does not depend on unknown parameters). This is seen as follows:
P(Z
i
≤ z) = P((Y
i
−u)/b ≤ z) = P(Y
i
≤ u +bz) = G(([u +bz] −u)/b) = G(z) .
It is this known distribution of Z = (Z
1
, . . . , Z
n
) that is instrumental in knowing the distribution of
the four pivots that we discuss below. There we utilize the representation Y
i
= u+bZ
i
or Y = u+bZ
in vector form.
9.1 Pivot for the Scale Parameter b
As natural pivot for the scale parameter ϑ = b we take
W
1
=
ˆ
b(Y)
b
=
ˆ
b(u +bZ)
b
=
b
ˆ
b(Z)
b
=
ˆ
b(Z) .
The right side, being a function of Z alone, has a distribution that does not involve unknown
parameters and W
1
=
ˆ
b(Y)/b is strictly monotone in b.
23
q
q
q
q
q
A
2
×× ((1 ++ 0.2 n))
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0.0 0.5 1.0 1.5
0
.
0
0
1
0
.
0
1
0
.
0
5
0
.
2
5
0
.
5
0
.
7
5
0
.
9
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
q
q
q
q
q
W
2
×× ((1 ++ 0.2 n))
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0.00 0.05 0.10 0.15 0.20 0.25 0.30
0
.
0
0
1
0
.
0
1
0
.
0
5
0
.
2
5
0
.
5
0
.
7
5
0
.
9
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
Figure 7: Interpolation & Extrapolation for A
2
(1 +.2/
√
n) and W
2
(1 +.2/
√
n)
24
q
q
q
q
quadratic interpolation & linear extrapolation in 1 n
n (on 1 n scale)
∞∞
500 200 100 50 40 30 25 20 15 10 9 8 7 6 5 4
0
.
8
0
3
0
.
8
7
4
0
.
9
3
9
1
.
0
0
7
n
××
D
−
q
u
a
n
t
i
l
e
D
0
.
9
D
0
.
9
5
D
0
.
9
7
5
D
0
.
9
9
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
tabled values
interpolated/extrapolated values
q
q
q
q
cubic interpolation & linear extrapolation in D
n ×× D
0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05
t
a
i
l
p
r
o
b
a
b
i
l
i
t
y
p
o
n
l
o
g
((
p
((
1
−−
p
))
))
s
c
a
l
e
0
.
0
0
1
0
.
0
0
5
0
.
0
2
5
0
.
1
0
.
2
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
tabled values
interpolated quantiles
interpolated/extrapolated values
q
q
q
q
q
q
Figure 8: Interpolation & Extrapolation for
√
n ×D
25
How do we obtain the distribution of
ˆ
b(Z)? An analytical approach does not seem possible. The
approach followed here is that presented in Bain (1978), Bain and Engelhardt (1991) and originally
in Thoman et al. (1969, 1970), who provided tables for this distribution (and for those of the other
pivots discussed here) based on N
sim
simulated values of
ˆ
b(Z) (and ˆ u(Z)), where N
sim
= 20000 for
n = 5, N
sim
= 10000 for n = 6, 8, 10, 15, 20, 30, 40, 50, 75, and N
sim
= 6000 for n = 100.
In these simulations one simply generates samples Z
= (Z
1
, . . . , Z
n
) ∼ G(z) and ﬁnds
ˆ
b(Z
) (and
ˆ u(Z
) for the other pivots discussed later) for each such sample Z
. By simulating this process
N
sim
= 10000 times we obtain
ˆ
b(Z
1
), . . . ,
ˆ
b(Z
N
sim
). The empirical distribution function of these
simulated estimates
ˆ
b(Z
i
), denoted by
ˆ
H
1
(w), provides a fairly reasonable estimate of the sampling
distribution H
1
(w) of
ˆ
b(Z) and thus also of the pivot distribution of W
1
=
ˆ
b(Y)/b. From this
simulated distribution we can estimate any γquantile of H
1
(w) to any practical accuracy, provided
N
sim
is suﬃciently large. Values of γ closer to 0 or 1 require higher N
sim
. For .005 ≤ γ ≤ .995 a
simulation level of N
sim
= 10000 should be quite adequate.
If we denote the γquantile of H
1
(w) by η
1
(γ), i.e.,
γ = H
1
(η
1
(γ)) = P(
ˆ
b(Y)/b ≤ η
1
(γ)) = P(
ˆ
b(Y)/η
1
(γ) ≤ b)
we see that
ˆ
b(Y)/η
1
(γ) can be viewed as a 100γ% lower bound to the unknown parameter b. We do
not know η
1
(γ) but we can estimate it by the corresponding quantile ˆ η
1
(γ) of the simulated distri
bution
ˆ
H
1
(w) which serves as proxy for H
1
(w). We then use
ˆ
b(Y)/ˆ η
1
(γ) as an approximate 100γ%
lower bound to the unknown parameter b. For large N
sim
, say N
sim
= 10000, this approximation is
practically quite adequate.
We note here that a 100γ% lower bound can be viewed as a 100(1 − γ)% upper bound, because
1 − γ is the chance of the lower bound falling on the wrong side of its target, namely above. To
get 100γ% upper bounds one simply constructs 100(1 − γ)% lower bounds by the above method.
Similar comments apply to the pivots obtained below, where we only give onesided bounds (lower
or upper) in each case.
Based on the relationship b = 1/β the respective 100γ% approximate lower and upper conﬁdence
bounds for the Weibull shape parameter would be
ˆ η
1
(1 −γ)
ˆ
b(Y)
= ˆ η
1
(1 −γ) ×
ˆ
β(X) and
ˆ η
1
(γ)
ˆ
b(Y)
= ˆ η
1
(γ) ×
ˆ
β(X)
and an approximate 100γ% conﬁdence interval for β would be
_
ˆ η
1
((1 −γ)/2) ×
ˆ
β(X), ˆ η
1
((1 +γ)/2) ×
ˆ
β(X)
_
since (1 +γ)/2 = 1 −(1 −γ)/2. Here X = (X
1
, . . . , X
n
) is the untransformed Weibull sample.
26
9.2 Pivot for the Location Parameter u
For the location parameter ϑ = u we have the following pivot
W
2
=
ˆ u(Y) −u
ˆ
b(Y)
=
ˆ u(u +bZ) −u
ˆ
b(u +bZ)
=
u +bˆ u(Z) −u
b
ˆ
b(Z)
=
ˆ u(Z)
ˆ
b(Z)
.
It has a distribution that does not depend on any unknown parameter, since it only depends on the
known distribution of Z. Furthermore W
2
is strictly decreasing in u. Thus W
2
is a pivot with respect
to u. Denote this pivot distribution of W
2
by H
2
(w) and its γquantile by η
2
(γ). As before this
pivot distribution and its quantiles can be approximated suﬃciently well by simulating ˆ u(Z
)/
ˆ
b(Z
)
a suﬃcient number N
sim
times and using the empirical cdf
ˆ
H
2
(w) of the ˆ u(Z
i
)/
ˆ
b(Z
i
) as proxy for
H
2
(w).
As in the previous pivot case we can exploit this pivot distribution as follows
γ = H
2
(η
2
(γ)) = P
_
ˆ u(Y) −u
ˆ
b(Y)
≤ η
2
(γ)
_
= P(ˆ u(Y) −
ˆ
b(Y)η
2
(γ) ≤ u)
and thus we can view ˆ u(Y) −
ˆ
b(Y)η
2
(γ) as a 100γ% lower bound for the unknown parameter u.
Using the γquantile ˆ η
2
(γ) obtained from the empirical cdf
ˆ
H
2
(w) we then treat ˆ u(Y) −
ˆ
b(Y)ˆ η
2
(γ)
as an approximate 100γ% lower bound for the unknown parameter u.
Based on the relation u = log(α) this translates into an approximate 100γ% lower bound
exp(ˆ u(Y) −
ˆ
b(Y)ˆ η
2
(γ)) = exp(log(ˆ α(X)) − ˆ η
2
(γ)/
ˆ
β(X)) = ˆ α(X) exp(−ˆ η
2
(γ)/
ˆ
β(X)) for α.
Upper bounds and intervals are handled as in the previous situation for b or β.
9.3 Pivot for the pquantile y
p
With respect to the pquantile ϑ = y
p
= u +b log(−log(1 −p)) = u +bw
p
of the Y distribution the
natural pivot is
W
p
=
ˆ y
p
(Y) −y
p
ˆ
b(Y)
=
ˆ u(Y) +
ˆ
b(Y)w
p
−(u +bw
p
)
ˆ
b(Y)
=
ˆ u(u +bZ) +
ˆ
b(u +bZ)w
p
−(u +bw
p
)
ˆ
b(u +bZ)
=
u +bˆ u(Z) +b
ˆ
b(Z)w
p
−(u +bw
p
)
b
ˆ
b(Z)
=
ˆ u(Z) + (
ˆ
b(Z) −1)w
p
ˆ
b(Z)
.
Again its distribution only depends on the known distribution of Z and not on the unknown param
eters u and b and the pivot W
p
is a strictly decreasing function of y
p
. Denote this pivot distribution
27
function by H
p
(w) and its γquantile by η
p
(γ). As before, this pivot distribution and its quantiles
can be approximated suﬃciently well by simulating
_
ˆ u(Z) + (
ˆ
b(Z) −1)w
p
_
/
ˆ
b(Z) a suﬃcient num
ber N
sim
times. Denote the empirical cdf of such simulated values by
ˆ
H
p
(w) and the corresponding
γquantiles by ˆ η
p
(γ).
As before we proceed with
γ = H
p
(η
p
(γ)) = P
_
ˆ y
p
(Y) −y
p
ˆ
b(Y)
≤ η
p
(γ)
_
= P
_
ˆ y
p
(Y) −η
p
(γ)
ˆ
b(Y) ≤ y
p
_
and thus we can treat ˆ y
p
(Y) − η
p
(γ)
ˆ
b(Y) as a 100γ% lower bound for y
p
. Again we can treat
ˆ y
p
(Y) − ˆ η
p
(γ)
ˆ
b(Y) as an approximate 100γ% lower bound for y
p
.
Since
ˆ y
p
(Y) −η
p
(γ)
ˆ
b(Y) = ˆ u(Y) +w
p
ˆ
b(Y) −η
p
(γ)
ˆ
b(Y) = ˆ u(Y) −k
p
(γ)
ˆ
b(Y)
with k
p
(γ) = η
p
(γ) −w
p
, we could have obtained the same lower bound by the following argument
that does not use a direct pivot, namely
γ = P(ˆ u(Y) −k
p
(γ)
ˆ
b(Y) ≤ y
p
) = P(ˆ u(Y) −k
p
(γ)
ˆ
b(Y) ≤ u +bw
p
)
= P(ˆ u(Y) −u −k
p
(γ)
ˆ
b(Y) ≤ bw
p
)
= P
_
ˆ u(Y) −u
b
−k
p
(γ)
ˆ
b(Y)
b
≤ w
p
_
= P(ˆ u(Z) −k
p
(γ)
ˆ
b(Z) ≤ w
p
) = P
_
ˆ u(Z) −w
p
ˆ
b(Z)
≤ k
p
(γ)
_
and we see that k
p
(γ) can be taken as the γquantile of the distribution of (ˆ u(Z) −w
p
)/
ˆ
b(Z).
This distribution can be estimated by the empirical cdf of N
sim
simulated values (ˆ u(Z
i
)−w
p
)/
ˆ
b(Z
i
),
i = 1, . . . , N
sim
and its γquantile
ˆ
k
p
(γ) serves as a good approximation to k
p
(γ).
It is easily seen that this produces the same quantile lower bound as before. However, in this
approach one sees one further detail, namely that h(p) = −k
p
(γ) is strictly increasing in p
1
, since
w
p
is strictly increasing in p.
1
Suppose p
1
< p
2
and h(p
1
) ≥ h(p
2
) with γ = P(ˆ u(Z) +h(p
1
)
ˆ
b(Z) ≤ w
p1
) and γ = P(ˆ u(Z) +h(p
2
)
ˆ
b(Z) ≤ w
p2
) =
P(ˆ u(Z) + h(p
1
)
ˆ
b(Z) ≤ w
p1
+ (w
p2
− w
p1
) + (h(p
1
) − h(p
2
))
ˆ
b(Z)) ≥ P(ˆ u(Z) + h(p
1
)
ˆ
b(Z) ≤ w
p1
+ (w
p2
− w
p1
)) > γ
(i.e., γ > γ, a contradiction) since P(w
p1
< ˆ u(Z) +h(p
1
)
ˆ
b(Z) ≤ w
p1
+(w
p2
−w
p1
)) > 0. A thorough argument would
show that
ˆ
b(z) and thus ˆ u(z) are continuous functions of z = (z
1
, . . . , z
n
) and since there is positive probability in
any neighborhood of any z ∈ R there is positive probability in any neighborhood of (ˆ u(z),
ˆ
b(z)).
28
Of course it makes intuitive sense that quantile lower bounds should be increasing in p since its
target pquantiles are increasing in p. This strictly increasing property allows us to immediately
construct upper conﬁdence bounds for left tail probabilities as is shown in the next section.
Since x
p
= exp(y
p
) is the corresponding pquantile of the Weibull distribution we can view
exp
_
ˆ y
p
(Y) − ˆ η
p
(γ)
ˆ
b(Y)
_
= ˆ α(X) exp
_
(w
p
− ˆ η
p
(γ))/
ˆ
β(X)
_
= ˆ α(X) exp
_
−
ˆ
k
p
(γ)/
ˆ
β(X)
_
as an approximate 100γ% lower bound for x
p
= exp(u +bw
p
) = α(−log(1 −p))
1/β
.
Since α is the (1 − exp(−1))quantile of the Weibull distribution, lower bounds for it can be seen
as a special case of quantile lower bounds. Indeed, this particular quantile lower bound coincides
with the one given previously.
9.4 Upper Conﬁdence Bounds for the Tail Probability p(y) = P(Y ≤ y)
As far as an appropriate pivot for p(y) = P(Y ≤ y) is concerned, the situation here is not as
straightforward as in the previous three cases. Clearly
ˆ p(y) = G
_
y − ˆ u(Y)
ˆ
b(Y)
_
is the natural estimate (mle) of p(y) = P(Y ≤ y) = G
_
y −u
b
_
and one easily sees that the distribution function H of this estimate depends on u and b only
through p(y), namely
ˆ p(y) = G
_
y − ˆ u(Y)
ˆ
b(Y)
_
= G
_
(y −u)/b −(ˆ u(Y) −u)/b
ˆ
b(Y)/b
_
= G
_
G
−1
(p(y)) − ˆ u(Z)
ˆ
b(Z)
_
∼ H
p(y)
.
Thus by the probability integral transform it follows that
W
p(y)
= H
p(y)
(ˆ p(y)) ∼ U(0, 1)
i.e., W
p(y)
is a true pivot, contrary to what is stated in Bain (1978) and Bain and Engelhardt (1991).
Rather than using this pivot we will go a more direct route as was indicated by the strictly increasing
property of h(p) = h
γ
(p) in the previous section. Denote by h
−1
(·) the inverse function to h(·). We
then have
γ = P(ˆ u(Y) +h(p)
ˆ
b(Y) ≤ y
p
) = P(h(p) ≤ (y
p
−ˆ u(Y))/
ˆ
b(Y)) = P
_
p ≤ h
−1
_
(y
p
− ˆ u(Y))/
ˆ
b(Y)
__
,
for any p ∈ (0, 1). If we parameterize such p via p(y) = P(Y ≤ y) = G((y −u)/b) we have y
p(y)
= y
and thus also
γ = P
_
p(y) ≤ h
−1
_
(y − ˆ u(Y))/
ˆ
b(Y)
__
29
for any y ∈ R and u ∈ R and b > 0. Hence ˆ p
U
(y) = h
−1
_
(y − ˆ u(Y))/
ˆ
b(Y)
_
can be viewed as
100γ% upper conﬁdence bound for p(y) for any given threshold y.
The only remaining issue is the computation of such bounds. Does it require the inversion of h
and the concomitant calculations of many h(p) = −k(p) for the iterative convergence of such an
inversion? It turns out that there is a direct path just as we had it in the previous three conﬁdence
bound situations.
Note that h
−1
(x) solves −k
p
= x for p. We claim that h
−1
(x) is the γquantile of the G(ˆ u(Z)+x
ˆ
b(Z))
distribution which we can simulate by calculating as before ˆ u(Z) and
ˆ
b(Z) a large number N
sim
times.
The above claim concerning h
−1
(x) is seen as follows. If for any x = h(p) we have
P(G(ˆ u(Z) +x
ˆ
b(Z)) ≤ h
−1
(x)) = P(G(ˆ u(Z) +h(p)
ˆ
b(Z)) ≤ p)
= P(ˆ u(Z) +h(p)
ˆ
b(Z) ≤ w
p
)
= P(ˆ u(Z) −k
γ
(p)
ˆ
b(Z) ≤ w
p
) = γ ,
as seen in the previous section. Thus h
−1
(x) is the γquantile of the G(ˆ u(Z) +x
ˆ
b(Z)) distribution.
If we observe Y = y and obtain ˆ u(y) and
ˆ
b(y) as our maximum likelihood estimates for u and
b we get our 100γ% upper bound for p(y) = G((y − u)/b) as follows: For the ﬁxed value of
x = (y − ˆ u(y))/
ˆ
b(y) = G
−1
(ˆ p(y)) simulate the G(ˆ u(Z) +x
ˆ
b(Z)) distribution (with suﬃciently high
N
sim
) and calculate the γquantile of this distribution as the desired approximate 100γ% upper
bound for p(y) = P(Y ≤ y) = G((y −u)/b).
10 Tabulation of Conﬁdence Quantiles η(γ)
For the pivots for b, u and y
p
it is possible to carry out simulations once and for all for a desired set
of conﬁdence levels γ, sample sizes n and choices of p, and tabulate the required conﬁdence quantiles
ˆ η
1
(γ), ˆ η
2
(γ), and ˆ η
p
(γ). This has essentially been done (with
√
n scaling modiﬁcations) and such
tables are given in Bain (1978), Bain and Engelhardt (1991) and Thoman et al. (1969,1970). Similar
tables for bounds on p(y) are not quite possible since the appropriate bounds depend on the observed
value of ˆ p(y), which varies from sample to sample. Instead Bain (1978), Bain and Engelhardt (1991)
and Thoman et al. (1970) tabulate conﬁdence bounds for p(y) for a reasonably ﬁne grid of values
for ˆ p(y), which can then serve for interpolation purposes with the actually observed value of ˆ p(y).
It should be quite clear that all this requires extensive tabulation. The use of these tables is not
easy. Table 4 in Bain (1978) does not have a consistent format and using these tables would
require delving deeply into the text for each new use, unless one does this kind of calculation all
the time. In fact, in the second edition, Bain and Engelhardt (1991), Table 4 has been greatly
reduced to just cover the conﬁdence factors dealing with the location parameter u, and it now
30
leaves out the conﬁdence factors for general pquantiles. For the pquantiles one is referred to the
same interpolation scheme that is needed when getting conﬁdence bounds for p(y), using Table 7
in Bain and Engelhardt (1991). The example that they present (page 248) would have beneﬁtted
by showing some intermediate steps in the interpolation process. They point out that the resulting
conﬁdence bound for x
p
is slightly diﬀerent (14.03) from that obtained using the conﬁdence quantiles
of the original Table 4, namely 13.92. They attribute the diﬀerence to roundoﬀ errors or other
discrepancies. Among the latter one may consider that possibly diﬀerent simulations were involved.
Further, note that some entries in the tables given in Bain (1978) seem to have typos. Presumably
they were transcribed by hand from computer output, just as the book (and its second edition) itself
is typed and not typeset. We give just give a few examples. In Bain (1978) Table 4A, p.235, bottom
row, the second entry from the right should be 3.625 instead of 3.262. This discrepancy shows up
clearly when plotting the row values against log(p/(1 − p)), see a similar plot for a later example.
In Table 3A, p.222, row 3 column 5 shows a double minus sign (still present in the 1991 second
edition). In comparing the values of these tables with our own simulation of pivot distribution
quantiles, just to validate our simulation for n = 40, we encountered an apparent error in Table
4A, p. 235 with last column entry of 4.826. Plotting log(p/(1 −p)) against the corresponding row
value (γquantiles) one clearly sees a change in pattern, see the top plot in Figure 9. We suspect
that the whole last column was calculated for p = .96 instead of the indicated p = .98. The bottom
plot shows our simulated values for these quantiles as solid dots with the previous points (circles)
superimposed.
The agreement is good for the ﬁrst 8 points. Our simulated γquantile was 5.725 (corresponding to
the 4.826 above) and it ﬁts quite smoothly into the pattern of the previous 8 points. Given that
this was the only case chosen for comparison it leaves some concern in fully trusting these tables.
However, this example also shows that the great majority of tabled values are valid.
11 The R Function WeibullPivots
Rather than using these tables we will resort to direct simulations ourselves since computing speed
has advanced suﬃciently over what was common prior to 1978. Furthermore, computing availability
has changed dramatically since then. It may be possible to further increase computing speed by
putting the loop over N
sim
calculations of mle’s into compiled form rather than looping within R for
each simulation iteration. For example, using qbeta in vectorized form reduced the computing time
to almost 1/3 of the time compared to looping within R itself over the elements in the argument
vector of qbeta.
However, such an increase in speed would require writing Ccode (or Fortran code) and linking that
in compiled form to R. Such extensions of R are possible, see chapter 5 System and foreign language
interfaces in the Writing R Extensions manual available under the toolbar Help in R.
31
2 3 4 5
0
1
2
3
4
Bain's tabled quantiles for n=40, γγ == 0.9
l
o
g
(
p
/
(
1
−
p
)
)
q0.96
0.98
q
q
q
q
q
q
q
q
q
2 3 4 5
0
1
2
3
4
our simulated quantiles for n=40, γγ == 0.9
l
o
g
(
p
/
(
1
−
p
)
)
Figure 9: Abnormal Behavior of Tabulated Conﬁdence Quantiles
32
For the R function WeibullPivots (available within the R work space for Weibull Distribution
Applications on the class web site) the call
system.time(WeibullPivots(Nsim = 10000, n = 10, r = 10, graphics = F))
gave an elapsed time of 59.76 seconds. Here the default sample size n = 10 was used and r = 10
(also default) indicates that the 10 lowest sample values are given and used, i.e., in this case the
full sample. Also, an internally generated Weibull data set was used, since the default in the call to
WeibullPivots is weib.sample=NULL. For sample sizes n = 100 with r = 100 and n = 1000 with
r = 1000 the corresponding calls resulted in elapsed times of 78.22 and 269.32 seconds, respectively.
These three computing times suggest strong linear behavior in n as is illustrated in Figure 10. The
intercept 57.35 and slope of .2119 given here are fairly consistent with the intercept .005886 and
slope of 2.001 ×10
−5
given in Figure 4. The latter give the calculation time of a single set of mle’s
while in the former case we calculate N
sim
= 10000 such mle’s, i.e., the previous slope and intercept
for a single mle calculation need to be scaled up by the factor 10000.
For all the previously discussed conﬁdence bounds, be they upper or lower bounds for their respective
targets, all that is needed is the set of (ˆ u(z
i
),
ˆ
b(z
i
)) for i = 1, . . . , N
sim
. Thus we can construct
conﬁdence bounds and intervals for u and b, for y
p
for any collection of p values, and for p(y) and
1 −p(y) for any collection of threshold values y and we can do this for any set of conﬁdence levels
that make sense for the simulated distributions, i.e., we don’t have to run the simulations over and
over for each target parameter, conﬁdence level, p or y, unless one wants independent simulations
for some reason.
Proper use of this function only requires understanding the calling arguments, purpose, and output
of this function, and the time to run the simulations. The time for running the simulation should
easily beat the time spent in dealing with tabulated conﬁdence quantiles in order to get desired
conﬁdence bounds, especially since WeibullPivots does such calculations all at once for a broad
spectrum of y
p
and p(y) and several conﬁdence levels without greatly impacting the computing time.
Furthermore, WeibullPivots does all this not only for full samples but also for type II censored
samples, for which appropriate conﬁdence factors are available only sparsely in tables.
We will now explain the calling sequence of WeibullPivots and its output. The calling sequence
with all arguments given with their default values is as follows:
WeibullPivots(weib.sample=NULL,alpha=10000,beta=1.5,n=10,r=10,
Nsim=1000,threshold=NULL,graphics=T)
Here Nsim = N
sim
has default value 1000 which is appropriate when trying to get a feel for the
function for any particular data set. The sample size is input as n = n and r = r indicates the
number of smallest sample values available for analysis. When r < n we are dealing with a type II
censored data set where observation stops as soon as the smallest r lifetimes have been observed.
We need r > 1 and at least two distinct observations among X
(1)
, . . . , X
(r)
in order to estimate
33
q
q
q
0 200 400 600 800 1000
0
5
0
1
0
0
1
5
0
2
0
0
2
5
0
3
0
0
sample size n
e
l
a
p
s
e
d
t
i
m
e
f
o
r
W
e
i
b
u
l
l
P
i
v
o
t
s
(
N
s
i
m
=
1
0
0
0
0
)
=
=
intercept
slope
57.35
0.2119
Figure 10: Timings for WeibullPivots for Various n
any spread in the data. The available sample values X
1
, . . . , X
r
(not necessarily ordered) are given
as vector input to weib.sample. When weib.sample=NULL (the default), an internal data set is
generated as input sample from fromW(α, β) with α = alpha = 10000 (default) and β = beta = 1.5
(default), either by using the full sample X
1
, . . . , X
n
or a type II censored sample X
1
, . . . , X
r
when
r < n is speciﬁed. The input thresh (= NULL by default) is a vector of thresholds y for which
we desire upper conﬁdence bounds for p(y). The input graphics (default T) indicates whether
graphical output is desired.
Conﬁdence levels γ are set internally as .005, .01, .025, .05, .10, .02, .8, .9, .95, .975, .99, .995 and these
levels indicate the coverage probability for the individual onesided bounds. A .025 lower bound
is reported as a .975 upper bound, and a pair of .975 lower and upper bounds constitute a 95%
conﬁdence interval. The values of p for which conﬁdence bounds or intervals for x
p
are provided
are also set internally as .001, .005, .01, .025, .05, .1, (.1), .9, .95, .975, .99, .995, .999.
The output from WeibullPivots is a list with components:
$alpha.hat, $alpha.hat, $alpha.beta.bounds,$p.quantile.estimates, $p.quantile.bounds,
$Tail.Probability.Estimates, and $Tail.Probability.Bounds. The structure and meaning of
these components will become clear from the example output given below.
34
$alpha.hat
(Intercept)
8976.2
$beta.hat
[1] 1.95
$alpha.beta.bounds
alpha.L alpha.U beta.L beta.U
99.5% 5094.6 16705 0.777 3.22
99% 5453.9 15228 0.855 3.05
97.5% 5948.6 13676 0.956 2.82
95% 6443.9 12608 1.070 2.64
90% 7024.6 11600 1.210 2.42
80% 7711.2 10606 1.390 2.18
$p.quantile.estimates
0.001quantile 0.005quantile 0.01quantile 0.025quantile 0.05quantile
259.9 593.8 848.3 1362.5 1957.0
0.1quantile 0.2quantile 0.3quantile 0.4quantile 0.5quantile
2830.8 4159.4 5290.4 6360.5 7438.1
0.6quantile 0.7quantile 0.8quantile 0.9quantile 0.95quantile
8582.7 9872.7 11457.2 13767.2 15756.3
0.975quantile 0.99quantile 0.995quantile 0.999quantile
17531.0 19643.5 21107.9 24183.6
$p.quantile.bounds
99.5% 99% 97.5% 95% 90% 80%
0.001quantile.L 1.1 2.6 6.0 12.9 28.2 60.1
0.001quantile.U 1245.7 1094.9 886.7 729.4 561.4 403.1
0.005quantile.L 8.6 16.9 31.9 57.4 106.7 190.8
0.005quantile.U 2066.9 1854.9 1575.1 1359.2 1100.6 845.5
0.01quantile.L 20.1 36.7 65.4 110.1 186.9 315.3
0.01quantile.U 2579.8 2361.5 2021.5 1773.9 1478.4 1165.8
0.025quantile.L 62.8 103.5 169.7 259.3 398.1 611.0
0.025quantile.U 3498.8 3206.6 2827.2 2532.5 2176.9 1783.5
35
0.05quantile.L 159.2 229.2 352.6 497.5 700.0 1011.9
0.05quantile.U 4415.7 4081.3 3673.7 3329.5 2930.0 2477.2
0.1quantile.L 398.3 506.3 717.4 962.2 1249.5 1679.1
0.1quantile.U 5584.6 5261.9 4811.6 4435.7 3990.8 3474.1
0.2quantile.L 1012.6 1160.2 1518.8 1882.9 2287.1 2833.2
0.2quantile.U 7417.1 6978.2 6492.8 6031.2 5543.2 4946.9
0.3quantile.L 1725.4 1945.2 2383.9 2820.0 3305.0 3929.0
0.3quantile.U 8919.8 8460.0 7939.8 7384.1 6865.0 6211.4
0.4quantile.L 2548.0 2848.2 3345.2 3806.6 4353.9 5008.2
0.4quantile.U 10616.3 10130.4 9380.3 8778.2 8139.3 7421.2
0.5quantile.L 3502.4 3881.1 4415.1 4873.3 5443.0 6107.3
0.5quantile.U 12809.0 11919.1 10992.9 10226.8 9485.1 8703.4
0.6quantile.L 4694.0 5022.6 5573.8 6052.8 6624.4 7300.4
0.6quantile.U 15626.1 14350.6 12941.3 11974.8 11041.1 10106.2
0.7quantile.L 6017.1 6399.0 6876.6 7345.8 7938.2 8628.0
0.7quantile.U 19271.6 17679.9 15545.8 14181.1 12958.0 11784.2
0.8quantile.L 7601.3 7971.0 8465.4 8933.5 9504.0 10244.2
0.8quantile.U 24765.2 22445.0 19286.0 17236.0 15605.6 13952.2
0.9quantile.L 9674.7 10033.7 10538.6 11031.1 11653.0 12460.3
0.9quantile.U 35233.4 31065.3 26037.4 22670.5 19835.3 17417.5
0.95quantile.L 11203.6 11584.6 12145.2 12660.2 13365.5 14311.2
0.95quantile.U 46832.9 40053.3 32863.1 27904.7 23903.9 20703.0
0.975quantile.L 12434.7 12833.5 13449.7 14030.5 14781.8 15909.1
0.975quantile.U 59783.1 49209.9 39397.8 33118.7 27938.4 23773.7
0.99quantile.L 13732.6 14207.7 14876.0 15530.0 16431.7 17783.1
0.99quantile.U 76425.0 61385.4 48625.4 40067.3 33233.8 27729.8
0.995quantile.L 14580.4 15115.4 15810.0 16530.6 17551.8 19081.0
0.995quantile.U 89690.9 71480.4 55033.4 45187.1 36918.7 30505.0
0.999quantile.L 16377.7 16885.9 17642.5 18557.1 19792.4 21744.7
0.999quantile.U 121177.7 95515.7 71256.5 56445.5 45328.1 36739.2
$Tail.Probability.Estimates
p(6000) p(7000) p(8000) p(9000) p(10000) p(11000) p(12000) p(13000)
0.36612 0.45977 0.55018 0.63402 0.70900 0.77385 0.82821 0.87242
p(14000) p(15000)
0.90737 0.93424
36
$Tail.Probability.Bounds
99.5% 99% 97.5% 95% 90% 80%
p(6000).L 0.12173 0.13911 0.16954 0.19782 0.23300 0.28311
p(6000).U 0.69856 0.67056 0.63572 0.59592 0.54776 0.49023
p(7000).L 0.17411 0.20130 0.23647 0.26985 0.31017 0.36523
p(7000).U 0.76280 0.73981 0.70837 0.67426 0.62988 0.57670
p(8000).L 0.23898 0.26838 0.30397 0.34488 0.38942 0.44487
p(8000).U 0.82187 0.80141 0.77310 0.74260 0.70414 0.65435
p(9000).L 0.30561 0.33149 0.37276 0.41748 0.46448 0.52203
p(9000).U 0.87042 0.85462 0.82993 0.80361 0.77045 0.72545
p(10000).L 0.36871 0.39257 0.44219 0.48549 0.53589 0.59276
p(10000).U 0.91227 0.89889 0.87805 0.85624 0.82667 0.78641
p(11000).L 0.41612 0.45097 0.50030 0.54631 0.59749 0.65671
p(11000).U 0.94491 0.93318 0.91728 0.89891 0.87425 0.83973
p(12000).L 0.46351 0.50388 0.55531 0.60133 0.65374 0.71215
p(12000).U 0.96669 0.95936 0.94650 0.93210 0.91231 0.88377
p(13000).L 0.50876 0.54776 0.60262 0.65055 0.70218 0.76148
p(13000).U 0.98278 0.97742 0.96794 0.95756 0.94149 0.91745
p(14000).L 0.54668 0.58696 0.64619 0.69359 0.74451 0.80178
p(14000).U 0.99201 0.98837 0.98205 0.97459 0.96267 0.94321
p(15000).L 0.58089 0.62534 0.68389 0.73194 0.78068 0.83590
p(15000).U 0.99653 0.99449 0.99092 0.98596 0.97764 0.96268
The above output was produced with
WeibullPivots(threshold = seq(6000, 15000, 1000), Nsim = 10000, graphics = T).
Since we entered graphics=T as argument we also got two pieces of graphical output. The ﬁrst
gives the two intrinsic pivot distributions of ˆ u/
ˆ
b and
ˆ
b in Figure 11. The second gives a Weibull
plot of the generated sample with a variety of information and with several types of conﬁdence
bounds, see Figure 12. The legend in the upper left gives the mle’s of α, β (agreeing with the
output above), and the mean µ = αΓ(1 + 1/β) together with 95% conﬁdence intervals, based on
respective normal approximation theory for the mle’s. The legend in the lower right explains the
red ﬁtted line (representing the mle ﬁt) and the various pointwise conﬁdence bound curves, giving
95% conﬁdence intervals (blue dashed curves) for pquantiles x
p
for any p on the ordinate and 95%
conﬁdence intervals (green dotdashed line) for p(y) for any y on the abscissa. Both of these interval
types use normal approximations from large sample mle theory. Unfortunately these two types of
bounds are not dual to each other, i.e., don’t coincide or to say it diﬀerently, one is not the inverse
to the other.
37
u
^
b
^
F
r
e
q
u
e
n
c
y
−2 −1 0 1
0
2
0
0
6
0
0
1
0
0
0
b
^
F
r
e
q
u
e
n
c
y
0.5 1.0 1.5 2.0
0
2
0
0
4
0
0
6
0
0
8
0
0
Figure 11: Pivot Distributions of ˆ u/
ˆ
b and
ˆ
b
38
cycles x 1000
p
r
o
b
a
b
i
l
i
t
y
q
q
q
q
q
q
q
q
q
q
0.001 0.01 0.1 1 10 100
.
0
0
1
.
0
1
0
.
1
0
0
.
2
0
0
.
5
0
0
.
9
0
0
.
9
9
9
.
6
3
2
m.l.e.
95 % q−confidence bounds
95 % p−confidence bounds
95 % monotone qp−confidence bounds
Weibull Plot
αα
^
= 8976 , 95 % conf. interval ( 6394 , 12600 )
ββ
^
= 1.947 , 95 % conf. interval ( 1.267 , 2.991 )
MTTF
µµ
^
= 7960 , 95 % conf. interval ( 5690 , 11140 )
n = 10 , r = 10 failed cases
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 12: Weibull Plot Corresponding to Previous Output
39
A third type of bound is presented in the orange curve which simultaneously provides 95% conﬁdence
intervals for x
p
and p(x), depending on the direction in which the curves are used. We either read
sideways from p and down from the curve (at that p level) to get upper and lower bounds for x
p
, or
we read vertically up from an abscissa value x to read oﬀ upper and lower bounds for p(x) on the
ordinate axis as we go from the respective curves at that x value to the left. These latter bounds
are also based on normal mle approximation theory and the approximation will naturally suﬀer for
small sample sizes. However, the principle behind these bounds is a unifying one in that the same
curve is used for quantile and tail probability bounds. If instead of using the approximating normal
distribution one uses the parametric bootstrap approach (simulating samples from an estimated
Weibull distribution) the unifying principle reduces to the pivot simulation approach, i.e., is basically
exact except for the simulation aspect N
sim
< ∞.
The curves representing the latter (pivots with simulated distributions) are the solid black lines
connecting the solid black dots which represent the x
p
95% conﬁdence intervals (using the 97.5%
lower and upper bounds to x
p
given in our output example above. Also seen on these curves are
solid red dots that correspond to the abscissa values x = 6000, (1000), 15000 and viewed vertically
they represent 95% conﬁdence intervals for p(x). This illustrates that the same curves are used.
Figure 13 represents an extreme case where we have a sample of size n = 2 and here another aspect
becomes apparent. Both of the ﬁrst two types of bounds (blue and green) are no longer monotone
in p or x respectively. This is in the nature of an imperfect normal approximation for these two
approaches. Thus we could not (at least not generally) have taken either to take the role of serving
both purposes, i.e., as providing bounds for x
p
and p(x) simultaneously. However, the orange curve
is still monotone and still serves that dual purpose, although its coverage probability properties are
bound to be aﬀected badly by the small sample size n = 2. The pivot based curves are also strictly
monotone and they have exact coverage probability, subject to the N
sim
< ∞ limitation.
40
cycles x 1000
p
r
o
b
a
b
i
l
i
t
y
q
q
0.1 0.2 0.5 1 2 5 10 20 50 100
.
0
0
1
.
0
1
0
.
1
0
0
.
2
0
0
.
5
0
0
.
9
0
0
.
9
9
9
.
6
3
2
m.l.e.
95 % q−confidence bounds
95 % p−confidence bounds
95 % monotone qp−confidence bounds
Weibull Plot
αα
^
= 8125 , 95 % conf. interval ( 6954 , 9493 )
ββ
^
= 9.404 , 95 % conf. interval ( 2.962 , 29.85 )
MTTF
µµ
^
= 7709 , 95 % conf. interval ( 6448 , 9217 )
n = 2 , r = 2 failed cases
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 13: Weibull Plot for Weibull Sample of Size n = 2
41
12 Regression
Here we extend the results of the previous location/scale model for logtransformed Weibull samples
to the more general regression model where the location parameter u for Y
i
= log(X
i
) can vary with
i, speciﬁcally it varies as a linear combination of known covariates c
i,1
, . . . , c
i,k
for the i
th
observation
as follows:
u
i
= ζ
1
c
i,1
+. . . +ζ
k
c
i,k
, i = 1, . . . , n ,
while the scale parameter b stays constant. Thus we have the following model for Y
1
, . . . , Y
n
Y
i
= u
i
+bE
i
= ζ
1
c
i,1
+. . . +ζ
k
c
i,k
+bZ
i
, i = 1, . . . , n ,
with independent Z
i
∼ G(z) = 1−exp(−exp(z)), i = 1, . . . , n, and unknown parameters b > 0 and
ζ
1
, . . . , ζ
k
∈ R.
Two concrete examples of this general linear model will be discussed in detail later on. The ﬁrst is
the simple linear regression model and the other is the ksample model, which exempliﬁes ANOVA
situations.
It can be shown (Scholz, 1996) that the mle’s
ˆ
ζ
= (
ˆ
ζ
1
, . . . ,
ˆ
ζ
k
) and
ˆ
b of ζ and b exist and are unique
provided the covariate matrix C, consisting of the rows c
i
= (c
i,1
, . . . , c
i,k
), i = 1, . . . , n, has full
rank k and n > k. It is customary that the ﬁrst column of C is a vector of n 1’s. Alternatively,
one can also only specify the remaining k − 1 columns and implicitly invoke the default option in
survreg that augments those columns with such a 1vector. These two usages are illustrated in the
function WeibullReg which is given on the next page.
It is very instructive to run this function as part of the following call:
{\tt system.time(for(i in 1:1000)WeibullReg())} ,
i.e., we execute the function WeibullReg a thousand times in close succession.
The rapidly varying plots give a good visual image of the sampling uncertainty and the resulting
sampling variation of the ﬁtted lines. The ﬁxed line represents the true line with respect to which
the Weibull data are generated by simulation. Of course, the logWeibull data are plotted because
of its more transparent relationship to the true line. It is instructive to see the variability of the
data clouds around the true line, but also the basic stability of the overall cloud pattern as a whole.
On my laptop the elapsed time for this call is about 15 seconds, and this includes the plotting time.
When the plotting commands are commented out the elapsed time reduces to about 9 seconds.
This promises reasonable behavior with respect to the computing times that can be anticipated for
the conﬁdence bounds to be discussed below.
42
WeibullReg < function (n=50,x=NULL,alpha=10000,beta=1.5,slope=.05)
{
# We can either input our own covariate vector x of length n
# or such a vector is generated for us (default).
#
if(is.null(x)) x < (1:n(n+1)/2)
uvec < log(alpha)+slope*x
b < 1/beta
# Create the Weibull data
time < exp(uvec+b*log(log(1runif(n))))
# Creating good vertical plotting limits
m < min(uvec)+b*log(log(11/(3*n+1)))
M < max(uvec)+b*log(log(1/(3*n+1)))
plot(x,log(time),ylim=c(m,M))
dat < data.frame(time,x)
out < survreg(Surv(time)~x,data=dat,dist="weibull")
# The last two lines would give the same result as the next three lines
# after removing the # signs.
# x0 < rep(1,n)
# dat < data.frame(time,x0,x)
# survreg(formula = Surv(time) ~ x0 + x  1, data = dat, dist = "weibull")
# Here we created the vector x0 of ones explicitly and removed the implicit
# vector of ones by the 1 in ~ x0+x1.
# Note also, that we did not use a status vector (of ones) in the creation
# of dat, since survreg will use status = 1 for each observation, i.e,
# treat the given time as a failure time as default.
abline(log(alpha),slope) #true line
# estimated line
abline(out$coef[1],out$coef[2],col="blue",lty=2)
# Here out has several components, of which only
# out$coef and out$scale of of interest to us.
# The estimate out$scale is the mle of b=1/beta
# and out$coef is a vector that gives the mle’s
# of intercept and the various regression coefficients.
out
}
43
12.1 Equivariance Properties
From the existence and uniqueness of the mle’s we can again deduce the following equivariance
properties for the mle’s, namely for r = Ca +σz we have
ˆ
ζ(r) = a +σ
ˆ
ζ(z) and
ˆ
b(r) = σ
ˆ
b(z) .
The proof follows the familiar line used in the location/scale case. With r
i
= c
i
a +σz
i
we have
sup
b,ζ
_
n
i=1
1
b
g
_
r
i
−c
i
ζ
b
__
=
1
σ
n
sup
b,ζ
_
n
i=1
1
b/σ
g
_
z
i
−c
i
(ζ −a)/σ
b/σ
__
using
˜
ζ = (ζ −a)/σ and
˜
b = b/σ =
1
σ
n
sup
˜
b,
˜
ζ
_
n
i=1
1
˜
b
g
_
z
i
−c
i
˜
ζ
˜
b
__
=
1
σ
n
n
i=1
1
ˆ
b(z)
g
_
z
i
−c
i
ˆ
ζ(z)
ˆ
b(z)
_
On the other hand
sup
b,ζ
_
n
i=1
1
b
g
_
r
i
−c
i
ζ
b
__
=
n
i=1
1
ˆ
b(r)
g
_
r
i
−c
i
ˆ
ζ(r)
ˆ
b(r)
_
=
1
σ
n
n
i=1
1
ˆ
b(r)/σ
g
_
z
i
−c
i
(
ˆ
ζ(r) −a)/σ
ˆ
b(r)/σ
_
and by the uniqueness of the mle’s the equivariance claim is an immediate consequence.
12.2 Pivots and Conﬁdence Bounds
From these equivariance properties it follows that (
ˆ
ζ −ζ)/
ˆ
b and
ˆ
b/b have distributions that do not
depend on any unknown parameters, i.e., b and ζ. The logtransformed Weibull data have the
following regression structure Y = Cζ + bZ, where Z = (Z
1
, . . . , Z
n
)
consists of independent and
identically distributed components with known cdf G(z) = 1−exp(−exp(z)). From the equivariance
property we have that
ˆ
ζ(Y ) = ζ +b
ˆ
ζ(Z) and
ˆ
b(Y ) = b
ˆ
b(Z) .
Thus
ˆ
ζ(Y ) −ζ
ˆ
b(Y )
=
b
ˆ
ζ(Z)
b
ˆ
b(Z)
=
ˆ
ζ(Z)
ˆ
b(Z)
and
ˆ
b(Y )
b
=
b
ˆ
b(Z)
b
=
ˆ
b(Z) ,
44
which have a distribution free of any unknown parameters. This distribution can be approximated
to any desired degree via simulation, just as in the location scale case, except that we will need to
incorporate the known covariate matrix C in the call to survreg in order to get the N
sim
simulated
parameter vectors (
ˆ
ζ(Z
1
),
ˆ
b
1
(Z
1
)), . . . , (
ˆ
ζ(Z
N
sim
),
ˆ
b(Z
N
sim
)) and thus the empirical distribution of
(
ˆ
ζ(Z
1
)/
ˆ
b
1
(Z
1
),
ˆ
b
1
(Z
1
)), . . . , (
ˆ
ζ(Z
N
sim
)/
ˆ
b(Z
N
sim
),
ˆ
b(Z
N
sim
)).
For any target covariate vector c
0
= (c
0,1
, . . . , c
0,k
) the distribution of (c
0
ˆ
ζ(Y ) −c
0
ζ)/
ˆ
b(Y ) is free
of unknown parameters since
c
0
ˆ
ζ(Y ) −c
0
ζ
ˆ
b(Y )
=
c
0
ˆ
ζ(Z)
ˆ
b(Z)
and we can use the simulated values (c
0
ˆ
ζ(Z
i
))/
ˆ
b(Z
i
), i = 1, . . . , N
sim
, to approximate this pa
rameter free distribution. If ˆ η
2
(γ, c
0
) denotes the γquantile of this simulated distribution then we
can view c
0
ˆ
ζ(Y ) − ˆ η
2
(γ, c
0
)
ˆ
b(Y ) as an approximate 100γ% lower bound for c
0
ζ, the log of the
characteristic life at the covariate vector c
0
. This can be demonstrated as in the location/scale case
for the location parameter u.
Similarly, if ˆ η
1
(γ) is the γquantile of the simulated
ˆ
b(Z
i
), i = 1, . . . , N
sim
, then we can view
ˆ
b(Y )/ˆ η
1
(γ) as approximate 100γ% lower bound for b. We note here that these quantiles ˆ η
1
(γ)
and ˆ η
2
(γ, c
0
) depend on the original covariance matrix C, i.e., they diﬀer from those used in the
location/scale case. The same comment applies to the other conﬁdence bound procedures following
below.
For a given covariate vector c
0
we can target the pquantile y
p
(c
0
) = c
0
ζ +bw
p
of the Y distribution
with covariate dependent location parameter u(c
0
) = c
0
ζ and scale parameter b. We can calculate
c
0
ˆ
ζ(Y ) −
ˆ
k
p
(γ)
ˆ
b(Y ) as an approximate 100γ% lower bound for y
p
(c
0
), where
ˆ
k
p
(γ) is the γquantile
of the simulated (c
0
ˆ
ζ(Z
i
) −w
p
)/
ˆ
b(Z
i
), i = 1, . . . , N
sim
.
For the tail probability p(y
0
) = G((y
0
−c
0
ζ)/b) with given threshold y
0
and covariate vector c
0
we
obtain an approximate 100γ% upper bound by using the γquantile of the simulated values
G(c
0
ˆ
ζ(Z
i
) −x
ˆ
b(Z
i
)), i = 1, . . . , N
sim
,
where x = (y
0
− c
0
ˆ
ζ(y)/
ˆ
b(y) and y is the originally observed sample vector, obtained under the
covariate conditions speciﬁed through V .
We note here that the above conﬁdence bounds for the log(characteristic life) or regression location,
pquantiles and tail probabilities depend on the covariate vector c
0
that is speciﬁed. Not only does
this dependence arise through the use of c
0
ˆ
ζ(Y ) in each case but also through the simulated
distributions which incorporate c
0
in each of these three situations. The only exception is the
conﬁdence bound for b, which makes sense since we assumed a constant scale for all covariate
situations.
45
12.3 The Simple Linear Regression Model
Here we assume the following simple linear regression model for the Y
i
= log(X
i
)
Y
i
= ζ
1
+ζ
2
c
i
, i = 1, . . . , n .
In matrix notation this becomes
Y =
_
_
_
_
Y
1
.
.
.
Y
n
_
_
_
_
=
_
_
_
_
1 c
1
.
.
.
.
.
.
1 c
n
_
_
_
_
_
ζ
1
ζ
2
_
+b
_
_
_
_
Z
1
.
.
.
Z
n
_
_
_
_
= Cζ +b Z .
Here ζ
1
and ζ
2
represent the intercept and slope parameters in the straight line regression model for
the location parameter and b represents the degree of scatter (scale) around that line. In the context
of the general regression model we have k = 2 here and c
1,i
= 1 and c
2,i
= c
i
for i = 1, . . . , n. The
conditions for existence and uniqueness of the mle’s are satisﬁed when the covariate values c
1
, . . . , c
n
are not all the same.
The R function call system.time(WeibullRegSim(n=20,Nsim=10000)) (done twice and recording
an elapsed time of about 76 seconds each) produced each of the plots in Figure 14. Each call
generates its own data set of 20 points using 5 diﬀerent levels of covariate values. The data are
generated from a true Weibull distribution with a known true regression line relationship for log(α)
in relation to the covariates, as shown in the plots. Also shown in these plots is the .10quantile
line. Estimated lines are indicated by the corresponding color coded dashed lines.
In contrast, the quantile lower conﬁdence bounds based on N
sim
= 10000 simulations are represented
by a curve. This results from the fact that the factor
ˆ
k
p
(γ) used in the construction of the lower
bound,
ˆ
ζ
1
(Y ) +
ˆ
ζ
2
(Y )c−
ˆ
k
p
(γ)
ˆ
b(Y ), is the γquantile of the simulated values (c
0
ˆ
ζ(Z
i
) −w
p
)/
ˆ
b(Z
i
),
i = 1, . . . , N
sim
, and these values change depending on which c
0
= (1, c) is involved. This curvature
adjusts to some extent to the sampling variation swivel action in the ﬁtted line.
We repeated the above with a sample of size n = 50 (taking about 85 seconds for each plot) and the
corresponding two plots are shown in Figure 15. We point out two features. In this second set of
plots the lower conﬁdence bound curve is generally closer to the ﬁtted quantile line than in the ﬁrst
set of plots. This illustrates the sample size eﬀect, i.e., we are getting better or less conservative in
our bounds. The second feature shows up in the bottom plot where the conﬁdence curve crosses
the true percentile line, i.e., it gets on the wrong side of it. Such things happen, because we have
only 95% conﬁdence in the bound. Note that these bounds should be interpreted pointwise for
each covariate value and should not be viewed as simultaneous conﬁdence bands.
The function WeibullRegSim is part of the R workspace on the class web site. It can easily be
modiﬁed to handle any simple linear regression Weibull data set. Multiple regression relationships
could also be accommodated quite easily. To get a feel for the behavior of the conﬁdence bounds it
is useful to exercise this function repeatedly, but using N
sim
= 1000 for faster response.
46
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
−2 0 2 4 6 8
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 20
confidence bounds based on 10000 simulations
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 20
confidence bounds based on 10000 simulations
Figure 14: Weibull Regression with Quantile Bounds (n = 20)
47
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
6
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 50
confidence bounds based on 10000 simulations
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 0 2 4 6 8
6
7
8
9
1
0
1
1
covariate
l
o
g
(
t
i
m
e
)
true characteristic life
true 0.1−quantile
estimated characteristic life
estimated 0.1−quantile
95% lower bound to 0.1−quantile
total sample size n = 50
confidence bounds based on 10000 simulations
Figure 15: Weibull Regression with Quantile Bounds (n = 50)
48
12.4 The kSample Problem
A second illustration example concerns the situation of k = 3 samples with same scale but possibly
diﬀerent locations. In terms of the untransformed Weibull data this means that we have possibly
diﬀerent unknown characteristic life parameters (α
1
, α
2
, α
3
) but same unknown shape β for each
sample. The modiﬁcations for k = 3 should be obvious. In matrix notation this model is
Y =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Y
1
.
.
.
Y
n
1
.
.
.
Y
n
1
+1
.
.
.
Y
n
1
+n
2
Y
n
1
+n
2
+1
.
.
.
Y
n
1
+n
2
+n
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
.
.
.
.
.
.
.
.
.
1 0 0
1 1 0
.
.
.
.
.
.
.
.
.
1 1 0
1 0 1
.
.
.
.
.
.
.
.
.
1 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
ζ
1
ζ
2
ζ
3
_
_
_ +b
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Z
1
.
.
.
Z
n
1
.
.
.
Z
n
1
+1
.
.
.
Z
n
1
+n
2
Z
n
1
+n
2
+1
.
.
.
Z
n
1
+n
2
+n
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
= Cζ +b Z .
Here the Y
i
have location u
1
= ζ
1
for the ﬁrst n
1
observations, they have location u
2
= ζ
1
+ ζ
2
for
the next n
2
observations and they have location u
3
= ζ
1
+ζ
3
for the last n
3
observations. Thus we
can consider u
1
= ζ
1
as the baseline location (represented by the ﬁrst n
1
observations), ζ
2
can be
considered as the incremental change from u
1
to u
2
and ζ
3
is the incremental change from u
1
to u
3
.
If we were interested in the question whether the three samples come from the same location/scale
model we would consider testing the hypothesis H
0
: ζ
2
= ζ
3
= 0 or equivalently H
0
: u
1
= u
2
= u
3
.
Instead of using the likelihood ratio test, which invokes the χ
2
k−1
= χ
2
2
distribution as approximate
null distribution, we will employ the test statistic suggested in Lawless (1982) (p. 302, equation
(6.4.12)) for which the same approximate null distribution is invoked. Our reason for following this
choice is its similarity to the standard test statistic used in the corresponding normal distribution
model, i.e., when Z
i
∼ Φ(z) instead of Z
i
∼ G(z) as in the above regression model. Also, the
modiﬁcation of this test statistic for general k(= 3) is obvious. The formal deﬁnition of the test
statistic proposed by Lawless is as follows:
Λ
1
= (
ˆ
ζ
2
,
ˆ
ζ
3
)K
−1
11
(
ˆ
ζ
2
,
ˆ
ζ
3
)
t
,
where K
11
is the asymptotic 2 × 2 covariance matrix of (
ˆ
ζ
2
,
ˆ
ζ
3
). Without going into the detailed
derivation one can give the following alternate and more transparent expression for Λ
1
Λ
1
=
3
i=1
n
i
(ˆ u
i
(Y ) − ˆ u(Y ))
2
ˆ
b(Y )
2
,
49
where
ˆ u
1
(Y ) =
ˆ
ζ
1
(Y ) , ˆ u
2
(Y ) =
ˆ
ζ
1
(Y )+
ˆ
ζ
2
(Y ) , ˆ u
3
(Y ) =
ˆ
ζ
1
(Y )+
ˆ
ζ
3
(Y ) and ˆ u(Y ) =
3
i=1
(n
i
/N)ˆ u
i
(Y ) ,
with N = n
1
+n
1
+n
3
. In the normal case Λ
1
reduces to the traditional Ftest statistic (except for a
constant multiplier, namely (n−k)/((k−1)n) = (n−3)/(2n)) when writing ˆ u
i
(Y ) =
¯
Y
i.
, i = 1, 2, 3
and ˆ u(Y ) =
¯
Y
..
= (n
1
/N)
¯
Y
1.
+ (n
2
/N)
¯
Y
2.
+ (n
3
/N)
¯
Y
3.
and
ˆ
b(Y )
2
=
1
n
k
i=1
n
i
j=1
(Y
ij
−
¯
Y
i.
)
2
,
which are the corresponding mle’s in the normal case. However, in the normal case one uses the
F
k−1,N−k
distribution as the exact null distribution of the properly scaled Λ
1
and the uncertainty
in
ˆ
b(Y )
2
is not ignored by simply referring to the χ
2
k−1
distribution, using a large sample argument.
We don’t have to use a large sample approximation either, since the null distribution of Λ
1
(in the
logWeibull case) is free of any unknown parameters and can be simulated to any desired degree of
accuracy. This is seen as follows from our equivariance properties. Recall that
ˆ u
1
(Y ) −u
1
ˆ
b(Y )
=
ζ
1
(Y ) −ζ
1
ˆ
b(Y )
,
ˆ u
i
(Y ) −u
i
ˆ
b(Y )
=
ζ
1
(Y ) +ζ
i
(Y ) −(ζ
1
+ζ
i
)
ˆ
b(Y )
, i = 2, 3
have distributions free of unknown parameters. Under the hypothesis H
0
when u
1
= u
2
= u
3
(= u)
we thus have that
ˆ u
i
(Y ) −u
ˆ
b(Y )
,
ˆ u(Y ) −u
ˆ
b(Y )
, and thus
ˆ u
i
(Y ) − ˆ u(Y )
ˆ
b(Y )
=
ˆ u
i
(Y ) −u
ˆ
b(Y )
−
ˆ u(Y ) −u
ˆ
b(Y )
have distributions free of any unknown parameters which in turn implies the above claim about Λ
1
.
Thus we can estimate the null distribution of Λ
1
by using the N
sim
simulated values of
ˆ
ζ
i
(Z
j
)/
ˆ
b(Z
j
)
to create
ˆ u
1
(Z
j
)
ˆ
b(Z
j
)
=
ˆ
ζ
1
(Z
j
)
ˆ
b(Z
j
)
,
ˆ u
i
(Z
j
)
ˆ
b(Z
j
)
=
ˆ
ζ
1
(Z
j
) +
ˆ
ζ
i
(Z
j
)
ˆ
b(Z
j
)
, i = 2, 3 and
ˆ u(Z
j
)
ˆ
b(Z
j
)
=
3
i=1
n
i
ˆ u
i
(Z
j
)/N
ˆ
b(Z
j
)
and thus
Λ
1
(Z
j
) =
3
i=1
n
i
(ˆ u
i
(Z
j
) − ˆ u(Z
j
))
2
ˆ
b(Z
j
)
2
j = 1, . . . , N
sim
.
The distribution of these N
sim
values Λ
1
(Z
j
) will give a very good approximation for the true null
distribution of Λ
1
. The accuracy of this approximation is entirely controllable by the choice of N
sim
.
N
sim
= 10000 should be suﬃcient for most practical purposes.
50
The following plots examine the χ
2
2
approximation to the Λ
1
null distribution in the case of 3
samples of respective sizes n
1
= 5, n
2
= 7 and n
3
= 9. This is far from qualifying for a large
sample situation. The histogram in Figure 16 is based on N
sim
= 10000 simulated values of Λ
1
(Z
).
Although the superimposed χ
2
2
density is similar in character, there are strong diﬀerences. Using
the χ
2
2
distribution would result in much smaller pvalues than appropriate when these are on the
low side.
ΛΛ
1
*
D
e
n
s
i
t
y
0 2 4 6 8 10 12
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
asymptotic χχ
2
2
density
Figure 16: Histogram of Λ
1
Null Distribution with Asymptotic Approximation
Figure 17 shows a QQplot of the approximating χ
2
2
quantiles corresponding to the N
sim
= 10000
simulated and ordered Λ
1
(Z
i
) values. For a good approximation the points should follow the
shown main diagonal. The discrepancy is quite strong. Each point on this plot corresponds to a
pquantile where the abscissa value of such a point gives us the approximate pquantile of the Λ
1
null distribution and the corresponding ordinate gives us the pquantile of the χ
2
2
distribution which
is suggested as asymptotic approximation. The vertical probability scale facilitates the reading oﬀ
of p for each quantile level on either axis.
51
0 20 40 60
0
5
1
0
1
5
2
0
simulated & sorted ΛΛ
1
*
0.9999
0.9995
0.999
0.998
0.997
0.995
0.993
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.9
χχ
2 2
−
q
u
a
n
t
i
l
e
s
Figure 17: QQPlots Comparing Λ
1
(Z
) with χ
2
2
Clearly the pquantiles of the χ
2
2
distribution are smaller than the corresponding pquantiles of the
the Λ
1
null distribution. If we take the p corresponding to a Λ
1
on the abscissa and look up its p
according to the χ
2
2
scale, we only need to go up to the main diagonal at that abscissa location and
look up the p
on the χ
2
2
scale to the left. For example, the .95quantile for the Λ
1
null distribution
would yield a p
≈ .994 on the χ
2
2
scale. Thus a true Λ
1
pvalue of .05 would translate into a very
overstated observed signiﬁcance level of .006 according to the χ
2
2
approximation.
Figure 18 shows the comparison of the χ
2
2
quantiles with the corresponding quantiles of the 2×F
2,21−3
distribution (the factor 2 adjusts for the fact that the numerator of the F statistic is divided by
k − 1 = 2). This comparison is the counter part to the previous situation if we were to use the
asymptotic χ
2
2
distribution as approximation to the exact and true 2 × Fratio distribution had
we carried out the corresponding test in a normal data situation, i.e., tested whether three normal
random samples with common σ have the same mean. Again, the approximation is not good. The
departure from the main diagonal is not as severe as in Figure 17 , but the eﬀect is similar, i.e., the
52
χ
2
2
distribution would yield much smaller pvalues than appropriate when these pvalues are on the
small side, i.e., when they become critical.
0 5 10 15 20 25
0
5
1
0
1
5
2*qf((1:1000−.5)/1000,2,18)
q
c
h
i
s
q
(
1
:
1
0
0
0
−
.
5
)
/
1
0
0
0
,
2
)
0.9995
0.999
0.998
0.997
0.995
0.993
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.9
Figure 18: QQPlots Comparing 2 ×F
2,21−3
with χ
2
2
Aside from testing the equality of logWeibull distributions we can also obtain the various types
of conﬁdence bounds for the location, scale, pquantiles and tail probabilities for each sampled
population, whether the k locations are the same or not. This is diﬀerent from doing so for each
sample separately since we use all k samples to estimate b, which was assumed to be the same for
all k populations. This will result in tighter conﬁdence bounds than what would result from the
corresponding conﬁdence bound analysis of individual samples. We omit the speciﬁc details since
they should be clear from the general situation as applied in the simple linear regression case.
53
12.5 Goodness of Fit Tests
As in the location/scale case we can exploit the equivariance properties of the mle’s in the general
regression model to carry out the previously discussed goodness of ﬁt tests by simulation. Using
the previous computational formulas for the D, W
2
and A
2
we only need to deﬁne the appropriate
V
i
, namely
V
i
= G
_
Y
i
−v
i
ˆ
ζ(Y )
ˆ
b(Y )
_
, i = 1, . . . , n .
Pierce and Kopecky (1979) showed that the asymptotic null distributions of D, W
2
and A
2
, using
the sorted values V
(1)
≤ . . . ≤ V
(n)
of these modiﬁed versions of V
i
, are respectively the same as in
the location/scale case, i.e., they do not depend on the additional covariates that may be present.
This assumes that the covariate matrix C contains a vector of ones. However, for ﬁnite sample
sizes the eﬀects of these covariates may still be relevant. The eﬀect of using the small sample tables
given by Stephens (1986) is not clear.
However, one can easily simulate the null distributions of these statistics since they do not depend
on any unknown parameters. Using the data representation Y
i
= c
i
ζ + bZ
i
with i.i.d. Z
i
∼ G(z),
or Y = Cζ +bZ this is seen from the equivariance properties as follows
Y
i
−c
i
ˆ
ζ(Y )
ˆ
b(Y )
=
c
i
ζ +bZ
i
−c
i
(ζ +b
ˆ
ζ(Z))
b
ˆ
b(Z)
=
Z
i
−c
i
ˆ
ζ(Z)
ˆ
b(Z)
and thus
V
i
= G
_
Y
i
−c
i
ˆ
ζ(Y )
ˆ
b(Y )
_
= G
_
Z
i
−c
i
ˆ
ζ(Z)
ˆ
b(Z)
_
.
For any covariate matrix C and sample size n the null distributions of D, W
2
and A
2
can be
approximated to any desired degree. All we need to do is generate vectors Z
= (Z
1
, . . . , Z
n
)
i.i.d.
∼ G(z), compute the mle’s
ˆ
ζ(Z),
ˆ
b(Z), and from that V
, followed by D
= D(V
), W
2
= W
2
(V
)
and A
2
= A
2
(V
). Repeating this a large number of times, say N
sim
= 10000, would yield values
D
i
, W
2
i
, A
2
i
, i = 1, . . . , N
sim
. Their respective empirical distributions would serve as excellent
approximations to the desired null distributions of these test of ﬁt criteria.
54
−1 0 1 2
2
4
6
8
1
0
1
2
covariate c
l
o
g
(
f
a
i
l
u
r
e
t
i
m
e
)
Figure 19: Weibull Regression Example
Figure 19 shows some Weibull regression data which were generated as follows:
Y
i
= log(α) + slope ×c
i
+bZ
i
, i = 1, . . . , n with Z
1
, . . . , Z
n
i.i.d. ∼ G(z) ,
where α = 10000, b = 2/3, slope = 1.3 and there were 20 observations each at c
i
= i − 2.5 for
i = 1, . . . , 5.
Here X
i
= exp(Y
i
) would be viewed as the Weibull failure time data with common shape parameter
β = 1/b = 1.5 and characteristic life parameters α
i
= αexp(scale × c
i
) which are aﬀected in a
multiplicative manner by the covariate values c
i
, i = 1, . . . , 5.
The solid sloped line in Figure 19 indicates the true log(characteristic life) for the Weibull regression
data while the dashed line represents its estimate.
55
n == 100 , N
sim
== 10000
D
*
F
r
e
q
u
e
n
c
y
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0
4
0
0
8
0
0
1
2
0
0 p−value = 0.7746
D == 0.0501
W
2*
F
r
e
q
u
e
n
c
y
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0
5
0
0
1
0
0
0
2
0
0
0
p−value = 0.8001
W
2
== 0.0325
A
2*
F
r
e
q
u
e
n
c
y
0.0 0.5 1.0 1.5
0
5
0
0
1
0
0
0
1
5
0
0
p−value = 0.7135
A
2
== 0.263
Figure 20: Weibull Goodness of Fit for Weibull Regression Example
For this generated Weibull regression data set Figure 20 shows the results of the Weibull goodness
of ﬁt tests in relation to the simulated null distributions for the test criteria D, W
2
and A
2
. The
hypothesis of a Weibull lifetime distribution cannot be rejected by any of the three test of ﬁt criteria
based on the shown pvalues.
This example was produced by the R function WeibullRegGOF available in the R workspace on the
class web site. It took 105 seconds on my laptop. This function performs Weibull goodness of ﬁt
tests for any supplied regression data set. When this data set is missing it generates its own Weibull
regression data set of size n = 100 and uses the indicated covariates and parameters.
56
−1 0 1 2
6
8
1
0
1
2
1
4
covariate c
l
o
g
(
f
a
i
l
u
r
e
t
i
m
e
)
Figure 21: Normal Regression Example
Figure 21 shows some normal regression data which were generated as follows:
Y
i
= log(α) + slope ×c
i
+bZ
i
, i = 1, . . . , n with Z
1
, . . . , Z
n
i.i.d. ∼ Φ(z) ,
where α = 10000, b = 2/3, slope = 1.3 and there were 20 observations each at c
i
= i − 2.5 for
i = 1, . . . , 5. Here X
i
= exp(Y
i
) would be viewed as the failure time data. Such data would have a
lognormal distribution.
This data set was produced internally within WeibullRegGOF by modifying the line that generated
the original data sample, so that Z
i
∼ Φ(z), i.e., Z < rnorm(n,0,1) is used. The simulation of the
test of ﬁt null distributions remains essentially unchanged except that a diﬀerent random number
starting seed was used.
Here the solid sloped line indicates the mean of the normal regression data while the dashed line
represents the estimate according to an assumed Weibull model. Note the much wider discrepancy
here compared to the corresponding situation in Figure 19. The reason for this wider gap is that
the ﬁtted line aims for the .632quantile and not the median of that data.
57
n == 100 , N
sim
== 10000
D
*
F
r
e
q
u
e
n
c
y
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0
4
0
0
8
0
0
1
2
0
0 p−value = 0.0213
D == 0.0953
W
2*
F
r
e
q
u
e
n
c
y
0.0 0.1 0.2 0.3 0.4
0
5
0
0
1
5
0
0
p−value = 5e−04
W
2
== 0.265
A
2*
F
r
e
q
u
e
n
c
y
0.0 0.5 1.0 1.5 2.0 2.5
0
5
0
0
1
5
0
0
2
5
0
0
p−value = 4e−04
A
2
== 1.73
Figure 22: Weibull Goodness of Fit for Normal Regression Example
Here the pvalues clearly indicate that the hypothesis of a Weibull distribution should be rejected,
although the evidence in the case of D is not very strong. However, for W
2
and A
2
there should be
no doubt in the (correct) rejection of the hypothesis.
Any slight diﬀerences in the null distributions shown here and in the previous example are due to
a diﬀerent random number seed being used in the two cases.
58
References
Bain, L.J. (1978) Statistical Analysis of Reliability and LifeTesting Models, New York, Dekker.
Bain, L.J. and Engelhardt M.(1991) Statistical Analysis of Reliability and LifeTesting Models,
Theory and Methods, Second Edition, New York, Dekker.
Lawless, J.F. (1982), Statistical Models and Methods for Lifetime Data, John Wiley & Sons, New
York.
Pierce and Kopecky, K.J. (1979), “Testing Goodness of Fit for the Distribution of Errors in Regres
sion Models,” Biometrika, Vol. 66, No. 1, 15
Saunders, S.C. (1975), “Birnbaum’s Contributions to Reliability,” Reliability and Fault Tree Anal
ysis, Theoretical and Applied Aspects of System Reliability and Safety Assessment,Barlow, R.E,
Fussell, J.B., and Singpurwalla, N.D. editors, Society for Industrial and Applied Mathematics, 33
South 17 Street, Philadelphia PA 19103.
Stephens, M.A. (1986), “Tests based on EDF statistics,” GoodnessofFit Techniques, D’Agostino,
R. and Stephens, M.A. (editors), Dekker, New York.
Scholz, F.W. (1996) (revised 2001), “Maximum Likelihood Estimation for Type I Censored Weibull
Data Including Covariates,” ISSTECH96022, Boeing Information and Support Services.
Scholz, F.W. (2008) “Weibull Probability Paper,”(informal note).
Thoman, D.R., Bain, L.J., and Antle, C.E. (1969), “Inferences on parameters of the Weibull distri
bution,” Technometrics, Vol 11, No. 3, 445460.
Thoman, D.R., Bain, L.J., and Antle, C.E. (1970), “Exact conﬁdence intervals for reliability, and
tolerance limits in the Weibull distribution,” Technometrics, Vol 12, No. 2, 363371.
59
Weibull densities
β1 = 0.5 β2 = 1 β3 = 1.5 β4 = 2 β5 = 3.6 β6 = 7 α = 10000
63.2% 36.8%
α
Figure 1: A Collection of Weibull Densities with α = 10000 and Various Shapes
2
2
Minimum Closure and Weakest Link Property
The Weibull distribution has the following minimum closure property: If X1 , . . . , Xn are independent with Xi ∼ W(αi , β), i = 1, . . . , n, then
n
P (min(X1 , . . . , Xn ) > t) = P (X1 > t, . . . , Xn > t) =
i=1 n
P (Xi > t) 1 β i=1 αi
n −1/β n
=
i=1
exp −
t αi
β
β
= exp −tβ
t = exp − α
1 with α = β i=1 αi
,
i.e., min(X1 , . . . , Xn ) ∼ W(α , β). This is reminiscent of the closure property for the normal 2 distribution under summation, i.e., if X1 , . . . , Xn are independent with Xi ∼ N (µi , σi ) then
n n n
Xi ∼ N
i=1 i=1
µi ,
i=1
2 σi
.
This summation closure property plays an essential role in proving the central limit theorem: Sums of independent random variables (not necessarily normally distributed) have an approximate normal distribution, subject to some mild conditions concerning the distribution of such random variables. There is a similar result from Extreme Value Theory that says: The minimum of independent, identically distributed random variables (not necessarily Weibull distributed) has an approximate Weibull distribution, subject to some mild conditions concerning the distribution of such random variables. This is also referred to as the “weakest link” motivation for the Weibull distribution. The Weibull distribution is appropriate when trying to characterize the random strength of materials or the random lifetime of some system. This is related to the weakest link property as follows. A piece of material can be viewed as a concatenation of many smaller material cells, each of which has its random breaking strength Xi when subjected to stress. Thus the strength of the concatenated total piece is the strength of its weakest link, namely min(X1 , . . . , Xn ), i.e., approximately Weibull. Similarly, a system can be viewed as a collection of many parts or subsystems, each of which has a random lifetime Xi . If the system is deﬁned to be in a failed state whenever any one of its parts or subsystems fails, then the system lifetime is min(X1 , . . . , Xn ), i.e., approximately Weibull. Figure 2 gives a sense of usage of the Weibull distribution and Figure 3 shows the “real thing.” Googling “Weibull distribution” produced 185,000 hits while ”normal distribution” had 2,420,000 hits. 3
Figure 2: Publications on the Weibull Distribution 4
Figure 3: Waloddi Weibull
5
By some statisticians it was even believed to be the only possible one. (Maybe that is the reason it was so inﬂuential!)’ 3 The Hazard Function The hazard function for any nonnegative random variable with cdf F (x) and density f (x) is deﬁned as h(x) = f (x)/(1−F (x)).garfield. . originally submitted to a statistics journal and rejected. That this can be accomplished on most calculators is also moot since many calculators also give you Φ(x). namely P (x < X ≤ x + dX > x) = F (x + d) − F (x) d × f (x) P (x < X ≤ x + d) = ≈ = d × h(x) . or force of mortality.The Weibull distribution is very popular among engineers. the distribution function proposed by Gauss was dominating and was distinguishingly called the normal distribution. “.” (G¨ran W. in today’s computing environment one could argue that point since typically the computation of even exp(x) requires computing. However.upenn. In the case of the Weibull hazard rate function we observe that it 6 .β (x) α α β−1 . It is usually employed for distributions that model random lifetimes and it relates to the probability that a lifetime comes to an end within the next small time increment of length d given that the lifetime has exceeded x so far.” Journal of Applied Mechanics. P (X > x) 1 − F (x) 1 − F (x) In the case of the Weibull distribution we have h(x) = fα. 1981. failure rate (function). . he tried to publish an article in a wellknown British journal.pdf) Sam Saunders (1975): ‘Professor Wallodi (sic) Weibull recounted to me that the now famous paper of his “A Statistical Distribution of Wide Applicability”. Weibull. o http://www.β (x) β x = 1 − Fα. such as hazard rate. in which was ﬁrst advocated the “Weibull” distribution with its failure rate a power of time. Thus one of the most inﬂuential papers in statistics of that decade was published in the Journal of Applied Mechanics. was eventually published in an engineering journal: Waloddi Weibull (1951) “A statistical distribution function of wide applicability. At this time.library. Another reason for the popularity of the Weibull distribution among engineers may be that Weibull’s most famous paper. 18. See [35]. That was just the same article as the highly cited one published in 1951.edu/classics1981/A1981LD32400001. was rejected by the Journal of the American Statistical Association as being of no interrest. One reason for this is that the Weibull cdf has a closed form which is not the case for the normal cdf Φ(x). 293297. The article was refused with the comment that it was interesting but of no practical importance. Various other terms are used equivalently for the hazard function.
β) =⇒ log(X) = Y has a locationscale distribution. i.e. One also refers to this as a random failure model in the sense that failures are due to external shocks that follow a Poisson process with rate λ = 1/α. i. T ] one can view the k occurrence times as being uniformly distributed over the interval [0. The random times between shocks are exponentially distributed with mean α. possibly due to hardening. namely its cumulative distribution function (cdf) is exp(y) P (Y ≤ y) = P (log(X) ≤ y) = P (X ≤ exp(y)) = 1 − exp − α = 1 − exp [− exp {(y − log(α)) × β}] = 1 − exp − exp = 1 − exp − exp β y − log(α) 1/β y−u b with location parameter u = log(α) and scale parameter b = 1/β. since for β = 1 we have exp(−(x + h)/α) P (X > x + h) = = exp(−h/α) = P (X > h) . For β < 1 (less common) the system has a better chance of surviving the next small time increment d as it gets older. Given that there are k such shock events in an interval [0. after initial early failures the survival gets better with age. of which we will make strong use. is the following locationscale property of the logtransformed Weibull distribution. one has to keep in mind that we may be modeling parts or systems that consist of a mixture of defective or weak parts and of parts that practically can live forever. The reason for referring to such parameters this way is the following. it is again exponential with same mean α.. However. for which the lifetime is modeled by a Weibull distribution. or curing.is increasing in x when β > 1. T ]. 7 . When β > 1 the part or system. hence the allusion to random failures.e. A Weibull distribution with β < 1 may not do full justice to such a mixture distribution. If Z ∼ G(z) then Y = µ + σZ ∼ G((y − µ)/σ) since H(y) = P (Y ≤ y) = P (µ + σZ ≤ y) = P (Z ≤ (y − µ)/σ) = G((y − µ)/σ) .. By that we mean that: X ∼ W(α. P (X > x + hX > x) = P (X > x) exp(−x/α) i. the system is as good as new given that it has survived beyond x. is subject to aging in the sense that an older system has a higher chance of failing during the next small time increment d than a younger system. decreasing in x when β < 1 and constant when β = 1 (exponential distribution with memoryless property). Often one refers to this situation as one of infant mortality. For β = 1 there is no aging. 4 LocationScale Property of log(X) Another useful property. maturing..e.
p = P (Z ≤ wp ) = P (u + bZ ≤ u + bwp ) = P (Y ≤ u + bwp ) =⇒ yp = u + bwp with wp = log(− log(1 − p)). In any such a locationscale model there is a simple relationship between the pquantiles of Y and Z. Although MLE in the Weibull case requires numerical methods and a computer. . . ≤ Y(n) ) as a good approximation for ypi . β) based on a random sample X1 .5)/n one can view the ith ordered sample value Y(i) (Y(1) ≤ . MLE tends to be eﬃcient. For more in relation to Weibull probability plotting we refer to Scholz (2008). Two prominent locationscale families are 1. However. namely yp = µ + σzp in the normal model and yp = u + bwp in the extreme value model (using the location and scale parameters u and b resulting from logtransformed Weibull data). Y = u + bZ where Z has the standard extreme value distribution with cdf G(z) = 1 − exp(− exp(z)) for z ∈ R. This is the basis for Weibull probability plotting (and the case of plotting Y(i) against zpi for normal probability plotting). . that is no longer an issue in today’s computing environment. . 1) is standard normal with cdf G(z) = Φ(z) and thus Y has cdf H(y) = Φ((y − µ)/σ).The form Y = µ + σX should make clear the notion of location scale parameter. the pquantile of G. for appropriate pi = (i − . While wp is known and easily computable from p. Thus yp is a linear function of wp = log(− log(1 − p)). 2. but they are usually less eﬃcient than mle’s (estimates derived by MLE). σ 2 ). Y = µ + σZ ∼ N (µ. since they relate to the slope and intercept of the line that may be ﬁtted to the perceived linear point pattern. Previously. under regularity conditions MLE produces estimates that have an approximate normal distribution in large samples. the same cannot be said about yp . where Z ∼ N (0. since Z has been scaled by the factor σ and is then shifted by µ. 8 . Furthermore. a very appealing graphical procedure which gives a visual impression of how well the data ﬁt the assumed model (normal or Weibull) and which also allows for a crude estimation of the unknown location and scale parameters. since it involves the typically unknown parameters u and b. 5 Maximum Likelihood Estimation There are many ways to estimate the parameters θ = (α. at least in large samples. . By eﬃcient estimates we loosely refer to estimates that have the smallest sampling variance. as in our logtransformed Weibull example above. We just illustrate this in the extreme value locationscale model. Maximum likelihood estimation (MLE) is generally the most versatile and popular method. . Thus the plot of Y(i) against wpi should look approximately linear. estimates that could be computed by hand had been investigated. Xn ∼ W(α. β).
. . . . n (x1 . . . . . . . . . θ) = sup θ n fθ (xi ) i=1 .When X1 . i. θ) = log (L(x1 .. since it still needs to be shown that it is the location of a maximum. xn . . . . . . . . . xn ) which maximizes the likelihood n L(x1 . . . σ) with k = 2 and the unique solution of the likelihood equations results in the explicit expressions n n µ=x= ˆ ¯ i=1 xi /n and σ= ˆ i=1 (xi − x)2 /n ¯ 9 and thus ˆ θ = (ˆ. xn . θ) = ∂θj ∂θj n log(fθ (xi )) = i=1 ∂ log(fθ (xi )) = 0 i=1 ∂θj n j = 1. . . . . . xn ) ˆ L(x1 . k . . In the case of a normal random sample we have θ = (µ. . θ). . µ ˆ . k ˆ ˆ when solving for θ = θ = θ(x1 . . σ ) . Thus solving such equations is necessary but not suﬃcient.e. θ) = sup θ n log (fθ (xi )) i=1 . xn . . . It is a lot simpler to deal with the likelihood equations ∂ ∂ ˆ (x1 . where k is the number of parameters involved in θ = (θ1 . xn . . . Xn ∼ Fθ (x) with density fθ (x) then the maximum likelihood estimate of θ is that ˆ ˆ value θ = θ = θ(x1 . . . ∂ ∂θj n fθ (xi ) = 0 i=1 j = 1. θ) is the same as the value that maximizes (x1 . . .e. . . . Since taking derivatives of a product is tedious (product rule) one usually resorts to maximizing the log of the likelihood. . θ)) = i=1 log (fθ (xi )) since the value of θ that maximizes L(x1 . .e. Xn ) = (x1 . . . .. . . xn . These above equations reﬂect the fact that a smooth function has a horizontal tangent plane at its maximum (minimum or saddle point). . i. which gives highest local probability to the observed sample (X1 .. i. xn . .. . θ) = i=1 fθ (xi ) over θ. . ˆ (x1 . xn ). . . xn . θk ). xn . i. ˆ Often such maximizing values θ are unique and one can obtain them by solving. . . . . . .e. . . .
log(xn )).. Recall that Yi = log(Xi ) has cdf F (y) = 1 − exp(− exp((x − u)/b)) = G((y − u)/b) with G(z) = 1 − exp(− exp(z)) with g(z) = G (z) = exp(z − exp(z)). dβ j=1 10 . . Thus 1 d f (y) = F (y) = F (y) = g((y − u)/b)) dy b with y−u y−u − exp b b As partial derivatives of log(f (y)) with respect to u and b we get log(f (y)) = − log(b) + . ∂ 1 1 y−u log(f (y)) = − + exp ∂u b b b ∂ 1 1y−u 1y−u y−u log(f (y)) = − − + exp ∂b b b b b b b and thus as likelihood equations n 1 n yi − u exp 0 = − + b b i=1 b 0 = − or yi − u exp = n or b i=1 . .e. we have a solution u = u once we have a solution b = ˆ Substituting this expression for exp(u) ˆ b. Note that the derivative of these weights with respect to β take the following form wi (β) = n d wi (β) = yi wi (β) − wi (β) yj wj (β) . into the second likelihood equation we get (after some cancelation and manipulation) 0= n i=1 yi exp(yi /b) n i=1 exp(yi /b) −b− 1 n yi . yn ) = (log(x1 ). n 1 n yi exp exp(u) = n i=1 b b . . . yi − u n 1 n yi − u 1 n yi − u − + exp b b i=1 b b i=1 b b i. . . .In the case of a Weibull sample we take the further simplifying step of dealing with the logtransformed sample (y1 . . n i=1 Analyzing the solvability of this equation is more convenient in terms of β = 1/b and we thus write n 0= i=1 yi wi (β) − 1 −y ¯ β where wi (β) = exp(yi β) n j=1 exp(yj β) n with i=1 wi (β) = 1 .
Thus yi wi (β) − 1/β − y ≈ −1/β → −∞ as β → 0. . yn ) and for parameters u with u → ∞ (the location moves away from all observed data values y1 . where r ≥ 1 is the number of yi coinciding with M . yn ) and b with b → 0 (the spread becomes very concentrated on some point and cannot simultaneously do so at all values y1 . . u. b) = 1 bn n g i=1 yi − u b . . . . yn ) and β → ∞ we have n wi (β) = exp(β(yi −M ))/ j=1 exp(β(yj −M )) → 0 when yi < M and wi (β) → 1/r when yi = M . . unless they are all the same. Contemplate this likelihood for ﬁxed y = (y1 . . ¯ Furthermore. . . . . yn . . . If we consider the likelihood of the logtransformed Weibull data we have L(y1 . . Since this likelihood is positive everywhere (but approaching zero near the fringes of the parameter space. yn . Thus the reduced second likelihood equation yi wi (β) − 1/β − y = 0 ¯ has a unique solution (if it has a solution at all) since the the equation’s left side is strictly increasing. wn (β)). excluded as a zero probability degeneracy) and b → ∞ (in which case all probability is diﬀused thinly over the whole half plane {(u. That this ¯ unique solution corresponds to a maximum and thus a unique global maximum takes some extra eﬀort and we refer to Scholz (1996) for an even more general treatment that covers Weibull analysis with censored data and covariates. . the above half plane) it follows that it 11 . However. . .Hence d dβ since n 1 ¯ yi wi (β) − − y = β i=1 n 1 yi wi (β) + 2 = β i=1 n n n i=1 2 yi wi (β) − n 2 yj wj (β) + j=1 1 >0 β2 2 varw (y) = i=1 2 yi wi (β) − j=1 yj wj (β) = Ew (y 2 ) − [Ew (y)]2 ≥ 0 can be interpreted as a variance of the n values of y = (y1 . . . with M = max(y1 . . . . b) : u ∈ R. it is then easily seen that this likelihood approaches zero in all cases. Thus yi wi (β) − 1/β − y ≈ M − 1/β − y → M − y > 0 as β → ∞ ¯ ¯ ¯ where M − y > 0 assumes that not all yi coincide (a degenerate case with probability 0). . Note that wi (β) → 1/n as β → 0. . a somewhat loose argument can be given as follows. b > 0}). . . . . yn ) with weights or probabilities given by w = (w1 (β).
. The advantage of such data collection is that we do not have to wait until all n units have failed. . Yn we only observe the r ≥ 2 smallest sample values Y(1) < . . . and we know that the lifetimes of the remaining units exceed X(r) or that Y(i) > Y(r) for i > r. Integrating out yn > yn−1 > . . . In that case we know the ﬁrst r failure times X(1) < . The joint density of Y(1) . Furthermore. > yr+1 (> yr ) and using F (y) = 1 − F (y) we get after n − r successive integration steps the joint density of the ﬁrst r failure times y1 < . . ¯ This avoids overﬂow or accuracy loss in the exponentials when the yi tend to be large. yn−2 ) = n! i=1 n−3 f (yi ) × yn−2 ∞ ¯ f (yn−1 )F (yn−1 )dyn−1 = n! 1¯ f (yi ) × F 2 (yn−2 ) 2 i=1 n−3 f (y1 . ˆ u b). < yr as n−1 ∞ n−1 f (y1 . if we put a lot of units on test (high n) we increase our chance of seeing our ﬁrst r failures before a ﬁxed time y. . . . Y(n) at (y1 . . < Y(r) . . . yn ) = n! n 1 yi − u g = n! f (yi ) b i=1 i=1 b n where the multiplier n! just accounts for the fact that all n! permutations of y1 . yn ) with y1 < . . We showed there is only one such point (uniqueness of the solution to the likelihood equations) and thus there can only be one unique ˆ (global) maximum. . Such data is referred to as type II censored data. . . yn could have been the order in which these values were observed and all of these orders have the same density ¯ (probability). . . The above derivations go through with very little change when instead of observing a full sample Y1 . . . . i = 1. r. . . . . . . . This is a simple consequence of the following binomial probability statement: n P (Y (r) ≤ y) = P (at least r failures ≤ y in n trials) = i=r n P (Y ≤ y)i (1 − P (Y ≤ y))n−i i which is strictly increasing in n for any ﬁxed y and r ≥ 1 (exercise). . . . . . . . . . which then is also the unique maximum likelihood estimate θ = (ˆ. < X(r) and thus Y(i) = log(X(i) ). . . . yn−1 ) = n! i=1 n−2 f (yi ) × yn−1 ∞ f (yn )dyn = n! i=1 ¯ f (yi )F (yn−1 ) n−2 f (y1 . .must have a maximum somewhere with zero partial derivatives. yn ). . < yn is f (y1 . This situation typically arises in a laboratory setting when several units are put on test (subjected to failure exposure) simultaneously and the test is terminated (or evaluated) when the ﬁrst r units have failed. . . . . . yn−3 ) = n! i=1 f (yi ) × yn−3 ¯ f (yn−2 )F 2 (yn−2 )/2dyn−2 = n! i=1 f (yi ) × 1 ¯3 F (yn−3 ) 3! 12 . . In solving 0 = yi exp(yi /b)/ exp(yi /b) − b − y it is numerically advantageous to solve the ¯ equivalent equation 0 = yi exp((yi −M )/b)/ exp((yi −M )/b)−b− y where M = max(y1 . .
. i. r i=1 As in the complete sample case one sees that this equation has a unique solution ˆ and that (ˆ. b) u mle’s. . . . . u. yr . r i=1 Again it is advisable to use the equivalent but computationally more stable form r r yi exp((yi − yr )/b) i=1 i=1 exp((yi − yr )/b) − b − 1 r yi = 0 . . . .. u. ˆ b Using this in the second equation it transforms to a single equation in b alone. . yr ) = n! i=1 r f (yi ) × r 1 n! ¯ F n−r (yr ) = f (yi ) × [1 − F (yr )]n−r (n − r)! (n − r)! i=1 = with loglikelihood r! n 1 yi − u g × b r i=1 b 1−G yr − u b n−r (y1 . yr . . b) = − + exp 0= ∂u b b i=1 b 0= ∂ (y1 . namely r r yi exp(yi /b) i=1 i=1 exp(yi /b) − b − 1 r yi = 0 .e. b) = ∂b − or yi 1 r exp exp(u) = r i=1 b b r 1 r yi − u 1 r yi − u yi − u − + exp b b i=1 b b i=1 b b where again the transformed ﬁrst equation gives us a solution u once we have a solution ˆ for b. 13 . . . . (ˆ.. u. b) = log where we use the notation r r n! yi − u yi − u exp − r log(b) + − (n − r)! b b i=1 i=1 r r xi = i=1 i=1 xi + (n − r)xr . The likelihood equations are r 1 r yi − u ∂ (y1 . . yr . . . .. ˆ b u b) ˆ are the gives the location of the (unique) global maximimum of the likelihood function. r f (y1 .
23.weibull <.1.26.function (x=NULL.null(x))x <. either the full sample or the first # r observations of a type II censored sample.9) r <.0.6 Computation of Maximum Likelihood Estimates in R The computation of the mle’s of the Weibull parameters α and β is facilitated by the function survreg which is part of the R package survival.8.22.8.7.hat beta. that uses survreg to compute these estimates.rep(xs[r].29.25.10) # $mles # alpha.7. # If x is not given then a default full sample of size n=10.mle(c(7.0.25.hat # 28. then it loads survival if(r<n){ statusx <.7.12.mle <. This function is part of the R work space that is posted on the class web site.7).mle.statusx) 14 .39. called Weibull.12.7.9. otherwise x is treated as a full sample. if not.hat # 30.25.c(7.22. survreg does a whole lot more than compute the mle’s but we will not deal with these aspects here.1.frame(c(xs. at least for now.null(n)){n<r}else{if(r>nr<2){ return("x must have length r with: 2 <= r <= n")}} xs <.sort(x) if(!exists("survreg"))library(survival) #tests whether survival package is loaded.29.12.799793 # # In the type II censored usage # Weibull. Here x is the sample.n=NULL){ # This function computes the maximum likelihood estimates of alpha and beta # for complete or type II censored samples assumed to come from a 2parameter # Weibull distribution.1.725992 2.9) is analyzed and the returned # results should be # $mles # alpha.29.rep(0.8.39. Note that it tests for the existence of survreg before calling it. The following is an R function.1.22.nr)).29.length(x) if(is.23. Weibull.1.23. Here survreg is used in its most basic form in the context of Weibull data (full sample or type II censored Weibull data).5.c(rep(1.data.41.432647 if(is.9. namely # c(7.5.1.914017 2.nr)) dat.r).hat beta.26.41. In the latter case one must specify # the full sample size n.
exp(out. .c(alpha. .00591. This fact plays a signiﬁcant role later on in the various inference procedures which we will discuss. ."beta.e.07. .mle a 1000 times.hat <. where A ∈ R and B > 0 are arbitrary constant. .beta. system. Here such an object is created by the function Surv and it basically adjoins the failure times with a status vector of same length. .data=dat."status") out.00 5. but with slow growth.rep(1. as Figure 4 shows. . we only know that the unobserved actual failure time exceeds the reported censoring time.data. .weibull)<c("time". rn ) with ri = A + Bzi .dist="weibull". If we transform z to r = (r1 . then u(r1 . zn ) ˆ ˆ or u(r) = u(A + Bz) = A + B u(z) ˆ ˆ ˆ 15 . rn ) = A + B u(z1 . .87.frame(xs. .hat". . zn ) = ˆ ˆ b(z b(z). .weibull <.91 For n = 100. . . . Based on data z = (z1 . In the case of type II censored data these censoring times equal the rth largest failure time. To get a sense of the calculation speed of this function we ran Weibull.91 and 25. The relationship of computing time to n appears to be quite linear.1/out.1))}) user system elapsed 5. . .survreg(Surv(time. . respectively. i.status)~1.hat) names(parms)<c("alpha.weibull$coef) beta. The status is 1 when a time corresponds to an actual failure time.hat.hat <. 1000 the elapsed times came to 8.time(for(i in 1:1000){Weibull. .91/1000 = . 500.hat") list(mles=parms)} Note that survreg analyzes objects of class Surv.mle(rweibull(10. . which tells us that the time to compute the mle’s in a sample of size n = 10 is roughly 5.weibull <. 15. .}else{statusx <.n) dat.weibull$scale parms <.79 0. . .. It is 0 when the corresponding time is a censoring time. zn ) = ˆ u(z) and ˆ 1 . . zn ) we denote the estimates of u and b more explicitly by u(z1 .weibull) alpha. 7 Location and Scale Equivariance of Maximum Likelihood Estimates The maximum likelihood estimates u and ˆ of the location and scale parameters u and b have the ˆ b following equivariance properties which will play a strong role in the later pivot construction and resulting conﬁdence intervals.statusx)} names(dat.
015 0. . zn ) and r = (r1 . r n ) = B ˆ 1 . . . .001e−05 q time to compute Weibull mle's (sec) 0. .025 intercept = 0. Proof: Observe the following deﬁning properties of the mle’s in terms of z = (z1 . . . . . slope = 2.b g((ri − u)/b) i=1 g((zi − u(z))/ˆ ˆ b(z)) ˆn (z) i=1 b 1 n = g((ri − u(r))/ˆ ˆ b(r)) ˆn (r) i=1 b = = 1 1 n ˆ B (b(r)/B)n n i=1 1 n g((zi − (ˆ(r) − A)/B)/(ˆ u b(r)/B)) 16 . . .030 0. . .0. . . These properties are naturally desirable for any location and scale estimates and for mle’s they are indeed true.020 q 0. .b 1 bn n g((zi − u)/b) i=1 n 1 sup n b u. zn ) b(r b(z or ˆ = ˆ + Bz) = Bˆ b(r) b(A b(z) .010 q q 0. rn ) sup u.005 200 400 sample size n 600 800 1000 Figure 4: Weibull Parameter MLE Computation Time in Relation to Sample Size n and ˆ 1 .005886 .000 0 0.
d. and ˆ b(z) = ˆ b(r)/B 8 Tests of Fit Based on the Empirical Distribution Function Relying on subjective assessment of linearity in Weibull probability plots in order to judge whether a sample comes from a 2parameter Weibull population takes a fair amount of experience. . and IA = 0 when A is false. as is easily veriﬁed. 17 .b n g((ri − u)/b) i=1 1 = sup n b u.but also 1 sup n b u. The same equivariance properties hold for the mle’s in the context of type II censored samples. Xn is deﬁned as n ˆn (x) = # of observations ≤ x = 1 I{Xi ≤x} F n n i=1 where IA = 1 when A is true.˜ ˜b 1 1 n g((zi − u)/˜ = n ˜ b) g((zi − u(z))/ˆ ˆ b(z)) B ˆn (z) i=1 b i=1 Thus by the uniqueness of the mle’s we have u(z) = (ˆ(r) − A)/B ˆ u or u(r) = u(A + Bz) = A + B u(z) ˆ ˆ ˆ and ˆ = ˆ + Bz) = Bˆ b(r) b(A b(z) q. α. Just view Fn (x) as a binomial proportion or as an average of Bernoulli random variables. .b n g((A + Bzi − u)/b) i=1 n 1 1 B n (b/B)n 1 1 B n ˜n b n g((zi − (u − A)/B)/(b/B)) i=1 u = (u − A)/B ˜ ⇒ ˜ = b/B b = sup u. .b = sup u. .β ˆ α ˆ ˆ From the law of large numbers (LLN) we see that for any x we have that Fn (x) −→ Fα.e. The ﬁtted Weibull distribution function ˆ (using mle’s α and β) is ˆ ˆ β ˆ (x) = F ˆ(x) = 1 − exp − x F . It is simpler and more objective to employ a formal test of ﬁt which compares the empirical distribution ˆ ˆ function Fn (x) of a sample with the ﬁtted Weibull distribution function F (x) = Fα.β (x) as ˆ n −→ ∞. The empirical distribution function (EDF) of a sample X1 .β (x) using one ˆ ˆ of several common discrepancy metrics.
β (x) is continuous in x one can argue that these convergence statements can be made uniformly in x. 3. G) = sup F (x)−G(x) x is known as the KolmogorovSmirnov distance between two cdf’s F and G..β (x).β (x) − Fα. and empirical distribution function. i..β (x) −→ Fα. Figures 5 and 6 give illustrations of this KolmogorovSmirnov distance between EDF and ﬁtted Weibull distribution and show the relationship between sampled true Weibull distribution. try to ˆ ˆ give a good representation of the data. although being the origin of the data. ﬁtted Weibull distribution.β (x) ﬁt the data. ˆ sup Fn (x) − Fα. ˆ 2. G) = (F (x) − G(x))2 dG(x) = ∞ −∞ (F (x) − G(x))2 g(x) dx −∞ ∞ −∞ (F (x) − G(x))2 dG(x) = G(x)(1 − G(x)) ∞ −∞ (F (x) − G(x))2 g(x) dx . We will only discuss two of them. It can be noted that the closeness between Fn (x) and Fα.β (x) −→ 0 and x sup Fα.ˆ From MLE theory we also know that F (x) = Fα.β (x) is usually more pronounced ˆ ˆ than their respective closeness to Fα.β (x) as n −→ ∞ (also derived from ˆ ˆ the LLN). weighted by g(x) in the case of DCvM (F. i. This can be understood from the fact that both Fn (x) and Fα. the Cram´rvon Mises distance DCvM and the AndersonDarling e distance DAD .e.e. The closeness between all three distributions improves as n gets larger.β (x) −→ 0 as n −→ ∞ ˆ ˆ x and thus The distance ˆ sup Fn (x)−Fα. is not always good due to sampling variation. The ﬁt of the true distribution. DKS (F. in spite of the sequence of the above convergence statements. G(x)(1 − G(x)) Rather than focussing on the very local phenomenon of a maximum discrepancy at some point x as in DKS .β (x) −→ 0 as n −→ ∞ ˆ ˆ x for all α > 0 and β > 0. these alternate distances or discrepancy metrics integrate these distances in squared form over all x. Since the limiting cdf Fα. G) and by g(x)/[G(x)(1 − G(x))] in the case 18 . Several other distances between cdf’s F and G have been proposed and investigated in the literature. They are deﬁned respectively as follows ∞ DCvM (F. Some comments: ˆ 1. G) = and DAD (F.
when one cdf is shifted relative to the other by a small amount (no large vertical discrepancy). these small vertical discrepancies (squared) will add up and indicate a moderately large diﬀerence between the two compared cdf’s. F (x)) = D = max max i/n − V(i) .DAD (F. n. max V(i) − (i − 1)/n ˆ where V(1) ≤ . When using these metrics for tests of ﬁt one usually takes the cdf with a density (the model distribution to be tested) as the one with respect to which the integration takes place. This is seen as follows. Thus DAD (F. We will give the following computational expressions (without proof): ˆ ˆ DKS (Fn (x). W 2 and A2 . F ) .g.e. We point out the asymmetric nature of these last two metrics. .. while the other cdf is taken to be the EDF. As complicated as these metrics may look at ﬁrst glance. except to suggest that when viewing the previous ﬁgures ˆ ˆ illustrating DKS one should look at all vertical distances (large and small) between Fn (x) and F (x). Quite naturally we would reject the hypothesis of a sampled Weibull distribution whenever D or W 2 or A2 are too large. square them and accumulate these squares in the appropriately weighted fashion.β (Xi ). F (x)) = A2 = −n − (2i − 1) log(V(i) ) + log(1 − V(n−i+1) ) . G) = DAD (G. . G) is favored in situations where judging tail behavior is important. Because of the integration nature of these last two metrics they have more global character. For the other two test of ﬁt criteria we have n ˆ ˆ DCvM (Fn (x). being estimated by α and β in Vi = F (Xi ) = Fα. in risk situations. . i. G).. i. . F ) and DAD (F. the denominator increases the weight in the tails of the G distribution.. n i=1 In order to carry out these tests of ﬁt we need to know the null distributions of D. The null distribution of D. F (x)) = W 2 = i=1 V(i) − 2i − 1 2n 2 + 1 12n and 1 n ˆ ˆ DAD (Fn (x). For example. their computation is quite simple. There is no easy graphical representation of these metrics. i = 1. In the latter case. e. Using our prior notation we write log(Xi ) = Yi = u + bZi and since 19 . G) = DCvM (G. W 2 and A2 does not depend on the unknown ˆ ˆ parameters α and β.e. we typically have DCvM (F. The reason for this ˆ ˆ ˆ is that the Vi have a distribution that is independent of the unknown parameters α and β. . . ≤ V(n) are the ordered values of Vi = F (Xi ). compensates to some extent for the tapering oﬀ in the density g(x).
1.6 q q 0.0 0. β(x) x q q q q q q q q q q 0 5000 10000 x 15000 20000 25000 1.4 q q q q 0.4 q q q 0. β(x) x q q q qq q q q q q q q q q q q q q q q 0 5000 10000 x 15000 20000 25000 Figure 5: Illustration of KolmogorovSmirnov Distance for n = 10 and n = 20 20 . β(x) q q q q q cumulative distribution function KS−Distance q q q q q 0.0 ^ ^ ^ KS−Distance = sup Fn(x) − Fα.0 ^ ^ ^ KS−Distance = sup Fn(x) − Fα.8 ^ EDF = Fn True Sampled CDF = Fα. β(x) ^ ^ Weibull Fitted CDF = Fα.0 0.2 q q q q 0.6 q q q q 0.2 q q q 0. β(x) KS−Distance q q q q cumulative distribution function 0. β(x) ^ ^ Weibull Fitted CDF = Fα.8 ^ EDF = Fn True Sampled CDF = Fα.
2 0.4 0. β(x) q q q q q q q q q q q q q q q q q q cumulative distribution function KS−Distance q q q q q qq q q q q q q 0.4 0. β(x) ^ ^ Weibull Fitted CDF = Fα. β(x) KS−Distance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q cumulative distribution function 0. β(x) x q q q 0 5000 10000 x 15000 20000 25000 1.6 q q q q q q q q q 0. β(x) x q q qq q q q qq qq qq q q qq q q q q qq q qq qq q q q qq q q qqq q q q q q q q q qq q q q q q q q qqq q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q 0 5000 10000 x 15000 20000 25000 Figure 6: Illustration of KolmogorovSmirnov Distance for n = 50 and n = 100 21 .0 0.8 ^ EDF = Fn True Sampled CDF = Fα.6 0.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q ^ ^ ^ KS−Distance = sup Fn(x) − Fα.1.0 q q q q q qq q q q q qq q q q q q q q q q qqq q q q q q qq q q q q q q q qq q q q q q ^ ^ ^ KS−Distance = sup Fn(x) − Fα.2 q q q q q q q q q q q 0.0 0. β(x) ^ ^ Weibull Fitted CDF = Fα.8 ^ EDF = Fn True Sampled CDF = Fα.
10.025. √ ﬁtting quadratics in 1/ n and reading oﬀ the four interpolated quantile values for the needed n0 (the sample size at issue) and as a second step perform the interpolation or extrapolation scheme as it was done previously.05. .F (x) = P (X ≤ x) = P (log(X) ≤ log(x)) = P (Y ≤ y) = 1 − exp(− exp((y − u)/b)) and thus ˆ Vi = F (Xi ) = 1 − exp(− exp((Yi − u(Y))/ˆ ˆ b(Y))) = 1 − exp(− exp((u + bZi − u(u + bZ))/ˆ + bZ))) ˆ b(u = 1 − exp(− exp((u + bZi − u − bˆ(Z))/[b ˆ u b(Z]))) = 1 − exp(− exp((Zi − u(Z))/ˆ ˆ b(Z))) and all dependence on the unknown parameters u = log(α) and b = 1/β has canceled out. .025. Prior to the ease of current computing Stephens (1986) provided tables √ the (1 − α)quantiles for √ 2 q1−α of these null distributions. . Xn for which one wishes to assess whether the Weibull distribution hypothesis is tenable or not. . This is illustrated in Figure 8. Plotting log(α/(1 − α)) against q1−α shows a mildly quadratic pattern which can be used to interpolate or extrapolate the appropriate pvalue (observed signiﬁcance level α) for √ √ any observed nadjusted value A2 (1 + .25.05. .01. If C(X) denotes the used test of ﬁt criterion then the estimated pvalue of this sample is simply the proportion of C(X ) that are ≥ C(X). . . 50. . ﬁrst by plotting these quantiles against 1/ n.10. .2/ n) and W 2 (1 + .and extrapolation scheme is needed. as is illustrated in Figure 7.2/ n).2/ n) and W 2 (1 + . β = 1) ˆ ˆ ˆ (standard exponential distribution). Repeating this a large ˆ number of times. should give us a reasonably good approximation to the desired null distribution and from it one can determine appropriate pvalues for any sample X1 . . Xn ) from W(α = 1. Just generate samples X = (X1 . . . 20. Here √ a double inter. 22 . but using a cubic this time.β (Xi ) (where Fα. β)) and from that the values ˆ ˆ D = D(X ).01 were tabulated for n = 10. compute the corresponding α = α(X ) and β = β(X ). Calculating all three test of ﬁt criteria makes ˆ sense since the main calculation eﬀort is in getting the mle’s α and β . . For the nadjusted versions A (1 + .2/ n) these null distributions appear to be independent of n and (1 − α)quantiles were given for α = . say Nsim = 10000. ˆ ˆ ˆ then Vi = F (Xi ) = Fα . √ √ For nD the null distribution still depends on n (in spite of the normalizing factor n) and (1−α)quantiles for α = . ∞ by Stephens (1986).β (x) is the cdf of W(α. . This opens up the possibility of using simulation to ﬁnd good approximations to these null distributions for any n. . especially in view of the previously reported timing results for computing the ˆ mle’s α and β of α and β. W 2 = W 2 (X ) and A2 = A2 (X ).
It is this known distribution of Z = (Z1 .CvM. . 000 and the previously given timing of 8. W 2 (1 + . so that it is invertible with respect to ϑ. Xn ) we have Yi = log(Xi ) ∼ G((y − u)/b) with b = 1/β and u = log(α). . These functions have an optional argument graphic where graphic = T causes the interpolation graphs shown in Figures 7 and 8 to be produced. GOF. namely functions W = ψ(ˆ(Y). This is seen as follows: P (Zi ≤ z) = P ((Yi − u)/b ≤ z) = P (Yi ≤ u + bz) = G(([u + bz] − u)/b) = G(z) . The function Weibull.2/ n).test does a Weibull goodness of ﬁt test on any given sample. 23 . ϑ) of the estimates and an unknown parameter ϑ of interest such that W has a ﬁxed and known distribution and the function ψ is strictly monotone in the unknown parameter ϑ. You also ﬁnd there the function Weibull. 9. Zn ) that is instrumental in knowing the distribution of the four pivots that we discuss below. . ˆ u b(Y). and √ A2 (1 + . returning pvalues for all three test criteria. They are GOF.GOF. √ √ and GOF. has a distribution that does not involve unknown parameters and W1 = ˆ b(Y)/b is strictly monotone in b.test. being a function of Z alone. which is a known distribution (does not depend on unknown parameters). .Functions for computing these pvalues (via interpolation from Stephens’ tabled values) are given in the Weibull R work space provided at the class web site.5 minutes to simulate the null distributions based on Nsim = 10. . One could easily reproduce and extend the tables given by Stephens (1986) so that extrapolations becomes less of an issue. . respectively.mle that was listed earlier. There we utilize the representation Yi = u+bZi or Y = u+bZ in vector form.2/ n) .AD.07 sec for Nsim = 1000. b b b The right side.test. . . and several other functions not yet documented here.test for computing pvalues for nadjusted test criteria nD. Then Zi = (Yi − u)/b ∼ G(z) = 1 − exp(− exp(z)).1 Pivot for the Scale Parameter b As natural pivot for the scale parameter ϑ = b we take W1 = ˆ b(u bˆ b(Z) ˆ b(Y) ˆ + bZ) = = = b(Z) . Recall that for a Weibull random sample X = (X1 .KS. otherwise only the pvalues are given. For n = 100 it should take less than 1. 9 Pivots Based on the previous equivariance properties of u(Y) and ˆ ˆ b(Y) we have the following pivots.
01 q 0.15 W × (1 + 0.05 0.75 tabled values interpolated/extrapolated values q 0.5 q q q 0.00 0.30 √ √ Figure 7: Interpolation & Extrapolation for A2 (1 + .2/ n) and W 2 (1 + .25 0.001 0.001 0.05 q q q q 0.0.0 0.25 0.2 2 1.5 0.0 n) 1.2/ n) 24 .9 q q q tail probability p on log(p (1 − p)) scale 0.5 A × (1 + 0.75 tabled values interpolated/extrapolated values qq q q 0.20 0.01 q 0.9 q q q tail probability p on log(p (1 − p)) scale 0.25 0.5 0.05 q q q q 0.2 n) 0.10 2 0.
70 0.quadratic interpolation & linear extrapolation in 1 1.2 q qqq q q 0.939 0.90 0.025 q qq q q q q qq q q q 0.9 q n × D−quantile q ∞ 500 200 100 50 40 30 25 20 n (on 1 15 n scale) 10 9 8 7 6 5 4 cubic interpolation & linear extrapolation in D q q q q q q qq q q tail probability p on log(p (1 − p)) scale tabled values interpolated quantiles interpolated/extrapolated values 0.1 0.85 n ×D 0.00 1.803 D0.80 0.05 Figure 8: Interpolation & Extrapolation for 25 √ n×D .975 D0.874 0.001 0.005 q 0.95 1.99 q q q q q q q q q q q q q q q q q q q q n tabled values interpolated/extrapolated values 0.95 D0.007 D0.75 0.
75. .005 ≤ γ ≤ . 15. 50. We do not know η1 (γ) but we can estimate it by the corresponding quantile η1 (γ) of the simulated distriˆ ˆ 1 (w) which serves as proxy for H1 (w). i. namely above. . γ = H1 (η1 (γ)) = P (ˆ b(Y)/b ≤ η1 (γ)) = P (ˆ b(Y)/η1 (γ) ≤ b) we see that ˆ b(Y)/η1 (γ) can be viewed as a 100γ% lower bound to the unknown parameter b. . For . We note here that a 100γ% lower bound can be viewed as a 100(1 − γ)% upper bound. Xn ) is the untransformed Weibull sample. . Nsim = 10000 for n = 6. say Nsim = 10000. We then use ˆ bution H b(Y)/ˆ1 (γ) as an approximate 100γ% η lower bound to the unknown parameter b. where Nsim = 20000 for ˆ n = 5. To get 100γ% upper bounds one simply constructs 100(1 − γ)% lower bounds by the above method.. In these simulations one simply generates samples Z = (Z1 . η1 ((1 + γ)/2) × β(X) ˆ ˆ since (1 + γ)/2 = 1 − (1 − γ)/2. If we denote the γquantile of H1 (w) by η1 (γ). because 1 − γ is the chance of the lower bound falling on the wrong side of its target. From this simulated distribution we can estimate any γquantile of H1 (w) to any practical accuracy. . 8. Based on the relationship b = 1/β the respective 100γ% approximate lower and upper conﬁdence bounds for the Weibull shape parameter would be η1 (1 − γ) ˆ ˆ = η1 (1 − γ) × β(X) ˆ ˆ b(Y) and η1 (γ) ˆ ˆ = η1 (γ) × β(X) ˆ ˆ b(Y) and an approximate 100γ% conﬁdence interval for β would be ˆ ˆ η1 ((1 − γ)/2) × β(X). Bain and Engelhardt (1991) and originally in Thoman et al. 1970). where we only give onesided bounds (lower or upper) in each case. who provided tables for this distribution (and for those of the other pivots discussed here) based on Nsim simulated values of ˆ b(Z) (and u(Z)). The empirical distribution function of these b(Z ˆ ). . provided Nsim is suﬃciently large. Values of γ closer to 0 or 1 require higher Nsim .e. The approach followed here is that presented in Bain (1978). By simulating this process ˆ b(Z Nsim = 10000 times we obtain ˆ 1 ). 30. . provides a fairly reasonable estimate of the sampling ˆ simulated estimates b(Zi distribution H1 (w) of ˆ b(Z) and thus also of the pivot distribution of W1 = ˆ b(Y)/b. . this approximation is practically quite adequate. .995 a simulation level of Nsim = 10000 should be quite adequate. . and Nsim = 6000 for n = 100. For large Nsim . 40. 26 . 10. 20. denoted by H1 (w). (1969. . Similar comments apply to the pivots obtained below.How do we obtain the distribution of ˆ b(Z)? An analytical approach does not seem possible. Zn ) ∼ G(z) and ﬁnds ˆ ) (and b(Z u(Z ) for the other pivots discussed later) for each such sample Z . ˆ Nsim ). . Here X = (X1 .
Thus W2 is a pivot with respect to u. Based on the relation u = log(α) this translates into an approximate 100γ% lower bound ˆ ˆ exp(ˆ(Y) − ˆ u b(Y)ˆ2 (γ)) = exp(log(ˆ (X)) − η2 (γ)/β(X)) = α(X) exp(−ˆ2 (γ)/β(X)) η α ˆ ˆ η Upper bounds and intervals are handled as in the previous situation for b or β. Denote this pivot distribution of W2 by H2 (w) and its γquantile by η2 (γ). Furthermore W2 is strictly decreasing in u. Denote this pivot distribution 27 .3 Pivot for the pquantile yp for α. As in the previous pivot case we can exploit this pivot distribution as follows γ = H2 (η2 (γ)) = P u(Y) − u ˆ ≤ η2 (γ) = P (ˆ(Y) − ˆ u b(Y)η2 (γ) ≤ u) ˆ b(Y) and thus we can view u(Y) − ˆ ˆ b(Y)η2 (γ) as a 100γ% lower bound for the unknown parameter u. since it only depends on the known distribution of Z. ˆ Using the γquantile η2 (γ) obtained from the empirical cdf H2 (w) we then treat u(Y) − ˆ ˆ ˆ b(Y)ˆ2 (γ) η as an approximate 100γ% lower bound for the unknown parameter u. 9. ˆ ˆ ˆ + bZ) ˆ bb(Z) b(Y) b(u b(Z) It has a distribution that does not depend on any unknown parameter. ˆ bˆ b(Z) b(Z) Again its distribution only depends on the known distribution of Z and not on the unknown parameters u and b and the pivot Wp is a strictly decreasing function of yp .9.2 Pivot for the Location Parameter u For the location parameter ϑ = u we have the following pivot W2 = u(Z) ˆ u(Y) − u ˆ u(u + bZ) − u ˆ u + bˆ(Z) − u u = = = . With respect to the pquantile ϑ = yp = u + b log(− log(1 − p)) = u + bwp of the Y distribution the natural pivot is Wp = u(Y) + ˆ ˆ b(Y)wp − (u + bwp ) u(u + bZ) + ˆ + bZ)wp − (u + bwp ) ˆ b(u yp (Y) − yp ˆ = = ˆ ˆ ˆ + bZ) b(Y) b(Y) b(u = u(Z) + (ˆ ˆ b(Z) − 1)wp u + bˆ(Z) + bˆ u b(Z)wp − (u + bwp ) = . As before this pivot distribution and its quantiles can be approximated suﬃciently well by simulating u(Z )/ˆ ) ˆ b(Z ˆ b(Z a suﬃcient number Nsim times and using the empirical cdf H2 (w) of the u(Zi )/ˆ i ) as proxy for ˆ H2 (w).
i = 1. .e. in this approach one sees one further detail. . A thorough argument would show that ˆ b(z) and thus u(z) are continuous functions of z = (z1 . ˆ ˆ Since yp (Y) − ηp (γ)ˆ ˆ b(Y) = u(Y) + wpˆ ˆ b(Y) − ηp (γ)ˆ b(Y) = u(Y) − kp (γ)ˆ ˆ b(Y) with kp (γ) = ηp (γ) − wp . zn ) and since there is positive probability in ˆ any neighborhood of any z ∈ R there is positive probability in any neighborhood of (ˆ(z). . Again we can treat ˆ yp (Y) − ηp (γ)b(Y) as an approximate 100γ% lower bound for yp . . since wp is strictly increasing in p. However.. a contradiction) since P (wp1 < u(Z) + h(p1 )ˆ ˆ b(Z) ≤ wp1 + (wp2 − wp1 )) > 0. namely that h(p) = −kp (γ) is strictly increasing in p1 . . Suppose p1 < p2 and h(p1 ) ≥ h(p2 ) with γ = P (ˆ(Z) + h(p1 )ˆ u b(Z) ≤ wp1 ) and γ = P (ˆ(Z) + h(p2 )ˆ u b(Z) ≤ wp2 ) = ˆ ˆ ˆ P (ˆ(Z) + h(p1 )b(Z) ≤ wp1 + (wp2 − wp1 ) + (h(p1 ) − h(p2 ))b(Z)) ≥ P (ˆ(Z) + h(p1 )b(Z) ≤ wp1 + (wp2 − wp1 )) > γ u u (i. . . . ˆ As before we proceed with γ = Hp (ηp (γ)) = P yp (Y) − yp ˆ ≤ ηp (γ) = P yp (Y) − ηp (γ)ˆ ˆ b(Y) ≤ yp ˆ b(Y) and thus we can treat yp (Y) − ηp (γ)ˆ ˆ b(Y) as a 100γ% lower bound for yp .function by Hp (w) and its γquantile by ηp (γ). 1 28 . Denote the empirical cdf of such simulated values by Hp (w) and the corresponding γquantiles by ηp (γ). As before. this pivot distribution and its quantiles can be approximated suﬃciently well by simulating u(Z) + (ˆ ˆ b(Z) − 1)wp /ˆ b(Z) a suﬃcient numˆ ber Nsim times. ˆ u b(z)). γ > γ. u b(Z ˆp (γ) serves as a good approximation to kp (γ). namely γ = P (ˆ(Y) − kp (γ)ˆ u b(Y) ≤ yp ) = P (ˆ(Y) − kp (γ)ˆ u b(Y) ≤ u + bwp ) = P (ˆ(Y) − u − kp (γ)ˆ u b(Y) ≤ bwp ) = P ˆ b(Y) u(Y) − u ˆ − kp (γ) ≤ wp b b u(Z) − wp ˆ ≤ kp (γ) ˆ b(Z) = P (ˆ(Z) − kp (γ)ˆ u b(Z) ≤ wp ) = P and we see that kp (γ) can be taken as the γquantile of the distribution of (ˆ(Z) − wp )/ˆ u b(Z). we could have obtained the same lower bound by the following argument that does not use a direct pivot. This distribution can be estimated by the empirical cdf of Nsim simulated values (ˆ(Zi )−wp )/ˆ i ). Nsim and its γquantile k It is easily seen that this produces the same quantile lower bound as before.
e. Indeed. 1). for any p ∈ (0. Thus by the probability integral transform it follows that Wp(y) = Hp(y) (ˆ(y)) ∼ U (0. Clearly p(y) = G ˆ y − u(Y) ˆ ˆ b(Y) is the natural estimate (mle) of p(y) = P (Y ≤ y) = G y−u b and one easily sees that the distribution function H of this estimate depends on u and b only through p(y). lower bounds for it can be seen as a special case of quantile lower bounds. this particular quantile lower bound coincides with the one given previously. the situation here is not as straightforward as in the previous three cases.Of course it makes intuitive sense that quantile lower bounds should be increasing in p since its target pquantiles are increasing in p. Since α is the (1 − exp(−1))quantile of the Weibull distribution. 9.. Wp(y) is a true pivot. Denote by h−1 (·) the inverse function to h(·).4 Upper Conﬁdence Bounds for the Tail Probability p(y) = P (Y ≤ y) As far as an appropriate pivot for p(y) = P (Y ≤ y) is concerned. If we parameterize such p via p(y) = P (Y ≤ y) = G((y − u)/b) we have yp(y) = y and thus also γ = P p(y) ≤ h−1 (y − u(Y))/ˆ ˆ b(Y) 29 . This strictly increasing property allows us to immediately construct upper conﬁdence bounds for left tail probabilities as is shown in the next section. contrary to what is stated in Bain (1978) and Bain and Engelhardt (1991). namely p(y) = G ˆ y − u(Y) ˆ ˆ b(Y) =G (y − u)/b − (ˆ(Y) − u)/b u ˆ b(Y)/b =G G−1 (p(y)) − u(Z) ˆ ˆ b(Z) ∼ Hp(y) . Since xp = exp(yp ) is the corresponding pquantile of the Weibull distribution we can view ˆ ˆ ˆ exp yp (Y) − ηp (γ)ˆ ˆ ˆ b(Y) = α(X) exp (wp − ηp (γ))/β(X) = α(X) exp −kp (γ)/β(X) ˆ ˆ ˆ as an approximate 100γ% lower bound for xp = exp(u + bwp ) = α(− log(1 − p))1/β . We then have γ = P (ˆ(Y) + h(p)ˆ u b(Y) ≤ yp ) = P (h(p) ≤ (yp − u(Y))/ˆ ˆ b(Y)) = P p ≤ h−1 (yp − u(Y))/ˆ ˆ b(Y) . Rather than using this pivot we will go a more direct route as was indicated by the strictly increasing property of h(p) = hγ (p) in the previous section. 1) p i.
which varies from sample to sample. unless one does this kind of calculation all the time. Instead Bain (1978). which can then serve for interpolation purposes with the actually observed value of p(y). u and yp it is possible to carry out simulations once and for all for a desired set of conﬁdence levels γ. This has essentially been done (with n scaling modiﬁcations) and such ˆ ˆ ˆ tables are given in Bain (1978). ˆ The above claim concerning h−1 (x) is seen as follows. sample sizes n and choices of p. The only remaining issue is the computation of such bounds. Table 4 has been greatly reduced to just cover the conﬁdence factors dealing with the location parameter u. Bain and Engelhardt (1991) and Thoman et al. (1969. Does it require the inversion of h and the concomitant calculations of many h(p) = −k(p) for the iterative convergence of such an inversion? It turns out that there is a direct path just as we had it in the previous three conﬁdence bound situations. Similar tables for bounds on p(y) are not quite possible since the appropriate bounds depend on the observed value of p(y). Table 4 in Bain (1978) does not have a consistent format and using these tables would require delving deeply into the text for each new use. and tabulate the required conﬁdence quantiles √ η1 (γ). Note that h−1 (x) solves −kp = x for p. Bain and Engelhardt (1991) ˆ and Thoman et al.for any y ∈ R and u ∈ R and b > 0. Bain and Engelhardt (1991). If for any x = h(p) we have P (G(ˆ(Z) + xˆ u b(Z)) ≤ h−1 (x)) = P (G(ˆ(Z) + h(p)ˆ u b(Z)) ≤ p) = P (ˆ(Z) + h(p)ˆ u b(Z) ≤ wp ) = P (ˆ(Z) − kγ (p)ˆ u b(Z) ≤ wp ) = γ . ˆ ˆ It should be quite clear that all this requires extensive tabulation. in the second edition.1970). η2 (γ). We claim that h−1 (x) is the γquantile of the G(ˆ(Z)+xˆ u b(Z)) ˆ distribution which we can simulate by calculating as before u(Z) and b(Z) a large number Nsim times. Thus h−1 (x) is the γquantile of the G(ˆ(Z) + xˆ u b(Z)) distribution. (1970) tabulate conﬁdence bounds for p(y) for a reasonably ﬁne grid of values for p(y). and ηp (γ). and it now 30 . Hence pU (y) = h−1 (y − u(Y))/ˆ ˆ ˆ b(Y) can be viewed as 100γ% upper conﬁdence bound for p(y) for any given threshold y. In fact. The use of these tables is not easy. ˆ If we observe Y = y and obtain u(y) and b(y) as our maximum likelihood estimates for u and ˆ b we get our 100γ% upper bound for p(y) = G((y − u)/b) as follows: For the ﬁxed value of x = (y − u(y))/ˆ ˆ b(y) = G−1 (ˆ(y)) simulate the G(ˆ(Z) + xˆ p u b(Z)) distribution (with suﬃciently high Nsim ) and calculate the γquantile of this distribution as the desired approximate 100γ% upper bound for p(y) = P (Y ≤ y) = G((y − u)/b). as seen in the previous section. 10 Tabulation of Conﬁdence Quantiles η(γ) For the pivots for b.
Plotting log(p/(1 − p)) against the corresponding row value (γquantiles) one clearly sees a change in pattern. Such extensions of R are possible. p. 11 The R Function WeibullPivots Rather than using these tables we will resort to direct simulations ourselves since computing speed has advanced suﬃciently over what was common prior to 1978.625 instead of 3. computing availability has changed dramatically since then. Among the latter one may consider that possibly diﬀerent simulations were involved. this example also shows that the great majority of tabled values are valid. 31 . see chapter 5 System and foreign language interfaces in the Writing R Extensions manual available under the toolbar Help in R.leaves out the conﬁdence factors for general pquantiles. In comparing the values of these tables with our own simulation of pivot distribution quantiles. using qbeta in vectorized form reduced the computing time to almost 1/3 of the time compared to looping within R itself over the elements in the argument vector of qbeta. However. The example that they present (page 248) would have beneﬁtted by showing some intermediate steps in the interpolation process. We give just give a few examples. However. just to validate our simulation for n = 40.826 above) and it ﬁts quite smoothly into the pattern of the previous 8 points. bottom row. namely 13. 235 with last column entry of 4. such an increase in speed would require writing Ccode (or Fortran code) and linking that in compiled form to R. They attribute the diﬀerence to roundoﬀ errors or other discrepancies.826. It may be possible to further increase computing speed by putting the loop over Nsim calculations of mle’s into compiled form rather than looping within R for each simulation iteration. see the top plot in Figure 9.725 (corresponding to the 4. The bottom plot shows our simulated values for these quantiles as solid dots with the previous points (circles) superimposed. Given that this was the only case chosen for comparison it leaves some concern in fully trusting these tables. In Table 3A. This discrepancy shows up clearly when plotting the row values against log(p/(1 − p)). using Table 7 in Bain and Engelhardt (1991). The agreement is good for the ﬁrst 8 points.03) from that obtained using the conﬁdence quantiles of the original Table 4.96 instead of the indicated p = . p. Further. Furthermore. Our simulated γquantile was 5. We suspect that the whole last column was calculated for p = .98. p. row 3 column 5 shows a double minus sign (still present in the 1991 second edition). For the pquantiles one is referred to the same interpolation scheme that is needed when getting conﬁdence bounds for p(y).92. the second entry from the right should be 3.262. note that some entries in the tables given in Bain (1978) seem to have typos. we encountered an apparent error in Table 4A. For example. see a similar plot for a later example.222. They point out that the resulting conﬁdence bound for xp is slightly diﬀerent (14. just as the book (and its second edition) itself is typed and not typeset.235. Presumably they were transcribed by hand from computer output. In Bain (1978) Table 4A.
γ = 0.4 q 0.9 Figure 9: Abnormal Behavior of Tabulated Conﬁdence Quantiles 32 .98 q q q q q 0.96 log(p/(1 − p)) 2 3 1 q q q q 0 2 3 4 5 Bain's tabled quantiles for n=40. γ = 0.9 4 q q q q q q q q q q log(p/(1 − p)) 2 3 1 q q q q q q q q 0 2 3 4 5 our simulated quantiles for n=40.
For all the previously discussed conﬁdence bounds.threshold=NULL. be they upper or lower bounds for their respective targets.76 seconds. .35 and slope of . We need r > 1 and at least two distinct observations among X(1) . we don’t have to run the simulations over and over for each target parameter.32 seconds. all that is needed is the set of (ˆ(zi ).2119 given here are fairly consistent with the intercept . .001 × 10−5 given in Figure 4.sample=NULL. p or y. the previous slope and intercept for a single mle calculation need to be scaled up by the factor 10000. Thus we can construct u b(z conﬁdence bounds and intervals for u and b. i. Proper use of this function only requires understanding the calling arguments. X(r) in order to estimate 33 .For the R function WeibullPivots (available within the R work space for Weibull Distribution Applications on the class web site) the call system. Also. ˆ i )) for i = 1.e.5.. conﬁdence level. since the default in the call to WeibullPivots is weib. The latter give the calculation time of a single set of mle’s while in the former case we calculate Nsim = 10000 such mle’s. in this case the full sample. Nsim=1000. WeibullPivots does all this not only for full samples but also for type II censored samples.sample=NULL. Furthermore. We will now explain the calling sequence of WeibullPivots and its output. . . The time for running the simulation should easily beat the time spent in dealing with tabulated conﬁdence quantiles in order to get desired conﬁdence bounds.graphics=T) Here Nsim = Nsim has default value 1000 which is appropriate when trying to get a feel for the function for any particular data set. for which appropriate conﬁdence factors are available only sparsely in tables.alpha=10000. i.005886 and slope of 2. unless one wants independent simulations for some reason. and for p(y) and 1 − p(y) for any collection of threshold values y and we can do this for any set of conﬁdence levels that make sense for the simulated distributions. Nsim .. respectively. purpose.time(WeibullPivots(Nsim = 10000. Here the default sample size n = 10 was used and r = 10 (also default) indicates that the 10 lowest sample values are given and used.beta=1. i. r = 10. and the time to run the simulations. . For sample sizes n = 100 with r = 100 and n = 1000 with r = 1000 the corresponding calls resulted in elapsed times of 78..e.e. When r < n we are dealing with a type II censored data set where observation stops as soon as the smallest r lifetimes have been observed. n = 10. especially since WeibullPivots does such calculations all at once for a broad spectrum of yp and p(y) and several conﬁdence levels without greatly impacting the computing time.r=10. and output of this function. . .22 and 269. The calling sequence with all arguments given with their default values is as follows: WeibullPivots(weib. an internally generated Weibull data set was used.n=10. for yp for any collection of p values. graphics = F)) gave an elapsed time of 59. . The sample size is input as n = n and r = r indicates the number of smallest sample values available for analysis. These three computing times suggest strong linear behavior in n as is illustrated in Figure 10. The intercept 57.
999. .hat.$p. The input thresh (= NULL by default) is a vector of thresholds y for which we desire upper conﬁdence bounds for p(y).Bounds. . The values of p for which conﬁdence bounds or intervals for xp are provided are also set internally as . an internal data set is generated as input sample from from W(α. . . Xr when r < n is speciﬁed. .95.sample. . . .995 and these levels indicate the coverage probability for the individual onesided bounds.975 upper bound.02. $Tail.beta. .001.005. . $p. β) with α = alpha = 10000 (default) and β = beta = 1.1.sample=NULL (the default). $alpha.05. and $Tail. 34 . either by using the full sample X1 .9. . . .elapsed time for WeibullPivots(Nsim=10000) 300 intercept = 57. The structure and meaning of these components will become clear from the example output given below. Conﬁdence levels γ are set internally as .estimates. .2119 q 100 150 200 250 q 0 0 50 q 200 400 sample size n 600 800 1000 Figure 10: Timings for WeibullPivots for Various n any spread in the data.Estimates.hat. A . . . .01. . .1).01. The available sample values X1 .99.975. .8. When weib.975. . . . $alpha.025.9.Probability. Xn or a type II censored sample X1 .95.quantile.025 lower bound is reported as a . . .99. . . .975 lower and upper bounds constitute a 95% conﬁdence interval.bounds.025.35 slope = 0. . .quantile.005. The input graphics (default T) indicates whether graphical output is desired.995.05. .5 (default).bounds. Xr (not necessarily ordered) are given as vector input to weib. . .10. (.Probability. The output from WeibullPivots is a list with components: $alpha. . and a pair of .
01quantile 0.5 0.6 97.hat (Intercept) 8976.8 62.1 1773.9 593.U 99.42 80% 7711.210 2.2 110.2quantile 0.quantile.3 2532.777 3.1 1245.005quantile 0.9 24183.9 80% 60.L 0.7 11457.6 0.5quantile 7438.5 0.95 $alpha.U 99.01quantile.6 186.5 3206.1 190.001quantile.025quantile.7 1100.070 2.7 2361.05quantile 1957.1 403.2 10606 1.001quantile.5 90% 28.6 16705 0.82 95% 6443.8 4159.3 1165.956 2.6 13676 0.64 90% 7024.5% 5094.999quantile 17531.1 0.bounds alpha.2 $beta.0 1783.22 99% 5453.025quantile.beta.9 15228 0.6 11600 1.1 65.bounds 0.001quantile 0.2 0.4 398.9 1575.0 0.6 2066.7quantile 0.9 1854.0 886.8 611.3 1362.025quantile 259.2 561.estimates 0.5 315.9 729.005quantile.18 $p.L 0.8 99% 2.9 16.5 169.L 0.5% 5948.4 1359.9 259.4 106.L alpha.6quantile 0.L beta.4quantile 2830.3 $p.9 1478.4 5290.390 2.U beta.1 2579.hat [1] 1.005quantile.855 3.95quantile 15756.975quantile 0.$alpha.2 13767.5% 6.9 12608 1.quantile.2 35 95% 12.9 36.4 6360.U 0.05 97.4 2021.L 0.8 845.U 0.7 9872.7 31.7 2827.5% 1.0 19643.4 57.6 1094.3quantile 0.U 0.5 21107.995quantile 0.8 3498.1 2176.8quantile 0.99quantile 0.5 103.1quantile 0.7 8.8 848.01quantile.9 20.5 .9quantile 8582.
0 2848.5 962.5 23903.5 497.36612 0.9 95515.6 14350.L 14580.4 15810.6 8778.975quantile.1 0.L 4694.7 19792.8 12941.4 16885.6 15545.0 10033.L 7601.4 3881.2 32863.8 11974.99quantile.8 6052.6 1518.0 15909.7 14030.1quantile.9 0.L 12434.1 5022.1 0.87242 p(14000) p(15000) 0.L 159.6quantile.6 0.9 14207.6 3673.0 19835.3 16530.U 19271.3 11584.3 0.0 30505.0 6865.0 11784.8 27938.9 6031.1 8933.2 12460.0 10538.3 5261.8 3345.U 12809.U 59783.4 15115.05quantile.7 1882.2 10244.0 40067.2 4081.5 3990.5 700.6 0.0 21744.2 6107.8 2287.5quantile.3 10226.L 6017.1quantile.5 49209.1 22670.3 506.U 121177.3 8703.U 7417.995quantile.1 10992.0 55033.4 7300.99quantile.70900 0.4 19286.1 27729.8 6492.3 13365.7 31065.0 4353.L 2548.2 1679.2 4946.93424 36 .U 10616.2 0.995quantile.4 71480.6 0.6quantile.3 4415.1 3806.2 4873.Estimates p(6000) p(7000) p(8000) p(9000) p(10000) p(11000) p(12000) p(13000) 0.975quantile.2 9380.U 35233.1 18557.90737 0.L 1012.7 17783.L 9674.8 19081.1 23773.2 8460.4 0.1 3474.L 13732.7 0.999quantile.4 11041.5quantile.9 0.2quantile.7 0.6 0.6 45187.2 27904.0 17679.8 7345.4 0.L 16377.8 2383.0 9485.05quantile.L 11203.9 5573.8quantile.1 6624.9quantile.9 3929.5 33118.0 0.0 7384.82821 0.4 0.4 12145.4 17642.5 17236.0 11031.0 2930.4 5008.U 4415.1 2833.U 5584.8 0.7quantile.4 10106.45977 0.7 352.2 0.3quantile.2 $Tail.4 0.L 3502.0 48625.9 14781.1 5543.2 13952.U 8919.2 6978.0 15605.8 14876.0 1249.U 15626.9 7971.0 6211.4quantile.2 8628.5 14311.2 10130.95quantile.8quantile.7 61385.9 7939.5 12660.95quantile.1 0.9 2477.0 0.3 5443.2 3305.3 0.6 6399.7 229.3quantile.0 0.2 12958.6 11653.9quantile.0.7 33233.2 4435.Probability.0 0.7quantile.L 1725.77385 0.2quantile.3 17417.8 8465.U 89690.L 398.1 11919.U 76425.1 0.2 7421.3 0.55018 0.2 2820.63402 0.2 1945.7 39397.1 13449.5 3329.7 15530.999quantile.0 9504.9 8139.7 0.5 71256.1 1011.9 1160.4 4811.2 20703.U 24765.3 12833.1 56445.6 40053.8 14181.6 0.U 46832.4 45328.8 17551.7 0.7 717.6 26037.3 6876.7 36739.0 22445.4quantile.1 7938.8 36918.4 16431.
88377 0.93318 p(12000).62534 p(15000).91231 0.e.50388 p(12000).89891 0.96669 0.13911 p(6000).59592 0.98596 90% 0.65055 0.82993 0.L 0. see Figure 12.33149 p(9000).80141 p(9000).23898 0. The legend in the lower right explains the red ﬁtted line (representing the mle ﬁt) and the various pointwise conﬁdence bound curves.82667 0.12173 0.L 0.U 0.46351 0.U 0.U 0.U 0.94321 0.20130 p(7000).31017 0.62988 0.U 0.98837 p(15000).83590 0.72545 0. giving 95% conﬁdence intervals (blue dashed curves) for pquantiles xp for any p on the ordinate and 95% conﬁdence intervals (green dotdashed line) for p(y) for any y on the abscissa.65435 0.76280 0.44219 0.30561 0.L 0.96267 0.5% 99% p(6000).50876 0. 15000.99653 0.99092 95% 0.46448 0.85624 0.30397 0.74451 0.91227 0.23300 0.99449 97.69856 0.76148 0.96794 0.71215 0.59276 0.L 0.85462 p(10000).98278 0.23647 0.54776 p(13000).L 0.L 0.L 0.26838 p(8000).54776 0.67426 0.87805 0.37276 0.65374 0.83973 0.87042 0.16954 0. based on respective normal approximation theory for the mle’s.53589 0.73194 0.87425 0.80178 0.58089 0.60262 0.28311 0.96268 The above output was produced with WeibullPivots(threshold = seq(6000.U 0.U 0.5% 0.97764 80% 0.26985 0. The second gives a Weibull ˆ b b plot of the generated sample with a variety of information and with several types of conﬁdence bounds.45097 p(11000).57670 0.17411 0. The ﬁrst gives the two intrinsic pivot distributions of u/ˆ and ˆ in Figure 11.94491 0. Since we entered graphics=T as argument we also got two pieces of graphical output..95756 0.99201 0.L 0.73981 p(8000).Probability.49023 0. i. graphics = T).U 0.77045 0.36523 0.60133 0. 37 .70218 0.U 0.64619 0.69359 0. and the mean µ = αΓ(1 + 1/β) together with 95% conﬁdence intervals.82187 0.44487 0.97459 0.78068 0.65671 0.50030 0.74260 0.67056 p(7000).41748 0.34488 0.78641 0.39257 p(10000).68389 0. 1000).54631 0.97742 p(14000).70414 0.19782 0.U 0. Unfortunately these two types of bounds are not dual to each other.91745 0.98205 0. Both of these interval types use normal approximations from large sample mle theory.55531 0.58696 p(14000).94650 0.91728 0.77310 0.36871 0.94149 0.95936 p(13000). one is not the inverse to the other. The legend in the upper left gives the mle’s of α.93210 0.59749 0. β (agreeing with the output above).L 0.54668 0. Nsim = 10000.38942 0.70837 0.89889 p(11000).63572 0.L 0.48549 0.$Tail.80361 0. don’t coincide or to say it diﬀerently.Bounds 99.52203 0.41612 0.
0 Figure 11: Pivot Distributions of u/ˆ and ˆ ˆ b b 38 .5 2.5 1.Frequency 0 200 600 1000 −2 −1 ^ ^ u b 0 1 Frequency 0 200 400 600 800 0.0 ^ b 1.
interval ( 5690 .001 0.991 ) MTTF ^ µ = 7960 .200 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q .001 q q 0. 11140 ) n = 10 . 95 % conf.632 q q probability q q q q q q q . r = 10 failed cases .e.010 q q q q m. 2.900 q . 95 % q−confidence bounds 95 % p−confidence bounds 95 % monotone qp−confidence bounds . 95 % conf.l. 12600 ) ^ β = 1. 95 % conf.Weibull Plot .100 .500 .01 0.999 q q q q q q q q q q q q q q q q ^ α = 8976 .267 . interval ( 1.947 . interval ( 6394 .1 1 10 100 cycles x 1000 Figure 12: Weibull Plot Corresponding to Previous Output 39 .
Thus we could not (at least not generally) have taken either to take the role of serving both purposes. Also seen on these curves are solid red dots that correspond to the abscissa values x = 6000. subject to the Nsim < ∞ limitation. Figure 13 represents an extreme case where we have a sample of size n = 2 and here another aspect becomes apparent. i. If instead of using the approximating normal distribution one uses the parametric bootstrap approach (simulating samples from an estimated Weibull distribution) the unifying principle reduces to the pivot simulation approach. i.5% lower and upper bounds to xp given in our output example above. This illustrates that the same curves are used. (1000). depending on the direction in which the curves are used. However. 40 . The curves representing the latter (pivots with simulated distributions) are the solid black lines connecting the solid black dots which represent the xp 95% conﬁdence intervals (using the 97. is basically exact except for the simulation aspect Nsim < ∞. as providing bounds for xp and p(x) simultaneously. The pivot based curves are also strictly monotone and they have exact coverage probability. although its coverage probability properties are bound to be aﬀected badly by the small sample size n = 2. Both of the ﬁrst two types of bounds (blue and green) are no longer monotone in p or x respectively. the principle behind these bounds is a unifying one in that the same curve is used for quantile and tail probability bounds. These latter bounds are also based on normal mle approximation theory and the approximation will naturally suﬀer for small sample sizes. 15000 and viewed vertically they represent 95% conﬁdence intervals for p(x). or we read vertically up from an abscissa value x to read oﬀ upper and lower bounds for p(x) on the ordinate axis as we go from the respective curves at that x value to the left.A third type of bound is presented in the orange curve which simultaneously provides 95% conﬁdence intervals for xp and p(x).e. the orange curve is still monotone and still serves that dual purpose.. However.. This is in the nature of an imperfect normal approximation for these two approaches.e. We either read sideways from p and down from the curve (at that p level) to get upper and lower bounds for xp .
200 q q q q q q q q q q q q q q q q .632 q probability .1 0.100 . 95 % q−confidence bounds 95 % p−confidence bounds 95 % monotone qp−confidence bounds q .962 . 9217 ) n = 2 .999 q ^ α = 8125 .010 q q m. 95 % conf. 29.900 q q q q q q q q . 95 % conf. 95 % conf.2 0.e.001 q 0.l.500 .85 ) MTTF ^ µ = 7709 .5 1 2 5 10 20 50 100 cycles x 1000 Figure 13: Weibull Plot for Weibull Sample of Size n = 2 41 . interval ( 6448 .q Weibull Plot . 9493 ) ^ β = 9. r = 2 failed cases q q q q q q q q q q . interval ( 6954 . interval ( 2.404 .
1 . ζk ) and ˆ of ζ and b exist and are unique b provided the covariate matrix C. . . When the plotting commands are commented out the elapsed time reduces to about 9 seconds.. .1 + . . . These two usages are illustrated in the function WeibullReg which is given on the next page. i = 1. consisting of the rows ci = (ci. . + ζk ci. . has full rank k and n > k. + ζk ci. .1 . . with independent Zi ∼ G(z) = 1 − exp(− exp(z)). It is very instructive to run this function as part of the following call: {\tt system.k for the ith observation as follows: ui = ζ1 ci. . the logWeibull data are plotted because of its more transparent relationship to the true line. . . 1996) that the mle’s ζ = (ζ1 . i = 1. . ci. n. . .e.k + bZi . This promises reasonable behavior with respect to the computing times that can be anticipated for the conﬁdence bounds to be discussed below. ci. . . . .12 Regression Here we extend the results of the previous location/scale model for logtransformed Weibull samples to the more general regression model where the location parameter u for Yi = log(Xi ) can vary with i. . i = 1. which exempliﬁes ANOVA situations. Yn Yi = ui + bEi = ζ1 ci. .1 + .k . . speciﬁcally it varies as a linear combination of known covariates ci. n . 42 .k ). . Thus we have the following model for Y1 .time(for(i in 1:1000)WeibullReg())} . . ˆ ˆ ˆ It can be shown (Scholz. . and this includes the plotting time. Alternatively. one can also only specify the remaining k − 1 columns and implicitly invoke the default option in survreg that augments those columns with such a 1vector. The ﬁxed line represents the true line with respect to which the Weibull data are generated by simulation. Two concrete examples of this general linear model will be discussed in detail later on. Of course. . n. while the scale parameter b stays constant. . . The ﬁrst is the simple linear regression model and the other is the ksample model. It is instructive to see the variability of the data clouds around the true line. i. . . i = 1. On my laptop the elapsed time for this call is about 15 seconds. and unknown parameters b > 0 and ζ1 . . It is customary that the ﬁrst column of C is a vector of n 1’s. n . . ζk ∈ R. but also the basic stability of the overall cloud pattern as a whole. . . . . . we execute the function WeibullReg a thousand times in close succession. . . . The rapidly varying plots give a good visual image of the sampling uncertainty and the resulting sampling variation of the ﬁtted lines.
function (n=50. that we did not use a status vector (of ones) in the creation # of dat. out } 43 . # The estimate out$scale is the mle of b=1/beta # and out$coef is a vector that gives the mle’s # of intercept and the various regression coefficients.alpha=10000.n) # dat <.out$coef[2].1.dist="weibull") # The last two lines would give the same result as the next three lines # after removing the # signs.e.x=NULL.05) { # We can either input our own covariate vector x of length n # or such a vector is generated for us (default). # x0 <.x0. dist = "weibull") # Here we created the vector x0 of ones explicitly and removed the implicit # vector of ones by the 1 in ~ x0+x1.data. abline(log(alpha).min(uvec)+b*log(log(11/(3*n+1))) M <.x) # survreg(formula = Surv(time) ~ x0 + x .slope=.rep(1.M)) dat <.(1:n(n+1)/2) uvec <.exp(uvec+b*log(log(1runif(n)))) # Creating good vertical plotting limits m <. # Note also.frame(time.frame(time.data.ylim=c(m.5.data=dat.1/beta # Create the Weibull data time <.beta=1. of which only # out$coef and out$scale of of interest to us.lty=2) # Here out has several components.survreg(Surv(time)~x. i.slope) #true line # estimated line abline(out$coef[1].col="blue". # treat the given time as a failure time as default.max(uvec)+b*log(log(1/(3*n+1))) plot(x.x) out <.log(time).null(x)) x <. data = dat. since survreg will use status = 1 for each observation.log(alpha)+slope*x b <.WeibullReg <. # if(is.
Zn ) consists of independent and identically distributed components with known cdf G(z) = 1−exp(− exp(z)). . . namely for r = Ca + σz we have ˆ ˆ ζ(r) = a + σ ζ(z) and ˆ = σˆ b(r) b(z) . b b and 44 . i. With ri = ci a + σzi we have sup b.e. The proof follows the familiar line used in the location/scale case.ζ 1 ˆ b(z) g 1 g ˆ zi − ci ζ(z) ˆ b(z) 1 ri − c i ζ g b i=1 b n = i=1 1 ˆ b(r) n i=1 ˆ ri − ci ζ(r) ˆ b(r) g ˆ zi − ci (ζ(r) − a)/σ ˆ b(r)/σ = 1 σn ˆ b(r)/σ and by the uniqueness of the mle’s the equivariance claim is an immediate consequence.12. where Z = (Z1 . From the equivariance property we have that ˆ ˆ ζ(Y ) = ζ + b ζ(Z) Thus ˆ ˆ ˆ ζ(Y ) − ζ b ζ(Z) ζ(Z) = = ˆ ) ˆ b(Y b(Z) bˆ b(Z) and ˆ )=bˆ b(Y b(Z) .1 Equivariance Properties From the existence and uniqueness of the mle’s we can again deduce the following equivariance properties for the mle’s.. . .ζ ri − c i ζ 1 g b i=1 b and ˜ = b/σ b n = 1 sup σ n b.ζ 1 sup σn ˜ ζ b.2 Pivots and Conﬁdence Bounds ˆ From these equivariance properties it follows that (ζ − ζ)/ˆ and ˆ have distributions that do not b b/b depend on any unknown parameters. ˜ 1 σn n n i=1 zi − ci (ζ − a)/σ 1 g b/σ i=1 b/σ ˜ 1 zi − ci ζ g ˜ ˜ b i=1 b n n ˜ using ζ = (ζ − a)/σ = = On the other hand sup b. The logtransformed Weibull data have the following regression structure Y = Cζ + bZ. 12. ˆ ) b(Y bˆ b(Z) ˆ = = b(Z) . b and ζ.
. 45 . . . c0 ) denotes the γquantile of this simulated distribution then we ˆ ˆ ˆ b(Y can view c0 ζ(Y ) − η2 (γ. The only exception is the conﬁdence bound for b. . ˆ For any target covariate vector c0 = (c0. . We can calculate ˆ ˆ ˆ b(Y c0 ζ(Y ) − kp (γ)ˆ ) as an approximate 100γ% lower bound for yp (c0 ).. . Nsim . Nsim . Not only does ˆ this dependence arise through the use of c0 ζ(Y ) in each case but also through the simulated distributions which incorporate c0 in each of these three situations. (ζ(Z ˆ ˆ ˆ (ζ(Z 1 )/b 1 b 1 Nsim )/b(Z Nsim ). ˆ1 (Z )). b(Z i = 1. . We note here that these quantiles η1 (γ) b(Y η ˆ and η2 (γ. c0. . ˆ Nsim )) and thus the empirical distribution of b b(Z ˆ ˆ1 (Z ). . the log of the characteristic life at the covariate vector c0 . . (ζ(Z Nsim ). . i. which makes sense since we assumed a constant scale for all covariate situations. Similarly. i = 1. c0 )ˆ ) as an approximate 100γ% lower bound for c0 ζ. We note here that the above conﬁdence bounds for the log(characteristic life) or regression location. This can be demonstrated as in the location/scale case for the location parameter u. except that we will need to incorporate the known covariate matrix C in the call to survreg in order to get the Nsim simulated ˆ ˆ parameter vectors (ζ(Z 1 ). ˆ1 (Z 1 )). . they diﬀer from those used in the ˆ location/scale case. where kp (γ) is the γquantile ˆ b(Z of the simulated (c0 ζ(Z i ) − wp )/ˆ i ). then we can view ˆ b(Z ˆ )/ˆ1 (γ) as approximate 100γ% lower bound for b.1 . b(Z Nsim )). . . . . . . . . For a given covariate vector c0 we can target the pquantile yp (c0 ) = c0 ζ + bwp of the Y distribution with covariate dependent location parameter u(c0 ) = c0 ζ and scale parameter b.e. obtained under the covariate conditions speciﬁed through V . i = 1. . . . if η1 (γ) is the γquantile of the simulated ˆ i ). . Nsim . Nsim . The same comment applies to the other conﬁdence bound procedures following below. ˆ where x = (y0 − c0 ζ(y)/ˆ b(y) and y is the originally observed sample vector.k ) the distribution of (c0 ζ(Y ) − c0 ζ)/ˆ ) is free b(Y of unknown parameters since ˆ ˆ c ζ(Z) c0 ζ(Y ) − c0 ζ = 0 ˆ ) ˆ b(Y b(Z) ˆ b(Z and we can use the simulated values (c0 ζ(Z i ))/ˆ i ). to approximate this parameter free distribution. . c0 ) depend on the original covariance matrix C. This distribution can be approximated to any desired degree via simulation. .which have a distribution free of any unknown parameters. i = 1. For the tail probability p(y0 ) = G((y0 − c0 ζ)/b) with given threshold y0 and covariate vector c0 we obtain an approximate 100γ% upper bound by using the γquantile of the simulated values ˆ G(c0 ζ(Z i ) − xˆ i )). pquantiles and tail probabilities depend on the covariate vector c0 that is speciﬁed. If η2 (γ. . . just as in the location scale case.
Each call generates its own data set of 20 points using 5 diﬀerent levels of covariate values. cn are not all the same.3 The Simple Linear Regression Model Here we assume the following simple linear regression model for the Yi = log(Xi ) Yi = ζ1 + ζ2 ci .10quantile line. . it gets on the wrong side of it. Estimated lines are indicated by the corresponding color coded dashed lines. i.. . . Note that these bounds should be interpreted pointwise for each covariate value and should not be viewed as simultaneous conﬁdence bands. .12. This curvature adjusts to some extent to the sampling variation swivel action in the ﬁtted line. n . Nsim . . Yn 1 c1 . . The R function call system. c) is involved. . = Cζ + b Z . In the context of the general regression model we have k = 2 here and c1. Zn Here ζ1 and ζ2 represent the intercept and slope parameters in the straight line regression model for the location parameter and b represents the degree of scatter (scale) around that line. It can easily be modiﬁed to handle any simple linear regression Weibull data set. . i.i = 1 and c2. . . . Also shown in these plots is the . In contrast. n. In matrix notation this becomes Y1 Y = . i = 1. because we have only 95% conﬁdence in the bound. Multiple regression relationships could also be accommodated quite easily. The second feature shows up in the bottom plot where the conﬁdence curve crosses the true percentile line. . We repeated the above with a sample of size n = 50 (taking about 85 seconds for each plot) and the corresponding two plots are shown in Figure 15. We point out two features. but using Nsim = 1000 for faster response.e. . is the γquantile of the simulated values (c0 ζ(Z i ) − wp )/ˆ i ). To get a feel for the behavior of the conﬁdence bounds it is useful to exercise this function repeatedly. as shown in the plots. 46 .e. . the quantile lower conﬁdence bounds based on Nsim = 10000 simulations are represented ˆ by a curve. .. In this second set of plots the lower conﬁdence bound curve is generally closer to the ﬁtted quantile line than in the ﬁrst set of plots. . Such things happen. The function WeibullRegSim is part of the R workspace on the class web site. = . The conditions for existence and uniqueness of the mle’s are satisﬁed when the covariate values c1 . . . This results from the fact that the factor kp (γ) used in the construction of the lower ˆ ˆ ˆ ˆ b(Z bound. + b . This illustrates the sample size eﬀect. b(Y i = 1. . The data are generated from a true Weibull distribution with a known true regression line relationship for log(α) in relation to the covariates. and these values change depending on which c0 = (1. . . . we are getting better or less conservative in our bounds.i = ci for i = 1. . 1 cn ζ1 ζ2 Z1 .Nsim=10000)) (done twice and recording an elapsed time of about 76 seconds each) produced each of the plots in Figure 14.time(WeibullRegSim(n=20. ζ1 (Y ) + ζ2 (Y )c − kp (γ)ˆ ).
1−quantile estimated characteristic life estimated 0.1−quantile 95% lower bound to 0.1−quantile q q q 11 q 10 q q q q q log(time) 9 q q q q q q q q q 8 q q 7 total sample size n = 20 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 Figure 14: Weibull Regression with Quantile Bounds (n = 20) 47 .1−quantile q q q q q q q q q 10 11 q log(time) q q q q 9 q q q q 7 8 q total sample size n = 20 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 true characteristic life true 0.true characteristic life true 0.1−quantile estimated characteristic life estimated 0.1−quantile 95% lower bound to 0.
1−quantile 95% lower bound to 0.1−quantile estimated characteristic life estimated 0.1−quantile estimated characteristic life estimated 0.1−quantile 95% lower bound to 0.true characteristic life true 0.1−quantile q q q q q q q q q 11 q q q q q q q q q q q q q q q q q 10 q q q q q q q q q q q log(time) 9 q q q q 8 q q q 7 q q q 6 total sample size n = 50 confidence bounds based on 10000 simulations 4 covariate 6 8 −2 0 2 true characteristic life true 0.1−quantile q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 11 q q q q q q q q q q log(time) 9 q q q q q q 8 q 7 q 6 total sample size n = 50 confidence bounds based on 10000 simulations −2 0 2 covariate 4 6 8 Figure 15: Weibull Regression with Quantile Bounds (n = 50) 48 .
Our reason for following this choice is its similarity to the standard test statistic used in the corresponding normal distribution model. Zn1 +n2 +n3 Here the Yi have location u1 = ζ1 for the ﬁrst n1 observations. . . . . . . If we were interested in the question whether the three samples come from the same location/scale model we would consider testing the hypothesis H0 : ζ2 = ζ3 = 0 or equivalently H0 : u1 = u2 = u3 . . ζ3 )t . . . they have location u2 = ζ1 + ζ2 for the next n2 observations and they have location u3 = ζ1 + ζ3 for the last n3 observations. equation (6. . . . . 0 .12)) for which the same approximate null distribution is invoked. . Instead of using the likelihood ratio test. Without going into the detailed derivation one can give the following alternate and more transparent expression for Λ1 Λ1 = 3 i=1 ni (ˆi (Y ) − u(Y ))2 u ˆ . 1 1 1 0 . . Also. ζ3 )K11 (ζ2 . . . ζ + b . .e. . Z1 .. . Yn +n 1 2 Yn1 +n2 +1 . α3 ) but same unknown shape β for each sample. ζ3 ).12. = Cζ + b Z . . . .4 The kSample Problem A second illustration example concerns the situation of k = 3 samples with same scale but possibly diﬀerent locations. 2 ˆ ) b(Y 49 . 1 0 1 1 . α2 . Y1 . . when Zi ∼ Φ(z) instead of Zi ∼ G(z) as in the above regression model. . 2 . In terms of the untransformed Weibull data this means that we have possibly diﬀerent unknown characteristic life parameters (α1 . 0 ζ3 Zn +n 1 2 1 Zn1 +n2 +1 . Thus we can consider u1 = ζ1 as the baseline location (represented by the ﬁrst n1 observations). The formal deﬁnition of the test statistic proposed by Lawless is as follows: −1 ˆ ˆ ˆ ˆ Λ1 = (ζ2 . 1 0 . Yn1 +1 Y = . . the modiﬁcation of this test statistic for general k(= 3) is obvious. In matrix notation this model is Yn1 .4. ζ2 can be considered as the incremental change from u1 to u2 and ζ3 is the incremental change from u1 to u3 . 302. . . ˆ ˆ where K11 is the asymptotic 2 × 2 covariance matrix of (ζ2 . . 0 ζ1 Zn1 +1 . . Yn1 +n2 +n3 1 0 1 Zn1 0 . The modiﬁcations for k = 3 should be obvious. i. . . = . . we will employ the test statistic suggested in Lawless (1982) (p. . which invokes the χ2 = χ2 distribution as approximate k−1 2 null distribution. . .
. i = 2. . k ni ˆ )2 = 1 ¯ (Yij − Yi )2 . ˆ ) b(Y and thus ui (Y ) − u(Y ) ˆ ˆ ui (Y ) − u u(Y ) − u ˆ ˆ = − ˆ ) ˆ ) ˆ ) b(Y b(Y b(Y have distributions free of any unknown parameters which in turn implies the above claim about Λ1 . = . . In the normal case Λ1 reduces to the traditional F test statistic (except for a ¯ constant multiplier. 2. b(Y k−1 We don’t have to use a large sample approximation either. ˆ ) b(Y u(Y ) − u ˆ . Nsim . u2 (Y ) = ζ1 (Y )+ ζ2 (Y ) . which are the corresponding mle’s in the normal case. using a large sample argument. Nsim = 10000 should be suﬃcient for most practical purposes. i = 2. . i = 1. . b(Y n i=1 j=1 . in the normal case one uses the Fk−1. However. namely (n − k)/((k − 1)n) = (n − 3)/(2n)) when writing ui (Y ) = Yi . The distribution of these Nsim values Λ1 (Z j ) will give a very good approximation for the true null distribution of Λ1 . This is seen as follows from our equivariance properties. . Under the hypothesis H0 when u1 = u2 = u3 (= u) we thus have that ui (Y ) − u ˆ . 3 and ˆ ˆ ˆ ˆ b(Z j ) b(Z j ) b(Z j ) b(Z j ) and thus Λ1 (Z j ) = 3 i=1 u(Z j ) ˆ = ˆ b(Z ) j 3 i=1 ni ui (Z j )/N ˆ ˆ b(Z ) j ni (ˆi (Z j ) − u(Z j ))2 u ˆ ˆ b(Z )2 j j = 1. 3 ˆ ¯ ¯ ¯ ¯ and u(Y ) = Y = (n1 /N )Y1 + (n2 /N )Y2 + (n3 /N )Y3 and ˆ .N −k distribution as the exact null distribution of the properly scaled Λ1 and the uncertainty in ˆ )2 is not ignored by simply referring to the χ2 distribution. Recall that ζ1 (Y ) − ζ1 ui (Y ) − ui ˆ ζ1 (Y ) + ζi (Y ) − (ζ1 + ζi ) u1 (Y ) − u1 ˆ = . 50 . ˆ Thus we can estimate the null distribution of Λ1 by using the Nsim simulated values of ζi (Z j )/ˆ j ) b(Z to create ˆ ˆ ˆ u1 (Z j ) ˆ ζ1 (Z j ) ui (Z j ) ˆ ζ1 (Z j ) + ζi (Z j ) = . u i=1 with N = n1 +n1 +n3 . since the null distribution of Λ1 (in the logWeibull case) is free of any unknown parameters and can be simulated to any desired degree of accuracy. 3 ˆ ) ˆ ) ˆ ) ˆ ) b(Y b(Y b(Y b(Y have distributions free of unknown parameters. . . The accuracy of this approximation is entirely controllable by the choice of Nsim .where ˆ ˆ ˆ ˆ ˆ u1 (Y ) = ζ1 (Y ) . . = . u3 (Y ) = ζ1 (Y )+ ζ3 (Y ) and u(Y ) = ˆ ˆ ˆ ˆ 3 (ni /N )ˆi (Y ) .
there are strong diﬀerences. Using 2 the χ2 distribution would result in much smaller pvalues than appropriate when these are on the 2 low side. For a good approximation the points should follow the shown main diagonal.6 0.5 asymptotic χ2 density 2 Density 0. Each point on this plot corresponds to a pquantile where the abscissa value of such a point gives us the approximate pquantile of the Λ1 null distribution and the corresponding ordinate gives us the pquantile of the χ2 distribution which 2 is suggested as asymptotic approximation. 51 .The following plots examine the χ2 approximation to the Λ1 null distribution in the case of 3 2 samples of respective sizes n1 = 5. This is far from qualifying for a large sample situation. The vertical probability scale facilitates the reading oﬀ of p for each quantile level on either axis. The discrepancy is quite strong. Although the superimposed χ2 density is similar in character. n2 = 7 and n3 = 9. The histogram in Figure 16 is based on Nsim = 10000 simulated values of Λ1 (Z ).4 0.1 0.2 0. 0.0 0.3 0 2 4 6 Λ* 1 8 10 12 Figure 16: Histogram of Λ1 Null Distribution with Asymptotic Approximation Figure 17 shows a QQplot of the approximating χ2 quantiles corresponding to the Nsim = 10000 2 simulated and ordered Λ1 (Z i ) values.
98 qq qq q qq qq qq qq qq q q q q q q q 0.95quantile for the Λ1 null distribution 2 would yield a p ≈ .997 qq q qq qq qq q q q qq 0. tested whether three normal random samples with common σ have the same mean. the 52 .95 q qq q q q q q q q q 0.998 qq q q q qq qq 0.9999 q q 15 0.e. the . the approximation is not good.96 q q q q q q q q q q 0. 2 Figure 18 shows the comparison of the χ2 quantiles with the corresponding quantiles of the 2×F2. Again.006 according to the χ2 approximation.9 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q χ2 −quantiles 2 0 5 10 0 20 simulated & sorted 40 Λ* 1 60 Figure 17: QQPlots Comparing Λ1 (Z ) with χ2 2 Clearly the pquantiles of the χ2 distribution are smaller than the corresponding pquantiles of the 2 the Λ1 null distribution.05 would translate into a very 2 overstated observed signiﬁcance level of .993 q qq qq qq q q q q q qqq qqq 0.9995 0. Thus a true Λ1 pvalue of . we only need to go up to the main diagonal at that abscissa location and 2 look up the p on the χ2 scale to the left. i.94 q q q q q q q q q q 0. The departure from the main diagonal is not as severe as in Figure 17 .21−3 2 distribution (the factor 2 adjusts for the fact that the numerator of the F statistic is divided by k − 1 = 2).999 q q q q q q q q q q q qq q 0.97 q q q qq qq q q q q q qq q 0. i.994 on the χ2 scale.99 q qq qq qq qq qq qq q q q q q q q q qq qq q q q qq qq qq 0. If we take the p corresponding to a Λ1 on the abscissa and look up its p according to the χ2 scale..20 q 0.995 q q qqq qq qq q q q 0.93 q q q q q q q q q q q q q q q q q q q q 0. For example. but the eﬀect is similar.e. This comparison is the counter part to the previous situation if we were to use the asymptotic χ2 distribution as approximation to the exact and true 2 × F ratio distribution had 2 we carried out the corresponding test in a normal data situation..
18) 15 20 25 Figure 18: QQPlots Comparing 2 × F2. This is diﬀerent from doing so for each sample separately since we use all k samples to estimate b.5)/1000.998 0.997 q q 0. i.96 qq q qq q q q q q 0.9995 q 15 0.21−3 with χ2 2 Aside from testing the equality of logWeibull distributions we can also obtain the various types of conﬁdence bounds for the location.94 qq qq qq q q q q q q 0.93 q q qq qq qq qq q q q qq qq qq qq qq qq 0. whether the k locations are the same or not.993 0.5)/1000.95 qq qq qq 0.97 q q q q qq qq 0.2) 10 0.2.98 0 5 0 5 10 2*qf((1:1000−. when they become critical. 0.. We omit the speciﬁc details since they should be clear from the general situation as applied in the simple linear regression case. This will result in tighter conﬁdence bounds than what would result from the corresponding conﬁdence bound analysis of individual samples.χ2 distribution would yield much smaller pvalues than appropriate when these pvalues are on the 2 small side.99 q qq qq qq q q q q q 0.999 q 0. which was assumed to be the same for all k populations. 53 . pquantiles and tail probabilities for each sampled population.e. scale.9 qq qq qq qq qq qq qq q q q q q q q q q qq qq qq qq q q q q q qq qq qq qq qq q q q q q q qq qq q qq q q q q q q q q q qq qq q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.995 q q q q q q qchisq(1:1000−.
i = 1.12. .i. ≤ V(n) of these modiﬁed versions of Vi . and from that V . are respectively the same as in the location/scale case. Using the previous computational formulas for the D. . .e. i. . 54 . . . All we need to do is generate vectors Z = (Z1 . W 2 and A2 can be approximated to any desired degree. . Zi ∼ G(z). A2 . ˆ ∼ G(z). namely ˆ Yi − vi ζ(Y ) . This assumes that the covariate matrix C contains a vector of ones. followed by D = D(V ).d. they do not depend on the additional covariates that may be present. using the sorted values V(1) ≤ . i = 1.5 Goodness of Fit Tests As in the location/scale case we can exploit the equivariance properties of the mle’s in the general regression model to carry out the previously discussed goodness of ﬁt tests by simulation. . . For any covariate matrix C and sample size n the null distributions of D. Zn ) i. . say Nsim = 10000. . W 2 and A2 we only need to deﬁne the appropriate Vi . Using the data representation Yi = ci ζ + bZi with i. or Y = Cζ + bZ this is seen from the equivariance properties as follows ˆ ˆ ˆ c ζ + bZi − ci (ζ + bζ(Z)) Zi − ci ζ(Z) Yi − ci ζ(Y ) = i = ˆ ) ˆ b(Y bˆ b(Z) b(Z) and thus Vi = G ˆ Yi − ci ζ(Y ) ˆ ) b(Y =G ˆ Zi − ci ζ(Z) ˆ b(Z) . compute the mle’s ζ(Z). W 2 and A2 . for ﬁnite sample sizes the eﬀects of these covariates may still be relevant. Vi = G ˆ ) b(Y Pierce and Kopecky (1979) showed that the asymptotic null distributions of D.i. one can easily simulate the null distributions of these statistics since they do not depend on any unknown parameters. Their respective empirical distributions would serve as excellent i approximations to the desired null distributions of these test of ﬁt criteria. would yield values Di . Nsim . . .. W 2 = W 2 (V ) and A2 = A2 (V ). ˆ b(Z). However. However.d. Wi2 . The eﬀect of using the small sample tables given by Stephens (1986) is not clear. Repeating this a large number of times. . n .
55 . . . i = 1. n with Z1 . 5. . where α = 10000.5 and characteristic life parameters αi = α exp(scale × ci ) which are aﬀected in a multiplicative manner by the covariate values ci . b = 2/3.i. . . slope = 1. . .q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 12 q q q q q q q q q q q log(failure time) q 8 q q q q q q q q q q q q q q q q q 4 6 2 q −1 0 covariate c 1 2 Figure 19: Weibull Regression Example Figure 19 shows some Weibull regression data which were generated as follows: Yi = log(α) + slope × ci + bZi . The solid sloped line in Figure 19 indicates the true log(characteristic life) for the Weibull regression data while the dashed line represents its estimate. . . Zn i. .3 and there were 20 observations each at ci = i − 2. .d. Here Xi = exp(Yi ) would be viewed as the Weibull failure time data with common shape parameter β = 1/b = 1. i = 1. . . . . ∼ G(z) . 5. .5 for i = 1.
0 0.10 0.12 2000 p−value = 0. This example was produced by the R function WeibullRegGOF available in the R workspace on the class web site.5 Figure 20: Weibull Goodness of Fit for Weibull Regression Example For this generated Weibull regression data set Figure 20 shows the results of the Weibull goodness of ﬁt tests in relation to the simulated null distributions for the test criteria D.n = 100 .8001 Frequency 1000 W2 = 0.263 0 500 0.0501 0 400 0.5 A2* 1.20 0.10 0. Nsim = 10000 1200 p−value = 0. It took 105 seconds on my laptop.06 D * 0.30 0.25 0.7135 Frequency 1500 1000 A2 = 0.0325 0 500 0. This function performs Weibull goodness of ﬁt tests for any supplied regression data set.02 0.0 1. W 2 and A2 .05 0. 56 .00 0.15 W 2* 0.08 0. The hypothesis of a Weibull lifetime distribution cannot be rejected by any of the three test of ﬁt criteria based on the shown pvalues.7746 Frequency 800 D = 0.04 0.35 p−value = 0.00 0. When this data set is missing it generates its own Weibull regression data set of size n = 100 and uses the indicated covariates and parameters.
57 . Z <. . so that Zi ∼ Φ(z). . b = 2/3. Here the solid sloped line indicates the mean of the normal regression data while the dashed line represents the estimate according to an assumed Weibull model. slope = 1. .14 q q q q q q q q q q q q q q q q q q q 12 q q q q q q q q q q q q q q q q log(failure time) 10 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 8 q q q q q q q q q q q q q q q 6 −1 0 covariate c 1 2 Figure 21: Normal Regression Example Figure 21 shows some normal regression data which were generated as follows: Yi = log(α) + slope × ci + bZi . The simulation of the test of ﬁt null distributions remains essentially unchanged except that a diﬀerent random number starting seed was used.5 for i = 1.i. The reason for this wider gap is that the ﬁtted line aims for the .0.. This data set was produced internally within WeibullRegGOF by modifying the line that generated the original data sample. Such data would have a lognormal distribution.rnorm(n. Zn i. . . 5. i = 1.e. i. . .3 and there were 20 observations each at ci = i − 2. . . ∼ Φ(z) .632quantile and not the median of that data. Note the much wider discrepancy here compared to the corresponding situation in Figure 19. . where α = 10000. .1) is used. n with Z1 . Here Xi = exp(Yi ) would be viewed as the failure time data. .d.
265 0 500 0. 58 .00 0.2 W 2* 0.0 0. Any slight diﬀerences in the null distributions shown here and in the previous example are due to a diﬀerent random number seed being used in the two cases.5 1.0 0.0953 0 400 0.3 0.1 0. However.02 0.73 0 500 0.5 Figure 22: Weibull Goodness of Fit for Normal Regression Example Here the pvalues clearly indicate that the hypothesis of a Weibull distribution should be rejected.0213 Frequency 800 D = 0.n = 100 .10 0. although the evidence in the case of D is not very strong.0 A2* 1.08 0.04 0. for W 2 and A2 there should be no doubt in the (correct) rejection of the hypothesis.06 D* 0. Nsim = 10000 1200 p−value = 0.0 2.5 2.12 p−value = 5e−04 Frequency 1500 W2 = 0.4 2500 p−value = 4e−04 Frequency 1500 A2 = 1.
Vol 12.References Bain. New York. Dekker. (1996) (revised 2001). and Singpurwalla. Thoman. “Tests based on EDF statistics. No. Society for Industrial and Applied Mathematics.. (1986). Scholz.Barlow. R. 363371. Boeing Information and Support Services. L. N. No. (editors). J.W. Theory and Methods. Fussell. New York. Statistical Models and Methods for Lifetime Data. “Testing Goodness of Fit for the Distribution of Errors in Regression Models.”(informal note). S.A.. New York. and Engelhardt M. D. Theoretical and Applied Aspects of System Reliability and Safety Assessment. K. and Antle.W.R. D’Agostino. “Inferences on parameters of the Weibull distribution. (1982).E. J.” Biometrika. 59 .A.J. (1979). Scholz. M. Second Edition.E. “Birnbaum’s Contributions to Reliability. Lawless. D. “Maximum Likelihood Estimation for Type I Censored Weibull Data Including Covariates.” GoodnessofFit Techniques. Bain. Thoman. C. Bain.” Technometrics.J. 15 Saunders.. 445460. (1975). Philadelphia PA 19103. L. M. F. 33 South 17 Street. Dekker. Stephens. Dekker. editors.” Technometrics.J. 1. (1970). No. and tolerance limits in the Weibull distribution. “Exact conﬁdence intervals for reliability. L. New York. (1978) Statistical Analysis of Reliability and LifeTesting Models. Vol 11. 3.J.D. C. and Stephens.. (2008) “Weibull Probability Paper.F..J.B. (1969).” ISSTECH96022. Pierce and Kopecky.C. and Antle. F. R. Bain. L. Vol.(1991) Statistical Analysis of Reliability and LifeTesting Models.E.R. John Wiley & Sons. 66. 2.” Reliability and Fault Tree Analysis.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.