You are on page 1of 11

Author's personal copy

Reliability Engineering and System Safety 94 (2009) 16181628

Contents lists available at ScienceDirect

Reliability Engineering and System Safety

journal homepage:

A practical procedure for the selection of time-to-failure models based on the

assessment of trends in maintenance data
D.M. Louit a, R. Pascual b,, A.K.S. Jardine c

Komatsu Chile, Av. Americo Vespucio 0631, Quilicura, Santiago, Chile

lica de Chile, Av. Vicun
a Mackenna 4860, Santiago, Chile
Centro de Minera, Ponticia Universidad Cato
Department of Mechanical and Industrial Engineering, University of Toronto, 5 Kings College Road, Toronto, Ont., Canada M5S 3G8

a r t i c l e in f o

a b s t r a c t

Article history:
Received 24 April 2008
Received in revised form
6 April 2009
Accepted 10 April 2009
Available online 18 April 2009

Many times, reliability studies rely on false premises such as independent and identically distributed
time between failures assumption (renewal process). This can lead to erroneous model selection for the
time to failure of a particular component or system, which can in turn lead to wrong conclusions and
decisions. A strong statistical focus, a lack of a systematic approach and sometimes inadequate
theoretical background seem to have made it difcult for maintenance analysts to adopt the necessary
stage of data testing before the selection of a suitable model. In this paper, a framework for model
selection to represent the failure process for a component or system is presented, based on a review of
available trend tests. The paper focuses only on single-time-variable models and is primarily directed to
analysts responsible for reliability analyses in an industrial maintenance environment. The model
selection framework is directed towards the discrimination between the use of statistical distributions
to represent the time to failure (renewal approach); and the use of stochastic point processes
(repairable systems approach), when there may be the presence of system ageing or reliability
growth. An illustrative example based on failure data from a eet of backhoes is included.
& 2009 Elsevier Ltd. All rights reserved.

Trend testing
Time to failure
Model selection
Repairable systems

1. Introduction
As described by Dekker and Scarf [1] maintenance optimization consists of mathematical models aimed at nding balances
between costs and benets of maintenance, or the most appropriate moment to execute maintenance. Many times, these
models are fairly complex and maintenance analysts have been
slow to apply them, since often data are scarce or, due to lack of
statistical theoretical knowledge, models are very difcult to
implement correctly in an industrial setting. Other, more
qualitative techniques such as reliability centered maintenance
(RCM) or total productive maintenance (TPM) have then played an
important role in maintenance optimization. Nevertheless, data
analysis and statistical modeling are denitely very valuable tools
engineers can employ to optimize the maintenance of assets
under their supervision.
Acknowledging that many reliability studies or maintenance
optimization programs do not require sophisticated statistical
inputs, Ansell and Phillips [2] reinforce that even at a basic level,
we should always be critical of the analysis and ask whether a
technique is appropriate.

 Corresponding author.

E-mail address: (D.M. Louit).

0951-8320/$ - see front matter & 2009 Elsevier Ltd. All rights reserved.

The gap between researchers and practitioners of maintenance

has resulted in the fact that although many models rely on very
specic assumptions for their proper application, these are not
normally discriminated by the practitioner according to the real
operating conditions of their plants or eets, i.e. real-world data
[3,22,43]. OConnor (cited in [2]) points out that much reliability
analysis is done under false premises such as independence of
components, constant failure rates, identically distributed variables, etc. As critical constituents of any reliability analysis, timeto-failure models are not excluded of this situation; thus many
times the use of conventional time-to-failure analysis techniques
is adopted when they are, in fact, not appropriate.
The aim of this paper is to provide practitioners with a review
of techniques useful for the selection of a suitable time-to-failure
model, specically looking at the case when the standard use of
statistical distributions is useless, given the presence of long-term
trends in the maintenance failure data. The paper focuses on the
selection of single time variable models, since they are the most
commonly applied in practice, rather than in more complex
multivariate models such as the proportional hazards model,
which have also shown great value in their application to
maintenance and reliability [3].
The above does not imply that we propose that time-to-failure
models should be the center of attention in a reliability
improvement study; on the contrary, they should only act as a

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628


Maintenance Data

Model Selection
Failure Process

Optimization Model



Thomas, a discussant of Ansell and Phillips [10], states that the

insufcient data quantity problem would never disappear, since as
the aim of maintenance is to make failures rare events, one could
expect that as maintenance improves, fewer failures should occur.
Thus the solution, he says, is based on better a priori modeling.
Bayesian techniques, directed to incorporate into the models all
the prior information available, are useful in this case. Barlow and
Proschan [11], Lindley and Singpurwalla [12], Singpurwalla [13],
Walls and Quigley [44] and Guikema and Pate-Cornell [45] are
relevant sources for Bayesian methods in reliability.
Although it escapes the reach of this paper, it is valuable to
mention Bayesian analysis, because it provides a means to reach
optimal decisions, using standard models as the ones discussed
later on in this paper, when lack of data is a problem. In simple
words, and as described in [14], Bayesian estimation methods
incorporate information beyond that contained in the data,
such as:

Fig. 1. General framework in a reliability improvement study (modied from [2]).

tool for the objective the engineers assign to it. Actually, the
logical priority is that of objective, data and, nally, model
selection (as shown in Fig. 1). In other words, as suggested by
Ansell and Phillips [4] an analysis should be problem led rather
than technique or model centered. Nevertheless, correct
assessment of the failure process and of time to failure is
usually of critical importance to the (posterior) economic
analysis required to nd an optimal solution to the problem that
originated the analysis. Discussion of approaches to maintenance
and reliability optimization and models mixing reliability and
economics can be found in several references, for example [57].
When dealing with reliability eld data, frequently some
practical problems such as the unavailability of large sets of data
occur. This paper will briey touch on this and other problems, as
they are relevant to the discussion of model selection techniques.
This document is structured as follows. Section 2 refers to
common practical problems found in the analysis of reliability
data. Section 3 describes the concept of repairable systems and
identies some of the models available for their representation.
Section 4 presents a series of graphical and analytical tests used to
determine the existence of trends in the data. Section 5 proposes a
procedure based on these tests to correctly select a time-to-failure
model, discriminating between a renewal approach and the use of
an alternative, non-stationary model, such as the non-homogeneous Poisson process (NHPP). Section 6 presents numerical
examples using data coming from a eet of backhoes. Finally,
Section 7 contains a summary of the paper.

2. Some practical problems in reliability data

2.1. Scarce data
One major problem associated with reliability data is,
ironically, the lack of sufcient data to properly run statistical
analyses, as many authors mentioned repeatedly. In fact, Bendell
[8] points out that all statistical methodologies are limited when
done based on small data sets, since the amount of information
contained by such sets is by nature small. Furthermore, as
mentioned in [9], empirical evidences indicate that sets of failure
times typically contain ten or fewer observations, which emphasize the need to develop methods to deal adequately with small
data sets (naturally the larger the data set, the more precise the
statistical analysis). Also, many data sets are collected for
maintenance management rather than reliability. Hence the
information content is often very poor and can be misleading
without careful scrutiny of the material and cleaning if necessary.

 previous systems estimates;

 generic information coming from actual data from similar

 generic information from reliability sources and

 expert judgment and belief.
This information is converted into a prior distribution, which is
then updated using new data gathered during the operation into
posterior distributions representing the failure process of the
component or system of interest. Scarf [3] warns about the fact
that many times the expert judgment comes from the same
people responsible for the maintenance actions; thus it is possible
that prior distributions may reect the current practice rather
than the real underlying failure process; so special care should be
taken when attempting to use a Bayesian approach.
2.2. Data censoring
Another common practical problem in a reliability study is the
presence of censored data. A censored observation corresponds to
a non complete time to failure or to a non-failure event, but this
does not mean it does not contain relevant information for the
reliability modeler. Censoring can usually be classied in right,
left or interval censoring. Truncation may also be a practical
problem in some data sets, commonly confused with censoring.
An example of the latter is the case when the time to failure can
only be registered if it lies within a certain observation window
(failures that occurred outside this interval are not observed, thus
no information is available to the modeler).
Another situation that may arise is when data collection begins
in a specied moment of time and the operating time of the items
under analysis is unknown before the start of the monitoring
period. The monitored life to failure of a component under
observation can then be called residual life.
If time-to-failure data are found to be subject of representation
by a renewal process (RP), a statistical distribution can be tted to
times between failures (see Section 3). There are well-known
techniques to determine parameters for many distributions in the
presence of these censoring types. Detailed descriptions of such
techniques can be found in [1518], among others. Tsang and
Jardine [19] propose a methodology for the estimation of the
parameters of a Weibull distribution using residual-life data.
2.3. Combining data
A valid alternative when data are scarce is the combination or
pooling of data from similar pieces of equipment. This is a

Author's personal copy


D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

normally used procedure in reliability analysis, especially in

operations where a large number of identical systems are utilized,
such as eets of mobile equipment or parallel production lines.
Stamatelatos et al. [14] provide a check list of coupling factors for
common cause failure events (one triggering cause originates
several failures), found to be helpful in the denition of conditions
needed for proper data pooling, as usually these same conditions
are met by equipment subject to data pooling:

same design;
same hardware;
same function;
same installation, maintenance or operations people
(and conditions);
same procedures;
same systemcomponent interface;
same location and
same environment.

The word same in the list above could be replaced by

similar in many cases, as engineers should use their judgment
and experience in assessing the similarities between two or more
items before they are combined for analysis. If a more rigorous
analysis is needed, when in the presence of two or more samples
of data from possibly different populations, various statistical
methods can be used to determine if there are signicant
differences between two populations (or two-sample problem),
even in heavily censored data sets [20,21].

2.4. Effect of repair actions

Usually in practice, components can be repaired or adjusted,
rather than replaced, whenever a breakdown occurs. These
interventions (here referred to as repair actions) are likely to
modify the hazard rate of the component; so it can be argued that
the expected time to failure after an intervention takes place is
different from the expected time to the rst failure of a new
component. But the most common approach to reliability
assessment does not take this into account, as time to failure is
modeled using statistical distributions assumed to be valid for
every failure of the component or system (rst, second, third, etc.).
This is called the as good as new or renewal assumption. In fact,
in the reliability literature, as Ascher and Feingold [22] notice,
there is a xation in mortality, or rather, its equivalent in reliability
terms, time to failure in a non-repairable item or time to rst failure
of a repairable system.
In order to take repair actions into account (when they
effectively affect the behavior of the component or system under
study), a so called repairable systems approach has to be

3. Repairable systems
A non-repairable system is one which, when it fails, is
discarded (as repair is physically not feasible or non-economical).
The reliability gure of interest is, then, the survival probability.
The times between failures of a non-repairable system are
independent and identically distributed, iid [23]. This is the most
common assumption made when analyzing time-to-failure data,
but as many authors mention, it might be unrealistic in some
situations. Many examples have been given of systems that rather
than being discarded (and replaced) on failure, are repaired. In
this case, the usual non-repairable methodologies (statistical

distribution tting such as Weibull analysis, for instance) simply

cannot be appropriate [24].
Repairable systems, on the other hand, are those that can be
restored to their fully operational capabilities by any method,
other than the replacement of the entire system [22]. In this sense,
reliability is interpreted as the probability of not failing for a
particular period. This analysis does not assume that times
between failures are independent or identically distributed. When
dealing with repairable systems, reliability is not modeled in
terms of statistical distributions, but using stochastic point
The number of failures in an interval of time can be
represented through a stochastic point process. Furthermore, in
this case the point process can be interpreted as a counting
process, and what it counts is the number of events (failures) in a
certain time interval. In reliability analysis, such a process is said
to be time truncated when it stops counting at a particular instant.
It is called failure truncated when it stops counting when a certain
number of failures is reached.
The ve main stochastic process models applied to modeling of
repairable systems are [22]


renewal process (RP).

homogeneous Poisson process (HPP).
branching Poisson process (BPP).
superposed renewal process (SRP) and
non-homogeneous Poisson process (NHPP).

The RP assumes that the system is returned to an as new

condition every time it is repaired, so that it actually converts time
between failures in time to rst failure of a new system or, in
other words, leads to a non-repairable system approach, in
which time to failure can be modeled by a statistical distribution
and the iid assumption is valid. The HPP is a special case of the RP,
that assumes that times between failures are independent and
identically exponentially distributed, so the iid assumption is also
valid and the time to failure is described by an exponential
distribution (constant hazard rate).
The BPP is used to represent time-to-failure data that can be
assumed to be identically distributed, but not independent. As
Ascher and Feingold [22] mention, this process is applicable when
a primary failure (or a sequence of primary failures having iid
times to failure) can trigger one or more subsidiary failures; thus
there is dependence between the subsidiary failures and the
occurrence of the primary, triggering failure. Very few practical
applications of this model are found in the literature. A thorough
description of the BPP and its application to the study of
repairable systems can be found in [25].
The SRP is a process derived from the combination of various
independent RPs, and in general it is not an RP. For example, think
of a set of parts within a system that are discarded and replaced
every time they fail, independently. Each part can be modeled as
an RP, and then the system would be modeled using an SRP. But as
a possibility exists to investigate the times between failures for
the system as a whole, then the question whether this approach is
justied or not arises [4]. In addition, the superposition of
independent RPs converges to a Poisson process (possibly nonhomogeneous), when the number of superimposed processes
grows (by the theorem of Grigelionis, see [26]).
Since the RP and the HPP are equivalent to the regular, nonrepairable items methods, and the BPP and SRP have either not
been largely applied or, in the case of the latter, can be
approximated by an NHPP (with a relatively large number of
constituent processes), they will not be described in greater
detail here. These models are covered in detail by Ascher and
Feingold [22].

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

When the repair or substitution of the failed part in a complex

system does not involve a signicant modication of the
reliability of the equipment as a result of the repair action, the
NHPP is able to correctly describe the failure-repair process. Then,
the NHPP can be interpreted as a minimal repair model [27]. Note
that for (i) hazardous maintenance, i.e. when condition of the
equipment is worse after repair than it was before failure or for (ii)
imperfect repair, i.e. when reliability after repair is better than just
before failure, but not as good as new, other models have been
proposed. These models are even more exible than the NHPP, as
they allow for better representation of imperfect repair scenarios
(see, e.g. [28,29], among others). Nevertheless, we concentrate on
the NHPP given its simplicity, along with the following reasons
(as listed by Coetzee [30]):


observed from the failure data, thus an iid assumption may be

made, since time between failures is apparently independent from
the age of the equipment. For B and C, however, a trend is clearly
present; for B a decrease in reliability is evident, while for C
reliability growth is taking place. Whenever these latter situations
occur, and there is signicant evidence for recognition of an
ageing process taking place, the usual RP approach has to be
disregarded, and an alternative non-stationary approach would
usually be used to model time between failures for the system.
Note that equipment age refers to the age of the system under
analysis, measured from the moment it was rst put into
operation, as apposed to the time elapsed since the last repair
(which is signicant for a RP).
3.1. The non-homogeneous Poisson process

i. it is generally suitable for the purpose of modeling data with a

trend, due to the fact that the accepted formats of the NHPP
are monotonously increasing/decreasing functions;
ii. NHPP models are mathematically straightforward and their
theoretical base is well developed;
iii. models have been tested fairly well, and several examples are
available in the literature for their application.
Under the NHPP, times between successive failures are neither
independent nor identically distributed, which makes this model
the most important and widely used in the modeling of repairable
systems data [31,32]. Actually, whenever a trend is found to be
present in time between failures data, a non-stationary model
such as the NHPP is mandatory, and the regular distribution tting
methods are not valid.
The next section of the paper reviews a series of trend-testing
techniques found very helpful for model selection purposes,
focusing on the discrimination between a renewal approach and
the need for an alternative model, such as the NHPP. Should the
reader decide to pursue the modeling of times to failure using
other non-stationary models (i.e. imperfect repair models), the
techniques presented in this paper are equally valuable to
establish the existence of trends in the data, which justify the
decision of not using the standard distribution tting methods
such as Weibull analysis.
Fig. 2 shows three theoretical situations that may occur in
practice, in relation to time to failure of a particular system. From
the gure, it can be noticed that the three data sets generated
from systems AC are very different. For A, no clear trend can be

For each of the following diagrams,


represents the occurrence of a

In practical terms, the NHPP permits the modeling of trend in

the number of failures to be found in an interval in relation to total
age of the system, through the intensity function. Two popular
parameterizations for the intensity function of an NHPP are the
power law intensity (also called Weibull intensity) and the loglinear intensity.
The power law intensity gives its name to the power law
process (NHPP with Weibull intensity) and is given by

lt Zbtb1 ,


with Z, b40 and tX0.

The log-linear intensity function has the form

lt eabt ,


with Noa,boN and tX0.

Several practical examples reviewed show that the power-law
process is preferred, because of its similarities with the Weibull
distribution tting method used regularly for non-repairable data.
Actually, Z is the scale parameter and b the shape parameter, and
the intensity function is of the same form of the failure rate of a
Weibull distribution.
Details concerning the tting of a power-law and log-linear
intensity functions to data from water pumps in a nuclear plant
are discussed in [23]. A practical example using jet engine data
combined from several pieces of equipment is found in [24].
Ascher and Kobbacy [33], and Baker [34] also provide applications
of both log-linear and power-law processes.
When using an NHPP, engineers could imagine it as a twostage problem: the rst (say, inner stage) relates to the tting of
an intensity function to data and the second (outer stage) uses
the cumulative intensity function to estimate reliability
(or probability of failure) for the system. When trends are not
present and data can be assumed to be iid, only one stage is
needed, where directly a time-to-failure distribution is tted to
data and reliability (or probability of failure) estimates are
obtained from it.

4. Trend testing techniques


Fig. 2. Possible trends in time between failures.

Given that a clear need for trend testing in time between

failures data is identied, a rst step in model selection should be
that of assessing the existence of trend or time dependency in the
data. Several techniques accomplish this task, and a selection of
them is described here.
Before presenting the testing techniques, it is of great
importance to identify the possible trends one may encounter
when analyzing reliability data. A trend in the pattern of failures
can be monotonic or non-monotonic. In the case of a monotonic
trend (such as the ones shown in Fig. 2), the system is said to be

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

4.1. Graphical methods

4.1.1. Cumulative failures vs. time plot
The simplest method for trend testing is a plot of the
cumulative failures against time for the system observed (Newton, in Ansell and Phillips [4]). When a linear plot results, data can
be assumed to have no trend and the same distribution for time
between failures is accepted. Fig. 3 shows generic plots expected.
Plot A clearly shows the existence of a trend in the data, while plot
B shows no evidence of trend. Sometimes, a curve like plot C
occurs, where instead of a smooth trend, two or more straight
lines can be plotted. This may be the consequence of changes in
the maintenance policy or changes in the operational conditions
of the equipment; for example, dividing the failure behavior of the
system into two or more clearly different periods. When this
situation arises, one alternative is to discard data not
representative of the current situation; thus a no-trend plot
would result, for the most recent data set, and a renewal
assumption could then be made. When a plot like D occurs, a
non-monotonic trend may be present in the data set.
This kind of test is very simple to perform, does not require any
calculations and is very powerful when there are strong trends in
the data. When slight trends are present, this solution may not be
enough and an analytical test should be performed. A weakness
for this test is that assessment of trend is based on interpretation
(as in all graphical procedures). Ascher and Feingold [22] provide
an example of the use of this graphical test for diesel propulsion
engines in a US Navy ship.
Also in [22], it is noted that this test may result in masking
local variations when very large samples are available. An
alternative procedure is to divide the total observation interval
into several equally sized intervals, calculating (and then plotting,
if necessary) the average rate of occurrence of failures for each of
them, using


4.1.2. Scatter plot of successive service lives

A complementary test to the cumulative failures against time
plot is one consisting in plotting the service life of the ith failure,
against that of the (i1)th failure If no trend is observable, only
one cluster of points should be obtained. Two or more clusters, or
linear plots, indicate trend.
This test is also very helpful in checking for unusual values for
the failure times in a set of data, which may be related to poor
data collection, accidents or other situations not representative of
the failure process, and thus providing a means for identication
of candidates for data ltering. Knights and Segovia [35], for
example, applied this test to data coming from mining shovel
cables (see Fig. 4 for an example of this type of plot, points out of
the cluster suggest a revision of some failure times).
The tests described up to this point are for a single system only.
When in presence of multiple systems, two alternatives are
available for combination of the systems for assessment of trend.
The rst is based on the assumption that all systems follow the
same failure process (independently), and leads to the use of the
total time on test (TTT) transform of failure times (see denition
in Section 4.1.4). This approach results in one single process with
times to failure given by the TTT transformed values; thus singlesystem tests can be applied to the transformed data set. The
second alternative assumes that all systems follow possibly
different failure processes, and leads to the combination of the

Service life ((i-1)th failure)

Fig. 4. Example of a successive service life plot (highlighted points indicate

Cumulative Failures


Cumulative Failures

Ni t  N i1 t
with i  1Dtptpi Dt;

Cumulative Failures

li t

where Ni(t) is the total number of failures observed from time zero
to the ith interval and Dt the length of each interval. If there is a
trend in the data, then it will be reected in the average rate of
occurrences calculated. Then, if the system is improving, the
successive values of li(t) calculated will decrease and vice versa.


Cumulative Failures

improving (or happy system) if the time between arrivals tends

to get longer (decreasing trend in number of failures); and it is
said to be deteriorating (or sad system) if the times tend to get
shorter (increasing trend in number of failures). Non-monotonic
trends are said to occur when trends change with time or they
repeat themselves in cycles. One common form of non-monotonic
trend is the bath-tub shape trend, in which time between failures
increases in the beginning of the equipment life, then tends to be
stable for a period and decreases at the end.
It should be remembered, when testing for trend, that the
choice of the time scale could have an impact on the pattern of
failures; so special attention has to be given to the selection of the
time unit (calendar hours, operating hours, production through
put, etc. [31]).

Service life (ith failure)




Fig. 3. Cumulative failures vs. time plotsexamples (A: Increasing trend, B: no trend, C: two clearly different periods, D: non-monotonic trend).

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

results of various single-system tests into what is called a

combined test. This is usually performed by combining test
statistics from single systems (for examples see Sections 4.2.2
and 4.2.4).
4.1.3. NelsonAalen plot
Another useful graphical test is the NelsonAalen plot. This test
uses a non-parametric estimate of the cumulative intensity
function of an NHPP, L(t), and plots it against time [31]. The
estimate is given by


T ij

YT ij

Fig. 5. Typical shape of TTT plots. A: increasing, B: decreasing, C: bath-tub shape

intensities. Modied from [31]).


where Tij is the time to the ith failure of the jth process under
observation, Y(Tij) the number of systems operating immediately
before time Tij and L(t) 0 for tomin{Tij}. The formula in Eq. (4)
is valid for multiple systems under observation (multiple
processes, j 1,2,y,m).
If there is no trend, then the plot would tend to be linear, and
any deviation from a straight line indicates some kind of trend. It
should be noted that when only one system is observed, then the
NelsonAllen plot is equivalent to the cumulative failures vs. time
plot. It is also interesting to notice that the NelsonAalen plot
counts the number of systems operating before a certain time;
thus it may include suspensions to assess trend.
4.1.4. Total time on test (TTT) plot
As mentioned above, sometimes we are in the presence of
several pieces of equipment. Now, the combined failure process
for the entire group of components observed may or may not
present a trend. This test is directed to the identication of trend
for the combined behavior. So, if there are m independent
processes with the same intensity function (i.e. several identical
systems under observation) and the observation intervals for each
one are all contained in the interval [0,S], then the total number of
failures will be N m
i1 ni , where ni is the number of failures
observed for each process in its particular observation interval.
For the superposed process (combination of the m individual
processes), let Sk denote the time to the kth failure time. And let
p(u) denote the number of processes under observation at time u.
If all processes are observed from time 0 to time S, then p(u) is
equal to m. Then, Tt 0 pu du is the total time on test from
time 0 to time t (this is known as the total time on test, or TTT,
transformsee [36]).
The TTT plot test for NHPPs is given by a plot of the total time
on test statistic, calculated as
R Sk
pu du


upper right section, whereas for a bath-tub shape (Fig. 5C), further
spacing will occur in the middle section of the curve.
Some other graphical tools, such as control charts for reliability
monitoring (described by Xie et al. [37]), can also constitute a
useful method to identify if improvement or deterioration has
occurred in a particular parameter of interest, such as the rate of
occurrences of failures (ROCOF) or failure intensity. Nevertheless,
they rely on an RP assumption and are not directed to test for
trend when evaluating the use of a repairable systems approach.
4.2. Analytical methods
If preferred over the graphical approach, analytical testing
methods are available to test data for trends. Additionally, the null
and alternative hypotheses of these tests are of great help in the
determination of the most suitable model for the data.
Ascher and Feingold [22] provide a very complete survey of
analytical trend tests, and present them organized according to
their null hypothesis (i.e. RP, HPP, NHPP, monotonic trend, nonmonotonic trend, etc.). Hereby, only the most popular tests will be
described, according primarily to Elvebakk [38]. Other methods
are described and referenced in [46].
4.2.1. The Mann test
The null hypothesis for this non-parametric test is an RP. Then,
if this hypothesis is accepted, we can continue the reliability
analysis, tting a distribution to time-to-failure data. The
alternative hypothesis is a monotonic trend.
The test statistic is calculated counting the number of reverse
arrangements, M, among the times between failures. Let T1,T2,y,Tn
be the interarrival times of n failures. Then a reverse arrangement
occurs whenever TioTj for ioj. For example, if the following times
to failure were observed for a system:
21; 17; 48; 37; 64; 13;


against the scaled failure number k/N, with k 1,2,y,N. When

p(u) m, that is, when all processes are observed during the
complete interval, the TTT plot is also called a scaled NelsonAllen
plot with axes interchanged [31]. Fig. 5 shows different forms
possible to obtain when constructing a TTT plot. As in other
graphical techniques, a linear plot is representative of a no-trend
situation (thus validating a renewal assumption for the entire
group of observed items). The TTT plot is especially useful to
identify non-monotonic trends in time-to-failure data, such as the
bath-tub failure intensity (see Fig. 5C).
It is important to mention that spacing between points will not
be constant in a TTT Plot. Rather, in an increasing trend (Fig. 5A),
larger spacement between points will occur at the lower left
section of the plot. In the presence of a decreasing trend curve
(Fig. 5B) points should tend to be further from each other at the

then, for the rst failure time, 3 reverse arrangements occur, as

21o48, 21o37 and 21o64. We have that for the sample:
M 3 3 1 1 0 8.
In general

n1 X

IT i oT j


i1 ji1

I(  ) is an indicator variable used for counting the reverse

arrangements present in the data set. It takes the value of 1
whenever the condition is met, in this case, when (TioTj). Mann
[39] who originally developed the test, showed that M is
approximately normally distributed for nX10 and tabulated
probabilities for smaller samples.
If the hypothesis of an RP is correct, then the expected number
of reverse arrangements is equal to n(n1)/4, so large deviations

Author's personal copy


D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

from this number would indicate presence of a trend. This test

considers a single system under observation.
4.2.2. The Laplace test
This well-known test has a null hypothesis of HPP vs. an
alternative hypothesis of NHPP with monotonic intensity. In other
words, if the null hypothesis is not rejected, then we can assume
that times between failures are iid exponentially distributed. If not,
then a NHPP should be used. The test is optimal for NHPP with
log-linear intensity function.
The general idea behind the test is to compare the mean value
of the failure times in an interval with the midpoint of the
interval. If the mean of the failure times tends to deviate from the
midpoint, then a trend is present and data cannot be assumed to
be independent and identically distributed.
The test statistic, L, approximately follows a standard normal
distribution under the null hypothesis, and is calculated as

Note that Eq. (7) can be simplied when the starting point of
observation is time t 0, since (b+a) and (ba) both equal the end
point of the observation interval. The statistic above is applicable
for the case when only one process is being observed. Generalization of the laplace Test to more than one process is fairly
simple, and for m processes, the statistic is given by the following
expression (combined Laplace test statistic):

m 1
i1 Sj1 T ij  Si1 2 ni bi ai

1 m
12Si1 ni bi  ai


L is expected to be equal to zero (or very close to zero) when no

trend is present in the data. Then, the null hypothesis is rejected
with small or large values of L, and the sign is an indication
of the type of trend. If L40, then an increasing trend
(deterioration) is detected. Analogously, if Lo0, then a decreasing
trend (improvement) is detected.
4.2.3. The LewisRobinson test
This test, used for testing of the RP assumption, was derived by
Lewis and Robinson [40]. The test statistic LR is obtained by
dividing the Laplace test statistic L by the estimated coefcient of
variation of the times between failures, cc
v, which is calculated as



where X is a random variable representing the times between

failures of the system. Then, the LR statistic is given by


MH 2


Tj  a


where a, b, Tj and n have the same meaning as in the Laplace test.

As before, the generalization to m processes is given by
(combined MH test statistic):

MH 2

m X

b  ai
ln i
T ij  ai


 12 nb a
12 nb  a

j1 T j

where Tj is the age at failure for the jth failure, [a,b] is the interval
of observation and n is given by:
nobserved number of failures if the process is time truncated
n  1 if the process is failure truncated:


i1 j1


4.2.4. The military handbook test

As in the Laplace test, the null hypothesis for this one is a HPP,
and the alternative a NHPP with monotonic intensity. This test is
optimal for NHPP with increasing power-law intensity (reliability
deterioration with Weibull intensity function).
The test statistic for a single system (process) is w2 distributed
with 2 n degrees of freedom under the null hypothesis, and is
dened as


with L given by Eq. (7). If the failure times follow a HPP, then LR is
asymptotically equivalent to L, as cc
v is equal to 1 when the times
between failures are exponentially distributed. That is, LR is
asymptotically standard normally distributed. As in the Laplace
test, the expected value of the statistic is zero when no trend is
present; thus deviations from this value indicate trend. The sign is
an indication of the type of trend.

In this case, the MH statistic is w2 distributed with 2p degrees of

P ^
freedom under the null hypothesis of HPPs, where p m
i1 ni .
TTT-based statistics for both the Laplace and the Military
Handbook test are also available for the pooling of data from
several systems (see [31]).
Another test, known as the AndersonDarling Test (derived by
Anderson and Darling [41]), has been found to be very powerful
against non-monotonic trends, but normally simpler graphical
tests are able to detect this situation. For this reason, it will not be
described here.

5. Model selection procedure

Vaurio [42] and Ascher and Feingold [22] proposed procedures
based on various trend tests, directed to the proper selection of
models for time-to-failure data. Both methodologies are robust
and incorporate a set of tests leading to the selection of a model,
but are subject to simplication in order to achieve a larger use of
the testing techniques by maintenance analysts. Based on this, a
new diagram consisting of several steps to model selection is
proposed. This procedure only considers explicitly two models with
practical applicationthe RP and the NHPP (though it leaves the
option open to the user to select other non-stationary models).
The procedure also reduces the number of tests considered in
order to concentrate the users efforts on the techniques that seem
to be subject to easier practical implementation. The diagram
presented below is similar to that of Vaurio [42], which though
very complete in its procedure for model selection, appears to be
too complex for regular industrial application. The Ascher and
Feingold [22] ow diagram (AF ow diagram) is simpler, but as
they consider a broader review of tests and do not include them
explicitly in the graphical representation of the procedure, it can
possibly result in misguiding the practitioner.
Fig. 6 presents the suggested guideline for model selection,
applying the testing techniques reviewed here. As mentioned, this
procedure is believed to be a simple way for maintenance analysts
to correctly assess the failure processes in their operations and to
discriminate whether a standard renewal approach or a
repairable systems approach should be used to represent them.
Although the use of an NHPP is suggested in this paper as it is
capable of representing data with a trend, the reader should note
that the NHPP is best interpreted as a minimal repair (or as bad
as old) model, thus it will not be necessarily be the most
appropriate model for imperfect repair situations (neither as good
as new nor as bald as old system after repair). Minimal repair is
dened here as the situation when the components reliability

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

CMMS Databases
Collect operating time for
each failure registered
1. Define object of
2. Identify similar




Valid to
Order them chronologically
(failures only)

Graphical tests
Test for Renewal Assumption
Mann Test

to data


Laplace Test
LR test
Military HB

Test against NHPP




goodness of

(or other non-stationary model)

goodness of


Fig. 6. Framework for time-to-failure model selectiona practitioners approach.

characteristics are not noticeable changed by the repair action.

Nevertheless, the procedure remains valid for the testing of the
renewal assumption, even in the scenario of imperfect repair.
When imperfect repair is observed, more exible models (e.g.
models I and II by Kijima [29]) could provide a better t to the
failure data.
The box for testing for renewal assumption and the box for
testing against NHPP have been grouped in Fig. 6, under TEST.
Intuitively, all tests considered (graphical tests, Mann, Laplace,
LewisRobinson and Military Handbook) have the same
objective of testing for trends in the data, but only the Laplace
and Military Handbook tests explicitly have a HPP as null
hypothesis (that is, if no trend is found when applying these
tests, then a HPP can be used, and an exponential distribution
should be tted to failure times). It is important to notice
that suspended data is not considered in trend testing, as no
technique is currently available to assess the existence of trends
incorporating the effect of censored observations, to my knowledge, with the exception of the NelsonAalen plot. This plot,
as it counts the number of systems in operation before each
failure, may indirectly use suspensions in the trend assessment
calculations. This is not necessarily a shortcoming of the framework proposed, since the tests reviewed usually need few failure
times to be validated (using only failures and ignoring censored

Two-sample techniques allow for evaluation of the pooling

of censored data sets, so that intensity functions or distributions
may be tted using more information. In a repairable systems
approach, the simplest extension of the treatment of a single
system to multiple systems is when they are all observed from the
same time, as in this combination the group may be thought of as
a single system. In this case, the rate of occurrences of failures for
the combined set must be divided by the number of systems to
determine the ROCOF for a single unit.

6. Case study
Failure data coming from a eet of backhoes, collected
between 1998 and 2003, are used in the following numerical
example to illustrate the use of the trend tests and selection
procedure described in the paper. These equipments are operated
by a construction rm in the United States. The data consist of the
age at failure for each of 11 pieces of equipment, with a total of 43
failures. Table A1 in the Appendix presents the complete data set.
The following example will consider two cases: (i) single-system
analysis, for which all calculations are based on backhoe #7 (with
7 failures during the observation period) and (ii) multiple-systems
analysis, using the pooled data for all 11 backhoes. Time to failure
is expressed in operating hours.

Author's personal copy


D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

6.1. Single-system example

Cumulative failures

Backhoe number 7 had seven failures during the years

19982003. The times between failures are: 3090, 850, 904, 166,
395, 242 and 496 h. By looking at these numbers, deterioration
appears to be present in the data, that is, times between failures
seem to be shorter as the equipment ages. Fig. 7 (cumulative
failures vs. time plot) conrms this belief. Similar plots are
obtained for 7 out of the 11 backhoes analyzed. When plotting the
successive service lives, only two points lay outside the main
cluster (implying that one failure time might be an anomaly, see
Fig. 8). This is the rst service life of 3090 hours, which is much
larger than the rest. This failure time (as well as all other failure
times in the data set) was validated by the eet operator;thus no
need for further revision or elimination of data is identied.
According to the procedure depicted in Fig. 6, the Mann test could
be used to conrm the ndings obtained by the cumulative
failures vs. time plot, that a trend is present in the data. In this
example, M 0+1+0+3+1+1 6. This number differs signicantly
from the expected value of 10.5. Actually, it is signicantly low, as
PMoM=H1 0:881, implying that the data show a degradation
trend (H1 is the alternative hypothesis, of degradation trend).
As a trend was identied, a renewal approach is not valid for
modeling time to failure, and a repairable systems approach is
required. Alternatively, testing through parametric tests such as
the Laplace, LewisRobinson or Military Handbook tests was
performed, with similar results (NHPP with monotonic intensity is
the model selected). The Laplace statistic L equals 2.189 for
backhoe #7. This is given that the process in this case is failure
truncated (e.g. nal point of the observation interval given by the
failure time of the last recorded failure event). This result is








statistically signicant (at 0.05 signicance level), indicating an

increasing trend in the intensity of the failure process (i.e.
degradation). The LR statistic equals 10.187 and the MH statistic
equals 3.5698, leading us to the same conclusion (the latter at 0.01
signicance level). Fitting of an intensity function to the data is
the next step considered in the procedure presented in Fig. 6, but
will not be included here for brevity. It is important to mention
that not all tests need be performed in this case. On the contrary,
the idea is to choose a test that accommodates the user and, only
in the case that not enough evidence is available, then validate its
results using a second testing technique.

6.2. Multiple-system example

If we were interested in modeling the time to failure for the
pooled group of backhoes, existence of trends in the pooled
behavior should be assessed. A common mistake is to assume that
every piece of equipment operated for the same number of hours
over the entire interval, which in this case is not true. As these
backhoes are operated by a construction company in different
projects, during the same observation period of 19982003, some
of them operated more than 6000 h, whereas others barely
reached 2500 h of operation. Then, for the performance of
graphical tests such as the TTT or NelsonAalen plots, the modeler
should have special care in identifying the number of units in
operation for different ages of the eet. Note that time is
expressed in operating hours, so in the superposed failure process,
11 backhoes can be considered to be in operation just for the
interval between t 0 and t 2028 operating hours. For more
advanced ages of the eet, the number of backhoes in operation
decreases. Fig. 9 shows a TTT plot constructed for this example.
From the form of the curve, a clear indication of an increasing
trend in the intensity of failures is observed (i.e. degradation).
Fig. 10 presents a Nelson-Aalen plot for the same data, again
suggesting the same conclusion.
Results for the combined Laplace and Military Handbook tests
are the following: L 3.096 (signicant indicator of degradation,
at 0.01 signicance level) and MH 31.859 (signicant indicator
of degradation, at 0.01 signicance level). Then, the pooled failure
behavior for the eet effectively presents a trend, thus a RP cannot


Age (hours)
Fig. 7. Cumulative failures vs. time plotbackhoe #7.


TTT statistic

Service life (ith failure) (hours)









Service life ((i-1)th failure) (hours)

Fig. 8. Successive service life plotbackhoe #7.



Scaled failure number

Fig. 9. TTT plotall backhoes combined. Increasing intensity is suggested.

Author's personal copy

D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628



Table A1
Failure times for a eet of 11 backhoes.
Fleet of backhoesfailure data

Cumulative intensity


Equipment #

Failure #

Age at failure (h)

TBF (h)










































Fig. 10. NelsonAalen plotall backhoes combined. Increasing intensity is

be assumed (the use of an NHPP is suggested). This was expected

not only because the graphical tests indicated a similar trend, but
also because the results obtained for single systems showed that
many of the backhoes presented individually a deterioration
trend. The next step in this example would be, then, to estimate
parameters for an NHPP representing the failure process of the
backhoes. Coetzee [30] contains expressions for parameter
estimation and goodness of t procedures for an NHPP.

7. Conclusions
This paper reviews several tests available to assess the
existence of trends, and proposes a practical procedure to
discriminate between (i) the common renewal approach to model
time to failure and (ii) the use of a non-stationary model such as
the NHPP, which is a model believed to be subject to an easy
practical implementation, within the alternatives available
for a repairable systems approach. The procedure suggested
is simple, yet it is believed that it will lead to better representation of the failure processes commonly found in industrial
operations. Through numerical examples, the use of the several
tests reviewed is illustrated. Some practical problems that
one may encounter when analyzing reliability data are also
briey discussed and references are given in each case for further

We would like to thank Dr. Dragan Banjevic, of the Center for
Maintenance Optimization and Reliability Engineering at the
University of Toronto, for his valuable comments on an earlier
version of this paper.

Appendix. Data used in the numerical example

See Table A1.

[1] Dekker R, Scarf PA. On the impact of optimisation models in maintenance
decision making: the state of the art. Reliability Engineering and System
Safety 1998;60:1119.
[2] Ansell JI, Phillips MJ. Strategies for reliability data analysis. In: Comer P, editor.
Proceedings of the 11th advances in reliability technology symposium.
London: Elsevier; 1990.
[3] Scarf PA. On the application of mathematical models in maintenance.
European Journal of Operational Research 1997;99:493506.
[4] Ansell JI, Phillips MJ. Practical problems in the statistical analysis of reliability
data (with discussion). Applied Statistics 1989;38:20531.
[5] Jardine AKS, Tsang AHC. Maintenance, replacement and reliability: theory and
applications. Boca Raton: CRC Press; 2006.
[6] Campbell JD. Uptime: strategies for excellence in maintenance management.
Portland: Productivity Press; 1995.
[7] Campell JD, Jardine AKS, editors. Maintenance excellence: optimizing
equipment life-cycle decisions. New York: Marcel Dekker; 2001.
[8] Bendell T. An overview of collection, analysis, and application of reliability
data in the process industries. IEEE Transactions on Reliability 1998;37:

Author's personal copy


D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628

[9] Percy DF, Kobbacy KAH, Fawzi BB. Setting preventive maintenance schedules
when data are sparse. International Journal of Production Economics
[10] Ansell JI, Phillips MJ. Discussion of practical problems in the statistical
analysis of reliability data (with discussion). Applied Statistics 1989;38:
[11] Barlow RE, Proschan F. Inference for the exponential life distribution. In:
Serra A, Barlow RE, editors. Theory of reliability, Proceedings of the
International School of Physics Enrico Fermi. Amsterdam: North-Holland;
1986. p. 14364.
[12] Lindley DV, Singpurwalla ND. Reliability and fault tree analysis using expert
opinions. Journal of the American Statistical Association 1986;81:8790.
[13] Singpurwalla ND. Foundational issues in reliability and risk analysis. SIAM
Review 1988;30:26481.
[14] Stamatelatos M, et al. Probabilistic risk assessment procedures guide for
NASA managers and practitioners. Washington, DC: Ofce of Safety and
Mission Assurance NASA Headquarters; 2002.
[15] Meeker WQ, Escobar LA. Statistical methods for reliability data. New York:
Wiley; 1998.
[16] OConnor PDT. Practical reliability engineering. 3rd ed. New York: Wiley; 1991.
[17] Mann NR, Shafer RE, Singpurwalla ND. Methods for statistical analysis of
reliability and life data. New York: Wiley; 1974.
[18] Barlow RE, Proschan F. Mathematical theory of reliability. New York: Wiley;
[19] Tsang AH, Jardine AKS. Estimators of 2-parameter Weibull distributions from
incomplete data with residual lifetimes. IEEE Transactions on Reliability
[20] Bohoris GA. Parametric statistical techniques for the comparative analysis of
censored reliability data: a review. Reliability Engineering and System Safety
[21] Bohoris GA, Walley DM. Comparative statistical techniques in maintenance
management. IMA Journal of Mathematics Applied in Business and Industry
[22] Ascher HE, Feingold H. Repairable systems reliability. Modeling, inference,
misconceptions and their causes. New York: Marcel Dekker; 1984.
[23] Saldanha PLC, de Simone EA, Frutoso e Melo PF. An application of nonhomogeneus Poisson point processes to the reliability analysis of service
water pumps. Nuclear Engineering and Design 2001;210:12533.
[24] Weckman GR, Shell RL, Marvel JH. Modeling the reliability of repairable
systems in the aviation industry. Computers and Industrial Engineering
[25] Rigdon SE, Basu AP. Statistical methods for the reliability of repairable
systems. New York: Wiley; 2000.
[26] Thompson WA. On the foundations of reliability. Technometrics 1981;23:
[27] Calabria R, Pulcini G. Inference and test in modeling the failure/repair process
of repairable mechanical equipments. Reliability Engineering and System
Safety 2000;67:4153.

[28] Brown M, Proschan F. Imperfect repair. Journal of Applied Probability

[29] Kijima M. Some results for repairable systems with general repair. Journal of
Applied Probability 1989;26:89102.
[30] Coetzee J. The role of NHPP models in the practical analysis of maintenance
failure data. Reliability Engineering and System Safety 1997;56:1618.
[31] Kvaloy JT, Lindqvist BH. TTT-based tests for trend in repairable systems data.
Reliability Engineering and System Safety 1998;60:1328.
[32] Miller AG, Kaufer B, Carlsson L. Activities on component reliability under the
OECD Nuclear Energy Agency. Nuclear Engineering and Design 2000;198:
[33] Ascher HE, Kobbacy KAH. Modelling preventive maintenance for repairable
systems. IMA Journal of Mathematics Applied in Business and Industry
[34] Baker RD. Some new tests of the power law process. Technometrics 1996;38:
[35] Knights PF, Segovia R. Reliability model for the optimal replacement of shovel
cables. Transactions of the Institution of Mining and Metallurgy, Section A:
Mining Industry 1999;108:A8A16.
[36] Bergman B. On age replacement and the total time on test concept.
Scandinavian Journal of Statistics 1979;6:1618.
[37] Xie M, Goh TN, Ranjan P. Some effective control chart procedures for
reliability monitoring. Reliability Engineering and System Safety 2002;
[38] Elvebakk G. Extending the use of some traditional trend tests for repairable
systems data by resampling techniques, 1999. /
[39] Mann HB. Nonparametric tests against trend. Econometrica 1945;13:24559.
[40] Lewis PA, Robinson DW. Testing for monotone trend in a modulated renewal
process. In: Proschan F, Sering RJ, editors. Reliability and biometry.
Philadelphia: SIAM; 1974. p. 16382.
[41] Anderson TW, Darling DA. Asymptotic theory of certain goodness of t
criteria based on stochastic processes. Annals of Mathematical Statistics
[42] Vaurio JK. Identication of process and distribution characteristics by testing
monotonic and non-monotonic trends in failure intensities and hazard rates.
Reliability Engineering and System Safety 1999;64:34557.
[43] Tukey JW. The future of data analysis. Annals of Mathematical Statistics
[44] Walls L, Quigley J. Building prior distributions to support Bayesian reliability
growth modelling using expert judgement. Reliability Engineering and
System Safety 2001;74(2):11728.
[45] Guikema SD, Pate-Cornell ME. Probability of infancy problems for space
launch vehicles. Reliability Engineering and System Safety 2005;87:
[46] Viertava J, Vaurio JK. Testing statistical signicance of trends in learning,
ageing and safety indicators. Reliability Engineering and System Safety