You are on page 1of 6


fm Page 1 Monday, January 24, 2005 11:11 PM


Although we rarely think of it, reliability and maintenance

are part of our everyday lives. The equipment, manufactured
products, and fabricated infrastructure that contribute sub-
stantively to the quality of our lives have finite longevity. Most
of us recognize this fact, but we do not always fully perceive
the implications of finite system life for our efficiency and
safety. Many, but not all, of us also appreciate the fact that
our automobiles require regular service, but we do not gen-
erally think about the fact that roads and bridges, smoke
alarms, electricity generation and transmission devices, and
many others of the machines and facilities we use also require
regular maintenance.
We are fortunate to live at a time in which advances in
understanding of materials and energy have resulted in the
creation of an enormous variety of sophisticated products and
systems many of which (1) were inconceivable 100 or 200 or
even 20 years ago, (2) contribute regularly to our comfort,
health, happiness, efficiency, or success, (3) are relatively inex-
pensive, and (4) require little or no special training on our
part. Naturally, our reliance on these devices and systems is
continually increasing, and we rarely think about failure and
the consequences of failure.

1 Page 2 Monday, January 24, 2005 11:11 PM

2 Nachlas

Occasionally, we observe a catastrophic failure. Fatigue

failures of the fuselage of aircraft [1], the loss of an engine
by a commercial jet [1], the Three Mile Island [1] and Cher-
nobyl [1] nuclear reactor accidents, and the Challenger [2]
and Discovery [3] space shuttle accidents are all widely known
examples of catastrophic equipment failures. The relay circuit
failure at the Ohio power plant that precipitated the August
2003 power blackout in the northeast United States and in
eastern Canada [4] is an example of a system failure that
directly affected millions of people. When these events occur,
Downloaded by [Engineers Australia ] at 01:24 15 May 2014

we are reminded dramatically of the fallibility of the physical

systems on which we depend.
Nearly everyone has experienced less dramatic product
failures such as that of a home appliance, the wearout of a
battery, and failure of a light bulb. Many of us have also
experienced potentially dangerous examples of product fail-
ures such as the blowout of an automobile tire.
Reliability engineering is the study of the longevity and
the failure of equipment. Principles of science and mathemat-
ics are applied to the investigation of how devices age and
fail. The intent is that a better understanding of device failure
will aid in identifying ways in which product designs can be
improved to increase life length and limit the adverse conse-
quences of failure. The key point here is that the focus is upon
design. New product and system designs must be shown to
be safe and reliable prior to their fabrication and use. A
dramatic example of a design for which the reliability was
not properly evaluated is the well-known case of the Tacoma
Narrows Bridge, which collapsed into the Puget Sound in
November 1940, a few months after its completion [1].
The study of the reliability of an equipment design also
has important economic implications for most products. As
Blanchard [5] states, 90% of the life-cycle costs associated
with the use of a product are fixed during the design phase
of a product’s life.
Similarly, an ability to anticipate failure can often imply
the opportunity to plan for the efficient repair of equipment
when it fails or, even better, to perform preventive mainte-
nance in order to reduce failure frequency. Page 3 Monday, January 24, 2005 11:11 PM

Introduction 3

There are many examples of the products for which the

system reliability is far better today than it was previously.
One familiar example is the television set, which historically
experienced frequent failures and which, at present, usually
operates failure free beyond its age of obsolescence. Improved
television reliability is certainly due largely to advances in
circuit technology. However, the ability to evaluate the reli-
ability of new material systems and of new circuit designs
has also contributed to the gains we have experienced.
Perhaps the most well recognized system for which pre-
Downloaded by [Engineers Australia ] at 01:24 15 May 2014

ventive maintenance is used to maintain product reliability

is the commercial airplane. Regular inspection, testing, repair,
and even overhaul are part of the normal operating life of
every commercial aircraft. Clearly, the reason for such intense
concern for the regular maintenance of aircraft is an appre-
ciation of the influence of maintenance on failure probabilities
and thus on safety.
On a personal level, the products for which we are most
frequently responsible for maintenance are our automobiles.
We are all aware of the inconvenience associated with an in-
service failure of our cars and we are all aware of the rela-
tively modest level of effort required to obtain the reduced
failure probability that results from regular preventive
It would be difficult to overstate the importance of main-
tenance and especially preventive maintenance. It is also dif-
ficult to overstate the extent to which maintenance is
undervalued or even disliked. Historically, repair and espe-
cially preventive maintenance have often been viewed as
inconvenient overhead activities that are costly and unpro-
ductive. Very rarely have the significant productivity benefits
of preventive maintenance been recognized and appreciated.
Recently, there are reports [6,7,8] that suggest that it is com-
mon experience for factory equipment to lose 10 to 40% of
productive capacity to unscheduled repairs and that preven-
tive maintenance could drastically reduce these losses. In fact,
the potential productivity gains associated with the use of
preventive maintenance strategies to reduce the frequency of
unplanned failures constitute an important competitive Page 4 Monday, January 24, 2005 11:11 PM

4 Nachlas

opportunity [8]. The key to exploiting this opportunity is care-

ful planning based on cost and reliability.
This book is devoted to the analytical portrayal and eval-
uation of equipment reliability and maintenance. As with all
engineering disciplines, the language of description is math-
ematics. The text provides an exploration of the mathematical
models that are used to portray, estimate, and evaluate device
reliability and those that are used to describe, evaluate, and
plan equipment service activities. In both cases, the focus is
on design. The models of equipment reliability are the pri-
Downloaded by [Engineers Australia ] at 01:24 15 May 2014

mary vehicle for recognizing deficiencies or opportunities to

improve equipment designs. Similarly, using reliability as a
basis, the models that describe equipment performance as a
function of maintenance effort provide a means for selecting
the most efficient and effective equipment service strategies.
The examples of various failures mentioned above share
some common features, and they also have differences that
are used here to delimit the extent of the analyses and dis-
cussions. Common features are that (1) product failure is
sufficiently important that it warrants engineering effort to
try to understand and control it, and (2) product design is
complicated so the causes and consequences of failure are not
There are also some important differences among the
examples. Taking an extreme case, the failure of a light bulb
and the Three Mile Island reactor accident provide a defining
contrast. The accident at Three Mile Island was precipitated
by the failure of a physical component of the equipment. The
progress and severity of the accident were also influenced by
the response by humans to the component failure and by
established decision policies. In contrast, the failure of a light
bulb and its consequences are not usually intertwined with
human decisions and performance. The point here is that there
are very many modern products and systems for which opera-
tional performance depends upon the combined effectiveness of
several of the following: (1) the physical equipment, (2) human
operators, (3) software, and (4) management protocols.
It is both reasonable and prudent to attempt to include
the evaluation of all four of these factors in the study of Page 5 Monday, January 24, 2005 11:11 PM

Introduction 5

system behavior. However, the focus of this text is analytical,

and the discussions are limited to the behavior of the phys-
ical equipment.
Several authors have defined analytical approaches to
modeling the effects of humans [9] and of software [10] on
system reliability. The motivation for doing this is the view
that humans cause more system failures than does equip-
ment. This view seems quite correct. Nevertheless, implemen-
tation of the existing mathematical models of human and
software reliability requires the acceptance of the view that
Downloaded by [Engineers Australia ] at 01:24 15 May 2014

probability models appropriately represent dispersion in

human behavior. In the case of software, existing models are
based on the assumption that probability models effectively
represent hypothesized evolution in software performance
over time. The appropriateness of both of these points of view
is subject to debate. It is considered here that the human
operators of a system do not comprise a homogeneous popu-
lation for which performance is appropriately modeled using
a probability distribution. Similarly, software and operating
protocols do not evolve in a manner that one would model
using probability functions. As the focus of this text is the
definition of representative probability models and their anal-
ysis, the discussion is limited to the physical devices.
The space shuttle accidents serve to motivate our focus
on the physical behavior of equipment. The 1986 Challenger
accident has been attributed to the use of the vehicle in an
environment that was more extreme than the one for which
it was designed. The 2002 Discovery accident is believed to
have been the result of progressive deterioration at the site
of damage to its heat shield. Thus, the physical design of the
vehicles and the manner in which they were operated were
incompatible, and it is the understanding of this interface
that we obtain from reliability analysis.
The text is organized in four general sections. The early
chapters describe in a stepwise manner the increasingly com-
plete models of reliability and failure. These initial discus-
sions include the key result that our understanding of design
configurations usually implies that system reliability can usu-
ally be studied at the component level. This is followed by an Page 6 Monday, January 24, 2005 11:11 PM

6 Nachlas

examination of statistical methods for estimating reliability.

A third section is comprised of five chapters that treat increas-
ingly more complicated and more realistic models of equip-
ment maintenance activities. Finally, several advanced topics
are treated in the final chapter.
It is hoped that this sequence of discussions will provide
the reader with a basis for further exploration of the topics
treated. The development of new methods and models for
reliability and maintenance has expanded our understanding
significantly and is continuing. The importance of preventive
Downloaded by [Engineers Australia ] at 01:24 15 May 2014

maintenance for safety and industrial productivity is receiv-

ing increased attention. The literature that is comprised of
reports of new ideas is expanding rapidly. This book is
intended to prepare the reader to understand and use the
new ideas as well as those that are included here.
As a starting point, note that it often happens that tech-
nical terms are created using words that already have collo-
quial meanings that do not correspond perfectly with their
technical usage. This is true of the word reliability. In the
colloquial sense, the word reliable is used to describe people
who meet commitments. It is also used to describe equipment
and other inanimate objects that operate satisfactorily. The
concept is clear but not particularly precise. In contrast, for
the investigations we undertake in this text, the word reli-
ability has a precise technical definition. This definition is the
departure point for our study.