You are on page 1of 395

RELIABILITY

ENGINEERING
Probabilistic Models and
Maintenance Methods
Second Edition
RELIABILITY
ENGINEERING
Probabilistic Models and
Maintenance Methods
Second Edition

JOEL A. NACHLAS

Boca Raton London New York

CRC Press is an imprint of the


Taylor & Francis Group, an informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20161019

International Standard Book Number-13: 978-1-4987-5247-3 (Pack - Book and Ebook)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Dedicated to the memory of Betty Nachlas
Contents

Preface ................................................................................................................... xiii


Author .....................................................................................................................xv

1 Introduction ..................................................................................................... 1

2 System Structures ...........................................................................................5


2.1 Status Functions .................................................................................... 5
2.2 System Structures and Status Functions ...........................................7
2.2.1 Series Systems ..........................................................................7
2.2.2 Parallel System ......................................................................... 8
2.2.3 k-out-of-n Systems .................................................................. 10
2.2.4 Equivalent Structures ............................................................ 12
2.3 Modules of Systems ............................................................................ 17
2.4 Multistate Components and Systems ............................................... 18
Exercises .......................................................................................................... 19

3 Reliability of System Structures ............................................................... 23


3.1 Probability Elements........................................................................... 23
3.2 Reliability of System Structures ........................................................ 24
3.2.1 Series Systems ........................................................................ 24
3.2.2 Parallel Systems...................................................................... 25
3.2.3 k-out-of-n Systems .................................................................. 25
3.2.4 Equivalent Structures ............................................................ 26
3.3 Modules ................................................................................................ 31
3.4 Reliability Importance ........................................................................ 32
3.5 Reliability Allocation .......................................................................... 35
3.6 Conclusion............................................................................................ 36
Exercises .......................................................................................................... 37

4 Reliability over Time ................................................................................... 39


4.1 Reliability Measures ........................................................................... 39
4.2 Life Distributions ................................................................................44
4.2.1 Exponential Distribution ...................................................... 45
4.2.2 Weibull Distribution.............................................................. 46
4.2.3 Normal Distribution .............................................................. 49
4.2.4 Lognormal Distribution ........................................................ 51
4.2.5 Gamma Distribution ............................................................. 52
4.2.6 Other Distributions ............................................................... 52
4.3 System-Level Models ..........................................................................54
Exercises .......................................................................................................... 58

vii
viii Contents

5 Failure Processes ........................................................................................... 61


5.1 Mechanical Failure Models ............................................................... 62
5.1.1 Stress–Strength Interference ................................................ 62
5.1.2 Shock and Cumulative Damage ..........................................64
5.2 Electronic Failure Models .................................................................. 71
5.2.1 Arrhenius Model .................................................................... 71
5.2.2 Eyring Model .......................................................................... 72
5.2.3 Power Law Model .................................................................. 72
5.2.4 Defect Model........................................................................... 72
5.3 Other Failure Models.......................................................................... 73
5.3.1 Diffusion Process Model....................................................... 73
5.3.2 Proportional Hazards ........................................................... 78
5.3.3 Competing Risks ....................................................................80
Exercises ..........................................................................................................83

6 Age Acceleration ...........................................................................................85


6.1 Age Acceleration for Electronic Devices .......................................... 87
6.2 Age Acceleration for Mechanical Devices ....................................... 89
6.3 Step Stress Strategies .......................................................................... 92
6.4 Concluding Comment ........................................................................ 93
Exercises .......................................................................................................... 93

7 Nonparametric Statistical Methods .......................................................... 95


7.1 Data Set Notation and Censoring ..................................................... 96
7.2 Estimates Based on Order Statistics ................................................. 98
7.3 Estimates and Confidence Intervals ................................................. 99
7.4 Kaplan–Meier Estimates .................................................................. 102
7.4.1 Continuous Monitoring of Test Unit Status ..................... 102
7.4.2 Periodic Monitoring of Test Unit Status ........................... 105
7.5 Tolerance Bounds .............................................................................. 107
7.6 TTT Transforms ................................................................................. 109
7.6.1 Theoretical Construction .................................................... 109
7.6.2 Application to Complete Data Sets.................................... 113
7.6.3 Application to Censored Data Sets.................................... 118
7.7 Nelson Cumulative Hazard Estimation Method ......................... 122
Exercises ........................................................................................................ 124

8 Parametric Statistical Methods ................................................................ 129


8.1 Graphical Methods ........................................................................... 129
8.2 Method of Moments ......................................................................... 135
8.2.1 Estimation Expressions ....................................................... 136
8.2.2 Confidence Intervals for the Estimates ............................. 139
8.3 Method of Maximum Likelihood ................................................... 143
8.4 Maximum Likelihood Method with Data Censoring ................. 159
Contents ix

8.5 Special Topics..................................................................................... 161


8.5.1 Method of Moments with Censored Data ........................ 161
8.5.2 Data Analysis under Step Stress Testing .......................... 164
Exercises ........................................................................................................ 167

9 Repairable Systems I: Renewal and Instantaneous Repair ............... 173


9.1 Renewal Processes ............................................................................ 174
9.2 Classification of Distributions and Bounds on Renewal
Measures ............................................................................................ 181
9.3 Residual Life Distribution ............................................................... 186
9.4 Conclusion.......................................................................................... 189
Exercises ........................................................................................................ 190

10 Repairable Systems II: Nonrenewal and Instantaneous Repair ....... 193


10.1 Minimal Repair Models ................................................................... 194
10.2 Imperfect Repair Models ................................................................. 200
10.3 Equivalent Age Models .................................................................... 203
10.3.1 Kijima Models ...................................................................... 203
10.3.2 Quasi-Renewal Process ....................................................... 210
10.4 Conclusion.......................................................................................... 214
Exercises ........................................................................................................ 214

11 Availability Analysis ................................................................................. 217


11.1 Availability Measures....................................................................... 220
11.2 Example Computations ....................................................................223
11.2.1 Exponential Case ................................................................. 223
11.2.2 Numerical Case ....................................................................225
11.3 System-Level Availability ................................................................ 227
11.4 Nonrenewal Cases ............................................................................ 232
11.4.1 Availability under Imperfect Repair ................................. 233
11.4.2 Availability Analysis for the Quasi-Renewal Model ...... 235
11.5 Markov Models ................................................................................. 239
Exercises ........................................................................................................ 245

12 Preventive Maintenance............................................................................ 247


12.1 Replacement Policies......................................................................... 248
12.1.1 Elementary Models .............................................................. 248
12.1.2 Availability Model for Age Replacement ......................... 253
12.1.3 Availability Model for Block Replacement....................... 255
12.1.4 Availability Model for Opportunistic Age
Replacement ................................................................... 257
12.1.4.1 Failure Model ........................................................ 262
12.1.4.2 Opportunistic Failure Replacement Policy ...... 265
x Contents

12.1.4.3 Partial Opportunistic Age Replacement


Policy ................................................................ 268
12.1.4.4 Full Opportunistic Age Replacement Policy .... 271
12.1.4.5 Analysis of the Opportunistic Replacement
Models.................................................................... 271
12.2 Nonrenewal Models ......................................................................... 274
12.2.1 Imperfect PM Models .......................................................... 275
12.2.2 Models Based on the Quasi-Renewal Process ................. 277
12.2.3 Models Based on the Kijima Model .................................. 281
12.3 Conclusion.......................................................................................... 283
Exercises ........................................................................................................284

13 Predictive Maintenance............................................................................. 287


13.1 System Deterioration ........................................................................ 288
13.2 Inspection Scheduling ...................................................................... 289
13.3 More Complete Policy Analysis ...................................................... 290
13.4 Models and Analysis Based on Continuous Process
Monitoring ......................................................................................... 294
13.4.1 Observable Degradation Processes ................................... 294
13.4.2 Unobservable Degradation Processes ............................... 297
13.4.2.1 Time Series Methods ........................................... 298
13.4.2.2 Conditional Probability Methods ......................300
13.5 Conclusion..........................................................................................304
Exercises ........................................................................................................305

14 Special Topics .............................................................................................. 307


14.1 Statistical Analysis of Repairable System Data............................. 307
14.1.1 Data from a Single System .................................................. 307
14.1.2 Data from Multiple Identical Systems .............................. 310
14.2 Warranties .......................................................................................... 314
14.2.1 Full Replacement Warranties ............................................. 315
14.2.2 Pro Rata Warranties............................................................. 317
14.3 Reliability Growth ............................................................................ 319
14.4 Dependent Components .................................................................. 323
14.5 Bivariate Reliability........................................................................... 325
14.5.1 Collapsible Models............................................................... 326
14.5.2 Bivariate Models................................................................... 327
14.5.2.1 Stochastic Functions ............................................ 327
14.5.2.2 Correlation Models .............................................. 330
14.5.2.3 Probability Analysis............................................. 331
14.5.2.4 Failure and Renewal Models .............................. 335
Exercises ........................................................................................................ 341
Contents xi

Appendix A: Numerical Approximations ....................................................343


Appendix B: Numerical Evaluation of the Weibull Renewal
Functions ......................................................................................................... 347
Appendix C: Laplace Transform for the Key Renewal Theorem ............. 353
Appendix D: Probability Tables ..................................................................... 355
References ........................................................................................................... 359
Index ..................................................................................................................... 365
Preface

The motivation for the preparation of a second edition was my wish to expand
the treatment of several topics while maintaining an integrated introductory
resource for the study of reliability evaluation and maintenance planning.
The focus across all of the topics treated is the use of analytical methods to
support the design of dependable and efficient equipment and the planning
for the servicing of that equipment. The orientation of the topical develop-
ment is that probability models provide an effective vehicle for portraying
and evaluating the variability that is inherent in the performance and lon-
gevity of equipment.
The book is intended to support either an introductory graduate course
in reliability theory and preventive maintenance planning or a sequence of
courses that address these topics. A fairly comprehensive coverage of the
basic models and of various methods of analysis is provided. An under-
standing of the topics discussed should permit the reader to comprehend
the literature describing new and advanced models and methods.
Notwithstanding the emphasis upon initial study, the text should also
serve well as a resource for practicing engineers. Engineers who are involved
in the design process should find a coherent explanation of the reliability
and maintenance issues that will influence the success of the devices they
create. Similarly, engineers responsible for the analysis and verification of
product reliability or for the planning of maintenance support of fielded
equipment should find the material presented here to be relevant and easy
to access and use.
In preparing this second edition, the treatment of statistical methods for
evaluating reliability has been expanded substantially. Several methods for
constructing confidence intervals as part of the parametric estimation effort
are described and methods for treating data derived from operating repair-
able devices have also been added. In addition, the analysis of nonstation-
ary models of repairable equipment maintenance has been updated and
expanded. These expansions along with numerous other minor improve-
ments to the text should make this book an even more useful resource for
both students and practitioners.
The background required of the reader is a sound understanding of prob-
ability. This subsumes capability with calculus. More specifically, the reader
should have an understanding of distribution theory, Laplace transforms,
convolutions, stochastic processes, and Markov processes. It is also worth
mentioning that the use of the methods discussed in this book often involves
substantial computational effort, so facility with numerical methods and
access to efficient mathematical software is desirable.

xiii
xiv Preface

One caveat concerning the coverage here is that the treatment is strictly
limited to hardware. Reliability and maintenance models have been devel-
oped for applications to software, humans, and services systems. No criti-
cism of those efforts is intended but the focus here is simply hardware.
The organization of the text is reasonably straightforward. The elemen-
tary concepts of reliability theory are presented sequentially in Chapters 1
through 6. Following this, the commonly used statistical methods for eval-
uating component reliability are described in Chapters 7 and 8. Chapters
9 through 13 treat repairable systems and maintenance planning models.
Here again the presentation is sequential in that simple failure models pre-
cede those that include preventive actions and the renewal cases are treated
before the more realistic nonrenewal cases. In the final chapter, four inter-
esting special topics, including warranties, are discussed. It is worth noting
that four appendices that address aspects of numerical computation are pro-
vided. These should be quite useful to the reader.
Naturally, many people have contributed to the preparation of this text.
The principal factor in the completion of this book was the support and
encouragement of my wife Beverley. An important practical component of
my success was the support of Virginia Tech, especially during sabbaticals
when progress with writing is so much easier.
I acknowledge the significant computational capability provided to me by
the Mathematica software. Many of the analyses included in this text would
have been much more taxing or even impossible without the strength and
efficiency the Wolfram software provides.
I also wish to extend my thanks directly to three of my students, each of
whom contributed to my efforts. Edvin Beqari stimulated my increased inter-
est in and analysis of the diffusion models of degradation. He also directed
much of my analysis of that topic. Elliott Mitchell-Colgan helped to expand
the sets of exercises included at the end of the chapters. Paul D’Agostino
invested very many hours in verifying a majority of the complicated numeri-
cal analyses used for examples or for exercise solutions.
I express my profound gratitude to all of my graduate students who have
taught me so much about these topics over the years. May we all continue to
learn and grow and to enjoy the study of this important subject.
Author

Joel A. Nachlas received his BES from Johns Hopkins University in 1970, his
MS in 1972 and his PhD in 1976, both from the University of Pittsburgh. He
served on the faculty of the Grado Department of Industrial and Systems
Engineering at Virginia Tech for 41  years and retired in March 2016. His
research interests are in the applications of probability and statistics to prob-
lems in reliability and quality control. In addition to his normal teaching
activities during his time at Virginia Tech, he served as the coordinator for
the department’s graduate program in operations research and for their
dual master’s degree that is operated with École des Mines de Nantes in
France. From 1992 to 2011, he regularly taught reliability theory at the École
Polytechnique de l’Université Nice Sophia Antipolis. He is the coauthor of
more than 50 refereed articles, has served in numerous editorial and referee
capacities, and has lectured on reliability and maintenance topics through-
out North America and Europe.

xv
1
Introduction

Although we rarely think of it, reliability and maintenance are part of our
everyday lives. The equipment, manufactured products, and fabricated
infrastructure that contribute substantively to the quality of our lives have
finite longevity. Most of us recognize this fact, but we do not always fully
perceive the implications of finite system life for our efficiency and safety.
Many, but not all, of us also appreciate the fact that our automobiles require
regular service, but we do not generally think about the fact that roads and
bridges, smoke alarms, electricity generation and transmission devices, and
many other machines and facilities we use also require regular maintenance.
We are fortunate to live at a time in which advances in the understanding
of materials and energy have resulted in the creation of an enormous variety
of sophisticated products and systems, many of which (1) were inconceiv-
able 100 or 200 or even 20 years ago; (2) contribute regularly to our comfort,
health, happiness, efficiency, or success; (3) are relatively inexpensive; and
(4) require little or no special training on our part. Naturally, our reliance
on these devices and systems is continually increasing and we rarely think
about failure and the consequences of failure.
Occasionally, we observe a catastrophic failure. Fatigue failures of the fuse-
lage of aircraft [1], the loss of an engine by a commercial jet [1], the Three Mile
Island [1] and Chernobyl [1] nuclear reactor accidents, and the Challenger [2]
and Discovery [3] space shuttle accidents are all widely known examples of
catastrophic equipment failures. The relay circuit failure at the Ohio power
plant that precipitated the August 2003 power blackout in the northeastern
United States and in eastern Canada [4] is an example of a system failure
that directly affected millions of people. When these events occur, we are
reminded dramatically of the fallibility of the physical systems on which
we depend.
Nearly everyone has experienced less dramatic product failures such as
that of a home appliance, the wear out of a battery, and the failure of a light
bulb. Many of us have also experienced potentially dangerous examples of
product failures such as the blowout of an automobile tire.
Reliability engineering is the study of the longevity and failure of equip-
ment. Principles of science and mathematics are applied to the investiga-
tion of how devices age and fail. The intent is that a better understanding
of device failure will aid in identifying ways in which product designs can
be improved to increase life length and limit the adverse consequences of
failure. The key point here is that the focus is upon design. New product and

1
2 Reliability Engineering: Probabilistic Models and Maintenance Methods

system designs must be shown to be safe and reliable prior to their fabrica-
tion and use. A dramatic example of a design for which the reliability was
not properly evaluated is the well-known case of the Tacoma Narrows Bridge,
which collapsed into the Puget Sound in November 1940, a few months after
its completion [1].
A more recent example of a design fault with significant consequences is
the 2013 lithium-ion battery fire that occurred on a new Boeing 787 aircraft
while it was parked at the Boston airport [5]. Fortunately, the plane was
empty, so no one was injured, but the fire and two subsequent fires of the
same type resulted in all 787s being grounded until a modification to the
battery containment was made. The cost to the airlines using the planes was
estimated to be $1.1 million per day.
The study of the reliability of an equipment design also has important eco-
nomic implications for most products. As Blanchard [6] states, 90% of the life
cycle costs associated with the use of a product are fixed during the design
phase of a product’s life.
Similarly, an ability to anticipate failure can often imply the opportunity
to plan for an efficient repair of equipment when it fails or even better to per-
form preventive maintenance in order to reduce failure frequency.
There are many examples of products for which system reliability is far
better today than it was previously. One familiar example is the television
set, which historically experienced frequent failures and which, at present,
usually operates without failure beyond its age of obsolescence. Improved
television reliability is certainly due largely to advances in circuit technol-
ogy. However, the ability to evaluate the reliability of new material sys-
tems and new circuit designs has also contributed to the gains we have
experienced.
Perhaps the most well-recognized system for which preventive mainte-
nance is used to maintain product reliability is the commercial airplane.
Regular inspection, testing, repair, and even overhaul are part of the nor-
mal operating life of every commercial aircraft. Clearly, the reason for such
intense concern for the regular maintenance of aircraft is an appreciation of
the influence of maintenance on failure probabilities and thus on safety.
On a personal level, the products for which we are most frequently respon-
sible for maintenance are our automobiles. We are all aware of the inconve-
nience associated with an in-service failure of our cars and we are all aware
of the relatively modest level of effort required to obtain the reduced failure
probability that results from regular preventive maintenance.
It would be difficult to overstate the importance of maintenance and espe-
cially preventive maintenance. It is also difficult to overstate the extent to
which maintenance is undervalued or even disliked. Historically, repair and
especially preventive maintenance have often been viewed as inconvenient
overhead activities that are costly and unproductive. Very rarely have the
significant productivity benefits of preventive maintenance been recognized
and appreciated. Recently, there have been reports [7–9] that suggest that
Introduction 3

it is common experience for factory equipment to lose 10%–40% of pro-


ductive capacity to unscheduled repairs and that preventive maintenance
could drastically reduce these losses. In fact, the potential productivity gains
associated with the use of preventive maintenance strategies to reduce the
frequency of unplanned failures constitute an important competitive oppor-
tunity [9]. The key to exploiting this opportunity is careful planning based
on cost and reliability.
This book is devoted to the analytical portrayal and evaluation of equip-
ment reliability and maintenance. As with all engineering disciplines, the
language of description is mathematics. The text provides an exploration
of the mathematical models that are used to portray, estimate, and evalu-
ate device reliability and those that are used to describe, evaluate, and plan
equipment service activities. In both cases, the focus is on design. The models
of equipment reliability are the primary vehicle for recognizing deficiencies
or opportunities to improve equipment design. Similarly, using reliability
as a basis, the models that describe equipment performance as a function
of maintenance effort provide a means for selecting the most efficient and
effective equipment service strategies.
These examples of various failures share some common features and they
also have differences that are used here to delimit the extent of the analyses
and discussions. Common features are that (1) product failure is sufficiently
important that it warrants engineering effort to try to understand and con-
trol it and (2) product design is complicated, so the causes and consequences
of failure are not obvious.
There are also some important differences among the examples. Taking
an extreme case, the failure of a light bulb and the Three Mile Island reac-
tor accident provide a defining contrast. The Three Mile Island accident was
precipitated by the failure of a physical component of the equipment. The
progress and severity of the accident were also influenced by the response
by humans to the component failure and by established decision policies.
In contrast, the failure of a light bulb and its consequences are not usually
intertwined with human decisions and performance. The point here is that
there are very many modern products and systems for which operational
performance depends upon the combined effectiveness of several of (1) the
physical equipment, (2) human operators, (3) software, and (4) management
protocols.
It is both reasonable and prudent to attempt to include the evaluation of all
four of these factors in the study of system behavior. However, the focus of
this text is analytical and the discussions are limited to the behavior of the
physical equipment.
Several authors have defined analytical approaches to modeling the effects
of humans [10] and of software [11] on system reliability. The motivation for
doing this is the view that humans cause more system failures than does
equipment. This view seems quite correct. Nevertheless, implementation of
the existing mathematical models of human and software reliability requires
4 Reliability Engineering: Probabilistic Models and Maintenance Methods

the acceptance of the view that probability models appropriately represent


dispersion in human behavior. In the case of software, existing models
are based on the assumption that probability models effectively represent
hypothesized evolution in software performance over time. The appropri-
ateness of both of these points of view is subject to debate. It is considered
here that the human operators of a system do not comprise a homogeneous
population for which performance is appropriately modeled using a proba-
bility distribution. Similarly, software and operating protocols do not evolve
in a manner that one would model using probability functions. As the focus
of this text is the definition of representative probability models and their
analysis, the discussion is limited to the physical devices.
The space shuttle accidents serve to motivate our focus on the physical
behavior of equipment. The 1986 Challenger accident has been attributed to
the use of the vehicle in an environment that was more extreme than the one
for which it was designed. The 2002 Discovery accident is believed to have
been the result of progressive deterioration at the site of damage to its heat
shield. Thus, the physical design of the vehicles and the manner in which
they were operated were incompatible and it is the understanding of this
interface that we obtain from reliability analysis.
The text is organized in four general sections. The early chapters describe
in a stepwise manner the increasingly complete models of reliability and
failure. These initial discussions include the key result that our understand-
ing of design configurations usually implies that system reliability can be
studied at the component level. This is followed by an examination of statis-
tical methods for estimating reliability. A third section is comprised of five
chapters that treat increasingly more complicated and more realistic models
of equipment maintenance activities. Finally, several advanced topics are
treated in the final chapter.
It is hoped that this sequence of discussions will provide the reader with
a basis for further exploration of the topics treated. The development of new
methods and models for reliability and maintenance has expanded our
understanding significantly and is continuing. The importance of preventive
maintenance for safety and industrial productivity is receiving increased
attention. The literature that is comprised of reports of new ideas is expand-
ing rapidly. This book is intended to prepare the reader to understand and
use the new ideas as well as those that are included here.
As a starting point, note that it often happens that technical terms are
created using words that already have colloquial meanings that do not cor-
respond perfectly with their technical usage. This is true of the word reli-
ability. In the colloquial sense, the word reliable is used to describe people
who meet commitments. It is also used to describe equipment and other
inanimate objects that operate satisfactorily. The concept is clear but not par-
ticularly precise. In contrast, for the investigations we undertake in this text,
the word reliability has a precise technical definition. This definition is the
departure point for our study.
2
System Structures

The point of departure for the study of reliability and maintenance planning
is the elementary definition of the term reliability. As mentioned in Chapter
1, the technical definition of reliability is similar to the colloquial definition
but is more precise. Formally, the definition is as follows:

Definition 2.1

Reliability is the probability that a device properly performs its intended


function over time when operated within the environment for which it is
designed.

Observe that there are four specific attributes of this definition of reliability.
The four attributes are (1) probability, (2) proper performance, (3) qualifica-
tion with respect to environment, and (4) time. All four are important. Over
this and the next several chapters, we explore a series of algebraic models
that are used to represent equipment reliability. We develop the models suc-
cessively by sequentially including in the models each of the four attributes
identified in the previous definition. To start, consider the representation of
equipment performance to which we refer as function.

2.1 Status Functions


The question of what constitutes proper operation or proper function for a
particular type of equipment is usually specific to the equipment. Rather
than attempt to suggest a general definition for proper function, we assume
that the appropriate definition for a device of interest has been specified and
we represent the functional status of the device as

ì1 if the device functions properly


f=í
î0 if the device is failed

5
6 Reliability Engineering: Probabilistic Models and Maintenance Methods

Note that this representation is intentionally binary. We assume here that the
status of the equipment of interest is either satisfactory or failed. There are
many types of equipment for which one or more derated states are possible.
Discussion of this possibility is postponed until the end of this chapter.
We presume that most equipment is comprised of components and that
the status of the device is determined by the status of the components.
Accordingly, let n be the number of components that make up the device and
define the component status variables, xi, as

ì1 if component i is functioning
xi = í
î0 if component i is failed

so the set of n components that comprise a device is represented by the com-


ponent status vector

x = {x1 , x2 , ⋯ , xn }

Next, we represent the dependence of the device status on the component


status as the function

f = f( x ) (2.1)

and the specific form for the function is determined by the way in which the
components interact to determine system function. In the discussions that
follow, ϕ(x) is referred to as a “system structure function” or as a “system sta-
tus function” or simply as a “structure.” In all cases, the intent is to reflect the
dependence of the system state upon the states of the components that com-
prise the system. A parenthetical point is that the terms “device” and “sys-
tem” are used here in a generic sense and may be interpreted as appropriate.
An observation concerning the component status vector is that it is defined
here as a vector of binary elements so that an n component system has 2n pos-
sible component status vectors. For example, a three-component system has
23 = 8 component status vectors. They are

{1, 1, 1} {1, 0, 1}

{1, 1, 0} {1, 0, 0}

{0, 1, 1} {0, 0, 1}

{0, 1, 0} {0, 0, 0}

Each component status vector yields a corresponding value for the system
status function, ϕ.
System Structures 7

From a purely mathematical point of view, there is no reason to limit the


definition of the system status function so forms that have no practical inter-
pretation can be constructed. In order to avoid any mathematically correct
but practically meaningless forms for the system status function, we limit
our attention to coherent systems.

Definition 2.2

A coherent system is one for which the system structure function is nonde-
creasing in each of its arguments.

This means that for each element of the component status vector, xi, there
exists a realization of the vector for which
f( x1 , … , xi -1 , 0, xi +1 , … , xn ) < f( x1 , … , xi -1 , 1, xi +1 , … , xn ) (2.2)

Throughout our study of reliability, we will limit our attention to algebraic


forms that comply with this restriction.
Generally, we expect that the physical relationships among the compo-
nents determine the algebraic form of the system status function, ϕ.

2.2 System Structures and Status Functions


Among reliability specialists, it is generally accepted that there are four
generic types of structural relationships between a device and its compo-
nents. These are (1) series, (2) parallel, (3) k out of n, and (4) all others. Consider
each of these forms in sequence.

2.2.1 Series Systems


The simplest and most commonly encountered configuration of components
is the series system. The formal definition of a series system is as follows:

Definition 2.3

A series system is one in which all components must function properly in


order for the system to function properly.

The conceptual analog to the series structure is a series-type electrical


circuit. However, unlike a series circuit, it is specifically not implied here that
the components must be physically connected in sequence. Instead, the point
of emphasis is the requirement that all components function. An example of
a series system in which the components are not physically connected is the
8 Reliability Engineering: Probabilistic Models and Maintenance Methods

1 2 3

FIGURE 2.1
Reliability block diagram for a series system.

set of legs of a three-legged stool. Another is the set of tires on an automobile.


In both examples, the components are not physically connected to each other
in a linear configuration. Nevertheless, all of the components must function
properly for the system to operate.
The concept of a series circuit is commonly used to define a graphical rep-
resentation of a series structure. For the three components, this is presented
in Figure 2.1.
In general, representations of system structures such as the one in Figure 2.1
are referred to as reliability block diagrams. They are often helpful in under-
standing the relationships between components.
For the series structure, the requirement that all components must func-
tion in order for the system to function implies that a logical algebraic form
for the system structure function is
f( x ) = min {xi } (2.3)
i

but an equivalent and more useful form is


n

f( x ) = Õx
i =1
i (2.4)

As examples, consider a three-component series system and the cases

x1 = x2 = 1, x3 = 0, and f( x ) = 0
x1 = 1, x2 = x3 = 0, and f( x ) = 0
x1 = x2 = x3 = 1 and f( x ) = 1

Only the functioning of all components yields system function.

2.2.2 Parallel System


The second type of structure is the parallel structure. The conceptual analog
is again the corresponding electrical circuit and the definition is as follows:

Definition 2.4

A parallel system is one in which the proper function of any one component
implies system function.
System Structures 9

It is again emphasized that no specific physical connections among the


components are implied by the definition or by the reliability block dia-
gram. The reliability block diagram for a three-component parallel system
is shown in Figure 2.2.
One example of a parallel system is the set of two engines on a two-
engine airplane. As long as at least one engine functions, flight is sustained.
However, this example implies that simply maintaining flight corresponds
to proper function. It is a worthwhile debate to discuss when it is and when
it is not an appropriate example of a parallel system.
Another example that is more appealing is the fact that the commu-
nications satellites presently in use have triple redundancy for each
communications channel. That is, three copies of each set of transmit-
ting components are installed in the satellite and arranged in parallel in
order to assure continued operation of the channel. In view of the fact
that this implies significant weight increases over the use of only single-
configuration transmitters, the satellite provides an example of the impor-
tance of reliability as well as one of a parallel structure.
In a similar manner to that for the series system, the structure function for
the parallel system may be defined as
f( x ) = max {xi } (2.5)
i

An alternate form that is more amenable to analytical manipulation can be


defined using a shorthand developed by Barlow and Proschan [12]:
n

f( x ) = ∐x
i =1
i (2.6)

FIGURE 2.2
Reliability block diagram for a three-component parallel system.
10 Reliability Engineering: Probabilistic Models and Maintenance Methods

The inverted product symbol, Π, is called “ip” and is defined as

n n

∐i =1
xi = 1 - Õ (1 - x )
i =1
i (2.7)

Once mastered, this shorthand is very convenient. Example cases for the
three-component parallel system are

x1 = x2 = 1, x3 = 0, and f( x ) = 1
x1 = 1, x2 = x3 = 0, and f( x ) = 1
x1 = x2 = x3 = 0, and f( x ) = 0

Conceptually, a parallel system is failed only when all system components


are failed.
Before leaving the discussion of parallel structures, it is appropriate
to mention the fact that the parallel arrangement of components is often
referred to as “redundancy.” This is because the proper function of any
of the parallel components implies proper function of the structure.
Thus, the additional components are redundant until a component fails.
Frequently, parallel structures are included in product designs specifi-
cally because of the resulting redundancy. Often but not always, the par-
allel components are identical. At the same time, there are actually several
ways in which the redundancy may be implemented. A distinction is
made between redundancy obtained using a parallel structure in which
all components function simultaneously and that obtained using parallel
components of which one functions and the other(s) wait as standby units
until the failure of the functioning unit. Models that describe the reliabil-
ity of active redundancy and of standby redundancy are presented at the
end of Chapter 4.

2.2.3 k-out-of-n Systems


The third type of structure is the k-out-of-n structure. There is no obvious
conceptual analog for this structure. A formal definition of it is as follows:

Definition 2.5

A k-out-of-n system is one in which the proper function of any k of the n com-
ponents that comprise the system implies proper system function.
System Structures 11

The usual approach to constructing the reliability block diagram for the
k-out-of-n system is to show a parallel diagram and to provide an additional
indication that the system is k out of n.
An example of a k-out-of-n system is the rear axle of a large tractor trailer
on which the functioning of any three out of the four wheels is sufficient to
assure mobility. Another example is the fact that some (1 – k) electronic mem-
ory arrays are configured so that the operation of any 126 of the 128 memory
addresses corresponds to satisfactory operation.
The algebraic representation of the structure function for a k-out-of-n sys-
tem is not as compact as those for series and parallel systems. Given the defi-
nition of the relationship between component and system status, the most
compact algebraic form for the structure function is

ì n

ï1
f( x ) = í
if åx ³ k
i =1
i
(2.8)
ï
î0 otherwise

Example cases for a 3-out-4 system are

x1 = x2 = x3 = 1, x4 = 0, and f( x ) = 1
x1 = x2 = 1, x3 = x4 = 0, and f( x ) = 0
x1 = x2 = x3 = 0, x4 = 1, and f( x ) = 0

Note that a series system may be considered as an n-out-of-n system and a


parallel system may be viewed as a 1-out-of-n system. Thus, the k-out-of-n
form provides a generalization that is sometimes useful in analyzing system
performance in a generic context.
Before leaving the discussion of k-out-of-n systems, it is interesting to note
that a special case of this type of system is the consecutive k-out-of-n system of
which there are two types. The consecutive k-out-of-n:F system is comprised of
n-aligned independent components that fail when k or more consecutive com-
ponents fail. The k-out-of-n:G system functions when k or more consecutive
components function. These two realizations of the consecutive system have
been shown to be mirror images so it is sufficient to analyze either of them.
Examples of the k-out-of-n:F system have been provided by Chiang and
Niu [13]. One of those examples is an oil pipeline in which each of the equally
spaced pumps is capable of pushing the oil to a distance equal to k pumping
points so a failed pump does not fail the system as the next pump will main-
tain flow. If k consecutive pumps fail, the system fails.
As indicated earlier, the fourth class of component configurations is the set
of all others that are conceivable. This statement is not intended to be mis-
leading. Instead, it is intended to imply that we can establish an equivalence
12 Reliability Engineering: Probabilistic Models and Maintenance Methods

between any arbitrary component configuration and one based on series


and parallel structures. The process of constructing equivalent structures is
explained in the next section and is illustrated with a classic example.

2.2.4 Equivalent Structures


The selection of a component configuration is usually made by the device
designer in order to assure a specific functional capability. The configuration
selected may not match one of the classes discussed earlier. In such a case, there
are two ways by which we can obtain equivalent structural forms that may be
easier to analyze than the actual one. The two ways are to use either minimum
path or minimum cut analyses of the network representation of the system.
As a vehicle for illustrating the two methods, we use the Wheatstone
bridge. The reliability block diagram for the bridge is shown in Figure 2.3.
Notice that the bridge structure is not series, parallel, or k out of n. Thus,
the earlier algebraic representations cannot be used directly to provide a
statement of the system status function.
We can obtain a system status function for the bridge in several ways. One
obvious approach is to enumerate all of the component status vectors, to
determine the system status for each vector and to construct a table of system
status values. For the five-component bridge structure, this is readily done
and the result is shown in Table 2.1.
On the other hand, systems having a greater number of components can-
not be handled so easily.
The use of minimum paths will permit us to analyze the bridge structure
and other larger systems as well. Start with some definitions:

Definition 2.6

A path vector, x, is a component status vector for which the corresponding


system status function has a value of 1.

1 4

2 5

FIGURE 2.3
Reliability block diagram for a Wheatstone bridge.
System Structures 13

TABLE 2.1
System Status Values for the Bridge Structure
x ϕ(x) x ϕ(x)
{1, 1, 1, 1, 1} 1 {0, 1, 1, 1, 1} 1
{1, 1, 1, 1, 0} 1 {0, 1, 1, 1, 0} 1
{1, 1, 1, 0, 1} 1 {0, 1, 1, 0, 1} 1
{1, 1, 1, 0, 0} 0 {0, 1, 1, 0, 0} 0
{1, 1, 0, 1, 1} 1 {0, 1, 0, 1, 1} 1
{1, 1, 0, 1, 0} 1 {0, 1, 0, 1, 0} 0
{1, 1, 0, 0, 1} 1 {0, 1, 0, 0, 1} 1
{1, 1, 0, 0, 0} 0 {0, 1, 0, 0, 0} 0
{1, 0, 1, 1, 1} 1 {0, 0, 1, 1, 1} 0
{1, 0, 1, 1, 0} 1 {0, 0, 1, 1, 0} 0
{1, 0, 1, 0, 1} 1 {0, 0, 1, 0, 1} 0
{1, 0, 1, 0, 0} 0 {0, 0, 1, 0, 0} 0
{1, 0, 0, 1, 1} 1 {0, 0, 0, 1, 1} 0
{1, 0, 0, 1, 0} 1 {0, 0, 0, 1, 0} 0
{1, 0, 0, 0, 1} 0 {0, 0, 0, 0, 1} 0
{1, 0, 0, 0, 0} 0 {0, 0, 0, 0, 0} 0

Definition 2.7

A minimum path vector, x, is a path vector for which any vector y < x has a cor-
responding system status function with a value of 0.

Definition 2.8

A minimum path set, Pj, is the set of indices of a minimum path vector for
which the component status variable has a value of 1.

These definitions identify the component status vectors that correspond to


system function and those vectors that are minimal in the sense that any
reduction in the number of functioning components implies system failure.
For the bridge structure, the minimum path vectors and minimum path
sets are

{1, 0, 0, 1, 0} P1 = {1, 4}

{0, 1, 0, 0, 1} P2 = {2, 5}

{1, 0, 1, 0, 1} P3 = {1, 3, 5}

{0, 1, 1, 1, 0} P4 = {2, 3, 4}
14 Reliability Engineering: Probabilistic Models and Maintenance Methods

Next, consider the elements of a minimum path and define a status func-
tion for each minimum path. That is, represent the functional status of each
path using the functions ρ(x). Since all of the components in a minimum
path must function in order for the path to represent proper function, the
components in a minimum path may be viewed as a series system. Hence,
in general,

r j ( x) = Õx
iÎPj
i (2.9)

and for the example bridge structure,

r1( x ) = Õx = x x
iÎP1
i 1 4

r2 ( x ) = Õx = x x
iÎP2
i 2 5

r3 ( x ) = Õx = x x x
iÎP3
i 1 3 5

r4 ( x) = Õx = x x x
iÎP4
i 2 3 4

Now, observe that the original system will function if any of the mini-
mum paths is functioning. Therefore, we may view the system as a parallel
arrangement of the minimum paths. Algebraically, this means

f( x ) = ∐ r ( x ) = ∐Õ x
j
j
j iÎPj
i (2.10)

For the bridge structure, this expression expands to

f( x ) = 1 - (1 - x1x4 )(1 - x2 x5 )(1 - x1x3 x5 )(1 - x2 x3 x4 )

The most important point here is that for any component status vector,
Equation 2.10 will always give the same system status value as in Table 2.1.
That is, the parallel arrangement of the minimum paths of a system with the
components of the respective minimum paths arranged in series constitutes
a system that is equivalent to the original system. The graphical realization
of this equivalence for the bridge structure is presented in Figure 2.4.
It is appropriate to emphasize here the fact that the equivalent struc-
ture has exactly the same status function value as the original structure
System Structures 15

1 4

2 5

1 3 5

2 3 4

FIGURE 2.4
Minimum path equivalent structure for the Wheatstone bridge.

for all realizations of the component status vector. Consequently, the


minimum path analysis permits us to identify a form for the system sta-
tus function that can be computed using only series and parallel alge-
braic forms.
There is a comparable construction using the idea of cut vectors rather
than path vectors. The method based on cut vectors may also be used for the
bridge and other structures. Again, start with some definitions:

Definition 2.9

A cut vector, x, is a component status vector for which the corresponding sys-
tem status function has a value of 0.

Definition 2.10

A minimum cut vector, x, is a cut vector for which any vector y > x has a cor-
responding system status function with a value of 1.

Definition 2.11

A minimum cut set, Ck, is the set of indices of a minimum cut vector for which
the component status variable has a value of 0.

These definitions identify the component status vectors that correspond


to system failure and those vectors that are minimal in the sense that any
increase in the number of functioning components implies satisfactory
16 Reliability Engineering: Probabilistic Models and Maintenance Methods

system function. For the bridge structure, the minimum cut vectors and
minimum cut sets are

{0, 0, 1, 1, 1} C1 = {1, 2}

{1, 1, 1, 0, 0} C2 = {4, 5}

{0, 1, 0, 1, 0} C3 = {1, 3, 5}

{1, 0, 0, 0, 1} C4 = {2, 3, 4}

Based on the definition of the minimum cuts, we see that the system will
function if any of the elements of the minimum cut function. Hence, we may
define a structure function for the minimum cuts as

kk ( x) = ∐x
iÎCk
i (2.11)

in general and for the specific case of the bridge structure, we have

k1 ( x ) = ∐ x = 1 - (1 - x )(1 - x )
iÎC1
i 1 2

k2 ( x) = ∐ x = 1 - (1 - x )(1 - x )
iÎC2
i 4 5

k3 ( x) = ∐ x = 1 - (1 - x )(1 - x )(1 - x )
iÎC3
i 1 3 5

k4 ( x) = ∐ x = 1 - (1 - x )(1 - x )(1 - x )
iÎC4
i 2 3 4

We observe further that the system will function only if all of the minimum
cuts are inactive—if all are functioning. If any minimum cut is active, the
system is failed, so the minimum cuts act as a series system with respect to
system operation. Therefore, the minimum cut equivalent structure has the
min cuts arranged in series with the elements of each cut in parallel. This is
illustrated in Figure 2.5.
Here again, it is appropriate to emphasize the fact that the equivalent struc-
ture and the original structure have the same status function value for each
component status vector. Thus, the system status may be calculated using
only the simple series and parallel forms.
One further observation concerning the equivalent structures is that one
may use either the minimum cut or the minimum path method. Both yield
equivalent expressions for the system status so we may use the one that
appears easier or preferable for some other reason.
System Structures 17

1 2

1 4

3 3

2 5

5 4

FIGURE 2.5
Minimum cut equivalent structure for the Wheatstone bridge.

2.3 Modules of Systems


Most modern equipment is fairly complicated and is comprised of relatively
many components. For example, depending upon how one counts them, one
might say that there are about 100 components in a television, 300 components
in a personal computer, and 600 components in a medium-size automobile. Even
with minimum paths or minimum cuts, evaluating the status function for sys-
tems of this scale is too difficult. However, it is also common to find that a com-
plicated system such as a television is actually comprised of subsystems called
modules. A television usually has a power management module, a video signal
reception module, an audio signal reception module, a sound production mod-
ule, a video projection module, and some sort of system control module. Thus,
the system may be viewed as comprised of “super components” called modules.
System status may be defined as a function of the status modules and the status
of each module should be a function of the components that comprise it.
To formalize this idea algebraically, assume that the n components that
comprise a system can be partitioned into m sets of components correspond-
ing to m modules. Let ψl(x) represent the module status function for module
l where l = 1, …, m. Then,
y( x ) = {y1( x ), y 2 ( x ), … , y m ( x )}

is a vector of binary module status values and the system status is defined as

f( x ) = f(y( x )) (2.12)

Naturally, the module status functions may be evaluated using minimum


paths or cuts and the system status may be analyzed by treating the modules
as components and applying the minimum path or minimum cut methods
as appropriate.
18 Reliability Engineering: Probabilistic Models and Maintenance Methods

Of course, the partition of the set of components may be performed at sev-


eral levels if that is appropriate. The principle remains the same. Any mean-
ingful decomposition of the system components may be used to simplify the
representation and analysis of the system status.
A final point here is that systems generally have only one of each module
and usually require that all modules function properly in order for the sys-
tem to operate satisfactorily. Thus, while it is not always the case, the mod-
ules often comprise a series structure. Consequently, the analysis of system
behavior in terms of modules can be quite efficient.

2.4 Multistate Components and Systems


The treatment of component and system status throughout most of this text
is limited to the case in which system state and component states are binary.
Nevertheless, it should be recognized that for some equipment, multiple
states may be meaningful. It is reasonable to define derated or otherwise
incomplete levels of performance for some equipment. It may even be appro-
priate for some devices to define a continuous state variable on the interval
[0, 1]. Algebraic models for system state using multistate components and
for multistate systems with binary components have been studied. The key
references for this work are the papers by Natvig [14] and by Barlow and Wu
[15]. In each case, the key to the construction is the algebraic representation
of the effect of component state on system state.
Both Natvig and Barlow and Wu start by defining the state space for the
components as {0, 1, …, m} where xi = j represents the condition that component
i is in state j, the state 0 corresponds to component failure, and the state m cor-
responds to perfect functioning. The interpretation of the intermediate states
depends upon the physical characteristics of the specific component. Using
this basic format, there are two immediately apparent approaches to defining
the system state as a function of the states of the components of a system.
One approach is to define a binary system state by partitioning all of the
possible component status vectors into two subsets, one for which system
performance is acceptable and the other for the case that system-level per-
formance is unacceptable. Naturally, this cannot be done in general as the
selection of members of the two subsets depends upon the particular system.
An alternate, and more general, approach is to define the system state to
also be an element of the set {0, 1, …, m} and to define the value of the system
state using minimum path and minimum cut concepts. To do this, we first
specify that as in the case of the binary state space, the state of a series system
of multistate components is the minimum of the component state values.
Thus, Equation 2.3 applies:

f( x ) = min {xi } (2.13)


i
System Structures 19

In the same manner, the state of a parallel system is the maximum of the
component state values as stated in Equation 2.5:
(2.14)
f( x ) = max {xi }
i

Then for more general structures, we use the min paths or min cuts to define

ì
j î iÎPj
ü
þ k iÎCk
{
f( x ) = max ímin {xi }ý = min max {xi } } (2.15)

This general formulation may be tailored to nearly any application. For


example, one can restrict some of the components to only a subset of
the m  + 1 states. One may also incorporate the multistate status mea-
sure within a modular decomposition in whatever manner is meaningful.
Finally, this formulation has the appealing feature that it subsumes the
binary case.

Exercises
2.1 Construct the system structure function for the following system:

1
4
2 3

1
5
2 3

2.2 Construct the minimum path and minimum cut equivalent structures
for a 2-out-of-3 system.
2.3 Construct the minimum path and minimum cut equivalent structures
for a three-component series system.
2.4 Construct the minimum path and minimum cut equivalent structures
for a three-component parallel system.
20 Reliability Engineering: Probabilistic Models and Maintenance Methods

2.5 Show that the status function value of any system structure is
bounded below by the state of the series system comprised of the same
components and is bounded above by the state of the parallel system
comprised of the same components. That is, show that
n n

Õ x £ f(x) £ ∐ x
i =1
i
i =1
i

2.6 Construct the minimum path and minimum cut equivalent structures
for the following system:

1 2 3

4 5

6 7

2.7 Construct the minimum path and minimum cut equivalent structures
for the following system:

5 7

3 4 6
9

2.8 Construct the minimum path and minimum cut equivalent structures
for the following system:

1 3 5

4
System Structures 21

2.9 Construct and compare the minimum path and minimum cut equiva-
lent structures for the following systems:

1 3
1 3

2
4 2 4

2.10 Consider a structure having status function f( x ) and define the parallel
and series composition operations as component by component. That
∐ y = ( x ∐ y ,…, x ∐ y ) and Õ xy = (x y ,…, x y ). Show
is, let x 1 1 n n 1 1 n n

that (a) f ( x∐ y ) ³ f( x )∐ f( y ) and (b) f ( Õ xy ) £ f( x )f( y ). Then, use


(a) to interpret the relationships of Problem 2.9.
2.11 Consider the following 6-component system in which components 1
and 6 are identical and components 2 and 3 are identical.

4
3

5
2

Identify the min paths, the min paths equivalent system and the struc-
ture function for the equivalent system.
Then, identify the min cuts, the min cut equivalent system and the
structure function for the equivalent system.
2.12 Suggest a system structure model for a bicycle. Indicate which
component failures could occur without putting the rider in danger.
2.13 For the system structure in Problem 2.6, suggest a structural represen-
tation using modules.
2.14 Identify the primary modules of an automobile.
2.15 Identify three systems that are appropriately viewed as multistate
systems because the systems or their components are sometimes
operated at reduced levels.
3
Reliability of System Structures

The next logical step in our construction of mathematical models of reliabil-


ity is to enhance the system structure models by the addition of probabil-
ity—the second of the attributes of the definition of reliability. As we do this,
we will refer to the probabilities as reliabilities despite the fact that we have
not yet included all four attributes in our models. Naturally, we expect that
the reliability of a system will be represented as a function of the reliabilities
of its constituent components.

3.1 Probability Elements


Keeping in mind the fact that we represent system state as a binary variable,
ϕ, define the system reliability, Rs, to be the probability that the system is
functioning:

Rs = Pr[f = 1] (3.1)

Observe that an artifact of the binary definition of the system state is that
the system reliability is also the expected value of the system state variable:

E[f] = 1 × Pr[f = 1] + 0 × Pr[f = 0] = Pr[f = 1] (3.2)

A similar pair of definitions applies to the component status variables. That


is, we let

ri = Pr[xi = 1] (3.3)

where, because of the fact that the xi are binary, it is again the case that the
reliability and expected value correspond. For a system comprised of n com-
ponents, we take

r = {r1 , r2 , … , rn }

to be the vector of component reliability values. Given the defined notation,


it is reasonable to expect that the system reliability can be expressed as a

23
24 Reliability Engineering: Probabilistic Models and Maintenance Methods

function of the component reliabilities. In terms of general notation, we rep-


resent this as

Rs ( r ) = Pr[f( x ) = 1] (3.4)

and we devote the next section to the realizations of this expression.

3.2 Reliability of System Structures


The formulation of the system reliability function is often relatively straight-
forward, but the general forms must be constructed carefully. The key issue
to consider is whether or not the components are mutually independent. In
this construction, we follow the same four cases that we examined in the
previous chapter.

3.2.1 Series Systems


Based on the form of the system structure function, the general statement of
the reliability function for a series system is

é n
ù
Rs = Pr[f( x ) = 1] = Pr ê
ëê
Õ
i =1
xi = 1ú
úû
(3.5)

Now, in general,

é n
ù n

Pr ê
êë
Õ i =1
xi = 1ú ³
úû
Õ Pr[x = 1]
i =1
i

and equality holds only when the components are mutually independent.
Thus, for a series system comprised of independent components, the system
reliability may be stated as
n

Rs = Õr
i =1
i (3.6)

We should note that regardless of whether or not the components are inde-
pendent, the system reliability function is an increasing function of the
component reliability values and is a decreasing function of the number of
components.
Reliability of System Structures 25

In most series systems, the components are independent with respect


to their probabilities of proper function. One noteworthy class of struc-
tures for which the components are not independent is the set of sys-
tems for which the components share loads. This is discussed further in
Chapter 14.

3.2.2 Parallel Systems


The expression corresponding to Equation 3.5 for a parallel system is

é n
ù é n
ù
Rs = Pr[f( x ) = 1] = Pr ê
êë
∐i =1
xi = 1ú = Pr ê
úû êë
Õ
i =1
(1 - xi ) = 0 ú
úû
(3.7)

and it is again the case that

é n
ù n

Pr ê
êë
∐ i =1
xi = 1ú £
úû
∐ Pr[x = 1]
i =1
i

so a parallel system of independent components has the following reliability


function:

Rs = ∐r
i =1
i (3.8)

Examination of this function indicates that the system reliability function for
a parallel system is increasing both in the component reliability values and
in the number of components.
For system structures other than series and parallel, the computation of the
system reliability from the component reliabilities is not as straightforward.

3.2.3 k-out-of-n Systems


For the generic k-out-of-n system, there is no compact statement of the reli-
ability function. We can only state that

é n
ù
Rs = Pr ê
ëê
å x ³ k úúû
i =1
i (3.9)

and even when the components are independent, there is no convenient form
for this function. The single exception occurs when the n components are
independent and identical (have the same reliability). In that case, the system
26 Reliability Engineering: Probabilistic Models and Maintenance Methods

reliability is given by the sum of binomial probabilities for k or more func-


tioning components:

n
æ nö
Rs = å çè j ÷ø r (1 - r)
j=k
j n- j
(3.10)

For most other k-out-of-n systems, the use of the minimum path– and mini-
mum cut–based methods of the next section provides the most effective
approach to evaluating system reliability.
For the special case of the consecutive k-out-of-n:F systems described in
Chapter 2, calculation of the system reliability is quite complicated. Lambiris
and Papastavridis [16] obtained an exact formula for the system reliability
using binomial probabilities as

æ n - jk ö æ n - jk - k ö
n n

å ( ) å çè ( )
j j
Rs = ç ÷ (-1) r(1 - r )
j k
- (1 - r )k ÷ (-1) r(1 - r )
j k
(3.11)
j =0 è j ø j =0
j ø

3.2.4 Equivalent Structures


The fact that every system structure has an equivalent representation as a
combination of series and parallel forms suggests that we can use the equiv-
alent structures to evaluate system reliability. While this is true, the fact that
the minimum paths and minimum cuts that comprise the equivalent forms
are usually not independent implies that we will be able to obtain bounds on
system reliability rather than the exact system reliability values. The process
of using the minimum paths and minimum cuts is again illustrated by the
Wheatstone bridge shown in Figure 2.3.
Before developing the minimum path– and minimum cut–based bounds,
let us note that using the same approach as we did for the system status, we
can determine the system reliability by complete enumeration. Specifically,
we can enumerate all possible system states, obtain the probability that each
state occurs, and sum those probabilities that correspond to system function.
For the case in which the components are independent, the result of the enu-
meration is presented in Table 3.1.
Based on the expressions in Table 3.1, the system reliability is

Rs = r1r2r3 r4 r5 + r1r2r3 r4 (1 - r5 ) + r1r2r3 (1 - r4 )r5 + r1r2 (1 - r3 )r4 r5 + r1r2 (1 - r3 )r4 (1 - r5 )

+ r1r2 (1 - r3 )(1 - r4 )r5 + r1(1 - r2 )r3 r4 r5 + r1(11 - r2 )r3 r4 (1 - r5 ) + r1(1 - r2 )r3 (1 - r4 )r5

1 - r2 )(1 - r3 )r4 (1 - r5 ) + (1 - r1 )r2r3 r4 r5 + (1 - r1 )r2r3 r4 (1 - r5 )


+ r1(1 - r2 )(1 - r3 )r4 r5 + r1(1
+ (1 - r1 )r2r3 (1 - r4 )r5 + (1 - r1 )r2 (1 - r3 )r4 r5 + (1 - r1 )r2 (1 - r3 )(1 - r4 )r5 (3.12)
Reliability of System Structures 27

TABLE 3.1
Reliability Values for Paths of the Wheatstone Bridge
x ϕ(x) Pr[f( x ) = 1] x ϕ(x) Pr[f( x ) = 1]
{1, 1, 1, 1, 1} 1 r1r2r3r4r5 {0, 1, 1, 1, 1} 1 (1 − r1)r2r3r4r5
{1, 1, 1, 1, 0} 1 r1r2r3r4(1 − r5) {0, 1, 1, 1, 0} 1 (1 − r1)r2r3r4(1 − r5)
{1, 1, 1, 0, 1} 1 r1r2r3(1 − r4)r5 {0, 1, 1, 0, 1} 1 (1 − r1)r2r3(1 − r4)r5
{1, 1, 1, 0, 0} 0 r1r2r3r(1 − r4)(1 − r5) {0, 1, 1, 0, 0} 0 (1 − r1)r2r3r(1 − r4)(1 − r5)
{1, 1, 0, 1, 1} 1 r1r2(1 − r3)r4r5 {0, 1, 0, 1, 1} 1 (1 − r1)r2(1 − r3)r4r5
{1, 1, 0, 1, 0} 1 r1r2(1 − r3)r4(1 − r5) {0, 1, 0, 1, 0} 0 (1 − r1)r2(1 − r3)r4(1 − r5)
{1, 1, 0, 0, 1} 1 r1r2(1 − r3)(1 − r4)r5 {0, 1, 0, 0, 1} 1 (1 − r1)r2(1 − r3)(1 − r4)r5
{1, 1, 0, 0, 0} 0 r1r2(1 − r3)(1 − r4) {0, 1, 0, 0, 0} 0 (1 − r1)r2(1 − r3)(1 − r4)
(1 − r5) (1 − r5)
{1, 0, 1, 1, 1} 1 r1(1 − r2)r3r4r5 {0, 0, 1, 1, 1} 0 (1 − r1)(1 − r2)r3r4r5
{1, 0, 1, 1, 0} 1 r1(1 − r2)r3r4(1 − r5) {0, 0, 1, 1, 0} 0 (1 − r1)(1 − r2)r3r4(1 − r5)
{1, 0, 1, 0, 1} 1 r1(1 − r2)r3(1 − r4)r5 {0, 0, 1, 0, 1} 0 (1 − r1)(1 − r2)r3(1 − r4)r5
{1, 0, 1, 0, 0} 0 r1(1 − r2)r3(1 − r4) {0, 0, 1, 0, 0} 0 (1 − r1)(1 − r2)r3(1 − r4)
(1 − r5) (1 − r5)
{1, 0, 0, 1, 1} 1 r1(1 − r2)(1 − r3)r4r5 {0, 0, 0, 1, 1} 0 (1 − r1)(1 − r2)(1 − r3)r4r5
{1, 0, 0, 1, 0} 1 r1(1 − r2)(1 − r3) {0, 0, 0, 1, 0} 0 (1 − r1)(1 − r2)(1 − r3)
r4(1 − r5) r4(1 − r5)
{1, 0, 0, 0, 1} 0 r1(1 − r2)(1 − r3) {0, 0, 0, 0, 1} 0 (1 − r1)(1 − r2)(1 − r3)
(1 − r4)r5 (1 − r4)r5
{1, 0, 0, 0, 0} 0 r1(1 − r2)(1 − r3) {0, 0, 0, 0, 0} 0 (1 − r1)(1 − r2)(1 − r3)
(1 − r4)(1 − r5) (1 − r4)(1 − r5)

With considerable algebraic effort, this reduces to

Rs = r1r4 + r2r5 + r1r3 r5 + r2r3 r4 - r1r2r3 r4 - r1r2r3 r5 - r1r2r4 r5 - r1r3 r4 r5 - r2r3 r4 r5 + 2r1r2r3 r4 r5
(3.13)

In addition, for a structure such as the bridge, it would be reasonable for all
five components to be identical and to have the same reliability. Then,

ri = r "i

and the system reliability function reduces to the polynomial


Rs = 2r 2 + 2r 3 - 5r 4 + 2r 5 (3.14)

Clearly, system reliability evaluation by this method is rather demanding


and offers plenty of possibility for error.
An alternate approach is to construct bounds on system reliability. Three
sets of bounds on system reliability have been defined. It is appropriate to
compute all three sets and to combine the information they provide to obtain
the narrowest possible interval within which the system reliability will lie.
28 Reliability Engineering: Probabilistic Models and Maintenance Methods

The first set of bounds is reasonably obvious and is generally not very tight.
These are the series and parallel bounds. To compute these bounds, we sim-
ply treat the system components as if the system configuration were a series
structure of independent components and calculate a lower bound and we
then calculate an upper bound assuming a parallel configuration. Thus,
n n

bs = Õ i =1
ri £ Rs £ ∐r = b
i =1
i p (3.15)

For a system such as the bridge structure, assuming all of the components
are identical implies that these bounds reduce to

bs = r n £ Rs £ 1 - (1 - r )n = bp

Example calculations of these bounds are presented in Table 3.2.


A second set of reliability bounds may be constructed using minimum
paths and minimum cuts. Remember that the minimum cut equivalent
structure has the minimum cuts arranged in series and recall that for a series
structure in general

é n
ù n

Pr ê
êë
Õ i =1
xi = 1ú ³
úû
Õ Pr[x = 1]
i =1
i

Applying this inequality to the minimum cut structures yields the mini-
mum cut lower bound on system reliability:

é ù
bmcl = Õ Pr[k (x) = 1] £ Pr êêëÕ k (x) = 1úúû = Pr[f(x) = 1] = R
k
k
k
k s (3.16)

TABLE 3.2
Computed Values of the System Reliability
and the Bounds on System Reliability
r bs b mcl b mml Rs b mmu b mpu bp
0.99 0.951 0.999 0.980 0.999 1.000 1.000 1.000
0.95 0.774 0.995 0.903 0.995 0.998 1.000 1.000
0.90 0.591 0.978 0.810 0.979 0.990 0.997 1.000
0.75 0.273 0.852 0.563 0.861 0.938 0.936 0.999
0.60 0.078 0.618 0.360 0.660 0.840 0.748 0.990
0.50 0.031 0.431 0.250 0.500 0.750 0.569 0.969
0.25 0.001 0.064 0.063 0.139 0.438 0.148 0.763
0.10 0.000 0.003 0.010 0.022 0.190 0.022 0.410
Reliability of System Structures 29

The same reasoning can be applied to the minimum paths. The minimum
path equivalent structure has the minimum paths arranged in a parallel con-
figuration. Therefore, applying the inequality

é n
ù n

Pr ê
êë
∐ i =1
xi = 1ú £
úû
∐Pr[x = 1]
i =1
i

to the minimum paths yields the minimum path upper bound on system
reliability:

é ù
bmpu = ∐ Pr[r j ( x ) = 1] ³ Pr ê
ê ∐ r j ( x ) = 1ú = Pr[f( x ) = 1] = Rs
ú
(3.17)
j ë j û

For the example bridge structure having five identical components, these
bounds are computed as follows:

k1 ( x ) = ∐ x = 1 - (1 - x )(1 - x )
iÎC1
i 1 2 so Pr[k1( x ) = 1] = ∐r = 1 - (1 - r)
iÎC1
i
2

k2 ( x) = ∐ x = 1 - (1 - x )(1 - x )
iÎC2
i 4 5 so Pr[k 2 ( x ) = 1] = ∐r = 1 - (1 - r)
iÎC2
i
2

k3 ( x) = ∐ x = 1 - (1 - x )(1 - x )(1 - x )
iÎC3
i 1 3 5 so Pr[k 3 ( x ) = 1] = ∐ x = 1 - (1 - r)
iÎC3
i
3

k 4 ( x) = ∐ x = 1 - (1 - x )(1 - x )(1 - x )
iÎC4
i 2 3 4 so Pr[k 4 ( x ) = 1] = ∐ x = 1 - (1 - r)
iÎC4
i
3

r1( x ) = Õx = x x
iÎP1
i 1 4 so Pr[r1( x ) = 1] = Õr = r
iÎP1
i
2

r2 ( x ) = Õx = x x
iÎP2
i 2 5 so Pr[r2 ( x ) = 1] = Õr = r
iÎP2
i
2

r3 ( x ) = Õx = x x x
iÎP3
i 1 3 5 so Pr[r3 ( x ) = 1] = Õr = r
iÎP3
i
3

r4 ( x) = Õx = x x x
iÎP4
i 2 3 4 so Pr[r4 ( x ) = 1] = Õr = r
iÎP4
i
3

Then,

( ) (1 - (1 - r) ) ( )
2 2
bmcl = 1 - (1 - r )2 3
£ Rs £ 1 - (1 - r 2 )2 (1 - r 3 )2 = bmpu

Example calculations of these bounds are also included in Table 3.2.


30 Reliability Engineering: Probabilistic Models and Maintenance Methods

The minimum paths and minimum cuts may be used to define a third
set of bounds. These are known as the minimax bounds. Starting with
the minimum paths, recall that the structure function for a parallel sys-
tem may also be expressed in terms of a maximum. That is, as stated in
Equation 2.5,

f( x ) = max {xi }
i

so

Pr[f( x ) = 1] = Pr[max {xi } = 1]


i

and for a parallel arrangement of minimum paths,

Pr[f( x ) = 1] = Pr[max {r j ( x )} = 1]
j

In general, for any set of probabilities,

max {Pr[r j ( x ) = 1]} £ Pr[max {r j ( x )} = 1]


j j

so the minimax lower bound on system reliability is

bmml = max {Pr[r j ( x ) = 1]} £ Rs (3.18)


j

Applying the same logic to the minimum cuts, we have the following from
Equation 2.3:

f( x ) = min {xi }
i

so

Pr[f( x ) = 1] = Pr[min {xi } = 1]


i

and for the parallel configuration of the minimum cuts,

Pr[f( x ) = 1] = Pr[min {k k } = 1]
k

In general,

Pr[min {k k ( x )} = 1] £ min {Pr[k k ( x ) = 1}


k k
Reliability of System Structures 31

And the resulting minimax upper bound on system reliability is

Rs £ min {Pr[k k ( x ) = 1} = bmmu (3.19)


k

Therefore, the minimax reliability bounds are

bmml = max {Pr[r j ( x ) = 1]} £ Rs £ min {Pr[k k ( x ) = 1} = bmmu (3.20)


j k

For the bridge with identical components, the computation of the minimax
bounds proceeds as follows:

{ }
bmml = max {Pr[r j ( x ) = 1]} = max r 2 , r 2 , r 3 , r 3 = r 2
j

and

{
bmmu = min {Pr[k k ( x ) = 1} = min 1 - (1 - r )2 , 1 - (1 - r )2 , 1 - (1 - r )3 , 1 - (1 - r )3
k
}
= 1 - (1 - r )2

so

r 2 £ Rs £ 1 - (1 - r )2

To illustrate the computation and behavior of the bounds, we have calculated


the values of each of the bounds for the bridge with identical components
for several values of the component reliability. These values and the actual
system reliability value (from Equation 3.14) are shown in Table 3.2.
Note that one may not, in general, assume that one of the sets of bounds
will be tighter than the others. A reasonable approach to using the bounds is
to take the greatest of the lower bounds and the smallest of the upper bounds
to obtain the tightest possible bounds on system reliability. Doing this for
the entries of Table 3.2 indicates that very satisfactory bounds are obtained.

3.3 Modules
The idea that the components that comprise a system may sometimes be
partitioned into modules may be extended to the calculation of system reli-
ability bounds. There are three key ways in which this may be pursued.
Recall that the algebraic representation of system state using modules is

f( x ) = f(y( x ))
32 Reliability Engineering: Probabilistic Models and Maintenance Methods

where

y( x ) = {y1( x ), y 2 ( x ), … , y m ( x )}

One possible approach to the use of the partition is to calculate the reliability
of each module and to use the resulting values as component values in the
bounds defined in the previous section. If the modules contain relatively
few components or are configured in either a series of parallel structure, this
method will be fairly straightforward.
If, on the other hand, some of the modules are themselves rather compli-
cated but the system is designed with the modules in series, then the series
computation applied to each of the upper and the lower bounds provides a
pair of bounds on system reliability. In this case, the bounds on module reli-
ability are obtained using the methods of the previous section.
The third possibility is that one or more of the modules are complicated
and the system configuration of the modules is not a simple one. In this case,
a lower bound on system reliability can be computed by applying the mini-
mum cut lower bound calculation at the system level to the minimum cut
lower bounds for the modules.

3.4 Reliability Importance


In view of the influence of the reliabilities of the components, it is reason-
able to ask which components have the greatest (or least) impact on sys-
tem reliability. One possible reason for examining this question is to help
decide which of the components should be improved first. The idea that
the effect of each component is worth considering has led several authors
to define various measures of “reliability importance.” Among the several
forms that have been suggested, the one based on derivatives was proposed
by Birnbaum [17] and seems to be the most logical. The formal definition
is as follows:

Definition 3.1

The reliability importance of component i, IR(i), of a coherent system is the


derivative of the system reliability function with respect to the component i
reliability. That is,

d
I R (i ) = Rs (3.21)
dri
Reliability of System Structures 33

An appropriate interpretation of this definition is that the component


for which the component reliability imposes the greatest gradient on the
system reliability function is the most important. Consider some specific
example cases.
For a series structure comprised of n independent components, the realiza-
tion of expression is

n n
d d
I R ( j) =
drj
Rs =
drj Õr = Õr
i =1
i
i =1
i

i¹ j

Thus, for a series system of independent components, the reliability impor-


tance of any component is equal to the product of the reliabilities of the other
components. Since component indices are usually arbitrary, assume the com-
ponents have been numbered so that

r1 £ r2 £ r3 £ ⋯ £ rn

Under this indexing, we can see that

I R (1) ³ I R (2) ³ ⋯ ³ I R (n)

so that the weakest component has the greatest importance. For a series sys-
tem, this seems intuitively reasonable.
For a parallel system comprised of independent components, the corre-
sponding analysis is

d æ ö
n n n

Õ Õ (1 - r )
d d
I R ( j) =
drj
Rs =
drj ∐
i =1
ri = ç1-
drj çè i =1
(1 - ri ) ÷ =
÷
ø i =1
i

i¹ j

Assuming the same ordering of the component indices as indicated earlier,


the most important component in a parallel system is the strongest. That is,

I R (1) £ I R (2) £ ⋯ £ I R (n)

Intuitively, it seems reasonable that the strongest component is the most


important to reliability. The corresponding point that investments in compo-
nent reliability improvement should begin with the most reliable component
is less apparent but equally accurate.
An alternate approach to defining component reliability importance is
suggested by Fussell and Vesely [18]. Their definition is essentially the prob-
ability that a component is the one that failed when the system is failed.
34 Reliability Engineering: Probabilistic Models and Maintenance Methods

The probability for each component is obtained by computing the failure


probabilities for each min cut containing the component and dividing by the
system failure probability. Consider the example of a 2-out-of-3 system.
For a 2-out-of-3 system, the cut sets are {1, 2}, {1, 3}, and {2, 3} and the system
reliability is
Rs = p1 p2 + p1 p3 + p2 p3 - 2 p1 p2 p3

where
pi is the reliability of component i
qi = 1 - pi is the probability that component i is failed

Now, given that the system is failed, the chance that the failure was caused
by component 1 is the probability that one of the cut sets containing compo-
nent 1 is failed and the Fussell–Vesely importance of component 1, I FV (1), is
the probability that exactly one of those cut sets is failed divided by the prob-
ability of system failure. Thus, the denominator is 1 - Rs and the numerator is
Pr[( k1 = 0) È ( k 2 = 0)] = Pr[k1 = 0] + Pr[k 2 = 0] - Pr[( k1 = 0) Ç ( k 2 = 0)]

( ∐ p p ) + (1 -∐ p p ) - (1 -∐ p p p ) = q q + q q - q q q
= 1- 1 2 1 3 1 2 3 1 2 1 3 1 2 3

Therefore,

q1q2 + q1q3 - q1q2q3


I FV (1) =
1 - ( p1 p2 + p1 p3 + p2 p3 - 2 p1 p2 p3 )

The same construction yields


q1q2 + q2q3 - q1q2q3
I FV (2) =
1 - ( p1 p2 + p1 p3 + p2 p3 - 2 p1 p2 p3 )

q1q3 + q2q3 - q1q2q3


I FV (3) =
1 - ( p1 p2 + p1 p3 + p2 p3 - 2 p1 p2 p3 )

Assuming p1 = 0.95, p2 = 0.90, and p3 = 0.85 yields Rs = 0.974 , so

0.0118
I FV (1) = = 0.452
0.026

0.0193
I FV (2) = = 0.740
0.026

0.0218
I FV (3) = = 0.837
0.026
Reliability of System Structures 35

and we note that in a 2-out-of-3 system, it is the weakest component that


has the highest Fussell–Vesely importance measure. The reader may con-
firm that it is the weakest component of a series system that has the highest
Fussell–Vesely importance measure, and in a parallel system, it is the stron-
gest component.
Resolving questions of how to enhance system reliability has been one
of the areas in which reliability analysts have contributed to system design
efforts. Reliability importance measures provide a basis for evaluating the
cost-effectiveness of investments in component redesign or other improve-
ment strategies.

3.5 Reliability Allocation


Another approach to enhancing system reliability is by introducing
redundancy at selected component locations. That is, the system configura-
tion is altered by replacing a single component with two or more copies of the
component in parallel. The problem of selecting the components for which
this is done is known as the reliability allocation problem. It is assumed that
each copy of a component included in the system has a cost. The cost might
actually represent the price of the component or may represent weight or any
other consequence of allocating the component to the design. Then, the prob-
lem of designating the locations and magnitudes of component redundancy
can be stated as an integer mathematical program. In fact, there are two
plausible algebraic forms for the optimization problem. We may minimize
system cost subject to a reliability constraint or we may maximize system
reliability subject to a budget constraint.
Regardless of the system configuration, we assume that we have an expres-
sion for the system reliability, Rs. Let mi represent the number of copies of
component i placed in parallel at the component i location in the system con-
figuration and let ci represent the unit cost for component i. Then, one integer
programming representation of the system design problem is to

n
Minimize åc m
i =1
i i

subject to:
Rs ³ Rtarget

mi ³ 1 "i

mi integer
36 Reliability Engineering: Probabilistic Models and Maintenance Methods

Here, we minimize the cost to obtain a target system reliability level. The
alternate problem is to

Maximize Rs

subject to:
n

åc m £ C
i =1
i i budget

mi integer

Naturally, for both of the optimization problems, the algebraic statement of


the system reliability function has terms of the form

1 - (1 - ri )mi

for the contributions of the reliabilities of the component positions. Equally


clear should be the fact that it is the algebraic form of the system reliability
function that determines how difficult it is to solve either of the optimization
problems. A recent paper by Majety et al. [19] provides an efficient algorithm
for solving the integer programs in general and recent work by Rice et al.
[20] suggests that because of the relatively small number of feasible solutions,
enumeration strategies will often yield solutions efficiently.
A distinction should be made here between reliability allocation and reli-
ability apportionment. Reliability apportionment is the process of assigning
reliability targets to subsystems during system design. There are algorithms
for doing this, and the most popular of which is called the ARINC method,
which is described by Lloyd and Lipow [21]. Usually, the algorithms are
based upon a balance between the reliability importance of the subsystems
and the cost of enhancing the reliability of existing subsystems designs. The
key difficulty with the apportionment task is that the assignment process is
driven by subjective criteria in that enhancement cost is usually represented
by an effort function. The origin of the effort function is often unclear. For
this reason, the algorithm is not developed here. In contrast, the allocation
models defined earlier treat the question directly.

3.6 Conclusion
The analyses considered to this point provide a means for relating system
reliability to component reliability for many types of equipment designs.
Several exceptions have been noted and some of these will be addressed
Reliability of System Structures 37

later in the text. For the system configurations that are based on binary com-
ponent states and independent components, the models and analyses treated
so far are sufficient to permit a reductionist approach to reliability analysis.
That is, for these simplest of systems, reliability may be studied at the com-
ponent level because the dependence of system reliability on component reli-
ability is well defined. For very many system designs, the ability to focus
independently on individual component reliability performance is essential
to achieving the high levels of reliability we now enjoy.

Exercises
3.1 Compute the reliability of a three-independent-component parallel
system in which all components have a reliability of 0.75.
3.2 Compute the reliability of a 3-out-of-4 system for which all components
have a reliability of 0.85 and the components are independent.
3.3 Compute the system reliability for the following system assuming the
components are independent and r = (0.96, 0.88, 0.92).

3.4 Assuming that the components are independent, compute


the system reliability for the system of Problem 2.1 when
r = (0.86, 0.90, 0.92, 0.85, 0.82, 0.78, 0.82).
3.5 For the system of Problem 2.5, assume that r = (0.96, 0.96, 0.96, 0.90, 0.90,
0.94, 0.94). Compute the three types of reliability bounds and the actual
system reliability.
3.6 For the system of Problem 2.7, assume that r1 = 0.85, r2 = 0.80, r3 = 0.95,
r4 = 0.75, and r5 = 0.90. Compute the three types of reliability bounds
and the actual system reliability.
3.7 For the system of Problem 2.6, assume that r1 = r2 = 0.96, r3 = 0.95,
r4 = 0.98, r5 = r6 = 0.92, r7 = 0.93, and r8 = r9 = 0.90. Compute the three
types of reliability bounds and the actual system reliability.
3.8 Problem 2.12 involved defining a modular decomposition to the system
of Problem 2.6. Construct the reliability bounds for the revised system
obtained in Problem 2.12.
38 Reliability Engineering: Probabilistic Models and Maintenance Methods

3.9 For the systems of Problem 2.8, assume that r1 = r2 = 0.90 and r3 = r4 = 0.80.
Compute the three types of reliability bounds and the actual system
reliability.
3.10 Suggest an alternative reliability importance measure that is different
from the Birnbaum and the Fussell–Vesely measures. Illustrate your
measure by application to the system of Problem 2.6.
3.11 Show that for a series system of n components, the Fussell–Vesely
importance measure is the greatest for the weakest component.
3.12 For a parallel system of n independent components, which component
has the greatest value of the Fussell–Vesely importance measure?
4
Reliability over Time

The definition of reliability given in Chapter 2 indicates that reliability is


the probability of proper function over time. An implicit feature of this
definition is the underlying assumption that across a population of identical
devices, survival over time (or life length) is dispersed in a manner that is
well modeled by probability and hence by a probability distribution function.
Thus, the extension of the measures of reliability to include time involves the
specification of probability distributions that constitute reasonable models
for the dispersion of life length. This is the subject of this chapter.

4.1 Reliability Measures


The logical extension of the models examined in the previous two chapters
to include time is to indicate that component and system states may be
represented as functions of time. That is, we map xi to xi(t) and ϕ to ϕ(t).
The interpretation of this extension is that the random variable implied in
Definition 2.1 is the duration of a proper system function. In similar manner,
we adjust the definitions of the reliability functions as

Rs (t) = Pr[f(t) = 1] and ri (t) = Pr[xi (t) = 1]

As before, the complement of reliability is failure.


As indicated earlier, the introduction of the time variable carries the implicit
assumption that the time at which a component or system state changes from
proper function to failed, the failure time, is a random variable and across
a population of identical devices, the dispersion in the failure time can be
represented using a probability distribution function. The failure time of a
device is also called its life length. Denoting life length by T, the distribution
function on T is represented by FT(t) where FT(t) = Pr[T ≤ t] is the probability
that device life length is less than or equal to t.
Given this basic definition, reliability in time is the probability that device
life length exceeds t. Thus, R(t) = FT (t) is the probability that life length
exceeds t.

39
40 Reliability Engineering: Probabilistic Models and Maintenance Methods

These basic definitions raise two points concerning notation. First, note
that it is reasonable to use either R(t) or the survivor function FT (t) to repre-
sent reliability. As these forms are truly synonymous, the survivor function
form will be used here whenever reasonable. Second, note that the subscript
denoting the component or system has been dropped. Much of the discus-
sion to follow is general in that it applies equally well to the system or to any
component. As no specific component is being identified, the subscript is
excluded unless needed for clarification. In addition, the comment at the end
of Chapter 3 concerning the fact of our ability to reduce our focus from the
system to the component in many designs suggests that we can often exam-
ine reliability independent of component identity.
Returning to the inclusion of time in our models, note that the distribution
function on life length is the basis for four equivalent algebraic descriptors
of longevity. These four descriptors are the distribution function, the sur-
vivor (reliability) function, the density function, and the hazard function.
Reiterating the previous definitions, we have

Distribution function: FT (t) = Pr[T £ t] (4.1)

Reliability function: FT (t) = Pr[T > t] (4.2)

d
Density function: fT (t) = FT (t) (4.3)
dt

The hazard function, zT(t), is the instantaneous conditional probability of


failure given survival to any time. That is, it is the instantaneous rate of fail-
ure for surviving devices. Algebraically,

FT (t + Dt) - FT (t)
Pr[T £ t + Dt|T > t] =
FT (t)

so we obtain the hazard function as the limit of this expression:

ì1 ü ì 1 FT (t + Dt) - FT (t) ü fT (t)


zT (t) = lim í Pr[T £ t + Dt|T > t]ý = lim í ý= (4.4)
Dt ®0 î Dt þ Dt ® 0
î Dt FT (t) þ FT (t)

As indicated by the algebraic form, the hazard function is the rate at which
surviving units fail. For this reason, it is often called the “failure rate.”
However, because it can apply to other failure phenomena, the terminology
“failure rate” can be misleading, so in this text, the function zT(t) is called the
hazard function.
Knowledge of any one of the four reliability measures implies knowl-
edge of all of them. They are all functionally related and actually comprise
Reliability over Time 41

alternate statements of the model of life length dispersion. As an example,


note that given f T(t), we obtain

FT (t) =
ò f (u)du
0
T

FT (t) = 1 - FT (t)

fT (t)
zT (t) =
FT (t)

As a further example, suppose we are given zT(t). Treating the definition of


the hazard function

fT (t)
zT (t) =
FT (t)

as a differential equation, we have

d
fT (t) = - FT (t) = zT (t)FT (t)
dt

so

d
FT (t) + zT (t)FT (t) = 0
dt

The solution for this equation is

FT (t) = e ò0
- zT ( u ) du

or alternately

FT (t) = e - ZT (t )

where
t

ò
ZT (t) = zT (u)du
0
(4.5)

is called the cumulative hazard function. Later in this text, there are several
topics in which the cumulative hazard is a useful part of the analysis.
42 Reliability Engineering: Probabilistic Models and Maintenance Methods

As the four reliability measures are all faces of the same description of
the failure behavior of a device, we could use any of them as a basis for
distinguishing failure patterns. The hazard function is commonly used
by reliability analysts to describe the failure behavior of a device. The
use of the hazard function started with the concept that a population of
devices displays a “bathtub”-shaped hazard over the lives of the members
of the population. The “bathtub curve” is shown in Figure 4.1. The shape is
intended to illustrate the view that aging in a device population proceeds
through phases. Early in the lives of the devices, failures occur at a rela-
tively high rate. This “infant mortality period” is often attributed to the
failure of members of the population that are “weak” as a result of material
flaws, manufacturing defects, or other physical anomalies. Following the
“early life” or “infant mortality” period, the device population proceeds
through the “functional life period” during which the hazard function is
relatively low and reasonably stable. Finally, toward the end of the lives of
the population members, survivors fail with an increasing rate as a conse-
quence of “wear out.”
It is reasonable to observe that actuarial curves for human and other
biological entities often display the bathtub shape so that the analogy to
human mortality is often informative. It is also interesting that early life
failure behavior has been observed so extensively that most durable goods
manufacturers include some sort of run-in as part of their product testing
activities. In addition, it has long been common for government and mili-
tary procurement policies to mandate run-in efforts as a condition of sale for
equipment suppliers.
The concept of the bathtub curve has been discussed and debated widely
by reliability analysts. Some authors such as Wong and Lindstrom [22] argue
that device populations actually comprise numerous subpopulations, each
zT (t)

Time

FIGURE 4.1
Example of a bathtub curve.
Reliability over Time 43

of which has a unique hazard behavior. According to Wong and Lindstrom,


mixing the subpopulations results in a hazard curve that is “roller coaster”
rather than bathtub shaped. The key point is that the hazard function is
viewed as the most informative descriptor of device failure behavior. It is
the measure that is usually used to select the distribution function to model
life length.
In view of the importance of the hazard functions to descriptions of device
failure patterns, an extensive classification of hazard function behavior has
been devised. The simplest elements of that classification are presented in
the following:

Definition 4.1

A life distribution, FT(t), is said to be an increasing failure rate (IFR) distribu-


tion if

d
zT (t) ³ 0 0 £ t < ¥ (4.6)
dt

An alternate condition for the IFR classification is that

FT (t + t)
FT (t + t|t) =
FT (t)

be nonincreasing in τ for all t ≥ 0. Note that in the earlier definition and the
ones that follow, the terminology “failure rate” is used. This is done to show
the correspondence to the abbreviations, which were defined by Barlow
et al. [23] at a time when the ambiguity in the terminology had not yet been
recognized.
In any case, the previous expression says that if the conditional survival
probability is a nonincreasing function of age, then the rate at which failures
occur is increasing and the life distribution is IFR. By similar reasoning, we
obtain the following.

Definition 4.2

A life distribution, FT(t), is said to be a decreasing failure rate (DFR) distribution if

d
zT (t) £ 0 0 £ t < ¥ (4.7)
dt
44 Reliability Engineering: Probabilistic Models and Maintenance Methods

The alternate condition for the DFR classification is that FT (t + t|t) be non-
decreasing in τ for all t ≥ 0. The following is the third possible form for the
hazard function:

Definition 4.3

A life distribution, FT(t), is said to be a constant failure rate (CFR) distribution if


d
zT (t) = 0 0 £ t < ¥ (4.8)
dt
The equivalent condition for a CFR classification is that

FT (t + t|t) = FT (t)

for all t ≥ 0. This is an interesting special case that is examined further in the
next section.
There are situations in which the conditions for designation as IFR or DFR
are only partially met. For these cases, we consider the following situations:

Definition 4.4

A life distribution, FT(t), is said to be an increasing failure rate on average (IFRA)


distribution if
t
1
zT (t) >

0
zT (u)du 0 £ t < ¥ (4.9)

and is said to be a decreasing failure rate on average (DFRA) distribution if


t
1
zT (t) <

0
zT (u)du 0 £ t < ¥ (4.10)

While more detailed and extensive classifications of hazard functions have


been defined, the ones enumerated earlier are sufficient for our study here.
The classification of the hazard behavior serves to direct our choice of distri-
bution to model device reliability.

4.2 Life Distributions


In principle, any distribution function may be used to model equipment lon-
gevity. In practice, distribution functions having monotonic hazard functions
seem most realistic, and within that class, there are a few that are generally
Reliability over Time 45

thought to provide the most reasonable models of device reliability. The most


common choices of life distribution models are described in the next few pages.

4.2.1 Exponential Distribution


The most widely used distribution function for modeling reliability is the
exponential distribution. It is such a popular model of device reliability
because (1) it is algebraically simple and thus tractable and (2) it is considered
representative of the functional life interval of the device life cycle. Some
firms try to manage components or devices by aging them through the early
life period before putting them in service. The devices are expected to be
obsolete before reaching the wear-out period, so an appropriate model of
device reliability is one having constant hazard. This point of view is con-
troversial. Nevertheless, the exponential model is widely used. The general
statement of the exponential distribution is

FT (t) = 1 - e - lt (4.11)

The corresponding density function is

fT (t) = le - lt (4.12)

and the hazard function is constant over time. It is

fT (t) le - lt
zT (t) = = =l (4.13)
FT (t) e - lt

The exponential distribution is the only probability distribution that has a


constant hazard function. In fact, if we observe that a device appears to have
constant hazard, we may conclude that the exponential distribution is an
appropriate life distribution model and the converse statement also applies.
To see that constant hazard implies an exponential distribution, solve the
differential equation implied by Equation 4.13:

d
FT (t) = fT (t) = lFT (t)
dt

and obtain Equation 4.11 as the result.


Constant hazard is both a desirable and an undesirable feature of the expo-
nential model. The appeal of the result lies in its simplicity. Problems with
the constant hazards model revolve around the associated “memoryless”
property that it displays and the corresponding fact that the conditional sur-
vival probability is independent of age. That is,

FT (t + t) e - l(t + t)
Pr[T > t + t|T > t] = FT (t + t|t) = = - lt = e - lt = FT (t)
FT (t) e
46 Reliability Engineering: Probabilistic Models and Maintenance Methods

The interpretation of this result is that a used device has the same reli-
ability as a new one. Clearly, this is quite contrary to intuition and is
unlikely to be true of most devices. The lack of memory feature of the
exponential model is therefore a weakness in its representation of real
equipment.
One final observation concerning the exponential model is the fact that
the life distribution of a series system comprising independent components,
each of which has an exponential life distribution, is exponential. That is,

n æ n ö
n - å l it - ç l i ÷t
å
Õ F (t) = Õ e - l it ç ÷
Rs (t) = Ti =e i =1
=e è i=1 ø

i =1 i =1

Note further that this expression confirms the fact that the system-level haz-
ard function for a series system of independent components is computed as
the sum of the component hazards.

4.2.2 Weibull Distribution


An alternate life distribution model that is also widely used is the Weibull
distribution. The distribution is named for its developer, Waloddi Weibull,
who was a highly talented Swedish scientist. He developed the Weibull dis-
tribution [24] to describe the observed strengths of tensile test specimens.
It has subsequently been found that the distribution provides a reasonable
model for the life lengths of very many devices. The Weibull distribution
function may be stated in several ways. The most general is

b
æ t -d ö
-ç ÷
è q- d ø
FT (t) = 1 - e (4.14)

For this three-parameter form of the distribution function, the parameter δ is
a minimum life parameter that is often assumed to have a value of zero.
The interpretation of the parameter δ is that it is the time before which no
failures occur. When expressed in this manner, it seems reasonable to set
δ = 0. On the other hand, if the “time variable” is actually cycles to failure
or applied force in the case of a mechanical component, δ > 0 may be an
appropriate feature of the failure model. For example, a tensile specimen
made of steel will not fail when subjected to forces of 10–20 newtons. To
model the dispersion in failure strength of such specimens, a minimum
applied force of perhaps δ = 100 newtons might be appropriate. When the
minimum life parameter is nonzero, the distribution function appears as is
shown in Figure 4.2.
Reliability over Time 47

FT (t)

δ t

FIGURE 4.2
Three-parameter Weibull life distribution.

Because it is simply a coordinate location parameter, there is no loss in gen-


erality in assuming it has a value of zero. For the balance of the discussion
here, we take δ = 0. Then, the form of the two-parameter Weibull distribution
function is

b
ætö
-ç ÷
èqø
FT (t) = 1 - e (4.15)

The parameter θ is a scale parameter in that it determines the range of dis-


persion of the distribution. It is also called the “characteristic life” parameter
because the value of the distribution at t = θ is independent of the value of
the second parameter, β. That is,

FT (t = q) = 1 - e -1 = 0.632

so θ is a characteristic feature of any realization of the distribution.


The parameter β is the shape parameter. It determines the relative shape
of the distribution and also the behavior of the hazard function. The general
form of the hazard function is

btb-1
zT (t) = (4.16)
qb

Figure 4.3 provides an illustration of the Weibull hazard function. Observe


that by setting β = 1, we obtain the exponential distribution as a special case
and thus have constant hazard. Note further that the hazard monotonically
increases when β > 1 and monotonically decreases when β < 1.
The Weibull distribution is very widely used in reliability modeling. It has
the advantages of flexibility in modeling various types of hazard behavior
48 Reliability Engineering: Probabilistic Models and Maintenance Methods

β>1
zT (t)

β=1

β<1

Time

FIGURE 4.3
Weibull hazard functions.

and of algebraic tractability. In addition, as with any two-parameter distribu-


tion, it can be made to fit many actual situations reasonably well.
There are two further reasons that the Weibull is so widely used. One is that
when Weibull [24] first developed the distribution form, it was to represent
the failure behavior of tensile specimens, and the other is that the Weibull is
one possible realization of the extreme value distribution. Considering that
a device is subject to failure due to any of several causes, the first failure
mechanism to be activated (smallest time to occurrence) determines device
failure. Thus, failure times are the smallest extremes of a set and might rea-
sonably be modeled using a distribution from the class of extreme value
distributions.
Another perspective on the extreme value concept is that one might
consider that there are some number, say, k, of possible sites (links in a chain,
units of material, reactive species, etc.) of failure and that each cause is actu-
ated in time according to an identical and independent distribution; the
time of device failure is the minimum of the cause actuation times. This also
implies an extreme value. The extreme value argument may be stated as

Õ F (t) = F (t) ( F (t))


k -1
FT (t) = FTj (t) Ti Tj Tj
i =1
i¹ j

where the subscripts on the time variable indicate a correspondence to the


pertinent “units.” Gumbel [25] presents the results developed by Fisher
and Tippet [26], which show that in the limit, as k goes to infinity, the life
Reliability over Time 49

distribution can only have one of the three forms and the forms corre-
spond to the cases of negative, unconstrained in sign, and positive random
variables. The form for nonnegative random variables has the Weibull as a
representative case. In addition, the variable Y = ln T, where T has a Weibull
distribution, has an extreme distribution of the form

b ( y -ln q )
FY ( y ) = 1 - e - e (4.17)

Thus, the Weibull has a fairly plausible physical interpretation.


There are two special cases of the Weibull that are used fairly exten-
sively. One is when the value of the shape parameter β is 3.26. At this value,
the Weibull is nearly “bell shaped” so it appears like and can be used as a
substitute for the normal distribution. The other special case is when the
value of the shape parameter β is 2. In this case, the distribution is usually
called the Rayleigh distribution. It is special because the hazard function is
linear, that is,

2t
zT (t) =
q2

and many people find this linearity appealing.

4.2.3 Normal Distribution


Another popular model of device life length is provided by the normal dis-
tribution. Because of its algebraic intractability, the normal distribution is
usually expressed in terms of its density function:

2 2
e -(t -m ) /2s
fT (t) = (4.18)
2ps2

The corresponding distribution function cannot be expressed algebraically


in closed form. Hence, the same statement applies to the reliability and haz-
ard function. The distribution function is widely tabulated in its standard
form. A table of standard normal probabilities is included in Appendix A.
It is also reasonably straightforward to compute normal variates and cumu-
lative probabilities numerically. The expressions for these calculations are
provided in Appendix A.
The parameters of the normal distribution correspond to the distribution
moments. That is, μ = E[t] is the mean of the distribution and σ2 is the variance.
The normal distribution displays monotonically increasing hazard. We can
50 Reliability Engineering: Probabilistic Models and Maintenance Methods

prove that fact as follows. Start with the definition of the hazard function as
given in Equation 4.4 and take the derivative of the hazard function to obtain

d
zT (t) =
{ }
fT¢ (t) FT (t) - fT (t){- fT (t)} FT (t) fT¢ (t) + fT2 (t)
=
{ }
2
dt FT (t) FT2 (t)

Next, take the indicated derivative of the density function:


2 2 2 2
d e -(t -m ) /2s 2(t - m) e -(t -m ) /2s (t - m)
fT¢ (t) = =- 2
=- fT (t)
dt 2ps 2 2s 2ps 2 s2

and substitute it back into the derivative of the hazard function:


(t - m)
- fT (t)FT (t) + fT2 (t)
d s2
zT (t) =
dt FT2 (t)

d (t - m)
Note that FT2 (t) > 0 so zT (t) ³ 0 if - fT (t)FT (t) + fT2 (t) ³ 0. When t < μ,
dt s2
(t - m)
- >0
s2
fT (t) ³ 0

FT (t) > 0

fT2 (t) ³ 0
d
so zT (t) ³ 0
dt
When t ≥ μ,

(t - m) æ (t - m) ö
- fT (t)FT (t) + fT2 (t) = ç - FT (t) + fT (t) ÷ fT (t)
s 2
è s 2
ø

and
fT (t) ³ 0

(t - m)
so the question is whether or not - FT (t) + fT (t) ³ 0. To decide this, we
recall that s2

FT (t) =
ò f (x)dx
t
T
Reliability over Time 51

Necessarily, for any fixed value t,


¥ ¥

ò ò
t fT ( x)dx < xfT ( x)dx
t t

Therefore,
¥ ¥
(t - m) (t - m) ( x - m)
s2
FT (t) =
s2 òt
fT ( x)dx <
ò t
s2
fT ( x)dx

¥ 2 2
( x - m ) e - ( x - m ) /2 s
ò
¥
= dx = - fT ( x) t = fT (t)
s2 2ps 2
t

and hence,

(t - m)
FT (t) < fT (t)
s2

(t - m)
This inequality implies that - FT (t) + fT (t) > 0 so the derivative of the
s2
normal hazard function is nonnegative and the hazard function is increasing.
As is discussed in Chapter 5, the normal distribution is often consid-
ered a very appropriate model for the reliability of structural components.
On the other hand, many reliability analysts have resisted the use of the
normal distribution as a life length model for time-indexed ages because
of the fact that the normal is defined over the entire real line, negative as
well as positive.

4.2.4 Lognormal Distribution


A suggested response to the criticisms of the normal distribution is the use
of the lognormal. In this case, ln t is taken to be normally distributed. As a
consequence, the density function for the life distribution model becomes
2 2
e -(ln t -m ) /2s
fT (t) = (4.19)
t 2ps2

Note the appearance of the time variable in the denominator of this function.
It also appears that the lognormal is an appealing distribution indepen-
dent of the issues related to positive random variables. The life lengths of
quite a few microelectronic components have been found to be well modeled
by the lognormal distribution.
52 Reliability Engineering: Probabilistic Models and Maintenance Methods

The lognormal distribution suffers from a comparable level of algebraic


intractability to the normal. The model is quite useful and it displays the
unique feature that for appropriately selected parameter values, the hazard
function increases and then decreases.

4.2.5 Gamma Distribution


Another distribution that is widely used in reliability modeling is the gamma
distribution. The representation of the density function for the gamma is

lb b-1 - lt
fT (t) = t e (4.20)
G(b)

The distribution function can be stated in closed form only if the shape
parameter, β, is an integer. In this case, the distribution function is
¥
( lt ) b-1
( lt )
k k

FT (t) = å
k =b
k!
e - lt = 1 - å
k =0
k!
e - lt (4.21)

Note that when the shape parameter is an integer, the gamma distribution is
usually called the Erlang (or Erlang-β) distribution.
As in the case of the Weibull, the gamma distribution model displays increas-
ing hazard when β > 1, decreasing hazard when β < 1, and constant hazard when
β = 1. Thus, setting β = 1 collapses the gamma distribution to the exponential.
The gamma has the disadvantage of being rather difficult algebraically,
but it has the advantage that it arises naturally as the convolution of identi-
cal exponential distributions. It therefore has considerable practical appeal.
Strategies for the numerical evaluation of gamma functions and gamma dis-
tributions are provided in Appendix A.

4.2.6 Other Distributions


A wide variety of other probability distributions are used for reliability
modeling but most with relatively low frequencies. One of those other distri-
butions, the Birnbaum–Saunders distribution [27,28], is discussed briefly in
Chapter 5. Another is the Makeham distribution [29] for which the distribu-
tion function is

é b ù
- êat + ( e gt -1)ú
g
FT (t) = 1 - e ë û
(4.22)

The hazard function for this distribution is

zT (t) = a + be gt (4.23)
Reliability over Time 53

so there are three parameters that can be selected to provide whatever type
of model is desired. More importantly, the parameters can be selected so that
the function matches failure data quite well. For this reason, the Makeham
distribution [29] is widely used in actuarial studies. Note that it also corre-
sponds to an extreme value type of distribution.
Still another model that is based on actuarial data analysis is the Gompertz
distribution [29] for which the hazard function is

lbt (4.24)

so the distribution function is


é l (bt -1) ù
-ê ú
êë ln b úû
FT (t) = 1 - e (4.25)

While this form is rather intricate, the rationale for its construction is that the
reciprocal of the hazard function should be decreasing. That is,

dæ 1 ö
ç ÷<0
dt è zT (t) ø

which implies increasing hazard. For those working with actuarial data, this
seems a reasonable way to treat such behavior and the distribution has been
adopted by some reliability specialists for the same reason.
One final model that is particularly worth mentioning is the one suggested
by Hjorth [30].
He calls it the IDB (Increasing, Decreasing, Bathtub) distribution because
depending on the choice of parameter values, it can have increasing, decreas-
ing, or bathtub-shaped hazard function. The general statement of the IDB
distribution is
2
e - dt /2
(4.26)
FT (t) = 1 - q/b
(1 + bt )
and the corresponding hazard function is

q
zT (t) = dt + . (4.27)
1 + bt

Clearly, with three parameters, many behaviors can be modeled. In particu-


lar, setting θ = 0 yields the Rayleigh type of Weibull distribution. Similarly,
setting δ = β = 0 yields the exponential distribution. Observe that δ = 0 implies
decreasing hazard, while δ > θβ implies increasing hazard and 0 <  δ < θβ
yields a bathtub-shaped hazard function. If only for its flexibility, the model
is worthy of consideration.
54 Reliability Engineering: Probabilistic Models and Maintenance Methods

In summary, we might note that there are very many types of equipment
for which life distributions provide a meaningful model of life duration. The
possible choices are comparably wide. The distributions described earlier are
the principle but not the only distributions used to model life length. Each
has advantages and each has shortcomings. The key is to select one that is
appropriate for its application.

4.3 System-Level Models


The discussion of equivalent structures in Chapter 2 assures that system reliabil-
ity may be expressed in terms of component reliability. In addition, the discus-
sion in Chapter 3 on the use of minimum paths and minimum cuts to construct
bounds on system reliability further supports the focus on component-level reli-
ability measures. Notwithstanding this fact, system-level (or subsystem) analy-
ses are often meaningful or informative. For example, consider a series system of
independent components. For this system, the reliability function is
n
n n - å ZTi (t )
FT (t) = Õ F (t) = Õ e
i =1
Ti
i =1
- ZTi ( t )
=e i =1
(4.28)

Clearly, this form implies that for a system of independent components


arranged in a series, the system-level hazard function is the sum of the com-
ponent hazard functions. Thus, system hazard may be managed by reducing
component hazard function levels.
Another point of interest is the question of whether or not component haz-
ard function behavior is preserved in the formation of systems. For example,
does a system comprise independent IFR components? The answer is no.
Such a system is IFRA but not IFR. The most well-known example of the
accumulation of IFR behavior is the parallel arrangement of three indepen-
dent components, all of which are CFR (which may be viewed as simultane-
ously IFR and DFR). For this system, the reliability is
3 3 3

FT (t) = ∐i =1
FTi (t) = 1 - Õ(
i =1
)
1 - FTi (t) = 1 - Õ (1 - e )
i =1
- l it

= e - l1t + e - l2t + e - l3t - e -( l1 +l2 )t - e -( l1 +l3 )t - e -( l2 +l3 )t + e -( l1 +l2 +l3 )t

so the corresponding density function on life length is

fT (t) = l1e - l1t + l 2e - l2t + l 3 e - l3 t - (l1 + l 2 )e -( l1 + l2 )t - (l1 + l 3 )e -( l1 + l3 )t

- (l 2 + l 3 )e -( l2 + l3 )t + (l1 + l 2 + l 3 )e -( l1 + l2 + l3 )t
Reliability over Time 55

0.1

0.08
zT (t)

0.06

0.04

0.02

2 4 6 8 10
Time

FIGURE 4.4
Hazard function for a three (exponential)-component parallel system.

and the hazard function is the ratio of these two functions. A plot of the haz-
ard function for the case in which the parameters are normalized to sum to
one (λ1 = 0.6, λ2 = 0.3, λ 3 = 0.1) is shown in Figure 4.4.
Naturally, the specific behavior of the hazard function depends upon the
values of the parameters of the life distributions of the components and upon
the specific system structure. Nevertheless, we may conclude that IFR behav-
ior is not preserved when IFR components are combined to form a system.
The same is true for DFR components.
In view of the previous results concerning the aggregation of IFR compo-
nents, it is clear that for any system, the behavior of the system-level hazard
function and the system-level reliability function should be examined care-
fully. The approach most likely to lead to a successful investigation of the
hazard function at the system level is to form the system reliability function
and to differentiate it, either algebraically or numerically, and to then evalu-
ate the expression

d
- FT (t)
zT (t) = dt (4.29)
FT (t)

again either algebraically or numerically.


As an example, consider the system shown in Figure 4.5. This simple sys-
tem has the reliability function

(
FT (t) = e - ZT1 (t ) e - ZT2 (t ) + e - ZT3 (t ) - e - ZT2 (t )- ZT3 (t ) )
= e - ZT1 (t )- ZT2 (t ) + e - ZT1 (t )- ZT3 (t ) - e - ZT1 - ZT2 (t )- ZT3 (t )
56 Reliability Engineering: Probabilistic Models and Maintenance Methods

FIGURE 4.5
System example.

The density function corresponding to this reliability function is

fT (t) = ( zT1 (t) + zT2 (t))e - ZT1 (t )- ZT2 (t ) + ( zT1 (t) + zT3 (t))e - ZT1 (t )- ZT3 (t )

- ( zT1 (t) + zT2 (t) + zT3 (t))e - ZT1 - ZT2 (t )- ZT3 (t )

and the hazard function is the ratio of those two expressions. If all three
components have Weibull life distributions with θ1 = 10, β1 = 1.25, θ 2 = θ 3 = 8,
and β2 = β 3 = 2.25, the system-level hazard function is shown in Figure 4.6.
Finally, before leaving the discussion of life distributions, let us con-
sider the standby redundant component configuration. As mentioned in
Chapter 2, this is a system structure that does not fall within the set of
structural forms enumerated and is one that is used frequently. The system

0.25

0.2

0.15
zT (t)

0.1

0.05

2 4 6 8 10
Time

FIGURE 4.6
System-level hazard function.
Reliability over Time 57

FIGURE 4.7
Standby redundant system.

configuration has several possible realizations. Assuming that the system


structure is represented by Figure 4.7, there are two components in parallel
but only one is performing the system task at a time. Component 1 operates
until it fails at the time the switch instantaneously activates component 2.
Then component 2 operates until it fails at which time the system is failed.
We may consider that the switch is perfect and always makes the change in
components, or we may allow for the possibility that with probability, say,
p, the switch functions successfully and that with probability (1 − p), the
switch fails and the system is therefore failed. We may also consider that
component 2 is either “warm” or “cold” while waiting to be activated. If we
say that it is “warm,” we imply that it is aging in some manner, while the
statement that it is “cold” implies that component 2 is new when the switch
moves to initiate its operation.
Now, the key quantity to be determined is the distribution on the length of
time the system operates before failure. That distribution is

ò
FTS (t) = (1 - p)FT1 (t) + p fT1 (u)FT2 (t - u)du
0
(4.30)

The rationale for this construction is that the system survives as long as the
first component if the switch fails and as long as the sum of the failure times
of the two components if the switch functions properly. As a special case of
this model, setting p = 1 represents the case of the perfect switch. One may
also observe that for this model, if FT1 (t) and FT2 (t) are both IFR, then the sys-
tem life distribution is also IFR.
To conclude this chapter, it is appropriate to note that there are many
other models that have been defined for particular types of equipment with
specific types of operating profiles. The models discussed in this chapter
represent the majority but not all of the useful models.
58 Reliability Engineering: Probabilistic Models and Maintenance Methods

Exercises
4.1 Show that an IFR distribution can be characterized as one for which
- ln FT (t) is convex and a DFR distribution can be characterized as one
for which - ln FT (t) is concave.
4.2 Construct the life distribution for a component having hazard function
zT(t) = 1.5t.
4.3 Construct an algebraic expression for the hazard function for a gamma
distribution having shape parameter β = 3.0.
4.4 Plot the hazard function for a gamma distribution having β = 3.0 and
λ = 0.02.
4.5 Show that an exponential life distribution has FT (t1 + t2 ) = FT (t1 )FT (t2 ).
4.6 Compute the values of the reliability function and the hazard function at
t = 2500 hours for a Weibull distribution having β = 1.8 and θ = 4000 hours.
4.7 Plot the density function for a Weibull distribution having β = 3.2 and
θ = 500 hours.
4.8 Determine the expression for the mean of the Weibull distribution.
4.9 Determine the expression for the mean of the gamma distribution.
4.10 Compute the values of the reliability function and the hazard function
at t = 6000 hours for a gamma distribution having β = 3.0 and λ = 0.004
hours.
4.11 Use the numerical approximation of Appendix A to compute Γ(7.45).
Compare the value you obtain to Γ(7) = 6! and Γ(8) = 7!.
4.12 Compute the value of the reliability function at t = 6000 hours for a
gamma distribution having β = 2.75 and λ = 0.004 hours.
4.13 Compute the value of the reliability function at 1500 hours for a device
having a normal life distribution with μ = 2000 hours and σ = 300 hours.
4.14 Consider a device for which the life length is well modeled by a log-
normal distribution having a mean life of 150 hours and a standard
deviation in life length of 4.5 hours. Compute the reliability for this
device at 20.0 hours. Then, plot the hazard function for the device.
Finally, characterize the point at which the hazard function changes
from increasing to decreasing.
4.15 Use the approximations in Appendix A to compute the reliability at
4500 hours of a component for which the life distribution is normal
with μ = 4000.0 and σ = 250.0.
4.16 Construct an algebraic expression for the mean life of a parallel system
of two independent components, each of which has an exponential life
distribution.
Reliability over Time 59

4.17 Construct an algebraic expression for the reliability function and the
system hazard function for a two-out-of-three system comprising
identical components each having an exponential life distribution. Plot
the hazard function for the case in which λ = 0.05.
4.18 Construct the algebraic expression for the system-level hazard func-
tions for the structure:

1 2

4.19 Construct algebraic expressions for the reliability and hazard functions
for the following two systems. Compare several of the values of these
functions for the case in which all components have Weibull life distri-
butions and in both structures β1 = β2 = 2.5, θ1 = θ2 = 100, β3 = β4 = 1.5,
and θ3 = θ4 = 200.

1 3 1 3

2 4 2 4

4.20 In general, for a distribution function FT(t), tγ is the γth quantile of the
distribution if FT(tγ) = γ. Determine the 0.90 quantile and the 0.99 quan-
tile for the Weibull distribution having β = 1.5 and θ = 20,000 and for the
exponential distribution having λ = 0.001.
4.21 Analyze the standby redundant system for two identical exponential
components having λ = 0.001 for the cases of p = 0.50, 0.80, and 1.0. Plot
the resulting distribution function and its hazard function.
4.22 Consider a set of n independent spot lamps that are installed to illu-
minate a parking lot. Suppose the life length of each lamp is well
modeled by Weibull distribution having β = 1.8 and θ = 450 hours.
How many lamps must be installed in order to have a probability of
at least 0.99 that at least one lamp is still functioning after 600 hours
of operation?
4.23 Consider a parallel system of two independent but unequal compo-
nents, each of which has an exponential life distribution. Construct
60 Reliability Engineering: Probabilistic Models and Maintenance Methods

expressions for the reliability function and the hazard function for this
system. If the components have λ1 = 0.005 and λ2 = 0.004, compute the
system reliability at 250 hours.
4.24 Consider the two-unit standby system shown in Figure 4.7. Suppose
the switch is perfect but the two components are independent and
unequal. If the two components have exponential life distributions
with λ1 = 0.008 and λ2 = 0.003, compute the system reliability at 400 and
at 750 hours.
4.25 Consider a three-independent-component standby system for which
the switch is perfect. Construct a general model for the life length of
this system. Suppose the three components are identical and have
exponential life distributions with λ = 0.0005. Compute the system reli-
ability at 1500 and 8000 hours.
4.26 Consider a three-independent-component standby system for which
the switch is imperfect and has a probability of p of proper function
each time it is activated. Construct a general model for the life length
of this system. Suppose the three components are identical and have
exponential life distributions with λ = 0.0004 and p = 0.98. Compute the
system reliability at 2500 hours.
4.27 Suppose the previous system was designed simply as a three-unit par-
allel system. Compute the system reliability at 2500 hours. For what
values of the switch reliability, p, is the parallel system more reliable at
2500 hours than the standby system?
4.28 (Extension) Consider a shared load system of two identical components.
As long as both components are functioning, each has an exponential
life distribution with λ = 0.0025. When one of the components fails, the
remaining component must support the load initially carried by both
so its hazard increases to λ = 0.0075. Construct a model for the life dis-
tribution of this system and compute its reliability at 500 hours.
5
Failure Processes

The fourth constituent of the definition of the reliability of equipment


is the environment for which it has been designed. In saying “environ-
ment,” we are really referring to the forces that are imposed on a device
and the processes that lead to its deterioration and failure. We start with
the following question: Why does equipment fail? Many people agree
that equipment fails because specific components fail and components
fail because the operation of a system implies the application of forces
(energy) upon the system and its components. Sudden and/or excessive
forces may precipitate immediate failure, while usual forces induce and
sustain the progress of various types of deterioration processes under
which the component eventually can no longer resist the force levels it
encounters.
While the preceding explanation seems reasonable, there has never really
been any consensus on the causes of failure and how they should be mod-
eled. This situation persists today. A substantial portion of the scientific and
engineering effort that has constituted the evolution of the reliability disci-
pline has been focused on the study and modeling of component degrada-
tion processes. Numerous theories and models have been proposed and very
many heated debates have centered on the various points of view concerning
the nature of failure processes.
In the pages that follow here, I will enumerate several of the principal
models and identify some of the points of contention. I will try to suggest
reasonable choices among the models and will finally offer my view on
how failure models should be defined. A key feature of the models that
have been proposed is that they are usually defined separately for mechan-
ical systems as opposed to electrical/electronic systems. That is, models
of failure have traditionally been developed from either a mechanical or
an electrical (electronic) perspective. Reliability of mechanical equipment
has often been viewed as a problem in structural integrity as influenced
by applied loads and inherent strength. In contrast, the reliability of elec-
trical devices has usually been viewed as dependent upon material stabil-
ity despite exposure to hostile chemical reactions such as oxidation. It is
only recently that some analysts have suggested that both types of reli-
ability result from common classes of phenomena. It is my view that this

61
62 Reliability Engineering: Probabilistic Models and Maintenance Methods

distinction is unnecessary as the failure processes for mechanical and for


electrical devices are essentially the same. I believe that they correspond
to the progress and completion of physicochemical reactions that occur
in the materials that comprise devices and that are driven by the transfer
of energy within those materials. Recently, I have articulated this view
formally [31] with typically mixed response and I will include my discus-
sion later in this chapter.

5.1 Mechanical Failure Models


5.1.1 Stress–Strength Interference
An early and still popular representation of mechanical device reliability is
the “stress–strength interference” model. Under this model, there is random
dispersion in the stress, Y, that results from applied loads. The dispersion
in the stress realized can be modeled by a distribution function, say, H Y(y).
Similarly, there is also random dispersion in inherent device strength, X,
and this can be modeled by GX(x). Then, device reliability corresponds to the
event that strength exceeds stress. That is,

¥ ¥ ¥ x

F = Pr[X > Y ] =
ò ò h (y)g (x)dxdy = ò ò h (y)g (x)dydx
-¥ y
Y X
-¥ -¥
Y X (5.1)

for which an equivalent statement is

¥ ¥

F = Pr[X > Y ] =
ò

hY ( y ) ( 1 - GX ( y ) ) dy =
ò H (x)g (x)dx

Y X (5.2)

The failure probability is the complement of the reliability. This time-


independent model has been studied extensively. Kapur and Lamberson [32]
provide solutions for several different choices of G and H.
Since Equation 5.2 is independent of time, the definition of a time-based
reliability model centers on selecting the distributions G and H and on repre-
senting the time evolution of those distributions. One popular model of this
type is based on the assumption that stress is normal in distribution with
constant mean and variance while strength is also normal but with declin-
ing mean and increasing variance. The idea is that the aging of a device
Failure Processes 63

results in a gradual decrease in mean strength and a gradual increase in the


inconsistency (variability) in strength. In this case,
¥
æ y - my öæ æ y - m x (t) ö ö
FT (t) = Pr[X(t) > Y ] =
òf çç
-¥ è
sy
÷÷ çç 1 - F ç
øè è s x (t) ø ø
÷ ÷÷ dy

¥
æ x - m y ö æ x - m x (t) ö
=
ò F ççè

÷fç
s y ÷ø è s x (t) ø
÷ dx (5.3)

where
μy and σy are the constant parameters of the normal stress distribution
μx(t) and σx(t) are the time-dependent parameters of the strength distribution
ϕ denotes the standard normal density
Φ represents the cumulative distribution for the standard normal

Clearly, the time evolution of the parameters of the strength distribution


may be assigned any plausible form. For example, if the mean declines lin-
early in time and the standard deviation increases linearly in time, possible
functions are
m X (t) = m X (0) - at and sX (t) = sX (0) + bt

Consider an example realization of this model. Suppose the distribution


on imposed stress has μy = 100 N/cm2 and σy = 2 N/cm2 and the distribution on
device strength has μx(0) = 102.5 N/cm2 and σx(0) = 1 N/cm2. Then the basic con-
cept of the stress–strength interference model is well illustrated in Figure 5.1.

0.4

0.3
Pr

0.2

0.1

97.5 100 102.5 105 107.5 110


Time

FIGURE 5.1
Basic stress–strength interference model.
64 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.3
110
0.2
Pr

0.1
0 105

th
ng
0

e
Str
200 100

Tim
e 400
95

FIGURE 5.2
Time evolution of the strength distribution.

Note that the distribution on stress lies below that on strength so that, in general,
the strength values will probably exceed the stress values.
Next, suppose that the “decay” parameters are α = 0.005 and β = 0.002.
With these values, the gradual deterioration of the device strength is rep-
resented by a gradual change in the center and the width of the strength
distribution. This is illustrated in Figure 5.2. At each point in time, the cor-
responding “slice” of the distribution corresponds to the strength distribu-
tion at that time.
Consider the interference at time values of t = 250 and 500. The cor-
responding plots are shown in Figure 5.3. Clearly, as the mean of the
strength distribution declines, the probability that strength exceeds stress
also decreases. Since both the strength and the stress distributions have
been assumed to be normal, the values of realizations of the distribution
and for the reliability must be computed numerically. Table 5.1 later lists
values of the reliability function of Equation 5.3 for the defined numerical
example.
A plot of the reliability function over the same range is shown in Figure 5.4.
Once the numerical analysis of the model has been performed, one can fit
a distribution to the reliability function as desired. For example, the function
represented in Figure 5.4 (Table 5.1) is well represented by a Weibull distribu-
tion with parameters β = 1.083 and θ = 1204.

5.1.2 Shock and Cumulative Damage


An alternate and more widely used set of models are the shock and the cumu-
lative damage models. An appealing description of the simplest shock model
Failure Processes 65

0.25

0.2

0.15
Pr

0.1

0.05

97.5 100 102.5 105 107.5 110


(a) Time

0.2

0.15

0.1
Pr

0.05

97.5 100 102.5 105 107.5 110


(b) Time

FIGURE 5.3
(a) Interference at t = 250. (b) Interference at t = 500.

TABLE 5.1
Example of Reliability Function Values
Time 100 200 500 750 1000 1500 2000 2500 3000 4000 5000
Reliability 0.933 0.890 0.702 0.531 0.391 0.217 0.132 0.088 0.064 0.039 0.028

is offered by Gertsbakh and Kordonskiy [33], who was one of the pioneers in
the development of the reliability theory in Europe. Gertsbakh suggests that
we consider a sequence of equipment actuations (or events), each of which
imposes a stress on a component of interest. As long as the stress is below a
threshold, the component does not fail and the first time the stress exceeds
the threshold, the component fails. As an example, he considers that each
66 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.8

0.6
Reliability

0.4

0.2

500 1000 1500 2000 2500


Time

FIGURE 5.4
Example Reliability Function for the Interference Model.

time an airplane lands, the landing imposes a gravitational force on the com-
munications radio and that variations in weather conditions and pilot skill
imply substantial variation in the loads experienced. When a sufficiently
severe load occurs, the radio is damaged and fails. The life of the radio may
thus be measured in the numbers of landings, and if γ is the probability that
failure occurs on any landing, the distribution on life length is geometric, so
k
FK (k ) = g å (1 - g )
n =1
n -1
= 1 - (1 - g )
k
(5.4)

gives the cumulative probability of failure on or before the kth landing. One
may then argue that for any individual landing, γ is quite small, so

(1 - g )
k
» e - kg

Or alternately, one may observe that


1
E[K ] =
g
so that the approximation
-k
E[ K ]
FK (k ) » e

is reasonable. In either case, one obtains a life distribution model in terms of


the number of cycles or actuations to failure that is exponential.

FK (k ) ≃ 1 - e -gk (5.5)
Failure Processes 67

Naturally, if we assume an average time between landings (or actuations),


the model is easily converted to one expressed in terms of time. Of course,
the designation and interpretation of a “time” scale are always somewhat
arbitrary and can be defined according to the application.
Gertsbakh’s example is informative in two ways. One is that it illustrates
the fact that reliability models have usually been based on an engineering
analysis of observed physical behavior. The second is that the example helps
to emphasize the evolution that has occurred in modeling failure as engi-
neers have made the models more and more representative of the actual
operating behavior. Starting from the earlier model, the transition across the
cumulative damage models to the diffusion models shows this evolution.
The early development of cumulative damage models was performed
by Birnbaum and Saunders [28] in their study of fatigue in aircraft and by
Karlin [34]. The elementary form of the model starts with the assumption that
a device is subjected to “shocks” that occur randomly in time. Each shock
imparts a random quantity, say, Yi, of damage to the device that fails when
a capacity or endurance threshold is exceeded. The most common realiza-
tion of this model includes the assumption that the shocks occur according
to a Poisson process with intensity λ and the amounts of damage per shock
are independently and identically distributed according to some arbitrarily
selected common distribution, say, H. If FT (t) represents the reliability over
time and K is the number of shocks that occur over the interval [0, t], the reli-
ability function based on a threshold of L is

¥
( lt )
k

FT (t) = å
k =0
e - lt
k!
HY( k ) (L) (5.6)

Note that the sum is taken over all possible numbers of shocks and the nota-
tion HY( k ) ( y ) represents the k-fold convolution of H Y(y) and thus the sum of k
damage magnitudes that total Y. Thus, Equation 5.6 represents the probabil-
ity that k shocks occur and their sum does not exceed the strength/damage
threshold L—summed over all possible values of k. By convention, HY( 0 ) ( y ) = 1
for all values of y ≥ 0 and, of course, HY(1) ( y ) = HY ( y ). Parenthetically, inde-
pendent of the choice of distribution H Y(y) for modeling shock magnitudes,
Equation 5.6 will correspond to an increasing failure rate on average (IFRA)
distribution.
Consider an example. Assume that a device has an endurance threshold
L  =  μx(0) = 102.5 N/cm2 as in the case of the stress–strength interference
model. Let H Y(y) be a normal distribution with μY = 50  N/cm2 and σY =
1 N/cm2 and assume λ = 0.004/h. Solving Equation 5.6 as a function of time
yields the reliability curve shown in Figure 5.5.
Note the conceptual duality of the cumulative damage model and the
stress–strength interference model. In the stress–strength interference
68 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.8
Reliability

0.6

0.4

0.2

500 1000 1500 2000 2500


Time

FIGURE 5.5
Reliability function for the cumulative damage model.

model, we treated stress as constant and strength as variable. In the basic


cumulative damage model, the strength (damage threshold) is the constant
quantity and the stress (damage) is variable.
A special case of the cumulative damage model was suggested by
Gertsbakh and Kordonskiy [33]. He considered that shocks occurred accord-
ing to a Poisson process and were always of the same fixed magnitude. In
this case, the model reduces to that of a birth process with a termination
barrier and the life distribution is a gamma distribution. Also, Nachlas [35]
generalized Gertsbakh’s model by treating the damage rate as state depen-
dent and the initial and failure states as random. The result is that the life
distribution model becomes a generalized gamma distribution. In the gener-
alized model, the state-dependent arrival rate is defined by the polynomial

l j (t) = j namt m-1 (5.7)

for a component in state j. The state variable represents the degree of deg-
radation of the device. For a failure threshold of state k and a random initial
state, say, i, the time to failure is described by

æ ö
æ k -1
ö k -1 ç 1 k -1 ÷ n m
FT (t|i , k ) = 1 - ç
ç
è
Õ å Õ
l =i
l ÷
n
÷
ø r =i
ç n
çr l =i
( l - r ) ÷ e - r at
n n

÷
(5.8)
ç l¹r ÷
è ø

This model is further enhanced by taking a Poisson mixture over the distance
(k − i) between the initial and final state.
Failure Processes 69

Subsequent to its initial development, the shock model has been studied
extensively and enhanced in several ways. One of the first extensions is to
permit successive shocks to cause increasing amounts of damage. That is,
we continue to assume that the shocks occur according to a Poisson pro-
cess and that the damage caused by the shocks are independent, but succes-
sive shocks are increasingly harmful, so the damage associated with the ith
shock, Yi, has distribution HYi ( y ) with the values of the HYi ( y ) decreasing in i.
With this assumption, the survival function becomes

( lt )
¥ k

FT (t) = å
k =0
e - lt
k!
HY0 * HY1 * ⋯ * HYk (L) (5.9)

where HY0 * HY1 * ⋯ * HYk (L) denotes the convolution of the distributions.
Further extensions include (1) eliminating the assumption of indepen-
dence of the successive shocks, (2) allowing attenuation of damage (partial
healing) between shock events, and (3) treating the failure threshold as a
decreasing function of time or damage. In each case, the basic model format
is the same and the resulting life distribution is IFRA.
The cumulative damage models have been viewed as very useful and
have been applied to some very important problems. Perhaps the most
noteworthy application was the study by Birnbaum and Saunders [28] of
fatigue failures in aircraft fuselage. Their model is specific to fatigue failure
but has been used to define reinforcement designs for aircraft structures.
The original model for fatigue failure was defined by Miner [36] and is
known as Miner’s rule. The model is based on the idea that stress is applied
cyclically and that the magnitude of the imposed stress in each cycle falls
within one of a set of K possible intervals. Miner’s rule states that fatigue
failure will occur when

K
nk
åN
k =1 k
=1 (5.10)

where
nk is the number of cycles in which the stress falls in the kth interval
Nk is the number of cycles that would cause failure if all stresses occurred
in the kth interval. Miner’s rule is deterministic

Birnbaum and Saunders made the model stochastic by assuming that


each stress imposed a random increase, Xn, in the accumulated damage to
a device. The damage increase is assumed to have a mean value μ and stan-
dard deviation σ and successive increases are assumed to be independent.
Then if Yn represents the sum of the damage increments over n cycles, failure
70 Reliability Engineering: Probabilistic Models and Maintenance Methods

occurs when Yn exceeds the failure threshold. The probability on the number
of cycles to failure is

é n
ù é n
X i - m L - nm ù
Pr[N £ n] = Pr[Yn ³ L] = 1 - Pr ê
êë
å
i =1
Xi £ L ú = 1 - Pr ê
úû êë
ås
i =1
n
£
s n úû
ú (5.11)

and for large values of the number of cycles, application of the central limit
theorem implies that the distribution is normal with mean value nμ and vari-
ance nσ2 so
æ L - nm ö
Pr[N £ n] » F ç ÷ (5.12)
è s n ø

0.4

0.3
F

0.2
α = 1.25
0.1
α = 0.75

1 2 3 4 5
(a) Time

α = 1.25

3 α = 0.75
z

1 2 3 4 5
(b) Time

FIGURE 5.6
(a) Density functions for the Birnbaum–Saunders distribution having a scale parameter 2.5.
(b) Hazard functions for the Birnbaum–Saunders distribution having a scale parameter 2.5.
Failure Processes 71

Replacing the discrete number of cycles, N, by time, T, and making the


parameter transformations a = s and l = m yield the final form of the
mL L
Birnbaum–Saunders distribution:

æ1æ 1 öö
FT (t) = F ç ç lt - ÷÷ (5.13)
è è
a lt ø ø

For this distribution, α is the shape parameter and λ is the scale parameter.
Although the distribution looks cumbersome, it is easily analyzed as it is
defined as a standard normal probability. Figure 5.6a shows examples of the
life density and Figure 5.6b shows the corresponding examples of the hazard
function.

5.2 Electronic Failure Models


To a large extent, models of electrical and electronic device reliability are
motivated by empirical observations and were developed subsequent to the
mechanical reliability models. The rapid evolution of electronic and espe-
cially microelectronic devices provided considerable impetus to the devel-
opment of reliability models for electronic devices. In addition, because the
life lengths of early electronic devices were often considerably shorter than
those of mechanical devices, electronic device reliability received substantial
attention.

5.2.1 Arrhenius Model


Most of the models developed are based on the idea that electronic device
degradation processes are essentially chemical conversion reactions within
the materials that comprise the devices. Consequently, many of the models
are based on the Arrhenius [37] reaction rate equation that is named after
the nineteenth-century chemist who developed the equation in the study of
irreversible reactions such as oxidation. The basic form of the equation is that
reaction rate, ρ, is given by

r = he - Ea /KT (5.14)

where
η is an electron frequency factor
K is Boltzmann’s constant (8.623 × 10 –5 eV/K)
T is the temperature in degrees Kelvin
Ea is Gibb’s free energy of activation
72 Reliability Engineering: Probabilistic Models and Maintenance Methods

For reliability modeling, the product of the reaction rate, ρ, and time yields
the extent of the progress of the deterioration reaction and is therefore con-
sidered to correspond to the cumulative hazard at any time. For example, for
the Weibull life distribution,
b
æ rt ö
ZT (t) = ç ÷ (5.15)
è qø

and it may be further noted that alternate rate functions may be (and some-
times are) used but are applied in the same manner. Until recently, the
scientific and algebraic links between rate functions such as Equation 5.14
and life distributions and hazard functions have not been widely studied.
Nevertheless, the implied models such as Equation 5.15 are used extensively
and have proven to be quite accurate.

5.2.2 Eyring Model


A more general model was suggested by Krausz and Eyring [38]. In their
model, the rate of progress of the deterioration reaction depends upon both
temperature and voltage, so

r = he - Ea /KT e - g1V - g 2V/KT (5.16)

where γ1 and γ2 also represent activation energies. Note that in this model,
Krausz and Eyring allow for the possibility that, in addition to their direct
effects, temperature and voltage have a synergetic (combination) effect on
reaction rate. For this model, we again incorporate the rate function in the
cumulative hazard function to obtain a life distribution.

5.2.3 Power Law Model


An alternate model for the effect of voltage is provided by the power law
model. In this case, it is assumed that the mean life length is proportional
to the (reciprocal of the) voltage at which a device is operated raised to an
appropriate power. That is,
c
E[T ] = (5.17)
Vp
Clearly, the two parameters of this model provide sufficient flexibility that
any observed value of the mean life can be matched by the model.

5.2.4 Defect Model


For the specific case of integrated circuits, Stevenson and Nachlas [39] sug-
gest that metallization lines contain both macroscopic and microscopic flaws
Failure Processes 73

and that these two flaw types both contribute to device failure. Their basic
premise is that the flaws correspond to material impurities or crystal lattice
anomalies that serve as reactive sites for degradation reactions such as metal
migration and oxidation. Based on this basic premise, they show that pub-
lished microelectronic failure data are reasonably well described by either
the Weibull or the IDB life distributions depending upon the relative magni-
tudes of the initial concentrations of the two defect types.

5.3 Other Failure Models


In recent years, some more useful and realistic, or at least more general,
models have been proposed. Of these, the proportional hazards models [40]
have been widely used and the diffusion models have not yet been widely
adopted but should be in the future. Another more general model that has
been used in actuarial studies for a long time is the competing risk model.
This model has not yet received much attention from the reliability commu-
nity, but it soon will. These three model forms are examined next.

5.3.1 Diffusion Process Model


The diffusion process models may be viewed as a generalization of the
cumulative damage models. They seem to apply well to mechanical failure
processes, but, as is discussed later in this section, they are certainly not lim-
ited to mechanical components and actually apply equally well to electronic
devices. The elementary idea for the models is that damage to a device accu-
mulates in a manner that is well represented by a diffusion process. The state
of a device over time is represented by Xt, and if successive changes in this
state variable conform to a gamma process [41], then one obtains
dXt = aXt dg(t) + bXt (5.18)

where
α and β are constants based on the material constituents of the device
γ(t) represents the gamma process in damage increments

Alternately, if it is assumed that state variable changes conform to a Poisson


process [41], one obtains
dXt = l(t)Xt + sXt dW (t) (5.19)

where
λ(t) is the Poisson rate function
σ is a constant
W(t) is a standard Wiener process
74 Reliability Engineering: Probabilistic Models and Maintenance Methods

More generally, one may let p(x,t) represent the density on the state variable
and use a Markov process to obtain

¶ ¶ ¶2
p( x , t) = - ( m( x , t)p( x , t) ) + 2 ( D( x , t)p( x , t) ) (5.20)
¶t ¶x ¶x

in which the first term represents drift and the second represents diffu-
sion. Irrespective of the definition of the constituent terms in this equa-
tion, it reduces to the Fokker–Planck equation, which is also known as the
Chapman–Kolmogorov backward differential equation [41]:

dXt = m(Xt , t) + s(Xt , t)dWt (5.21)

In all three of the earlier formulations, the damage accumulates gradually


with device use. More recently, Kharoufeh and Cox [42] incorporate the ini-
tial idea of Esary et al. [43] that shocks arriving according to a Poisson pro-
cess cause a traumatic increase in damage. Under this concept, the system
state change may be represented as

dXt = a(Xt ) + l(t)dXt + sXt dW (t) (5.22)

where the first term represents gradual damage growth, the second term
represents the traumatic increments that occur as a result of the shocks, and
the third term captures the random variation in the damage growth.
The best representation of all of the earlier models is provided in terms of
the survival function by Lemoine and Wenocur [44] as

¶FT ( x , t) ¶F ( x , t) s2 (t) ¶ 2 FT ( x , t)
= -k( x)FT ( x , t) + m(t) T + (5.23)
¶t ¶x 2 ¶x 2

where the interpretation of FT ( x , t) is that it is the joint probability that a com-


ponent having wear level x survives beyond time t.
The model indicates that the accumulation of age or stress or the gradual
reduction in strength of a component may be modeled as a diffusion pro-
cess. In the absence of other phenomena, the diffusion process represents
the deterioration of the component, and the first passage distribution of
the process represents the failure time. They then add the possibility of
sudden catastrophic events occurring in time according to an arbitrary and
selectable rule. In this case, the component ages gradually and fails accord-
ing to the first passage time distribution unless the component encounters
a sufficiently severe shock to cause it to fail “early.” A familiar example
is an automobile tire that gradually wears and ultimately is worn out.
Failure Processes 75

As it wears, the tire has reduced tread or strength and thus less resistance
to randomly occurring road hazards, some (but not all) of which are fatal
to the tire. A conceptually equivalent process is the gradual deterioration
of an incandescent light bulb that will also fail suddenly if subjected to a
voltage surge.
In Equation 5.23, the first term represents the “killing process” associated
with shocks, the second captures the drift in the diffusion process, and the
third term represents the randomness in the damage accumulation diffusion
process. The randomness may result from material variations, ambient oper-
ating conditions, intensity of use, or simple entropy in energy application to
the device.
The general model that Lemoine and Wenocur define can be quite dif-
ficult to analyze or evaluate computationally. However, this rather compli-
cated model is actually quite rich. As they point out, the practically useful
model realizations are generally tractable. In fact, appropriate choices of
initial conditions and of functions μ(t) and σ2(t) lead to a wide variety of
useful and conceptually reasonable specific models. For example, assum-
ing σ(t) = 0 and that the wear tolerance is infinite implies that k(x) cor-
responds to the Weibull hazard function. For that model, setting β = 1 to
obtain an exponential distribution corresponds to the case in which the
wear rate is zero. On the other hand, taking k(x) = x and X(t) = a + bt yields
the Rayleigh distribution. Other choices yield normal, gamma, inverse nor-
mal, and extreme value life distributions. Thus, as Lemoine and Wenocur
point out, selecting the mean value and diffusion functions and the killing
function to represent specific types of degradation processes leads to quite
reasonable reliability models.
As indicated, analysis of Equation 5.23 can be quite challenging. In
fact, the solution of stochastic differential equations is a topic of ongoing
study. Nevertheless, for the simplest diffusion processes, an algorithm for
obtaining numerical solutions has been defined. For Equation 5.23, identify
the discretized approximations to the derivative terms as

¶FT ( x , t) F(i , j + 1) - F(i , j)


®
¶t Dt

¶FT ( x , t) F(i + 1, j) - F(i - 1, j)


®
¶x 2Dx

¶ 2 FT ( x , t) F(i + 1, j) - 2 F(i, j) + F(i - 1, j)


®
¶x 2 (Dx)2

and the corresponding approximation to the first term as


-k( x)FT ( x , t) ® -k(i)F(i , j)
76 Reliability Engineering: Probabilistic Models and Maintenance Methods

Then, the discretized form of the equation is

F(i , j + 1) - F(i , j) æ F(i + 1, j) - F(i - 1, j) ö


= - k ( j )F (i , j ) + a 1 ç ÷
Dt è 2Dx ø
æ F(i + 1, j) - 2F(i , j) + F(i - 1, j) ö
+ a2 ç ÷ (5.24)
è (Dx)2 ø

where α1 and α2 are placeholders for the drift and diffusion coefficients. Then,
grouping all of the like survivor function terms, rewrite the expression as

æ -a Dt a Dt ö æ 2a Dt ö
F(i , j + 1) = ç 1 + 2 2 ÷ F(i - 1, j) + ç 1 - k( j) - 2 2 ÷ F(i , j)
è 2Dx (Dx) ø è (Dx) ø

æ a Dt a Dt ö
+ ç 1 + 2 2 ÷ F(i + 1, j)
è 2Dx (Dx) ø

This recursive form of the equation can be evaluated numerically. To do so,


the killing function and the boundary conditions must be specified and val-
ues for the coefficients must be set. For example, assume that k(x) = λx2 and
we take λ = 0.1, μ = 0.05 and σ = 0.5. Take the boundary condition on the time
variable to be FT ( x , 0) = 1 and suppose that the boundary conditions on the
state variable are that for a ≤ x ≤ b, FT ( a, t) = 0, and FT (b , t) = 0.
For this problem definition, numerical analysis of the survivor function
yields the surface shown in Figure 5.7a. In contrast, if the state variable
boundary conditions are changed to FT ( a, t) = 1 and FT (b , t) = 0, the surface
obtained is shown in Figure 5.7b. Thus, it is possible to obtain approximate
solutions to Equation 5.23 for some model realizations.
A further illustration of the utility of the diffusion model is provided by
Figures 5.8 and 5.9. In the first of these, the state variable is the operating
temperature of a machine, while in the second one, the state variable is the
extent of a crack propagation on a road surface. One can see that the choice of
boundary conditions has a strong influence on the survivor function.
It is appropriate to reiterate the fact that Lemoine and Wenocur show how
selecting the drift and diffusion coefficients appropriately leads to survivor
function solutions in the form of commonly used life distribution models.
One further point concerning the diffusion models is the fact that the exact
same modeling format is widely used to describe the progress of chemical
reactions. Nachlas and Beqari [31] have summarized the general format for
these models and have argued that the applicability of the diffusion model
to both mechanical and chemical degradation processes implies that the tra-
ditional separation of mechanical and electronic reliability models should be
abandoned. It is time to use this common model form as a basis for identify-
ing a unified theory of failure.
Failure Processes 77

1.0

0.5
4

0.0
0

x
2
2
t
4
(a) 0

2 t

1.0

0.5

0.0

2 x

(b) 0

FIGURE 5.7
(a) Example of a solution of the diffusion model for the state boundary conditions FT ( a, t) = 0
and FT (b , t) = 0. (b) Example of a solution of the diffusion model for the state boundary condi-
tions FT ( a, t) = 1 and FT (b , t) = 0.
78 Reliability Engineering: Probabilistic Models and Maintenance Methods

Close to optimal operating temperature—


survival probability high

1.0 Above optimal


Below optimal
operating temperature—
operating
0.8 survival probability
temperature—
very low
survival

Fbar(x,t)
0.6
probability low
0.4

0 0.2
1
2 0.0
Ti 3 5
m 4
e 4 3
2
5 1 X
0

FIGURE 5.8
Example of a solution of the diffusion model when the state boundary condition applies to a
minimum value.

Crack propagation large—


Crack propagation near zero—
survival probability low
survival probability very high

1.0

0.8
Fbar(x,t)

0.6

0.4

0.2
0
1 0.0
2 5
4
Tim 3 3
e 4 2
1 X
5 0

FIGURE 5.9
Example of a solution of the diffusion model when the state boundary condition applies to a
maximum value.

5.3.2 Proportional Hazards


A relatively unified representation of the relationship between environment
and failure processes is provided by the proportional hazards model, which was
originally proposed by Cox and Oakes [45]. There are several forms in which
this model may be stated. A general form that is conceptually appealing is

zT (t) = y( x )z0 (t) (5.25)


Failure Processes 79

The interpretation of the equation is that the device has a base hazard func-
tion, z0(t), that represents the core dispersion in life length, and this hazard is
increased by a function, ψ, of the vector of variables, x, that describe the spe-
cific operating environment in which the device is used. The two commonly
I
used forms for the function y( x ) are a linear function y( x ) = b0 +
b0 + å bi xi
I
i =1 å
bi xi
and an exponential form y( x ) = e i =1
where the second of these is often
preferred.
There are actually two approaches to using the proportional haz-
ards model. One is to identify the vector x as a set of covariates or explan-
atory variables and to view the model as a statistical basis for fitting a life
distribution to observed failure data. An alternate approach is to view the
vector x as a description of the operating environment and to treat the func-
tion ψ as a description of the effect of operating conditions on the failure
frequency. Practically, the two views are essentially the same as the function
ψ(x) is usually determined by regression analysis on accumulated failure
data. On the other hand, when the failure process for a device is understood,
the proportional hazards model in which the function ψ represents that fail-
ure process is very appropriate.
Observe that under the basic proportional hazards model, the shape of the
hazard function is preserved. The environmental effects on the hazard are
essentially additive (directly or as logarithms) and the cumulative hazard
function has the following comparable form:

t t

ò
0
ò
ZT (t) = y( x )z0 (u)du = y( x ) z0 (u)du = y( x )Z0 (t)
0
(5.26)

Thus, the reliability function can be stated as an exponential function of the


environmental variables:

( )
y( x )
FT (t) = e - y( x )Z0 (t ) = e - Z0 (t ) (5.27)

The advantages of this model are that it applies to any assumed base life
distribution, appears to be equally appropriate for mechanical and elec-
tronic devices, and allows for the very specific representation of a general
and arbitrary set of environmental effects. Conceptually, the chief drawback
of the model is that it may not directly portray the mechanisms of failure.
Practically, the drawback of the model is that substantial volumes of data are
required to obtain satisfactory regression models for ψ(x). Enhancements to
the model have been based on allowing some of the environmental variables
to be time dependent.
As an example of the proportional hazards model, consider a satellite-
based electronic device for which the gamma distribution with β = 2.0 and
80 Reliability Engineering: Probabilistic Models and Maintenance Methods

λ = 0.00025 provides a reasonable life length model. Suppose that the damage
effects of radiation on the device are modeled by y( x ) = e 0.024+0.25 x1 +0.40 x2 where
the variables x represent the (normalized) particle spectral density and peak
energy intensity in the ambient particle streams. For these circuits, when
x = (0.80, 0.45), the base cumulative hazard at 5000 hours is ZT(5000) = 1.0021
and the corresponding reliability is FT (5000) = 0.644. The cumulative hazard
achieved while in service in orbit is ZT(5000) = 1.501 with a corresponding
reliability of FT (5000) = 0.223. Installation of a radiation shield that reduces
the operating environment to x = (0.30, 0.25) improves the 5000 in orbit reli-
ability to FT (5000) = 0.294.

5.3.3 Competing Risks


Competing risk models have been used for actuarial studies for a very long
time. Some people say their use started with Sir Thomas Bayes when he
attempted to compute the probabilities of death from various causes given
that one cause, tuberculosis, was eliminated. In any case, the concept of the
competing risk model is that a person or a device is subject to failure as a
result of the action of multiple failure (or deterioration or disease) processes
that are competing to be the cause of failure (death). In the case of human
disease processes, it is inappropriate to treat the various causes as indepen-
dent so the models are often quite complicated. This may be the reason that
the application of the competing risk models to equipment reliability has
been fairly limited.
The basic approach to constructing a competing risk model is to treat the
various risks as components of a series system. The system fails at the time
of the earliest component failure, which is the time that the first of the com-
peting risks reaches completion. Thus, if a device is subject to failure from K
causes, Ck, k = 1, …, K, and if the time at which cause Ck reaches completion is
tk, then the life length of the device is

T = min{tk }
k

The analysis of the competing risk model starts with a consideration


of the “net” probabilities that are essentially the marginal probabilities
for each risk—the life distribution assuming other risks are not present.
Assuming there exists a joint life distribution, the associated joint survi-
vor function is

FT1, T2 ,…,TK (t1 , t2 , … , tK ) = Pr[T1 ³ t1 , T2 ³ t2 , … , TK ³ tK ]


¥ ¥

ò ò
= ⋯ fT1 ,T2 ,…,TK (t1 , t2 , … , tK )dt1 ⋯ dtK
t1 tK
(5.28)
Failure Processes 81

so the marginal (net) survival function is

FTj (t j ) = FT1 ,T2 ,…,TK (0, 0,…, 0, t j , 0,…, 0) (5.29)

and the “net” life distribution may be stated as

FTj (t j ) = 1 - FT1 ,T2 ,…,TK (0, 0,…, 0, t j , 0,…, 0) (5.30)

This expression is conceptually informative but is not useful as failure, when


it occurs, may not be caused by risk j.
To analyze the failure behavior of a device that is subject to competing
risks, one needs the “crude” probabilities of failure, which are the probabili-
ties of failure from a specific risk when the other risks are present. Since the
device reliability function is

FT1 ,T2 ,…,TK (t1 , t2 ,…, tK ) = Pr[T1 > t1 , T2 > t2 ,…, TK > tK ] (5.31)

the probability that failure is caused by risk j is


¥

p j = Pr[ J = j] = -
ò0
¶t j
FT1 ,T2 ,…,TK (t1 , t2 , … , tK )
t1 = u ,…, tK = u
du (5.32)

The survivor function for the crude life distribution for each risk is

Pr[T > t , J = j] = Pr[Tj > t , Tj < Ti "i ¹ j]

¥æ¥ ¥ ö
= ç ⋯ fT1 ,T2 ,…,TK (t1 , t2 , … , tK )dt1dt2 ⋯ dt j -1dt j +1 ⋯ dtK ÷ dt j (5.33)
òò ò
çç
t è tj
÷÷
tj ø

so the overall survivor function is

K
FT (t) = Pr[T > t] = å Pr[T > t, J = j]
j =1
(5.34)

and at t = 0, each term in Equation 5.34 equals πj because Pr[T ≥ 0, J = j] =


Pr[J = j]. Using these relationships, the survivor function for the crude life
distribution is

Pr[T ³ t , J = j]
FT|j (t) = Pr[T ³ t| J = j] = (5.35)
pj
82 Reliability Engineering: Probabilistic Models and Maintenance Methods

Observe that the described analysis applies regardless of whether or not the
various risks are independent. Of course, the analysis depends upon having
an initial multivariate model of life length as a function of the various risks.
This presents a problem as it is difficult to justify any particular multivariate
model prior to observing failures and when failures are observed, they
correspond to observations of crude failures rather than net failures.
On the other hand, if the risks are independent, then the joint life
distribution is the product of the marginal life distributions and the model
analysis is quite manageable. For example, suppose a device is subject to
two failure risks for which the marginal survival probability functions are
) . Then F (t , t ) = e -2.4t1 -(t2 0.25) so
1.4
(
1.4
-2.4 t - t
FT1 (t) = e and FT2 (t) = e 0.25
T1 ,T2 1 2

¥ ¥
-2.4 u - ( u
¶ 0.25 )
1.4

p1 = -
ò
¶t1
0
FT1 ,T2 (t1 , t2 )
t1 = t2 = u
du = 2.4e
ò
0
du = 0.3808

Necessarily, π2 = 0.6192.
Next, to compute survival probabilities conditioned on the cause of failure,
use the following equation:

¥æ¥ ö

t è t1
òò
Pr[T > t , J = 1] = ç fT1 ,T2 (t1 , t2 )dt2 ÷ dt1
ç ÷
ø

to obtain Pr[T > 0.025, J = 1] = 0.323 and Pr[T > 0.025, J = 2] = 0.582 so that the
survival functions for the crude life distributions have

Pr[T ³ 0.025, J = 1]
FT|1(0.025) = Pr[T ³ 0.025| J = 1] = = 0.848
p1

Pr[T ³ 0.025, J = 2]
FT|2 (0.025) = Pr[T ³ 0.025| J = 2] = = 0.940
p2

Finally, note that when the risks are independent, the life distributions on
the net life lengths can be computed using

pi fT|i (t) pi fT|i (t)


FTi (ti ) = = (5.36)
å å
K K
p j Pr[Tj > t| J = j] p j FT|j (t)
j =1 j =1

The appeal of the competing risk model is that many types of equipment
really are subject to multiple competing risks. A further advantage of the
model is that the combination of risks, even when each risk has a monotone
Failure Processes 83

hazard function, often yields a “crude” failure distribution that has a bath-
tub shape. Thus, the competing risk model provides a means to combine
failure distributions to yield conceptually appealing model behavior in the
overall hazard function.
Note finally that like the proportional hazards model, the competing
risk model applies to both mechanical and electronic devices and may be
employed with any of the life distribution models. Thus, it is a model form
with wide applicability.

Exercises
5.1 Assume that a component has a Weibull strength distribution with
β = 2.0 and θ = 1267 kN/cm2 and that it is subject to a normal stress dis-
tribution having μ = 105.5 kN/cm2 and σ = 1.76 kN/cm2 where in both
cases the distributions are time invariant. Compute the reliability of
the component.
5.2 Suppose that the component of Problem 5.1 has an increasing stress
distribution in which mean and variance change according to the equa-
tions μY(t) = 105.5 + 0.014t and σY(t) = 1.76 + 0.005t. Compute the compo-
nent reliability at 2,000, 4,000, 6,000, and 10,000 hours.
5.3 Suppose a component has a normal strength distribution that declines
with use so that the mean strength is μX(t) = 400−0.01t and the standard
deviation in strength increases according to σX(t) = 15 + 0.004t. If this
component is subject to normally distributed stress with μY = 375 and
σY = 16, compute the device reliability at 1500 hours.
5.4 Suppose the stress distribution for the component of Problem 5.3 is
increasing with μY(t) = 375 + 0.008t and σY(t) = 16 + 0.002t. Compute the
device reliability at 1500 hours.
5.5 Solve the cumulative damage model of Equation 5.6 for the 400 hour
reliability for a component having a strength threshold L = 60 and sub-
jected to shocks that impart a normally distributed damage with mean
μ = 17.5 and standard deviation σ = 1.5. Assume shocks occur according
to a Poisson process with rate λ = 0.01/h.
5.6 For the Birnbaum–Saunders life distribution of Equation 5.13, compute
the reliability function value at 40 and 60 hours for the case in which
L = 75.0, μ = 1.5, and σ = 2.5. Then plot the reliability function over the
range [0,100].
5.7 Using the Arrhenius reaction rate model of Equation 5.14 and assuming
a Weibull life distribution, compute the component reliability at 10,000,
25,000, and 50,000 hours when T = 55°C, Ea = 0.80, η = 1.5 × 1012, β = 0.75,
84 Reliability Engineering: Probabilistic Models and Maintenance Methods

and θ = 40,000 hours. Then plot the reliability at 25,000 hours versus the
activation energy for 0.5 ≤ Ea ≤ 1.2.
5.8 Using the proportional hazards model, plot the baseline and overall
hazard functions and the reliability function for a component having a
Weibull life distribution with β = 1.75 and θ = 2500 hours and ψ(x) = 1.4.
5.9 Consider a competing risk model in which the number of risks is 2 and
the joint density on life length is

fT (t) = fT1 ,T2 (t1 , t2 ) = 3e -t1 -3t2

Determine the probabilities of failure due to each cause, the net failure
probability functions, and the survival functions for each of the crude
life length distributions.
6
Age Acceleration

Models of failure processes and the effects of environmental conditions


on failure behavior are often used as a basis for defining age acceleration
regimes. As the term implies, these are methods for causing a device to age
more rapidly than normal. The basic principle is that altering the operat-
ing environment so that it is harsher than normal operating conditions will
increase the rate at which the device ages. That is, applying “stress” to the
device will cause it to age more rapidly. The parallel idea of increased aging
in humans due to excessive stress is considered representative.
Age acceleration of components or systems is generally used for one of
two purposes—accelerated life testing or stress screening. In both cases, the
environmental models are the same.
As the term implies, accelerated life testing is the use of age acceleration
to study device longevity. The focus is on testing and the support of prod-
uct design and development. Estimates of life distribution parameter values
or at least tolerance bounds on reliability may be obtained. Unsatisfactory
reliability performance may be recognized or sensitivity to specific envi-
ronmental factors may be investigated. The idea in accelerated life testing
is that the time needed to test a new component design in order to verify
its reliability can be excessive. In fact, many current durable good products
such as computers and automobiles have components for which typical life
lengths are measured in years. In principle, testing a new component design
to verify that it is sufficiently reliable should involve observation of a sample
of the devices operating until at least some of them fail. In the absence of an
age acceleration method, the design verification effort would be unreason-
ably long. Age acceleration is used to reduce the test duration.
In contrast to accelerated life testing, stress screening is the use of age
acceleration to age a device population beyond the early-life phase of a prod-
uct population so that the units put into customer use have reduced and pre-
sumably low hazard. It is a means to insulate the customers of a fixed product
design from initially high hazard rates of DFR devices. This should reduce
the level of expense for warranty support and also assure a favorable prod-
uct reputation. In fact, optimization models have been defined for selecting
the most cost-effective stress regimen. In these models, the costs of apply-
ing the environmental stress to the devices and the costs of lost new prod-
ucts are balanced against the avoidance of the costs of failures that would
be experienced as a result of failures during customer use of the product.

85
86 Reliability Engineering: Probabilistic Models and Maintenance Methods

The  solutions obtained indicate that for many types of electronic devices,
stress screening is a worthwhile strategy for managing device reliability.
Here again, there is an interesting contrast between mechanical and elec-
tronic devices. Experience has shown that many electrical and electronic
devices display high and decreasing hazard during early life, so stress
screening can be economically sensible. On the other hand, most mechanical
components display increasing hazard from the start of life. Consequently,
stress screening is rarely used for mechanical components and models of its
use have not been constructed.
For both applications, the operating environment of a device is modified
within the limits that the device should be able to tolerate. That is, environ-
ment is manipulated in a manner that the aging process remains unchanged
except that it occurs more rapidly. It is important to maintain the nature of

0.0007

0.0006

0.0005

0.0004
Pr

0.0003

0.0002

0.0001

500 1000 1500 2000


(a) Time

0.8

0.6
Pr

0.4

0.2

500 1000 1500 2000


(b) Time

FIGURE 6.1
The (a) density functions and the (b) distribution functions with and without age acceleration.
Age Acceleration 87

the failure processes so that the observed aging behavior is consistent with
actual use. When the age acceleration is done properly, it is usually believed
that a proportional hazards type of hazard rate enhancement is obtained,
so the shape of the distribution is maintained. Thus, age acceleration yields
only timescale compression and

FTa (t) = FT ( at) (6.1)

where a is the acceleration factor, so Equation 6.1 states that the life distribu-
tion under enhanced stress is the same as the life distribution under nominal
operating conditions but evaluated for a compressed timescale.
Consider the example of a component having a Weibull life distribution
with β = 1.5 and θ = 1000 hours. If the application of a stress to the component
population results in an age acceleration of a = 10, the density distribution
functions are compressed as shown in Figure 6.1. Representative values of the
distribution functions are FTa (50) = FT (500) = 0.298, FTa (100) = FT (1000) = 0.632,
and FTa (200) = FT (2000) = 0.941. Note the compression in the range of
dispersion. A key point is that the shape of the distribution is unchanged
and this reflects the fact that the basic failure process is unchanged.

6.1 Age Acceleration for Electronic Devices


The models used to represent age acceleration are essentially those for the
effects of operating environment presented in the preceding chapter. In the
case of electronic components, the application is direct. Over the past 50 or
60  years, age acceleration has been performed regularly on electrical and
electronic components. It may be the particular failure processes to which
electrical devices are subject or it may be their relatively short lives, but sub-
stantial evidence of the effectiveness of age acceleration has been accumu-
lated. In fact, the extensive experience in the application of the Arrhenius,
Eyring, and power law models has led to general acceptance of their accuracy.
In the case of the Arrhenius model, the age acceleration factor a is com-
puted as the ratio of the reaction rate under enhanced stress to that for the
nominal conditions. Thus, if T0 is the nominal operating temperature and
Ta is the stress temperature, the age acceleration is

Ea æ 1 1 ö
r(Ta ) ç - ÷
a= = e K è T0 Ta ø (6.2)
r(T0 )

As an example, consider a microelectronic device that is normally operated


at a temperature of T0 = 55°C = 328 K. If the device has an activation energy
88 Reliability Engineering: Probabilistic Models and Maintenance Methods

of Ea = 0.8 and is tested at Ta = 95°C = 368 K, it will age at a rate a = 21.6 times
faster than under normal operating conditions. That is, for each hour of oper-
ation at 95°, the device gains 21.6 hours of age.
An interesting extension to this model was studied by Nachlas [46] in
response to evidence presented by Wong [47] that for CMOS devices, the activa-
tion energy is often temperature dependent. Recognizing that performing ther-
mal age acceleration implies heating and cooling intervals, Nachlas defined a
model in which the net age acceleration is obtained as a function of the heating
and cooling rate and of the soak temperature. The basic form of that model is
Ea ( T0 )
D E ( T ( t ))
e KT0 - a
a=
D òe
0
KT ( t )
dt (6.3)

where D is the test duration including heating and cooling intervals, the
function T(t) describes the temperature profile over time, and the function
Ea(T) depends upon the device identity and may be selected on the basis
of the findings of Jensen and Wong. A representative temperature profile is
shown in Figure 6.2.
The cycle represented shows heating and cooling intervals of 200 minutes,
each with a dwell (or soak) period of 800 minutes. Representing the tempera-
ture profile by
ì æ t ö
2.5

ï 294 + 74 ç ÷ 0 £ t £ 200
ï è 200 ø
ï
Ta (t) = í368 200 £ t £ 1000
ï 2.5
ï 368 - 74 æ (t - 1000) ö 1000 £ t £ 1200
ïî ç ÷
è 200 ø

0.8

0.6
Temp

0.4

0.2

200 400 600 800 1000 1200


Time

FIGURE 6.2
Representative thermal cycle for age acceleration.
Age Acceleration 89

and, on the basis of the Jensen and Wong paper, taking the activation energy
function as

Ea (Ta ) = 1.843e -0.00264Ta

numerical integration indicates an acceleration factor of a = 167.5.


For the Eyring model, assume that the two temperatures are the same as
in the earlier example and that the nominal and stress voltages are V0 = 5.0
millivolts and Va = 8.0 millivolts, respectively. The general acceleration factor
equation is
Ea æ 1 1 ö g 2 æ Va V0 ö
r(Ta , Va ) ç - ÷ g V -V ç - ÷
a= = e K è T0 Ta ø e 1 ( a 0 )e K è Ta T0 ø (6.4)
r(T0 , V0 )

so taking Ea = 0.8, γ1 = 0.2, and γ2 = 0.008, the age acceleration is a = 72.0.


Note that some types of components may not experience a synergistic
effect of the combined stresses. In this case, γ2 = 0 and the stresses are inde-
pendent. They, nevertheless, both contribute to the acceleration in aging.
The (inverse) power law model is defined in terms of the expected
life length. The age acceleration factor is therefore the ratio of the expected life
under nominal conditions to the expected life length under increased stress.
The expression is
p
E[T |V0 ] c/V0p æ Va ö
a= = = (6.5)
E[T |Va ] c/Vap çè V0 ÷ø

In the case of the microelectronic device of the previous example, taking


p = 2 implies an age acceleration of a = 2.56.
The two final points related to age acceleration of electronic devices are (1) that
thermal stress is generally considered to be the most effective environmental
modification for age acceleration and (2) that Equations 6.1 and 6.2 each imply
that the device hazard function may be represented in a proportional hazards
format in which the temperature effect is a time-independent coefficient.
Note further that the evolution of modern computers, especially the laptop
models, has been dominated by the development of more efficient and com-
pact cooling systems.

6.2 Age Acceleration for Mechanical Devices


In contrast to the case of the Arrhenius model for electronic devices, no gen-
eral model has been defined for age acceleration of mechanical components.
In fact, it is generally considered that for each mechanical device, an age
90 Reliability Engineering: Probabilistic Models and Maintenance Methods

acceleration strategy, if there is one, must be identified specifically. Usually,


this means evaluating the forces that drive the deterioration process and
seeking a way to intensify those forces.
It is important to note that “stress” testing of mechanical devices is often
misrepresented. For example, the hinge on the door of an automobile is typi-
cally used a few (<20) times per day. In advertising the durability of their
automobiles, some firms show a door being opened and closed continuously
and imply that this is a form of testing routinely used to verify reliability.
While this demonstration may be impressive and while the time until the
hinge fails may be reduced, the increase in the frequency of use of the hinge
does not constitute age acceleration. The forces imposed on the hinge dur-
ing each open–close cycle are not changed, and thus the distribution on the
number of cycles to failure is not compressed. The rate of actuation (or use) is
increased but aging is not accelerated. In the case of the hinge, age accelera-
tion might be accomplished by increasing the weight of the door.
Another example of a test that is presented to the public is one in which
an automotive tire is spun continuously and at relatively high speed on a
test fixture. Here again, the usual assertion is that this test format accelerates
aging and demonstrates the reliability of the product. The frequency of actu-
ation is being used to reduce test time, but the distribution on the number
of kilometers to failure is probably not being altered by the continuous use.
It is true that increased turning speed might imply some age acceleration
and continuous use may imply heat accumulation that yields acceleration.
These are worth studying, but the representation to the consumer public is
misleading and inaccurate.
The guiding principle for age acceleration is therefore that the operating
conditions experienced by a device must be modified in a manner that results
in a compression in scale of the life distribution. Vassiliou and Mettas [48]
provide a simple but very informative example of this idea. They describe the
“testing” of a paper clip by bending the (small) inner loop of the clip out of
the plane that contains the entire unit. They consider that bending the inner
loop out of the plane and back into position constitutes a single functional
cycle, and they define a “test” in which the usual angle of displacement of
45° is increased to the “stress” condition of 90°. An additional “test” at 180°
is also possible. This paper clip example serves well to illustrate the fact that
it is the modification of the degree to which the forces that drive failure that
provides age acceleration. For mechanical devices, the failure processes must
be determined and the driving forces must be identified. Then a means to
intensify those forces must be found. This process is usually specific to the
component design.
The models presented in Chapter 5 do provide a format for the study of
age acceleration of mechanical devices. In the case of the cumulative damage
model of Equation 5.6, the frequency of activation effects can be represented
easily by adjusting the rate parameter, λ, of the Poisson process for the arrival
of shocks. While this does not comprise age acceleration, the adjustment
Age Acceleration 91

permits the modeling of the revised operating profile. More useful models
are obtained if the strength threshold parameter, L, is expressed as a func-
tion of the quantities that define operating conditions.
In a similar manner, both the stress–strength and proportional hazards
models offer the potential to represent age acceleration. In the case of the
stress–strength models, representing the parameters of the strength distribu-
tion in terms of operating conditions may provide a useful model. Normally,
the proportional hazards models are based on variables (covariates) that repre-
sent the operating conditions, so the representation of age acceleration should
be direct.
Recently, Cassady and Nachlas [49] suggested a general model that can be
used to capture the effects of service loads and device vintage as well as those
of the ambient operating environment. They assume that a device is subjected
to operating conditions that can be captured using two sets of variables: n
measures of intensity of use (e.g., speed, vibration) and m measures of ambient
operating conditions (e.g., temperature, relative humidity). They let Xi(t) denote
the value of intensity of use measure i at time t and X(t) = {X1(t), X 2 (t), … , X n (t)}.
Also, X0i denotes the nominal value of intensity of use measure i. Then, they let
Yj(t) denote the value of ambient operating conditions measure j at time t with
Y(t) = {Y1(t), Y2 (t), … , Ym (t)} where Y0j denotes the nominal value of ambient
operating conditions measure j. These quantities are used to obtain a measure
of the “equivalent age” of the device over time.
The unit’s operating conditions govern its aging. Let αk(t) denote the equiv-
alent age of a unit of vintage k at time t. The instantaneous rate of aging at
time t may be represented by

d qk ( t )
a¢k ( t ) = ak (t ) = ( ak0 ) (6.6)
dt

where αk0 is a vintage-dependent parameter (k is an index used to represent


vintage)
n m
qk ( t ) = å dki ( Xi ( t ) ) + åg kj (Y (t )) + e
j k (6.7)
i =1 j =1

The functions {δk1, δk2, …, δkn} and {γk1, γk2, …, γkm} are vintage dependent and
defined so that dki ( X0 i ) = 0, i = 1, … , n and g kj ( Y0 j ) = 0, j = 1, … , m .
Note that ek is a vintage-dependent parameter. In addition, note that they
assume the effects of individual measures of intensity of use and ambient
operating conditions on equipment aging are independent. Therefore,
t

ò
a k ( t ) = a¢k ( t ) dt
0
(6.8)
92 Reliability Engineering: Probabilistic Models and Maintenance Methods

The complexity of this function depends entirely upon the complexity of the
functions {δk1, δk2, …, δkn} and {γk1, γk2, …, γkm}. Then, if Θ denotes the equiva-
lent age of the unit at failure, then the cumulative distribution function on
failure age is FQ ( q ) = Pr ( Q £ q ), and the transformation Tk = a -k 1 ( Q ) yields the
clock time to failure for items of vintage k. This time value may then be used
for the life distribution and reliability functions. The appeal of this model
is that it captures acceleration due to operating intensity as well as environ-
ment and permits a distinction between operating time and clock time.
Finally, note that the diffusion process models have a “killing function”
that represents the process of deterioration. Here again, if the model that
comprises the “killing function” is defined (as would be expected) in terms
of the forces that drive failure, the representation of age acceleration should
again be direct.

6.3 Step Stress Strategies


The effectiveness of age acceleration in testing the reliability of new device
designs has led to the development of numerous stress application strategies.
One that has been found to be particularly useful is the step stress strategy.
Under this plan, the key environmental stress variable, which is often tem-
perature, is applied at successively higher levels. Thus, surviving devices are
subjected to an increasingly aggressive environment. The failure data that
are obtained in this manner will often support both the construction of a life
distribution model for the device population and the estimation of failure
model parameters such as the activation energy.
The number of levels at which the stress is applied depends upon the
device being tested, the capability of the equipment used for stress applica-
tion, and the extent of the information sought. Thus, the environmental pro-
file is specific to each application. An example of a step stress strategy would
be to test a sample of electronic devices having a constant activation energy
of Ea = 0.70 at, say, 95°C for 80 hours and to then increase the temperature to
110°C for an additional 40 hours. Under this regime, the devices that survive
the entire test will experience an average age acceleration over the 120 hours
of a = 21.47. A more informative measure is that devices that survive the test
will have a net equivalent age of 2577 hours. The average age acceleration
factor is computed as

a=
(80e 0.7 æ 1
ç -
1 ö
÷
0.00008623 è 328 368 ø
+ 40e
0.7 æ 1
ç -
1 ö
÷
0.00008623 è 328 383 ø ) = (80(14.73) + 40(34.96)) = 21.47
120 120
Age Acceleration 93

Of course, the failure data obtained during the test should provide consider-
able information about the failure characteristics of the device population.
One further example of a step stress test is provided by the paper clips.
Suppose the stress regimen is to bend each member of a sample set of clips
20 cycles of 45° displacement followed by 20 cycles of 90° displacement and
further followed by 20 cycles of 180° displacement. Clearly, the successive
stress levels are increasingly aggressive and should produce increasing
numbers of failures.

6.4 Concluding Comment


In recent years, the use of HALT and HASS has increased considerably.
HALT is “highly accelerated life testing” and HASS is “highly accelerated
stress screening.” Both have strong advocates. In both cases, the idea is to
impose very severe environmental stresses upon a device even if those stress
levels exceed the conditions the device is designed to tolerate. The motiva-
tion for these two practices is to obtain very high levels of acceleration for
devices that nominally have very long lives. The fact that additional failure
modes that are not typical of device behavior occur is accepted under the
assumption that those overstress failures will be recognized as such and
conclusions about the device behavior in service will be adjusted appropri-
ately. The use of these methods is still quite controversial, and the soundness
of their use needs to be examined individually for each device to which their
application is proposed.

Exercises
6.1 Consider an electronic component having activation energy Ea = 0.6
and normal operating temperature To = 55°C. Compute and plot the
age acceleration factor for the Arrhenius model that is realized for the
range of temperatures 70°C–110°C.
6.2 Assuming a normal operating temperature of 55°C and an accelerated
temperature of 95°C, compute and plot the acceleration factor defined
by the Arrhenius model for Ea in the range of 0.4–0.9.
6.3 Suppose copies of a device having activation energy of 0.82 eV/K are
subjected to a thermal cycling regime in which the dwell time at 90°C
is 4 hours, heating is linear over 30 minutes, cooling is linear over
90  minutes, and normal operating temperature is 55°C. Compute the
net acceleration per cycle.
94 Reliability Engineering: Probabilistic Models and Maintenance Methods

6.4 Suppose copies of a device for which temperature-dependent life


length is believed to be well represented by a Weibull distribution were
subjected to a step stress accelerated life test with a stress regimen of
([85°C, 40 h], [95°C, 80 h], [105°C, 40 h]). Assuming that the normal oper-
ating temperature for the device is 55°C and that its activation energy is
0.68 eV/K, compute the acceleration achieved during each test interval
as well as the overall acceleration.
6.5 A step stress acceleration regime was used in the test of a new device.
Under the test plan, copies of the device were tested at 90°C for 50 hours
and then at 110°C for an additional 50 hours. The normal operating
temperature for the device is 55°C. The analysis of the test data sug-
gests that during the second interval, the acceleration achieved relative
to the first test interval was 4.35. What is the best estimate of the activa-
tion energy for the device?
7
Nonparametric Statistical Methods

The specific description of actual failure characteristics of components is nat-


urally based on the analysis of observed failure data. Sometimes, failure data
are accumulated during the operation of a population of devices, but more
often, failure data are obtained through controlled testing of a sample of the
devices. In either case, statistical methods are then used to obtain estimates
for the parameters of life distribution models or to determine estimates of
the reliability without considering specific models.
To a large extent, the statistical methods used to analyze reliability data
are the same methods that have been defined for other statistical applica-
tions. In some cases, the methods require tailoring for application to failure
data, but usually the application is direct. As suggested earlier, data may
be analyzed to obtain estimates of the parameters of an assumed life dis-
tribution. The methods for computing estimates of distribution parameters
are called “parametric” statistical methods. A variety of parametric meth-
ods are presented in the next chapter. Data analysis that is directed toward
“model-free” estimates of reliability values and failure process characteris-
tics are called “nonparametric” methods and these methods are the subject
of this chapter.
The chief advantage provided by nonparametric statistical methods for
studying device reliability is that they do not require any assumption con-
cerning the identity or the form of the life distribution. No potentially con-
fining restrictions on behavior need be assumed. Also, for cases in which
the choice of life distribution model is difficult, perhaps because the hazard
function is not monotonic, the use of nonparametric methods permits the
analyst to avoid the issue of model choice altogether. A second and also sig-
nificant advantage of the nonparametric methods is that the quantities esti-
mated are often easier to compute and manipulate than those derived from
parametric methods. One further advantage of the nonparametric methods
is that some permit the use of grouped data sets in which not all of the failure
times are actually observed.
The key disadvantage of the use of nonparametric methods is that the
estimates obtained usually do not support inferences about the identities
of the hazard or distribution functions. In general, the quantities computed
using nonparametric methods contain less information than those obtained
with parametric analysis.

95
96 Reliability Engineering: Probabilistic Models and Maintenance Methods

7.1 Data Set Notation and Censoring


Before examining the specific analytical methods, it is worthwhile to estab-
lish some standard notation and some terminology that is descriptive of the
process of collecting failure data. To define a standard notation, assume that
(unless otherwise specified) the failure data that are to be analyzed have
been obtained in a controlled test of a sample of “n” identical copies of a
device. Assume further that each copy of the device that is tested is indexed
so that its identity is known. We might think of the index as the “test slot”
occupied by the device during the test.
We then take the set of observed failure times to be ti, i = 1, …, n. Thus, ti is
the time at which the component with index “i” fails. Consider an example.
Suppose a sample of n = 50 copies of a device are tested and the (scaled)
failure times are recorded as shown in Table 7.1. Associated with each copy
of the device is an index, and for each indexed device, the time of failure is
recorded.
Now, for many of the methods of analysis of the data, it is necessary that
we reorder it so that the values are increasing. To represent the fact that the
data have been reordered, we rename the values as xj where

x j = téë jùû (7.1)

This expression should be read as xj is the jth smallest failure time observed
and it corresponds to the failure time of the device copy indexed [j], for exam-
ple, in the data set of Table 7.1, x1 = t[1] = t19 = 0.006, x2 = t[2] = t47 = 0.019, and
x50 = t[50] = t3 = 5.292. The completely reordered data set is shown in Table 7.2.

TABLE 7.1
Example of a Failure Data Set
i ti i ti i ti i ti i ti
1 0.883 11 1.555 21 0.129 31 0.829 41 0.894
2 0.875 12 3.503 22 0.455 32 0.548 42 0.336
3 5.292 13 1.541 23 2.008 33 1.016 43 0.129
4 0.038 14 1.218 24 0.783 34 0.223 44 1.373
5 4.631 15 1.285 25 1.803 35 3.354 45 0.613
6 1.690 16 2.190 26 2.505 36 1.559 46 1.272
7 0.615 17 0.720 27 0.465 37 3.785 47 0.019
8 2.877 18 0.056 28 1.494 38 0.599 48 0.068
9 1.943 19 0.006 29 0.795 39 0.090 49 0.658
10 3.106 20 0.279 30 0.299 40 0.026 50 3.085
Nonparametric Statistical Methods 97

TABLE 7.2
Reordered Example of Failure Data
j xj j xj j xj j xj j xj
1 0.006 11 0.279 21 0.720 31 1.285 41 2.190
2 0.019 12 0.299 22 0.783 32 1.373 42 2.505
3 0.026 13 0.336 23 0.795 33 1.494 43 2.887
4 0.038 14 0.455 24 0.829 34 1.541 44 3.085
5 0.056 15 0.465 25 0.875 35 1.555 45 3.106
6 0.068 16 0.548 26 0.883 36 1.559 46 3.354
7 0.090 17 0.599 27 0.894 37 1.690 47 3.503
8 0.129 18 0.613 28 1.016 38 1.803 48 3.785
9 0.129 19 0.615 29 1.218 39 1.943 49 4.631
10 0.223 20 0.658 30 1.272 40 2.008 50 5.292

Now, the statistical methods that we consider will usually be applied to the
values xj.
With our notation defined, we next consider the question of whether or
not a test is run to completion. It is easy to imagine a case in which the time
required to test a complete sample of copies of a component until all have
failed can be excessive. Even with age acceleration, the time required for a
complete test can be unmanageable or infeasible. In order to perform a reli-
ability test within a reasonable length of time, tests are often truncated early.
A decision to use a truncated test should be made before the test is per-
formed. When the test is truncated, we say that the test data obtained are
censored.
There are two basic approaches to test truncation—Type I and Type II.
A test may be truncated at a preselected point in time or after a predeter-
mined number of item failures. If the test is to be terminated after a fixed
time interval, the test duration is known (and therefore limited) in advance,
but the number of device failures that will be observed is a random vari-
able. This is Type I test truncation and the data set that results is said to be a
Type I censored data set. On the other hand, truncation after a fixed number
of failures yields a Type II censored data set in which the number of data
values is known in advance, but the test duration is random. In many cases,
the distinction is not crucial, but for some statistical estimation methods, the
difference between the two types of censoring is important. When a life test
is not truncated, it is said to be a complete test and the data set obtained is
said to be a complete data set.
Now that our notation is established and the basic classes of data sets
that might be used are defined, we may examine the methods of analysis.
To start with, we consider only those cases in which a complete data set is
available.
98 Reliability Engineering: Probabilistic Models and Maintenance Methods

7.2 Estimates Based on Order Statistics


A direct method for treating failure data is to use the number of observed
failures or the number of surviving devices at any time to estimate the failure
or survival probability. It seems intuitively reasonable to define a point esti-
mate for the failure probability at any test time to be the ratio of the number
of observed failures to the number of items on test. Similarly, the survivor
probability can be estimated by the complement of the failure probability.
That is, for any time t,

k
FˆT (t) =
k = { j|x j £ t , x j +1 > t}
n (7.2)
ˆ n-k
FT (t) =
n

where the caret “^” above a quantity indicates an estimate. Now, while these
expressions seem intuitively logical, they can be shown to be based on statis-
tical reasoning and to be one of the two reasonable forms.
Assume an arbitrary time interval, say, (t, t + ΔT), and let pj denote the
probability that the jth device failure observed occurs during that interval.
In principle, any of the failures could occur during that interval so the prob-
ability that it is the jth failure is a probability on the index of the failure time
that happens to be the one in the interval of interest. Next, we ask how it
could occur that the jth failure would fall in the selected interval and the
answer is as follows:

1. Each of the preceding j − 1 failures would have to occur before the


start of the interval and this occurs with probability FT(t) for each
one.
2. Each of the succeeding n − j failures must occur after the end of the
interval and this happens with probability FT (t + Dt) for each one.
3. The jth failure must fall within the interval and this happens with
probability dFT(t).
4. The number of ways the n copies of the device put on test can be
separated into sets of j − 1, 1, and n − j is given by the multinomial
coefficient. Therefore,

n!
( )
n- j
( FT (t)) ( fT (t)dt ) FT (t)
j -1
pj = (7.3)
( j - 1)!(1)!(n - j)!

This is the probability on the number of the failure that occurs during an
arbitrary time interval and in the limit as the length of the interval is reduced;
it is the probability that it is the jth failure that has occurred  by  time  t.
Nonparametric Statistical Methods 99

Thus,  it  is the distribution on the “order statistic,” which is the count of
the number of failures. Properties of distributions on order statistics have
been studied extensively. Among the results that are known and have been
found useful are the facts that the mean of the distribution on the jth order
statistic implies that, as stated in Equation 7.2, an appropriate estimate for
the fraction failed is

j
FˆT (t) = (7.4)
n

while the median of the distribution on the jth order statistic implies that an
appropriate estimate for the fraction failed is

j - 0.3
FˆT (t) = (7.5)
n + 0.4

Although it may seem counterintuitive, both of these estimates are reasonable


and each has advantages. The chief advantage of the estimator based on the
mean (Equation 7.4) is that it is unbiased. That is,

E[Fˆ T (t)] = FT (t)

The estimator based on the median is not unbiased, but it has the advantage
that it does not assign a value of 1.0 to the estimate associated with the nth
failure time. This is often considered important in that one does not expect
that the greatest failure time observed during a test really corresponds to
the maximum achievable life length. In general, either of the estimators may
be used and each is treated at various points in the discussion of statistical
methods presented here.

7.3 Estimates and Confidence Intervals


The estimators defined in Equations 7.4 and 7.5 are called point estimates
because they yield scalar values. From both a statistical and a design per-
spective, an interval estimate is usually more informative as the point esti-
mate is unlikely to exactly match the actual value of the estimated quantity.
A confidence interval is a range for which there is an arbitrarily selected
probability that the interval contains the true value of the estimated quan-
tity. The construction of confidence intervals starts with the estimators and
is described in the following.
It is also the case that the estimators defined in Equations 7.4 and 7.5 may
be viewed and analyzed in two related but rather different ways. Specifically,
100 Reliability Engineering: Probabilistic Models and Maintenance Methods

one may base the computation of the estimate of the reliability (or failure
probability) on the number of survivors (or observed failures) or on the pro-
portion of the test items that have survived (or failed). Consider first the use
of the number of survivors.
The number of survivors at the end of a fixed time interval is a random
variable for which the dispersion is best modeled using the binomial dis-
tribution. That is, survival of each copy of the component may be viewed as
a Bernoulli trial so that the quantity n − j has a binomial distribution with
success probability FT (t). In this case,

E[n - j] = nFT (t) and Var[n - j] = nFT (t) 1 - FT (t) ( ) (7.6)

so the corresponding moments for the reliability estimates are

én- jù

é n - j ù FT (t) 1 - FT (t)
= FT (t) and Var ê =
( ) (7.7)
ú
ë n û ë n úû n

In general, when the sample size is large (or at least not too small) and the
success probability is not too close to 0 or to 1, a confidence interval for the
binomial parameter may be computed using the standard normal distribu-
tion. For an arbitrarily selected confidence level of 100(1 − α)%, a confidence
interval of

ˆ
FT (t) + za/2
ˆ
( ˆ
FT (t) 1 - FT (t) ) £ F (t) £ Fˆ (t) + z
T T 1-a/2
ˆ
( ˆ
FT (t) 1 - FT (t) ) (7.8)
n n

is valid asymptotically. Note that a table of standard normal probabilities is


provided in Appendix A along with numerical approximation strategies for
computing normal probabilities.
Consider an example. For the ordered data set provided in Table 7.2, the
point estimates of the failure probability and of the reliability at t = 0.5 time
units are
j 15 ˆ
FˆT (0.5) = = = 0.30 FT (0.5) = 1 - FˆT (0.5) = 0.70
n 50

j - 0.3 14.7 ˆ
FˆT (0.5) = = = 0.292 FT (0.5) = 1 - FˆT (0.5) = 0.708
n + 0.4 50.4

Taking α = 0.05 so that za/2 = -1.96, the first of the estimates provides the
confidence interval:

0.573 £ FT (0.5) £ 0.827


Nonparametric Statistical Methods 101

and the second one yields

0.582 £ FT (0.5) £ 0.834

A reasonably comparable but more involved estimate and confidence inter-


val may be defined using the proportion of test items that have failed by any
time. In general, the fraction failed, say, u = FT(t), is best represented by the
beta distribution for which the density function is
G(h + d + 2)
fU (u, h, d) = uh (1 - u)d du (7.9)
G(h + 1)G(d + 1)

In fact, this beta distribution form also follows directly from the distribution
on the order statistics stated in Equation 7.3. With the definition u = FT(t) and
the parameter identities of η = j − 1 and δ = n − j, replacing f T(t)dt by du yields
Equation 7.9. Keep in mind that the proportion u must be in the interval (0, 1).
The point estimate of Equation 7.4 is an appropriate estimate for the frac-
tion failed by time t so a confidence bound on that estimate is obtained using
the quantiles of the beta distribution. For integer values of the parameters,
the distribution function for the beta distribution can be obtained by succes-
sive integration by parts and is
h
G(h + d + 2)
FU (u, h, d) = 1 - å G(k + 1)G(h + d - k + 2) u (1 - u)
k =0
k h+ d - k +1
(7.10)

Then, a 100(1 − α)% confidence interval on the failure probability at the time
of the jth observed failure is

ulower £ FT (t) £ uupper (7.11)

where ulower is the solution to the equation


a
FU (ulower , j - 1, n - j) = (7.12)
2
and uupper is the solution to the equation
a
FU (uupper , j - 1, n - j) = 1 - (7.13)
2

Clearly, these confidence limits must be computed numerically, but the effort
required to do this is not great. For the same example case as the one treated
earlier in which n = 50 and j = 15 at t = 0.5, the point estimate for FT(0.5) is still
j 15
FˆT (0.5) = = = 0.30
n 50
102 Reliability Engineering: Probabilistic Models and Maintenance Methods

The corresponding 95% confidence interval is

uupper = FU-1(0.975, j - 1 = 14, n - j = 35) = 0.418

ulower = FU-1(0.025, j - 1 = 14, n - j = 35) = 0.170

0.170 £ FT (0.5) £ 0.418

and for the reliability, this interval corresponds to

0.582 £ FT (0.5) £ 0.830

It is appropriate to reiterate the point that these estimates and confidence


intervals do not depend upon any assumed form for the life distribution.
They apply to any set of test data provided the data set is complete.

7.4 Kaplan–Meier Estimates


The estimates based on order statistics can be applied directly and without
modification to censored data sets provided the censoring is implemented as
described earlier—Type I or Type II. This can be rather restrictive as the con-
trol of observed data can sometimes be less effective. A method that allows
for a more general censoring pattern was defined by Kaplan and Meier [50]
and is thus referred to as the Kaplan–Meier estimate. The more general cen-
soring pattern is that individual copies of a device on test may be removed
during the test and this also allows for the possibility that the information
for some units is simply lost. In addition, the estimation procedure is defined
for the case in which device status is observed periodically as well as when
it is observed continuously.

7.4.1 Continuous Monitoring of Test Unit Status


Assume that n copies of a device are placed on test and that their failure
behavior is then observed until τk, the time of the kth failure (or removal).
Associate with each of the values τi another variable, say, wi, which takes the
value 1 if τi is a failure time and 0 if it is a removal time. Note that during the
interval between τi−1 and τi, there are ni = n − (i − 1) = n − i + 1 units operating,
and at the end of the interval, one of those units fails or is removed from the
test. Thus, for devices that are functioning at time τi−1, the conditional prob-
ability of survival at time τi is pi where

pi = Pr[T > ti |T > ti -1 ] (7.14)


Nonparametric Statistical Methods 103

and this probability may be estimated by

ni - wi
p̂i = (7.15)
ni

The Kaplan–Meier estimator for the device survivor function at any time is
defined as

ˆ n - i + 1 - wi
FT (t) = Õ
i|ti £ t
n-i+1
(7.16)

and given the definition of the wi, this is equivalent to

wi
æ n-i ö
ˆ
FT (t) = Õ Õ
i|ti £ t
pˆ i =
i|ti £ t
ç ÷
è n-i+1ø
(7.17)

Because the estimator is equivalent to the product of the successive interval


conditional survival probabilities, it is sometimes referred to as the product
limit estimator. Consider an example.
Suppose that a sample of 25 copies of a device were placed in operation
and resulted in the observations listed in Table 7.3 where the asterisks
indicate that the times are censoring or data loss times. Implementation
of Equation 7.17 yields the estimates shown in the right-hand column of
Table 7.4. Observe that the data set could have been truncated at any time
and the estimates of the survivor function shown in Table 7.4 would be
the same.
Now, as indicated earlier, a confidence interval on an estimate is preferred
to a simple point estimate. In order to compute a 100(1 – α)% confidence inter-
val for any of the survival probability estimates, the variance of the estimate
is required. The variance of the estimates is given by Greenwood’s formula
[51] and is

( ) ( ) å n (nw- w )
2 i
 FˆT ( xi ) = FˆT ( xi )
Var
j
(7.18)
j =1 j j j

Using this expression, confidence intervals may be defined on the estimates


in the same manner as is used in Equation 7.8. Specifically,

ˆ
( )
 FˆT ( xi ) £ FT ( xi ) £ FˆT ( xi ) + z1-a/2 Var
FT ( xi ) - za/2 Var  FˆT ( xi )
( ) (7.19)
104 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 7.3
Data for Kaplan–Meier Estimation Example
1.20 10.93 16.49* 20.47 43.21
3.27 11.73 18.13 25.35 47.10
3.46 13.75* 18.16 31.43 54.95
3.57* 14.14 18.74* 31.66 60.43*
7.19 16.09 18.99 33.54* 80.40
Note: * Data loss time.

TABLE 7.4
Calculation of Kaplan–Meier Estimates using
Example Data for the Survivor Function 
ˆ
i xi wi p̂i FT ( xi )

0 1.0 1.000
1 1.20 1 0.960 0.960
2 3.27 1 0.958 0.920
3 3.46 1 0.957 0.088
4 3.57 0 1 0.088
5 7.19 1 0.952 0.838
6 10.93 1 0.950 0.796
7 11.73 1 0.947 0.754
8 13.75 0 1 0.754
9 14.14 1 0.941 0.710
10 16.09 1 0.938 0.666
11 16.49 0 1 0.666
12 18.13 1 0.929 0.618
13 18.16 1 0.923 0.570
14 18.74 0 1 0.570
15 18.99 1 0.909 0.518
16 20.47 1 0.900 0.467
17 25.35 1 0.889 0.415
18 31.43 1 0.875 0.363
19 31.66 1 0.857 0.311
20 33.54 0 1 0.311
21 43.21 1 0.800 0.249
22 47.10 1 0.750 0.187
23 54.95 1 0.667 0.124
24 60.43 0 1 0.124
25 80.40 1 0 0
Nonparametric Statistical Methods 105

ˆ
Applying this expression to the estimate FT ( x7 = 11.73) yields

( ) ( ) å n (nw- w )
2 7
 FˆT (11.73) = FˆT (11.73)
Var
j

j =1 i j j

2æ 1 1 1 1 1 1 ö
= ( 0.754 ) ç + + + + + ÷
è 25(24) 24(23) 23(22) 21(20) 20(19) 19(18) ø
= 0.0076

0.583 = 0.754 - 1.96 0.0076 £ FT (11.73) £ 0.754 + 1.96 0.0076 = 0.925

While it is true that the bounds are relatively wide, they are based on a very
general censoring pattern and thus permit estimation using even very lim-
ited data sets.

7.4.2 Periodic Monitoring of Test Unit Status


There are situations in which a sample of copies of a device are put on test
and the test specimens cannot be or simply are not monitored continuously.
Instead, the test specimens are examined periodically to see how many are
still surviving and the data recorded are the number of failed devices. In the
most general case, the observation times are also considered to provide an
opportunity to remove some specimens from the test. Thus, the analysis of
this case is labeled “group censoring.” In addition, while it is most common
to use equal length intervals, the general construction can be based on inter-
vals of unequal lengths.
The Kaplan–Meier estimation scheme can be modified to handle the group
censoring (or interval-based) data analysis. To allow for the general case,
assume that a total of n copies of a device are placed on test (or in opera-
tion) and that the status of the specimens is inspected at each of the times
(τ1,τ2,…,τk) where the time between any two adjacent inspection times need
not necessarily equal that for any of the other intervals. We let

ni be the number of functioning copies of the device at the beginning of


the ith interval (i.e., at time τi−1)
di be the number of copies of the device that fail during the ith interval
(i.e., between time τi−1 and time τi)
mi be the number of copies of the device that are removed from the test
at time τi−1, so observation for those devices is lost during the ith
(and subsequent) intervals
106 Reliability Engineering: Probabilistic Models and Maintenance Methods

Using these definitions, the number of units tested during the ith interval is
i -1
ni = n - å (d + m )
j =1
j j (7.20)

and the conditional probability of failure during the ith interval may be
estimated by

d d
pˆ i = i = i
(7.21)
å
i -1
ni n - (d j + m j )
j =1

The survivor function at any inspection time τi may be estimated as

i i æ ö
ç ÷
Õ Õ
ˆ dj
FT (ti ) = (1 - pˆ j ) = ç1- ÷ (7.22)
å
j -1
j =1 j =1 ç n- (dl + ml ) ÷
è l =1 ø

The variance of this survivor function estimate may also be estimated using
Greenwood’s formula [51]:

( ) ( ) å n (1pˆ- pˆ )
2 i
 FˆT (ti ) = FˆT (ti )
Var
j
(7.23)
j =1 j j

Once we have estimates of the reliability and the variance of the estimates,
the normal distribution confidence intervals described in Section 7.3 may be
applied directly.
Consider an example. Meeker and Escobar [52] present heat exchanger
tube failure data in which n = 300, m = (99, 95), and d = (4, 5, 2). The three
intervals are each 1 year long. Using these data, n1 = 300, n2 = 197, and n3 = 97.
Therefore,
4 5 2
pˆ 1 = = 0.0133 pˆ 2 = = 0.0254 pˆ 3 = = 0.0253
300 197 97

Then,
ˆ  FˆT (t1 ) = 4.385 ´ 10 -5
FT (t1 ) = 1 - pˆ 1 = 0.9867 and Var ( )
ˆ  FˆT (t2 ) = 1.578 ´ 10 -4
FT (t2 ) = (1 - pˆ 1 )(1 - pˆ 2 ) = 0.9616 and Var ( )
ˆ  FˆT (t3 ) = 3.438 ´ 10 -4
FT (t3 ) = (1 - pˆ 1 )(1 - pˆ 2 )(1 - pˆ 3 ) = 0.9418 and Var ( )
Nonparametric Statistical Methods 107

Finally, the 95% confidence intervals for the survivor function values are

0.9737 £ FT (t1 ) £ 0.9996

0.9370 £ FT (t2 ) £ 0.9862

0.9054 £ FT (t3 ) £ 0.9781

7.5 Tolerance Bounds


A tolerance bound is a confidence interval on a quantile of the life distribu-
tion. Thinking in similar terms to the beta density of Equation 7.9 for the pro-
portion failed, let τγ represent the time at which the true cumulative failure
probability is γ. That is,

FT (t g ) = g (7.24)

Note that when γ is relatively small, the corresponding survival prob-


ability is 1 − γ and the quantity τ γ is called the “reliable life.” For example,
τ0.01 is the age beyond which 99% of the population will fail. It is the time
for which the reliability is 0.99. Similarly, τ0.10 is the age for which the reli-
ability is 0.90. In an effort to standardize design practices and reliability
demonstration for new component designs, the U.S. Air Force has identi-
fied, in its design guide [53], the 95% confidence limit on τ0.10 as a “Type
A design allowable” and the 95% confidence limit on τ0.01 as a “Type B
design allowable.” The determination of these quantities is an important
aspect of verifying the reliability of devices that are being proposed for
Air Force use. In statistical terminology, these quantities are known as
tolerance bounds.
The construction of tolerance bounds using test data is based on the
distribution of order statistics and hence the beta density and distribution
discussed earlier. To start, we consider the jth failure time xj, and as before,
we let

u = FT ( x j )

Then for an arbitrarily selected value of γ, we wish to have a confidence level


of 100(1 − α)% that τγ exceeds xj. That is, we wish to have

Pr[t g ³ x j ] ³ 1 - a
108 Reliability Engineering: Probabilistic Models and Maintenance Methods

in which case, we will have 100(1 − α)% confidence that the reliability at xj is at
least 1 – γ. Using our established definitions, this means that the cumulative
probability of failure at xj is smaller than that at τγ, so

Pr[t g ³ x j ] = Pr[FT ( x j ) £ FT (t g )] = Pr[u £ FT (t g )] = FU ( FT (t g ), j - 1, n - j) ³ 1 - a


(7.25)

Now, examining the beta distribution on a proportion u for the specific case
of our test results, we find that
j -1
ænö
FU (u, j - 1, n - j) = 1 - å çè k ÷øu (1 - u)
k =0
k n-k
= 1 - B( j - 1, n, u) (7.26)

where the notation B(k, n, p) represents the cumulative binomial probability.


Substituting this identity back into Equation 7.25 yields

Pr[t g ³ x j ] = 1 - B( j - 1, n, FT (t g )) = 1 - B( j - 1, n, g ) ³ 1 - a (7.27)

so equivalently, the tolerance bound is defined by

B( j - 1, n, g ) £ a (7.28)

There are several ways in which we can use this result.


First, consider the cases associated with j = 1. For this case, Equations 7.26
and 7.28 simplify to

FU (u, 1, n - 1) = 1 - (1 - u)n = 1 - (1 - g )n ³ 1 - a

so we find that for the example data set of Table 7.2, using α = 0.05 and n = 50,
we compute γ = 0.058 and we say that we have 95% confidence that the reliabil-
ity of the component population is at least 1 − γ = 0.942 at a time of x1 = 0.006.
An alternate use of Equation 7.28 is to ask how many items should we
have tested in order for x1 to correspond to a Type A design allowable.
To answer this question, we set α = 0.05 and γ = 0.10 and we compute n to
be 29. For the corresponding case of the Type B design allowable, γ = 0.01
and n = 298.
Next, for other values of j, we might compute the reliability at the time
of the 15th failure for which we have 95% confidence. To do this we solve
Equation 7.28 for the smallest value of γ for which the expression holds.
That value is γ = 0.403 so the reliability value we seek is 0.597 and we can
say that

Pr[FT (t = x15 = 0.465) ³ 0.597] ³ 0.95


Nonparametric Statistical Methods 109

Yet another possible computation would be to ask, “What level of confidence


do we have that, say, FT (t = x10 = 0.223) ³ 0.75?”
The answer is

Pr[FT (t = x10 = 0.223) ³ 0.75] ³ 0.738

Thus, using Equation 7.28, we can calculate nearly any sort of tolerance
bound we need and we can also calculate the sample size required to obtain
any specific level of confidence for any desired level of reliability.
The various methods described up to this point in this chapter treat esti-
mation of failure probabilities or reliability values. An alternate idea is to
develop nonparametric methods that address the hazard function. This
topic is treated next.

7.6 TTT Transforms


A rather powerful nonparametric method for simply determining the
behavior of the hazard function has been developed by Barlow and Campo
[54] and explained very effectively by Klefsjo [55]. The method is known
as the total time on test (TTT) transform. It has the appeal that after some
theoretical development, we obtain a simple graphical test. The underlying
idea is that for any life distribution (having finite mean), the relationship
between the mean and the hazard function may be exploited to character-
ize the behavior of the distribution. In addition, the characterization can be
performed for an empirical representation of the life distribution based on
test data.

7.6.1 Theoretical Construction


In order to understand the behavior that is to be exploited, assume that a
population of devices has a life distribution FT(t). Then, for any cumulative
failure probability, say, u, the inverse of the distribution function is denoted
by FT-1(u) and represents the time for which the cumulative probability of
failure is u. Formally,

FT-1(u) = min {t|FT (t) ³ u} (7.29)

For example, if a population of devices has an exponential life distribution


with λ = 0.84, then FT(0.75) = 0.467 and FT-1(0.467 ) = 0.75. Now, using this nota-
tion, we have the following definition:
110 Reliability Engineering: Probabilistic Models and Maintenance Methods

Definition 7.1

The total time on test transform of a life distribution having finite mean, μ, is
denoted by H F -1 (u) and is expressed as
T

FT-1 ( u )

H F -1 (u) =
T ò0
FT (w)dw (7.30)

Continuing with the previous example of the exponential life distribution,


the TTT transform value at u = 0.467 is

FT-1 ( u = 0.467 ) 0.75


(1 - e -0.75 l
) = 0.467 = 0.556
H F -1 (u = 0.467 ) =
T ò0
FT (w)dw =
ò0
e - lw dw =
l l

The TTT transform has two useful and important properties that will lead us
to a method for characterizing the behavior of a life distribution. The first is
that it yields the mean when evaluated at u = 1.0. That is,
FT-1 ( u =1.0 ) ¥

H F -1 (u = 1.0) =
T ò
0
ò
FT (w)dw = FT (w)dw = m
0
(7.31)

To clarify, note that the time at which the life distribution has value 1.0 is
infinity (or some very large maximum value) and the integral of the survivor
function over the full range of any random variable yields the mean value of
that random variable.
The second useful property of the transform is that its derivative, when
evaluated at any value of the cumulative failure probability, equals the recip-
rocal of the corresponding value of the hazard function. This is shown as
follows:
FT-1 ( u )
d d
H -1 (u) =
du FT du ò0
FT (w)dw

FT-1 ( u )
æ d -1 ö æ d ö

è du ø
(
FT (u) ÷ FT ( FT-1(u)) - ç 0 ÷ FT (0) +
è du ø
) ( ) ò
d
du
FT (w)dw
0

Now, clearly the second and third terms of this derivative equal zero. For the
first term of the derivative, we observe that in the expression

FT ( FT-1(u))
Nonparametric Statistical Methods 111

FT-1(u) is the time for which the cumulative failure probability equals u so
evaluating the survivor function at that value yields 1 − u. That is,

FT ( FT-1(u)) = 1 - u

To evaluate the derivative of the inverse function, we proceed as follows:

dt d -1
t = FT-1(u) so = FT (u)
du du

Then,

dt 1 1 1
= = =
du d
du FT (t) f (t)
dt dt

and therefore,

d -1 1
FT (u) = -1
du f ( FT (u))

Combining our two results and evaluating the derivative at the cumulative
failure probability associated with any time yields

d 1- u FT (t) 1
H F -1 (u) = = = (7.32)
du T f ( FT-1(u)) u = F f (t) zT (t)
T (t )

where the time indicated is that for which t = FT-1(u).


The algebraic representations of the two properties of the TTT transform
are combined in the following to obtain a characterization of the life distri-
bution, but first, we can scale the transform to make the ultimate test data
analysis as generic as possible. We formally define a scaled TTT transform
as follows:

Definition 7.2

The scaled total time on test transform of a life distribution having finite mean,
μ, is denoted by Q F -1 (u) and is expressed as
T

H F -1 (u) H F -1 (u)
Q F -1 (u) = T
= T
(7.33)
T H F -1 (1) m
T
112 Reliability Engineering: Probabilistic Models and Maintenance Methods

Now, clearly the derivative of the scaled transform equals the derivative of
the transform divided by the scaling constant, μ. Thus,

æ d ö
ç H F -1 (u) ÷ -1
d
Q F -1 (u) = è
du T
ø = 1/zT ( FT (u)) (7.34)
du T m m

Consider what the derivative of the scaled transform tells us. Suppose the
life distribution happens to be exponential. In that case, zT(t) = λ and m = 1/l.
Thus, at all values of u, the derivative of the scaled transform is

d 1/l
Q -1 (u) = =1
du FT 1/l

When the life distribution has constant hazard, the scaled TTT transform
has a slope equal to one. In that case, the scaled transform plots as a straight
line. Suppose the hazard is not constant. For the Weibull distribution, the
mean is θΓ(1 + 1/β) and the hazard function is
b -1
bæ t ö
zT (t) = ç ÷
qèqø

For β > 1, the hazard function is increasing in time and the scaled TTT trans-
form displays the form shown in Figure 7.1a. Similarly, for β < 1, the scaled
transform has the form shown in Figure 7.1b as this is indicative of a distri-
bution with decreasing hazard function. Thus, we have the very powerful
result that the scaled transform is concave for increasing hazard, a straight
line for constant hazard, and convex for decreasing hazard distributions.

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(a) CDF

FIGURE 7.1
(a) Total time on test transform for increasing hazard functions. (Continued)
Nonparametric Statistical Methods 113

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(b) CDF

FIGURE 7.1 (Continued)


(b) Total time on test transform for decreasing hazard functions.

7.6.2 Application to Complete Data Sets


The application of our understanding of the behavior of the scaled TTT
transform is to plot the transform values computed from test data to deter-
mine the likely form of the hazard function for the device being tested. To
construct the application, we first define the quantities τ(xj) as
j -1

t( x j ) = å x + (n - j + 1)x
k =1
k j (7.35)

where the xj are our ordered failure times so τ(xj) is the total amount of test-
ing time that is accumulated by the time of the jth failure. This may be seen
as follows:

During the Interval Number of Items on Test Test Time Accrued


(0, x1) n nx1
(x1, x2) n−1 (n − 1) (x2 − x1)
(x2, x3) n−2 (n − 2) (x3 − x2)
· · ·
· · ·
· · ·
(xj − 1, xj) n−j+1 (n − j + 1) (xj − xj − 1)
· · ·
· · ·
· · ·
(xn − 1, xn) 1 (xn − xn − 1)
114 Reliability Engineering: Probabilistic Models and Maintenance Methods

Taking the sum after any failure time, one has


j

t( x j ) = å (n - k + 1)(x - x
k =1
k k -1 )

and this reduces to the form shown in Equation 7.35.


To apply the TTT transform to test data, we first construct the empirical life
distribution corresponding to the test observations. For a sample of n items
placed on test, let FXn(t) represent the empirical life distribution. Thus,

ì 0 0 £ t < x1
ï1/n x1 £ t < x2
ï
ïï ⋮ ⋮
FXn (t) = í (7.36)
ï j/n x j £ t < x j +1
ï ⋮ ⋮
ï
ïî 1 xn £ t < ¥

Given this definition, the inverse function is defined by

FX-n1 (u) = min {x j |FXn ( x j ) ³ u} (7.37)

which is the smallest failure time with estimated cumulative probability


equal to or greater than u. For example, for the data set shown in Table 7.2,
we have

ì 0 0 £ t < 0.006
ï0.02 0.006 £ t < 0.019
ï
ï0.04 0.019 £ t < 0.026
ï
ï0.06 0.0226 £ t < 0.038
FXn (t) = í
ï ⋮ ⋮
ï ⋮ ⋮
ï
ï0.98 4.631 £ t < 5.292
ï1.00 5.292 £ t < ¥
î

A few values of the corresponding inverse function are FX-n1 (0.04) = 0.019,
FX-n1 (0.30) = 0.465, and FX-n1 (0.99) = 4.631. For the same data set, note that
τ(x1) = 50(0.006) = 0.30, τ(x2) = 50(0.006) + 49(0.013) = 0.937, and τ(x3) = 50(0.006) +
49(0.013) + 48(0.007) = 1.273.
Nonparametric Statistical Methods 115

The use of the TTT transform to analyze test data proceeds by applying the
transform to FXn(t) in the same manner as for FT(t). That is,

FX-1 ( u )
n

H F -1 (u) =
Xn ò0
FXn (w)dw (7.38)

However, recognizing that FXn (t) is a step function, the integral may be
expressed as a sum. Consider the transform evaluated at u = j/n:

FX-1 ( j/n )
n
æ jö
H F -1
Xn
ç ÷=
ènø ò
0
FXn (w)dw

FX-1 (1/n ) FX-1 ( 2/n ) FX-1 ( j/n )


n n n

=
ò
0
FXn (w)dw +
FX-1
ò(
1/n )
FXn (w)dw + ⋯ +
FX-1 (
ò j -1/n )
FXn (w)dw
n n

x1 x2 xj
n n -1 n- j+1
=
ò
0
n
dw +
ò
x1
n
dw + ⋯ +
ò
x j-1
n
dw

1
=
n
( nx1 + (n - 1)(x2 - x1 ) + (n - 2)(x3 - x2 ) + ⋯ + (n - j + 1)(x j - x j -1 ))
1
= t( x j )
n

Thus, the TTT transform applied to the test data reduces to the sum of test
times—the total testing time—defined in Equation 7.35. In addition, in the
limit, the empirical transform corresponds to the theoretical transform for
the underlying life distribution. That is,

FX-1 ( j/n ) FT ( s )
n
æ jö
lim H F -1 ç ÷ = lim
n ®¥
j /n ® s
Xn
è n ø jn/®¥
n®s
ò
0
FXn (w)dw =
ò F (w)dw = H
0
T FT-1
(s)

so the transform based on the empirical distribution is representative of the


transform for the actual life distribution. Consequently, we can use the TTT
transform obtained from the data to characterize the underlying life distri-
bution. We actually use the scaled TTT transform. For the test data, the limit
116 Reliability Engineering: Probabilistic Models and Maintenance Methods

mentioned earlier also applies to the case of s = 1 so the scaling constant is


τ(xn). Thus, the scaled transform is

1
æ j ö H FX-n1 ( j/n ) n
t( x j ) t( x )
j
Q F -1 ç ÷ = = = (7.39)
xn
ènø H F -1 (1) 1 t( x )
Xn t( xn ) n
n

The application to sample data implied by the earlier discussion and results
is that the quantities τ(xj) and τ(xn) are computed and their ratio is plotted
against j/n. If the result is approximately a 45° line, one concludes that the
life distribution is constant hazard. On the other hand, if the plot is concave
and lies mostly above the 45° line, one concludes that the life distribution is
increasing failure rate (IFR), and if the plot is convex and lies below the 45°
line, one concludes that the life distribution is decreasing failure rate.
Consider the example provided by the data of Table 7.2. The computed val-
ues of the scaled transform are shown in Table 7.5. Figure 7.2 shows the plot
of the values in the table. Note that the points appear to resemble a 45° line.
In fact, the plot seems to cross the 45° line several times and to generally lie
close to it. An examination of the values in the table confirms this. It seems
reasonable to conclude that the test data correspond to a constant hazard life
distribution.

TABLE 7.5
Empirical Values of the Scaled Total Time on Test Transform
xj j/n Θ(x j) xj j/n Θ(x j) xj j/n Θ(x j)
0.006 0.02 0.0046 0.613 0.36 0.3665 1.555 0.70 0.6951
0.019 0.04 0.0143 0.615 0.38 0.3674 1.559 0.72 0.6960
0.026 0.06 0.0194 0.658 0.40 0.3878 1.690 0.74 0.7240
0.038 0.08 0.0281 0.720 0.42 0.4162 1.803 0.76 0.7464
0.056 0.10 0.0407 0.783 0.44 0.4441 1.943 0.78 0.7721
0.068 0.12 0.0489 0.795 0.46 0.4492 2.008 0.80 0.7830
0.090 0.14 0.0637 0.829 0.48 0.4633 2.190 0.82 0.8108
0.129 0.16 0.0893 0.875 0.50 0.4815 2.505 0.84 0.8541
0.129 0.18 0.0893 0.883 0.52 0.4846 2.887 0.86 0.8947
0.223 0.20 0.1482 0.894 0.54 0.4886 3.085 0.88 0.9212
0.279 0.22 0.1824 1.016 0.56 0.5315 3.106 0.90 0.9231
0.299 0.24 0.1943 1.218 0.58 0.5993 3.354 0.92 0.9420
0.336 0.26 0.2158 1.272 0.60 0.6167 3.503 0.94 0.9511
0.455 0.28 0.2831 1.285 0.62 0.6206 3.785 0.96 0.9641
0.465 0.30 0.2886 1.373 0.64 0.6462 4.631 0.98 0.9899
0.548 0.32 0.3329 1.494 0.66 0.6794 5.292 1.00 1.00
0.599 0.34 0.3594 1.541 0.68 0.6916
Nonparametric Statistical Methods 117

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


j/n

FIGURE 7.2
Plot of the scaled total time on test transform.

In order to illustrate the contrasting behavior to constant hazard, consider


the set of ordered test data in Table 7.6.
For this data set, the values of the scaled transform are listed in Table 7.7.
The corresponding scaled TTT plot for the data is shown in Figure 7.3. Note
that except for some crossing near zero and near 1.0, the entire plot lies above
the 45° line. The appropriate interpretation is that the devices that produced
the test data appear to display increasing hazard. The observed behavior is
confirmed by an inspection of the transform values that lie well above the
corresponding values of j/n.
Both of the data analyses displayed so far used the entire complete data
set. As discussed earlier, it is often impractical to run a test until all test
units have failed. When the test is terminated early and a censored data set

TABLE 7.6
Another Example Life Test Data Set 
j xj j xj j xj j xj j xj
1 0.023 11 0.303 21 0.511 31 0.754 41 1.224
2 0.025 12 0.370 22 0.522 32 0.767 42 1.252
3 0.081 13 0.371 23 0.532 33 0.795 43 1.344
4 0.110 14 0.373 24 0.571 34 0.802 44 1.378
5 0.185 15 0.394 25 0.579 35 0.873 45 1.562
6 0.226 16 0.400 26 0.596 36 0.884 46 1.580
7 0.230 17 0.412 27 0.605 37 0.936 47 1.653
8 0.278 18 0.435 28 0.627 38 0.993 48 1.659
9 0.278 19 0.449 29 0.673 39 1.001 49 1.764
10 0.287 20 0.494 30 0.753 40 1.087 50 2.520
118 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 7.7
Empirical Values of the Scaled Total Time on Test Transform
xj j/n Θ(x j) xj j/n Θ(x j) xj j/n Θ(x j)
0.023 0.02 0.0315 0.435 0.36 0.5121 0.873 0.70 0.7880
0.025 0.04 0.0342 0.449 0.38 0.5243 0.884 0.72 0.7925
0.081 0.06 0.1078 0.494 0.40 0.5625 0.936 0.74 0.8125
0.110 0.08 0.1451 0.511 0.42 0.5765 0.993 0.76 0.8328
0.185 0.10 0.2396 0.522 0.44 0.5852 1.001 0.78 0.8354
0.226 0.12 0.2901 0.532 0.46 0.5929 1.087 0.80 0.8613
0.230 0.14 0.2949 0.571 0.48 0.6217 1.224 0.82 0.8988
0.278 0.16 0.3514 0.579 0.50 0.6274 1.252 0.84 0.9057
0.278 0.18 0.3514 0.596 0.52 0.6391 1.344 0.86 0.9259
0.287 0.20 0.3615 0.605 0.54 0.6450 1.378 0.88 0.9324
0.303 0.22 0.3790 0.627 0.56 0.6588 1.562 0.90 0.9626
0.370 0.24 0.4506 0.673 0.58 0.6865 1.580 0.92 0.9651
0.371 0.26 0.4516 0.753 0.60 0.7325 1.653 0.94 0.9731
0.373 0.28 0.4537 0.754 0.62 0.7331 1.659 0.96 0.9735
0.394 0.30 0.4744 0.767 0.64 0.7398 1.764 0.98 0.9793
0.400 0.32 0.4801 0.795 0.66 0.7536 2.520 1.00 1.00
0.412 0.34 0.4913 0.802 0.68 0.7569

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


j/n

FIGURE 7.3
Plot of the scaled total time on test transform for an increasing failure rate distribution.

is obtained, it is still possible to use the TTT transform. The adjustments


necessary to do this are treated in the following section.

7.6.3 Application to Censored Data Sets


Suppose a life test has been performed on a sample of n = 50 items and that
the test was terminated early so that r of the n items were observed to fail
Nonparametric Statistical Methods 119

and n − r had not yet failed. For purposes of illustration, assume that only
the first r data values of Table 7.2 had been recorded. In that case, the total
observed test time is
r -1
t( xr ) = å x + (n - j + 1)x
k =1
k r (7.40)

For the censored data set, this quantity is used in place of τ(xn) as the scaling
constant and the scaled transform is defined as

1
t( x j ) t( x )
æ jö n j
Q F -1 ç ÷ = = (7.41)
xr
è ø
r 1
t( xr ) t( x r)
n

and this quantity is plotted versus j/r. For the example data set of Table 7.2,
the computed values for r = 12 and for r = 20 are listed in Table 7.8. The cor-
responding plots of the scaled transform are shown in Figure 7.4a and b,

TABLE 7.8
Empirical Values of the Scaled Total Time
on Test Transform for Censored Data Sets
r = 12 r = 20
xj j/r Θ(x j) j/r Θ(x j)
0.006 0.083 0.0236 0.05 0.0118
0.019 0.167 0.0736 0.10 0.0369
0.026 0.250 0.1000 0.15 0.0501
0.038 0.333 0.1444 0.20 0.0723
0.056 0.417 0.2094 0.25 0.1050
0.068 0.500 0.2519 0.30 0.1262
0.090 0.583 0.3280 0.35 0.1643
0.129 0.667 0.4598 0.40 0.2304
0.129 0.750 0.4598 0.45 0.2304
0.223 0.833 0.7627 0.50 0.3822
0.279 0.917 0.9387 0.55 0.4704
0.299 1.00 1.00 0.60 0.5011
0.336 0.65 0.5565
0.455 0.70 0.7299
0.465 0.75 0.7441
0.548 0.80 0.8585
0.599 0.85 0.9268
0.613 0.90 0.9450
0.615 0.95 0.9475
0.658 1.00 1.00
120 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(a) j/n

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(b) j/n

FIGURE 7.4
(a) Plot of the scaled total time on test transform for censored data with r = 12. (b) Plot of the
scaled total time on test transform for censored data with r = 20.

respectively. These plots serve to illustrate the facts that the transform may
be applied to censored data and that our ability to interpret the plots is sig-
nificantly influenced by the degree of censoring. For the case in which r = 20,
we can be reasonably confident that the hazard is constant. For the plot cor-
responding to the data censored after r = 12 observations, our conclusion of
constant hazard is rather more tenuous.
Once again, to provide a contrast to the constant hazard case, consider
the data from an IFR distribution that is listed in Table 7.6. If that data had
been generated in a test with censoring at either r = 12 or r = 20, the cor-
responding data values would have been those shown in Table 7.9 and the
corresponding plots of the scaled transforms would have been those shown
Nonparametric Statistical Methods 121

TABLE 7.9
Empirical Values of the Scaled Total Time on
Test Transform for Censored Data from an
Increasing Failure Rate Distribution
r = 12 r = 20
xj j/r Θ(x j) j/r Θ(x j)
0.023 0.083 0.0699 0.05 0.0560
0.025 0.167 0.0759 0.10 0.0608
0.081 0.250 0.2392 0.15 0.1916
0.110 0.333 0.3220 0.20 0.2580
0.185 0.417 0.5317 0.25 0.4260
0.226 0.500 0.6438 0.30 0.5157
0.230 0.583 0.6545 0.35 0.5243
0.278 0.667 0.7798 0.40 0.6247
0.278 0.750 0.7798 0.45 0.6247
0.287 0.833 0.8023 0.50 0.6427
0.303 0.917 0.8411 0.55 0.6738
0.370 1.00 1.00 0.60 0.8011
0.371 0.65 0.8028
0.373 0.70 0.8066
0.394 0.75 0.8434
0.400 0.80 0.8535
0.412 0.85 0.8734
0.435 0.90 0.9104
0.449 0.95 0.9321
0.494 1.00 1.00

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(a) j/n

FIGURE 7.5
(a) Plot of the scaled total time on test transform for censored data with r = 12. (Continued)
122 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.8

0.6
TTT

0.4

0.2

0.2 0.4 0.6 0.8 1


(b) j/n

FIGURE 7.5 (Continued)


(b) Plot of the scaled total time on test transform for censored data with r = 20.

in Figure 7.5a and b. Here again, it is clear that the degree of censoring affects
the confidence we have in our interpretations of the plots.
In closing this discussion, it should now be reasonably clear that the TTT
transform provides a method that is very simple to perform for characterizing
the hazard behavior of a device population.

7.7 Nelson Cumulative Hazard Estimation Method


As noted in Chapter 3, the reliability function for any plausible choice of life
distribution may be expressed in terms of the hazard function by

FT (t) = e ò0
- zT ( x ) dx
= e - ZT (t )

Nelson [56] suggests that since this expression may be solved to obtain the
cumulative hazard as a function of the reliability, estimates of the cumula-
tive hazard are implied by the Kaplan–Meier estimates of reliability. That is,
cumulative hazard may be estimated by

æ ö
( )
ˆ
Zˆ T (t) = - ln FT (t) = - ln ç
è i
Õpˆ i ÷ = -
ç i|x £t ÷
ø
å
i|xi £ t
ln ( pˆ i ) (7.42)
Nonparametric Statistical Methods 123

TABLE 7.10
Computed Estimates for the
Cumulative Hazard Function

i xi wi p̂i ẐT ( xi )

0 0 0
1 1.20 1 0.960 0.041
2 3.27 1 0.958 0.083
3 3.46 1 0.957 0.128
4 3.57 0 0 0.128
5 7.19 1 0.952 0.177
6 10.93 1 0.950 0.228
7 11.73 1 0.947 0.282
8 13.75 0 0 0.282
9 14.14 1 0.941 0.343
10 16.09 1 0.938 0.407

This expression may be simplified as

ni - ci - di
å ln ( pˆ ) = - å ln æçè ö æ di ö
Zˆ T (t) = -
i|xi £ t
i
i|xi £ t
ni - ci ÷=-
ø
å
i|xi £ t
ln ç 1 -
è n ÷
i - ci ø
(7.43)

which may be implemented directly. Applying Equation 7.43 to the first 10


observations of Table 7.3 yields the estimates shown in Table 7.10. The esti-
mates themselves are informative as they show how the hazard is accumu-
lating. In addition, each of the cumulative hazard estimates may be used to
obtain an estimate of the reliability using

ˆ ˆ
FT (t) = e - ZT (t )

ˆ
For example, using Zˆ T (16.09) = 0.407 yields FT (16.09) = 0.665, which may be
compared to the Kaplan–Meier estimate of 0.666.
In closing, it should now be apparent that nonparametric methods can pro-
vide us with substantial information concerning the reliability of a device
design without requiring us to assume an underlying distribution model of
the dispersion in failure times.
124 Reliability Engineering: Probabilistic Models and Maintenance Methods

Exercises
7.1 The following data set was obtained from a life test of n = 50 copies of
a component.

i ti i ti i ti i ti i ti
1 551.881 11 297.883 21 122.750 31 539.933 41 141.582
2 964.448 12 78.966 22 119.677 32 175.578 42 329.841
3 687.943 13 526.061 23 568.533 33 465.506 43 570.971
4 206.215 14 558.106 24 453.852 34 208.198 44 929.433
5 844.059 15 484.969 25 267.140 35 326.713 45 67.964
6 439.283 16 282.293 26 128.874 36 154.290 46 294.060
7 170.110 17 589.303 27 675.259 37 703.458 47 23.774
8 273.522 18 1032.227 28 347.812 38 327.022 48 295.930
9 475.883 19 726.202 29 283.398 39 511.423 49 514.202
10 255.646 20 573.447 30 357.552 40 560.902 50 251.874

For these data, identify the values of x1, x5, x10, and x50.
7.2 For the data set of Problem 7.1, compute the point estimates of the reli-
ability at x1, x5, x10, and x50 with both the mean- and the median-based
estimation equations. How do these estimates compare?
7.3 Using the data in Problem 7.1, compute 95% confidence intervals for the
failure probability at each of x1, x5, x10, and x50 using each of Equations 7.8
and 7.11. How do these intervals compare?
7.4 Suppose a sample of 40 copies of a device were placed on test and that
the observed failure times are:

1080.75 1860.35 1282.37 1861.67


1990.16 1218.02 1392.7 1447.29
727.969 1760.22 2025.54 1116.86
1396.47 1967.9 1849.47 2166.7
1459.98 1897.26 2634.16 1725.15
972.731 1081.11 1452.74 2141.54
1365.97 1891.58 1331.48 2058.84
1889.62 1907.9 822.497 1213.68
1401.61 1805.05 2171.56 1386.75
2296.4 1945.56 2546.97 991.637

ˆ
For this data set, use the median rank–based statistics to compute FT ( x 4 )
and the beta distribution–based limits to construct a 95% confidence
interval for that value.
7.5 Test data for a prototype battery [52] were accumulated over 550 hours
with an initial test sample of n = 68 specimens. The batteries were only
Nonparametric Statistical Methods 125

inspected at 50 hour intervals so τ = (50, 100, 150, 200, 250, 300, 350, 400,
450, 500, 550). The associated observed failures and specimen removals
were x = (1, 0, 1, 4, 1, 1, 1, 4, 4, 2, 2) and m = (5, 6, 1, 6, 2, 1, 2, 2, 3, 1, 0).
Compute reliability estimates for each of the inspection times and con-
fidence intervals for the estimates at 200 and 400 hours.
7.6 Using the data of Problem 7.1, compute a 95% tolerance bound on the
reliability at x10 and x15. Then compute the level of confidence the data
provide that the reliability at x20 exceeds 0.55.
7.7 The following ordered data set was obtained during a life test of n = 20
copies of a component.

0.230 1.059 1.814 6.533


0.347 1.537 2.633 7.237
0.499 1.568 2.740 7.246
0.811 1.696 2.930 7.380
1.027 1.725 3.732 7.499

Compute 95% confidence intervals for the failure probability at x2 and


x10 using each of Equations 7.8 and 7.11.
7.8 For the data of Problem 7.7, compute the reliability tolerance bound at x1.
7.9 Suppose a sample of 28 copies of a device were put on test and that the
following data were obtained:

j tj wj j tj wj
1 2663.42 1 15 1391.43 1
2 1031.09 1 16 1020.26 1
3 463.70 1 17 1494.93 1
4 413.09 1 18 1847.83 0
5 866.53 0 19 256.93 1
6 1827.96 1 20 1307.72 0
7 1788.34 1 21 1033.41 0
8 1282.64 1 22 1994.37 1
9 640.61 0 23 1286.57 1
10 1151.34 1 24 692.62 1
11 852.12 1 25 847.15 1
12 659.83 1 26 1562.79 1
13 1429.42 1 27 706.13 1
14 1474.34 1 28 541.41 1

where wj = 1 indicates that the recorded time is a failure and wj = 0


implies that the recorded time is a removal. Construct the Kaplan–
Meier reliability estimates for these data and plot the estimates along
with the corresponding 95% confidence intervals.
126 Reliability Engineering: Probabilistic Models and Maintenance Methods

7.10 Use the data set of Problem 7.7 to compute and plot the TTT transform.
Indicate what type of hazard behavior the data display.
7.11 For the following data set obtained from a life test, order the data and
then plot the TTT transform. Indicate what form of the hazard function
is suggested by the plot.

3.27765 8.38154 2.51122 12.8582


12.3234 23.0741 9.82042 23.0129
25.952 1.78442 12.0395 8.0937
19.4943 4.36944 17.3134 18.6497
12.0167 16.4399 21.0961 36.6558

7.12 For the following data set obtained from a life test, order the data and
then plot the TTT transform. Indicate what form of the hazard function
is suggested by the plot.

0.4182 0.1721 0.0056 0.3914


0.0004 0.7963 0.7501 0.0161
2.6405 0.0144 1.1699 0.2276
0.00003 0.0006 0.0027 0.9726
0.0317 1.2510 0.0179 0.7354

7.13 For the following data set obtained from a life test, order the data and
then plot the TTT transform. Indicate what form of the hazard function
is suggested by the plot.

i ti i ti i ti i ti i ti
1 635.655 11 456.731 21 335.464 31 282.015 41 170.998
2 369.012 12 459.482 22 100.790 32 172.954 42 465.023
3 312.489 13 420.944 23 453.539 33 216.222 43 254.582
4 196.092 14 306.064 24 82.843 34 204.064 44 319.789
5 72.393 15 216.330 25 356.053 35 228.195 45 285.048
6 22.150 16 180.638 26 255.021 36 528.971 46 307.34
7 302.257 17 137.704 27 302.217 37 270.25 47 318.541
8 114.434 18 159.855 28 181.568 38 117.524 48 242.783
9 68.381 19 231.442 29 93.694 39 70.280 49 458.005
10 200.899 20 203.094 30 314.594 40 93.151 50 130.900

7.14 For the data set of Problem 7.13, assume that because of test termina-
tion, only the earliest 16 data values are available. Construct the plot of
the TTT transform for the resulting censored data set.
7.15 For the data set of Problem 7.13, assume that because of test termina-
tion, only the earliest 25 data values are available. Construct the plot of
the TTT transform for the resulting censored data set.
Nonparametric Statistical Methods 127

7.16 Use the complete set of the following life test data to construct a plot of
the TTT transform and indicate what type of behavior the hazard func-
tion appears to have.

i ti i ti i ti i ti
1 29.835 11 1048.13 21 126.097 31 154.884
2 1262.860 12 641.953 22 434.761 32 103.444
3 804.623 13 762.882 23 170.046 33 176.225
4 691.363 14 206.062 24 1880.470 34 252.424
5 654.951 15 593.040 25 1058.727 35 333.961
6 409.087 16 224.793 26 957.271 36 1989.75
7 1615.690 17 203.809 27 2.970 37 1646.63
8 470.408 18 309.879 28 75.239 38 344.135
9 918.823 19 3094.740 29 346.605 39 48.831
10 68.348 20 55.854 30 801.645 40 131.215

7.17 For the data set of Problem 7.16, assume that because of test termina-
tion, only the earliest 12 data values are available. Construct the plot of
the TTT transform for the resulting censored data set.
7.18 For the data set of Problem 7.16, assume that because of test termina-
tion, only the earliest 20 data values are available. Construct the plot of
the TTT transform for the resulting censored data set.
7.19 For the data set in Problem 7.4, use the complete data set to compute
and plot the TTT transform and to characterize the hazard function.
7.20 Use the data in Problem 7.7 to compute Nelson cumulative hazard
estimates x5, x10, x14, x18, and x22 and then plot those estimates.
8
Parametric Statistical Methods

Parametric statistical methods for analyzing reliability data start with


an assumption of the form of the life distribution. Usually, the choice of a
distribution model is based on experience with similar types of devices or
an understanding of the phenomena that determine item failure. In some
cases, a general model such as the Weibull or gamma is assumed because the
availability of two distribution parameters makes it likely that a reasonable
representation of the failure probabilities will be obtained.
Once a life distribution is assumed, the statistical methods are used to
obtain estimates for the parameters of those distributions—hence the ter-
minology parametric methods. There are a large number of parametric reli-
ability estimation methods. Only the principal methods are presented here.
As each of the methods is discussed, it is illustrated by application to various
life distributions with particular emphasis on the Weibull and exponential
life distributions. The three principal methods that are discussed here are
(1) the graphical method, (2) the method of moments, and (3) the likelihood
method. Whenever necessary, the notational conventions discussed at the
start of the last chapter are used here as well.

8.1 Graphical Methods


Most of the graphical methods are based on the general relationship between
the cumulative hazard function and the reliability function. As we know,

FT (t) = e - ZT (t ) (8.1)

so consequently,

- ln FT (t) = ZT (t) (8.2)

If test data are used to compute estimated values of the reliability and these
estimates are plotted as a function of time, the resulting graph will pro-
vide estimated values for the distribution parameters that form the hazard
function.

129
130 Reliability Engineering: Probabilistic Models and Maintenance Methods

To be more specific, assume the life lengths of a population of components


are believed to be well represented by an exponential distribution. The real-
ization of Equation 8.2 for the exponential is

1
- ln FT (t) = ln = lt (8.3)
FT (t)

which is the equation for a line. Hence, if we represent our successive reli-
ability estimates by
1 1 n
y j = ln = ln = ln (8.4)
ˆ ( n - j ) n n -j
F T (t)

then a plot of yj versus xj should be a line with slope of λ. Of course, while we


can physically plot the observed data, we would be more likely to use linear
regression to determine the line that best fits the data.
Consider an example. Suppose 20 copies of a component are placed on test
and their ordered failure times are those listed in Table 8.1. Suppose further
that we calculate the corresponding values of the yj, which are also shown in
the table. Then the actual plot of the data is shown in Figure 8.1. As we can see,
within the variation expected in sample data, the graph is approximately lin-
ear. Using the plot to identify the slope and thus the estimated value of λ is not
particularly precise, but a reasonable choice of value appears to be lˆ = 0.0058. If
we compute the actual regression line, the algebraic expression for the slope is

å x y -å x å
n n n
n j j j yj
j =1 j =1 j =1
l̂ = 2 (8.5)
nå x - æç å x ö÷
n n
2
j j
è j =1 ø j =1

TABLE 8.1
Example Ordered Failure Data
j xj n–j/n yj j xj n–j/n yj
1 4.740 0.950 0.051 11 128.756 0.450 0.799
2 12.636 0.900 0.105 12 150.393 0.400 0.916
3 17.358 0.850 0.163 13 168.101 0.350 1.050
4 22.099 0.800 0.223 14 194.277 0.300 1.204
5 29.085 0.750 0.288 15 238.897 0.250 1.386
6 32.732 0.700 0.357 16 303.383 0.200 1.609
7 41.725 0.650 0.431 17 340.621 0.150 1.897
8 57.518 0.600 0.511 18 382.142 0.100 2.303
9 62.864 0.550 0.598 19 492.023 0.050 2.996
10 65.288 0.500 0.693 20 544.017 0.0
Parametric Statistical Methods 131

y
3

2.5

1.5

0.5

x
100 200 300 400 500

FIGURE 8.1
Example failure data plot.

This equation gives lˆ = 0.0055 as the estimate. As a point of information, the


example data were generated using a value of λ = 0.005.
Before we leave this example, note that we did not include an estimate of
the reliability for the last of the observed failure times. This is because we
used the following mean order statistic-based estimation equation (7.4):
j
FˆT (t) =
n

and in this case - ln Fˆ T (t) is undefined as the reliability estimate is zero. If


we had used the equation based on the median of the distribution on order
statistics—Equation 7.5—we would have been able to include the final data
value in our calculations.
The application of Equation 8.2 to the Weibull distribution proceeds in a
comparably direct manner. First, we obtain
b
ætö
- ln FT (t) = ç ÷
èqø

and taking the logarithm again yields

( )
ln - ln FT (t) = b ln t - b ln q (8.6)

which is again the equation for a line. In this case, the intercept is nonzero.
We again represent the dependent variable, which is the estimated reliability
at each failure time, by yj and the result is a data set such as the example set
132 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 8.2
Example Weibull Failure Data

ˆ ˆ
j xj ln(x j) FT (t ) yj j xj ln(x j) FT (t ) yj
1 390.896 5.968 0.972 −3.577 14 932.309 6.838 0.461 −0.255
2 509.925 6.234 0.933 −2.670 15 957.288 6.864 0.421 −0.146
3 540.671 6.293 0.894 −2.186 16 984.191 6.892 0.382 −0.038
4 594.520 6.388 0.854 −1.849 17 1003.160 6.911 0.343 0.069
5 621.604 6.432 0.815 −1.587 18 1018.753 6.926 0.303 0.177
6 626.117 6.440 0.776 −1.370 19 1030.576 6.938 0.264 0.287
7 679.096 6.521 0.736 −1.183 20 1082.845 6.987 0.224 0.402
8 664.210 6.499 0.697 −1.018 21 1222.792 7.109 0.185 0.523
9 710.355 6.566 0.657 −0.869 22 1279.176 7.154 0.146 0.656
10 714.938 6.572 0.618 −0.732 23 1285.361 7.159 0.106 0.807
11 746.485 6.615 0.579 −0.603 24 1392.606 7.239 0.067 0.995
12 763.342 6.638 0.539 −0.482 25 1577.441 7.364 0.028 1.279
13 775.172 6.653 0.500 −0.367

shown in Table 8.2. Note that the values of the yj listed in the table are com-
puted using Equation 7.5:

j - 0.3
FˆT (t) =
n + 0.4

so all of the data may be included in our analysis. The independent variable
in this case is the logarithm of the failure time as specified in Equation 8.6.
The plot of the data is shown in Figure 8.2. Clearly, using the plot to obtain
parameter estimates would be quite difficult for this case. Instead of trying

0 x
6.0 6.2 6.4 6.6 6.8 7.0 7.2

–1

–2

–3

FIGURE 8.2
Plot of logarithms of Weibull failure data.
Parametric Statistical Methods 133

to judge the behavior of the plot, we use the regression analysis equations to
calculate our estimates.
The slope and intercept of a line fit to this type of data are given by
Equations  8.7 and 8.8. For the data listed in Table 8.2, the computed value
of the slope is 3.344 and that for the intercept is −23.05. Now, inverting
Equation 8.6, these numerical values correspond to

bˆ = 3.344

ˆ
qˆ = e -( -23.05)/b = e 23.05/3.344 = 984.626

å y j ln( x j ) - æç å y j ö÷ æç å ln( x j ) ö÷
n n n
n
Slope =
j =1 è j =1 øè j =1 ø (8.7)
2

å ( ln(x j )) - çèæ å ö
n 2 n
n ln( x j ) ÷
j =1 j =1 ø

æ
å å y - æçè å ln(x ) ö÷ø æçè å
ö y j ln( x j ) ö÷
n n n n
( ln(x j ))
2
ç ÷ j j
Intercept = è ø ø
j =1 j =1 j =1 j =1
2 (8.8)
nå ( ln( x ) ) - çæ å ln( x ) ö÷
n 2 n
j j
è
j =1 ø j =1

Equations 8.7 and 8.8 are the standard linear regression forms. They have
a corresponding matrix form that is actually easier to understand and use.
For a general linear fit, the data pairs, say, (uj, vj), that correspond to a model

v = a + bu

are arranged in matrix/vector form with

é1 u1 ù é v1 ù
ê1 u2 úú êv ú
U=ê and V = ê ú
2

ê⋮ ⋮ú ê⋮ú
ê ú ê ú
ë1 un û ë vn û

so that the basic model is

V = UM

with the two coordinates of the matrix M being the intercept “a” and the
slope “b.” The regression solution of this model for a set of data is

M = [U ¢U ]-1U ¢V (8.9)
134 Reliability Engineering: Probabilistic Models and Maintenance Methods

As noted earlier, for the Weibull distribution, this solution corresponds to

ébˆ ln qˆ ù
M=ê ú
êë bˆ úû

There are three final points related to the graphical method. The first is
that the method can be applied to other distributions. However, because
most other distributions used to model life length do not have a closed-
form representation of the cumulative distribution function, estimation of
the parameters for those distributions is usually performed using methods
other than the graphical ones.
Second, if a plot of Weibull data displays curvature, particularly near the
ends, this is evidence of the existence of a third parameter, the minimum life
parameter (Figure 8.3). This was the parameter δ in Equation 4.14. The best
approach to this situation is to use a search method to identify the value of
the minimum life parameter for which the regression fit is best (Table 8.3).
Consider the example of the following data set:
Notice the decided nonlinearity of the initial data plot and the curvature at
the ends of the plot. By successive trials using a search strategy, we find that
a value of δ = 400.0 results in the plot shown in Figure 8.4. Then applying the
regression analysis to the data adjusted by δ, we obtain estimates of bˆ = 2.85
and qˆ = 606.771.
Finally, it is important to note that the graphical method (and its regression
equivalent) applies directly to censored data. If, for example, a sample of
n = 40 items are placed on test and the test is terminated after only 18 failures,
the resulting failure times can be plotted against the corresponding esti-
mated failure probabilities and the parameter estimation equations are the
slope and intercept expressions. As a further example, suppose only the first

0 x
6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1

–1

–2

–3

FIGURE 8.3
Initial plot of logarithms of Weibull failure data.
Parametric Statistical Methods 135

TABLE 8.3
Example Weibull Failure Data With Minimum Life

ˆ ˆ
J xj ln(x j) FT (t ) yj j xj ln(x j) FT (t ) yj
1 589.614 6.379 0.966 −3.355 11 1003.984 6.912 0.475 −0.297
2 640.936 6.463 0.917 −2.442 12 1008.317 6.916 0.426 −0.160
3 670.372 6.508 0.868 −1.952 13 1030.989 6.938 0.377 −0.026
4 731.327 6.595 0.819 −1.609 14 1040.955 6.948 0.328 0.107
5 828.633 6.720 0.770 −1.340 15 1052.332 6.959 0.279 0.243
6 859.266 6.756 0.721 −1.116 16 1062.327 6.968 0.230 0.384
7 870.480 6.769 0.672 −0.921 17 1103.345 7.006 0.181 0.535
8 881.452 6.782 0.623 −0.747 18 1103.629 7.006 0.132 0.704
9 894.234 6.796 0.574 −0.587 19 1192.351 7.084 0.083 0.910
10 934.200 6.840 0.525 −0.438 20 1230.343 7.115 0.034 1.216

0 x
5.4 5.6 5.8 6.0 6.2 6.4 6.6

–1

–2

–3

FIGURE 8.4
Revised plot of logarithms of Weibull failure data using δ = 400.0.

12 of the data points in Table 8.1 had been observed. This means that 60% of
the copies of the device on test had failed, but the total test time would then
have been around 150.393 units rather than the full 544.017 indicated in the
table. This is a significant savings in test time and the resulting parameter
estimate is an identical lˆ = 0.0058 to the one obtained using the full data set.

8.2 Method of Moments


The method of moments is actually well known and is quite intuitive. The
idea is that the sample moments are equated to the algebraic expressions
of the distribution moments and the resulting equations are solved for
136 Reliability Engineering: Probabilistic Models and Maintenance Methods

parameter estimates. The most common realization of this idea is the use of
a sample mean to estimate a population parameter such as its mean.

8.2.1 Estimation Expressions


For the exponential distribution, the population mean is the reciprocal of the
distribution parameter:

1
E[T ] = (8.10)
l

As the sample mean is a reasonable estimate for the population mean, one
may use Eˆ [T ] = x so

1
l̂ = (8.11)
x

For the example data in Table 8.1, we obtain x = 164.43, which yields the esti-
mate lˆ = 0.0061.
For the Weibull distribution, two equations are needed to estimate
two parameters. In principle, we could use the mean and the variance
expressions along with the sample mean and the sample variance. However,
there is a slightly simpler approach. In general, the moments of the Weibull
distribution may be determined to be

æ kö
E[T k ] = qk G ç 1 + ÷ (8.12)
è bø

Therefore, the mean of a Weibull distribution is

æ 1ö
E[T ] = qG ç 1 + ÷ (8.13)
è bø

and the corresponding variance is

2
æ 2ö æ æ 1 öö
Var[T ] = E[T ] - ( E[T ])
2
2
= q G ç 1 + ÷ - ç qG ç 1 + ÷ ÷
2

è b ø è è b øø

æ æ 2ö æ 1 öö
= q2 ç G ç 1 + ÷ - G 2 ç 1 + ÷ ÷ (8.14)
è è bø è b øø
Parametric Statistical Methods 137

Now, rather than equate the sample mean and sample variance to the distri-
bution mean and variance, we take the coefficient of variation

æ æ 2ö æ 1 öö æ 2ö
q2 ç G ç 1 + ÷ - G 2 ç 1 + ÷ ÷ G ç 1 + ÷
Var[t] è bø è b øø bø
c= 2 = è = è -1 (8.15)
E [T ] æ 1 ö æ 1ö
q 2G 2 ç 1 + ÷ G2 ç 1 + ÷
è bø è bø

which contains only one of the parameters. Given a set of failure data, we
solve this expression numerically for the estimate of β and we then use that
estimate in the expression for the mean to compute an estimate for θ.
Consider an example. The failure data in Table 8.2 have a sample mean value
of 884.153 and a sample variance of 91,548 (standard deviation of 302.569).
Thus, the sample value of the coefficient of variation is 0.1171. A numerical
search for the value of β in Equation 8.15 that most closely matches this value
yields an estimate of bˆ = 3.208. The sequence of functional evaluations used
to find this estimate is shown in Table 8.4. Note that only eleven trials were
needed to obtain the parameter estimate. Once the estimate of β is obtained,
we compute the estimate of the scale parameter using the sample mean in
place of the population mean in Equation 8.13:

x 884.153
qˆ = = = 987.04
æ 1ö G(1.3117 )
Gç1+ ÷
è bˆ ø

In the case of the normal distribution, the estimation process using the
method of moments is direct as the moments of the distribution are the mean

TABLE 8.4
Numerical Search Values
β c
1.0 1.0
5.0 0.0525
1.5 0.4610
3.5 0.1001
3.0 0.1321
3.3 0.1113
3.2 0.1177
3.21 0.1170
3.205 0.1173
3.208 0.11712
3.209 0.11706
138 Reliability Engineering: Probabilistic Models and Maintenance Methods

and variance. Thus, for a sample mean of x and sample variance of s2, the
estimation equations are

mˆ = x
(8.16)
ˆs2 = s2

The gamma distribution may also be analyzed using the method of moments.
The approach used for the Weibull distribution is the most efficient one for
the gamma. In general, the moments of the gamma distribution stated in
Equation 4.21 are
k -1

Õ (b + i )
1
E[T ] = k
k
(8.17)
l i =0

The application of this form yields a mean value of

b
E[T ] = (8.18)
l

and the corresponding variance is

b(b + 1) b2 b
Var[T ] = E[T 2 ] - ( E[T ]) =
2
- 2 = 2 (8.19)
l2 l l

Thus, as in the case of the Weibull, the coefficient of variation

Var[t] b/l 2 1
c= = 2 2 = (8.20)
E [T ] b /l
2
b

is expressed in terms of a single parameter. To obtain estimates of the param-


eters of a gamma distribution using sample data, we equate the reciprocal of
the sample coefficient of variation to β and then use the resulting estimate along
with the sample mean to estimate λ. For example, the life test data in Table 8.5
were obtained during the test of a component population for which the gamma
distribution is believed to represent well the dispersion in life lengths.
As these data display a sample mean value of 2948.75 and a sample standard
deviation of 1845.25, the application of Equation 8.20 and then Equation 8.18
yields

bˆ = 2.554 and lˆ = 0.00086

It should now be apparent that the method of moments is usually quite easy
to apply. It has intuitive appeal. The disadvantage of the method is that one
Parametric Statistical Methods 139

TABLE 8.5
Gamma Distributed Life Test Data
j xj j xj
1 692 11 2623
2 995 12 2881
3 1239 13 2972
4 1314 14 3271
5 1530 15 3618
6 1740 16 3889
7 1949 17 4493
8 2056 18 4973
9 2199 19 6214
10 2348 20 7979

does not expect to be able to use it with censored data. An approach for using
censored data has been suggested and is discussed in the final section of this
chapter.

8.2.2 Confidence Intervals for the Estimates


An important feature of the method of moments is that confidence inter-
vals may be defined for most of the parameter estimates. Those confidence
intervals may then be transferred to the associated reliability estimates. The
foundation for the construction of the confidence intervals is the fact that
each of the estimates has a sampling distribution and in most cases, the use
of reasonable sample sizes implies that the sampling distributions on the
estimators are normal or approximately normal. Consider the estimate for
the parameter of the exponential distribution.
In order to construct confidence intervals, let θ = 1/λ and work with
q̂ = x because sample means are often normal in distribution. In this case,
Var( x ) = q2/n, which can be manipulated to obtain a chi-square variate.
Consequently, an exact 1 – α probability confidence interval for θ is obtained as

2nx 2nx
2
£q£ (8.21)
c 2 n , 1- a / 2 c 22 n , a/2

Referring to the data in Table 8.1, the computed value of the sample mean is
x = 164.43, so

2(20)(164.43) 2(20)(164.43)
110.84 = £q£ = 269.23
59.34 24.43
140 Reliability Engineering: Probabilistic Models and Maintenance Methods

Then, using θ = 1/λ the confidence interval for the distribution parameter is

1 1
0.0037 = £l£ = 0.0090
269.23 110.84

This analysis provides an exact confidence interval for λ. The normal


distribution may be used to obtain an approximate confidence interval using
the expression

xe
(
- z1-a/2 n ) £ q £ xe -( za/2 n ) (8.22)

For the data in Table 8.1, Equation 8.22 yields

106.08 = 164.43e
(
- 1.96 20 ) £ q £ 164.43e -( -1.96 20 ) = 254.87

and inverting these expressions gives

1 1
0.0039 = £l£ = 0.0094
254.87 106.08

Given a confidence interval for the parameter of the distribution, confidence


intervals for the reliability function follow directly by substitution. That is,
for the exponential distribution, a 1 − α probability confidence interval for
the reliability at time t is
- l upper t
e £ FT (t) £ e - llower t (8.23)

For example, using the approximate confidence limits 0.0039 ≤ λ ≤ 0.0094 for
a time of 100 hours leads to

0.390 = e -( 0.0094 )(100 ) £ FT (100) £ e -( 0.0039)(100 ) = 0.675

With considerable difficulty, comparable analyses can be performed for the


Weibull parameter estimates. The source of the difficulty is that the sam-
pling distributions for the estimators have not been identified. For relatively
large sample sizes, the sampling distributions on the Weibull parameters are
approximately normal (or may be treated as if they are). Using this idea and
using the extreme value distribution form of the Weibull, Nelson [57] identi-
fies a method for computing approximate 1 – α probability confidence inter-
vals for the Weibull parameters. To implement Nelson’s method, the data are
first transformed into its extreme value equivalent by taking wj = ln xj and
then operating on the values wj. For the failure data listed in Table 8.2, the
logarithms of the failure times are shown in the third column. Those values
Parametric Statistical Methods 141

display a sample average of w = 6.728 and a sample standard deviation of


sw = 0.347. Using these values, the approximate 1 – α probability confidence
intervals are

p 6 (
1.049 za/2 n ) £ b £ p 6 e1.049( z1-a/2 n )
e (8.24)
sw sw

w+
sw
( g +1.081 za/2 n
) w+
sw
( g +1.081
z1-a/2 n
)
e p 6
£q£e p 6
(8.25)

where γ = 0.5772 is Euler’s constant. Applying these expressions to the esti-


mates bˆ = 3.208 and qˆ = 987.04, which were obtained for the failure data in
Table 8.2, yields

2.450 £ b £ 5.576 and 870.89 £ q £ 1095.33

It is acknowledged that these intervals are quite wide, but they represent
the most direct and simplest intervals available for the Weibull distribution
parameters when using the method of moments.
As in the case of the exponential distribution, the confidence intervals on
the parameters may be used to define confidence intervals on the system reli-
ability. However, the probability content of the intervals is no longer equal
to that of the parameters and is not generally known. It may be conjectured
to be approximately (1 – α)2. For the Weibull, the intervals for the reliability
function are

b
£ FT (t) £ e ( upper )
bupper lower
e ( lower )
- tq - tq
(8.26)

Applying this expression to the estimates computed for the data in Table 8.2
yields

0.301 £ FT (900) £ 0.539

for the confidence interval on the reliability at 900 hours.


In the case of the normal distribution, it is known that the sampling distri-
bution on the sample mean is normal, but since the variance is unknown, the
confidence interval is defined using the Student’s t distribution. Therefore,
a 1 – α probability confidence interval for the mean is

s s
x + ta/2 , n -1 £ m £ x + t1- a/2 , n -1 (8.27)
n n
142 Reliability Engineering: Probabilistic Models and Maintenance Methods

The sampling distribution on the sample variance is chi-square, so a 1 − α


probability confidence interval for the standard deviation is

1/2 1/2
æ n -1 ö æ n -1 ö
sç 2 ÷ £ s £ sç 2 ÷ (8.28)
è c1- a / 2 , n - 1 ø è c a /2 , n - 1 ø

As an example, suppose that the data in Table 8.2 were assumed to represent
the behavior of a device for which the normal distribution is an appropriate
model. The sample mean and sample standard deviation for that data are
884.153 and 302.569, respectively. Applying Equations 8.27 and 8.28 to those
values yields

302.569 302.569
759.25 = 884.153 - 2.064 £ m £ 884.153 + 2.064 = 1009.05
25 25

1/2 1/2
æ 24 ö æ 24 ö
236.27 = 302.569 ç ÷ £ s £ 302.569 ç ÷ = 420.94
è 39.36 ø è 12.40 ø

Note that the same approach as is suggested in Equation 8.26 for the Weibull
distribution applies to the normal distribution reliability values.
The sampling distributions for the method of moments parameter esti-
mates for the gamma distribution have not been determined. Consequently,
confidence intervals for the parameter estimates cannot be computed.
However, when the shape parameter is known, it is possible to define a
confidence interval on the scale parameter. The key to constructing that
interval is the fact that the sum of gamma random variables has a gamma
distribution. Thus, if a sample is taken from a population having a gamma
life distribution with parameters β and λ, the sum of the sample obser-
vations, say, ξ, will be gamma distributed with parameters nβ and λ.
Furthermore, the quantity λξ will be gamma distributed with parameters
nβ and 1. Using these facts, a 1 − α probability confidence interval defined
on that ratio is

éc c ù
Pr [ cl £ lx £ cu ] = Pr ê l £ l £ u ú = 1 - a
ëx xû

where cl and cu are the α/2 and 1 – α/2 quantiles of the gamma distribution
having parameters nβ and 1. Therefore, the corresponding confidence inter-
val for the scale parameter is

cl c
£l£ u (8.29)
x x
Parametric Statistical Methods 143

In view of the possibility to construct the confidence interval on λ when β is


known, a heuristic approach to confidence interval construction for gamma
distribution parameter estimates is to use Equation 8.29 in combination with
the relation
cl c
£b£ u (8.30)
n n

Clearly, this construction has no rigorous basis, but it may provide some prac-
tical guidance in judging device life length. For example, the analysis of the
data in Table 8.5 yielded the parameter estimates of bˆ = 2.554 and lˆ = 0.00086.
For 1 – α = 0.95, the quantiles of the gamma distribution with parameters
nβ = 20(2.554) = 51.08 and 1 are cl = 38.042 and cu = 66.010, so Equations 8.29
and 8.30 yield

38.042 66.010
0.00065 = £l£ = 0.00110
58, 975 58, 975

38.042 66.010
1.902 = £b£ = 3.300
20 20

8.3 Method of Maximum Likelihood


The most widely used of the parametric techniques is the method of maxi-
mum likelihood. Under this very intuitive strategy, one selects as parameter
estimates those values “most likely” to have produced the observed data.
Consider an example of this concept. Suppose a presumably fair coin is
tossed 40 times with the result that heads appears on 16 of the trials. What
estimate of the probability of heads, p, is appropriate? One possible answer
is that since presumably, the coin is a fair coin, pˆ = 0.50. The method of maxi-
mum likelihood indicates that pˆ = 0.40. The reason for this value is that the
binomial probability of observing 16 heads in 40 trials is greatest when the
value of p is 0.40. That is,

ìïæ 40 ö üï
max{b(16, 40, p} = max íç ÷ p16 (1 - p)24 ý = 0.128
îïè 16 ø þï
p p

occurs at p = 0.40. This is illustrated in Figure 8.5 in which the binomial prob-
ability of 16 heads in 40 trials is plotted against the value of p. Note that one
usually plots event probabilities against the events but that this plot provides
an alternate view of the probabilities—one focused on the parameter of the
distribution.
144 Reliability Engineering: Probabilistic Models and Maintenance Methods

Prob

0.12

0.1

0.08

0.06

0.04

0.02

p
0.2 0.4 0.6 0.8

FIGURE 8.5
Likelihood function for a binomial sample.

In terms of algebraic methodology, one forms the likelihood function,


which is the joint distribution of the parameter(s) and the sample outcome,
and one then maximizes the function relative to the parameter value. For the
binomial, the likelihood function is

ænö
L( x , n, p) = b( x , n, p) = ç ÷ p x (1 - p)n - x
èxø

and to find the maximum, we take the derivative with respect to p and set it
equal to zero:

d ænö x ænö
d
L( x , n, p) = n- x x -1
(
n- x
ç ÷ p (1 - p) = ç ÷ xp (1 - p) - (n - x)p (1 - p)
x
x n - x -1
=0 )
dp dp è ø èxø

= x(1 - p) - (n - x)p = x - px - np + px = x - np = 0

and from this we obtain the following maximum likelihood estimation


equation:

x
p̂ =
n

For our example, this means that pˆ = 0.40 is the maximum likelihood esti-
mate for the probability of heads.
The application of the principle of maximum likelihood to life test data
is direct. We form the likelihood function as the joint distribution of the
Parametric Statistical Methods 145

observed failure data and the unknown distribution parameters and we


maximize that function with respect to the parameters. That is, we form the
likelihood function

L( x , q) = fT ( x , q) = fT ( x1 , x2 , … , xn , q) (8.31)

We next note the fact that the individual failure times are mutually indepen-
dent so the joint distribution may be expressed as the product of the mar-
ginal distributions:
n

fT ( x1 , x2 , … , xn , q) = Õ f ( x , q)
j =1
T j (8.32)

Then we observe further that the values of the parameters that maximize the
likelihood function are the same as the ones that maximize the logarithm of
the likelihood function, so in those cases in which this equivalence is useful,
we can exploit it.
Consider the exponential distribution for which there is only one param-
eter so q = {l} and
lx j
f T ( x j , q ) = f T ( x j , l ) = le

For the exponential distribution, the likelihood function is


n
å j=1 x j
n
- lx j -l
L( x , l) = Õ j =1
le = l ne (8.33)

and the corresponding logarithm of the likelihood function is


n
ln ( L( x , l) ) = n ln l - l åx j (8.34)
j =1

To maximize this function we take the derivative and set it equal to zero
n
d n
ln ( L( x , l) ) = - åx = 0 j
dl l j =1

and solving for the parameter estimate yields

1 1 (8.35)
l̂ = =
å
1 n
x
xj
n j =1
146 Reliability Engineering: Probabilistic Models and Maintenance Methods

A check of the second derivative condition indicates that the second deriva-
tive is clearly negative, so the solution we have found is a maximum. In this
particular case, but not in general, the maximum likelihood and method
of moments estimates are the same. As noted earlier, the example data in
Table 8.1 display a mean value of x = 164.43 so our estimate of the param-
eter is lˆ = 0.0061. In addition, since the estimation expressions are the same
for the methods of moments and for the maximum likelihood method, the
confidence intervals for the estimates are also the same for the two methods.
Thus, when computing maximum likelihood parameter estimates for expo-
nential distributions, Equations 8.21 through 8.23 may be used to construct
the associated confidence intervals.
In the case of the Weibull distribution, the concept is the same but there are
two parameters. Thus, we form the likelihood function

b n b
æ xj ö
b æ ö -å ç q ÷
æ xj ö
bxbj -1
n n -ç ÷ n n

L( x , b, q) = Õ f (x, b, q) = Õ
T
qb
e è q ø
= nb ç
q ç Õ xbj -1 ÷ e j=1
÷
è ø
(8.36)
j =1 j =1 è j =1 ø

Taking the logarithm of Equation 8.36 yields

n n

å åx
1
ln ( L( x , b, q) ) = n ln b - nb ln q + (b - 1) ln x j - b
(8.37)
qb
j
j =1 j =1

and to maximize this function, we set the partial derivatives equal to zero.
We solve the resulting expressions for our parameter estimates. First,
n
¶ nb b
ln ( L( x , b, q) ) = - + b +1 åx b
j =0
¶q q q j =1

åx
ˆ 1 bˆ
qˆ b = j
n j =1

1/bˆ
æ1 n ö

qˆ = ç
çn
è
åj =1
x ÷
÷
ø
j (8.38)

and then
n n n
¶ n ln q 1
ln ( L( x , b, q) ) = - n ln q + å ln x j + b å x - b
b
j å x ln x = 0b
j j
¶b b j =1
q j =1
q j =1
Parametric Statistical Methods 147

Substituting the expression obtained for θ in this equation yields

å x ln x
n

n j
+ å ln x -
j
1 1 j =1
j =0 (8.39)
bˆ n
å x
n

j =1 j
j =1

This final expression must be solved numerically for the value of the
estimate for β and that value is then used to compute the estimated
value of θ.
For maximum likelihood estimation for distributions that are expressed
using more than one parameter, the second derivatives are important for
several reasons. First, they may confirm that maxima have been identified.
In the case of the Weibull distribution, the second partial derivatives of the
likelihood function are

n
¶2 nb b(b + 1)
ln ( L( x , b, q) ) = 2 - b + 2 åx b
j (8.40)
¶q2 q q j =1

n n
¶2 n 1 - b ln q b
ln ( L( x , b, q) ) = - + å xbj + å x ln x b
j j (8.41)
¶b ¶q q qb +1 j =1
qb +1 j =1

n n n
¶2 n (ln q)2 2 ln q 1
2
ln ( L( x , b, q) ) = - 2 - å x + b
b
j å x ln x j - b
b
j å x (ln x )b
j j
2
¶b b qb j =1
q j =1
q j =1

(8.42)

Arranging the second partial derivatives in matrix form yields the Hessian
matrix, which, when multiplied by −1 in a statistical context, is known as the
Fisher information matrix. In an optimization context, a sufficient condition
for a stationary point to be a maximum is that the Hessian matrix be nega-
tive semidefinite. In a statistical context, the inverse of the Fisher information
matrix is the variance–covariance matrix for the sampling distribution on
the estimators. Thus, for the maximum likelihood estimators of the Weibull
parameters, the Hessian matrix is

é ¶ 2 ln L ¶ 2 ln L ù
ê ú
¶b2 ¶b¶q ú
H=ê 2 (8.43)
ê ¶ ln L ¶ 2 ln L ú
ê ú
ë ¶b ¶q ¶q2 û
148 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 8.6
Trial Values of the Parameter Estimate

b l.h.s. b̂
b l.h.s.
2.0 0.288 3.20 0.002
4.0 −0.113 3.22 −0.001
3.0 0.038 3.21 0.0004
3.5 −0.046 3.215 −0.0005
3.25 −0.006 3.212 0
3.15 0.011

the corresponding Fisher information matrix is

é ¶ 2 ln L ¶ 2 ln L ù
ê- - ú
¶b2 ¶b¶q ú
F=ê 2 (8.44)
ê ¶ ln L ¶ 2 ln L ú
ê- - ú
ë ¶b ¶q ¶q2 û

and the variance–covariance matrix is

é Var(bˆ ) Cov(bˆ , qˆ )ù
(8.45)
F -1 = ê ˆ ˆ
ë Cov(b , q ) Var(qˆ ) úû

Consider the example of the failure data in Table 8.2. Using a numerical
search strategy to solve Equation 8.39, the values of the estimate for β that are
evaluated are shown in Table 8.6.
Observe that only eleven values were needed to obtain substantial preci-
sion in the estimate of bˆ = 3.212. Substituting that value into Equation 8.38
yields qˆ = 988.257. These parameter estimates and the data are included in
Equations 8.40 through 8.42 to obtain

é ¶ 2 ln L ¶ 2 ln L ù
ê = -4.725 = 1.164 ´ 10 -2 ú
¶b2 ¶b ¶q
H=ê 2 ú
ê ¶ ln L ¶ 2 ln L ú
ê = 1.164 ´ 10 -2 -4
= -2.64 ´ 10 ú
ë ¶b ¶q ¶q 2
û

which is clearly negative semidefinite so the computed parameter estimates do


maximize the likelihood function. In addition, the parameter estimates yield

é Var(bˆ ) = 0.237 Cov(bˆ , qˆ ) = 10.469 ù


F -1 = ê ˆ ˆ
ë Cov(b , q ) = 10.469 Var(qˆ ) = 4249.44 úû
Parametric Statistical Methods 149

Given the values of the variances of the sampling distributions on the parame-
ter estimates, it is possible to construct confidence intervals on those estimates.
To obtain the intervals, Nelson [57] and Lawless [58] show that the most efficient
approach is to transform the Weibull data and estimates into their extreme
value forms, construct the intervals, and then invert the transformations.
The transformation from the Weibull is to let the random variable of the
distribution, say, w, be defined by w = ln t. With this identity, the distribution
is reparameterized by δ = 1/β and η = ln θ so that the distribution becomes
w -h

FW (w) = 1 - e - e
d
(8.46)

While it may not be apparent, the normal distribution may also be trans-
formed to this form, and consequently, the sampling distribution on the
parameter estimate set (dˆ , hˆ ) is nearly normal. As a result, the confidence
intervals on the extreme value distribution parameter estimates may be used
to define the confidence intervals on the Weibull parameter estimates. This is
done by inverting the transformation, and when the process is implemented,
the approximate 1 – α probability confidence intervals are

Var (bˆ ) bˆ Var (bˆ ) bˆ


bˆ e za/2 £ b £ bˆ e z1-a/2 (8.47)

qˆ e za/2
( Var ( qˆ ) qˆ ) £ q £ qˆ e z (
1-a/2
Var ( qˆ ) qˆ ) (8.48)

so for 1 – α = 0.95 the example estimates of bˆ = 3.212 and qˆ = 988.257 yield

2.387 £ b £ 4.323 and 868.40 £ q £ 1124.65

Further use of the methods defined by Lawless [58] permits the construc-
tion of confidence limits on quantiles of the distribution and on reliability
estimates. The transformation of the Weibull to the extreme value distribu-
tion that is defined by w = ln t implies that wγ = ln tγ where, as defined in
Chapter 7, tγ is the time at which FT(t) = γ. Therefore, the corresponding value
of the extreme value random variable is wg = h + dFW-1(1 - g ). Recognizing that
FW(wγ) = γ implies that FW (wg ) = 1 - g and FW-1(1 - g ) = ln(- ln(1 - g )), the expres-
sion for wγ is analyzed by replacing η and δ with their estimates, which are
dˆ = 1/bˆ and hˆ = ln qˆ , to obtain the interval

ˆ g + za/2 Var(w
w ˆ g ) £ wg £ w
ˆ g + z1- a/2 Var(w
ˆ g)

where the variance term is computed as

Var(wˆ g ) = Var(hˆ ) + 2wˆ g Cov(hˆ , dˆ ) + wˆ 2g Var(dˆ ) (8.49)


150 Reliability Engineering: Probabilistic Models and Maintenance Methods

The variance and covariance values are obtained from the inverse of the
Fisher information matrix for the extreme value distribution where the like-
lihood function is defined by replacing f T(t) by f W(w) in Equation 8.36. Thus,

é ¶ 2 ln L ¶ 2 ln L ù
ê- - ú
¶h2 ¶d¶h ú
F=ê 2
ê ¶ ln L ¶ 2 ln L ú
ê- - ú
ë ¶d¶h ¶d2 û

These follow from the derivative expressions that are as follows:


n wi - h
¶ ln L(w , h, d)
å
n 1
=- + e d
¶h d d i =1

n n wi - h
¶ ln L(w , h, d) wi - h wi - h
å å
n
=- - + e d
¶d d i =1
d2 i =1
d2

n wi - h
¶ 2 ln L(w , h, d)
å
1
2
=- 2 e d
¶h d i =1

n n wi - h n wi - h
¶ 2 ln L(w , h, d) n wi - h wi - h (wi - h)2
¶d2
= 2 +2
d å
i =1
d 3
-2
i =1
åd3
e d - å
i =1
d4
e d

n wi - h n wi - h
¶ 2 ln L(w , h, d) n 1 wi - h
¶d¶h
= 2- 2
d d å i =1
e d - åi =1
d3
e d

Using the data in Table 8.2, the Fisher information matrix is


é ¶ 2 ln L ¶ 2 ln L ù
ê- = 257.926 - = 118.733 ú
ê ¶h2 ¶d¶h ú
F=
ê ¶ 2 ln L ¶ 2 ln L ú
ê- = 118.733 - = 502.892 ú
ë ¶d¶h ¶d 2
û

The inverse of this matrix is

é 0.00435 -0.00103 ù
F -1 = ê
ë -0.00103 0.00223 úû

Based on these calculations, an estimate of the 10% quantile of the extreme


w g -h

ˆ 0.10 = 6.192. Because FW (wg ) = 1 - g and FW (wg ) = e - e


value distribution is w
d
,
Parametric Statistical Methods 151

the estimated value of any quantile is calculated using wg = hˆ + dˆ ln(- ln(1 - g )).
Taking dˆ = 1/bˆ and hˆ = ln qˆ yields

ˆ 0.10 = ln qˆ + 1 ln(- ln 0.90) = 6.192.


w

and the 95% confidence interval on this value is

5.648 = 6.192 - 1.96(0.278) £ w0.10 £ 6.192 + 1.96(0.278) = 6.736

where the value of Var( wg ) = 0.0771 is obtained using Equation 8.48.


Corresponding to these values, tˆ0.10 = e w0.10 = 488.823 and

283.663 = e 5.648 £ t0.10 £ e 6.736 = 842.364

A similar construction is used for reliability estimates and confidence


intervals. Using Equation 8.46 and u = w – η/δ, the survivor function for the
extreme value distribution may be represented as
w -h
d u
FW (w) = e - e = e-e

h
and the estimator for u is approximately normal with a mean of E[uˆ ] = E[w] -
and variance of d

æ 1 ö
(
Var(uˆ ) = ç 2 ÷ Var(hˆ ) + 2uˆ Cov(hˆ , dˆ ) + uˆ 2 Var(dˆ )
è dˆ ø
)
Using these definitions,

uˆ + za/2 Var(uˆ ) £ u £ uˆ + z1- a/2 Var(uˆ )

uˆ + z1-a/2 Var ( uˆ ) uˆ + za/2 Var ( uˆ )


e-e £ FW (w) = FT e w £ e - e
( ) (8.50)

ˆ
ln(900) - h
For the example data of Table 8.2, take t = 900 so u = = -0.3005
ˆd
and Var(û ) = 0.0533. Using these values, the 95% confidence interval for the
reliability at 900 hours is

0.312 £ FT (900) £ 0.624


152 Reliability Engineering: Probabilistic Models and Maintenance Methods

As will be shown shortly, the method of maximum likelihood can be applied


to the exponential and Weibull distributions even when the data are cen-
sored. The method can be applied to the gamma distribution but only for
complete data sets.
Recall that in general, the gamma distribution does not have a closed-form
expression for the distribution function, but the density function is
l b b - 1 - lt
fT (t) = t e (8.51)
G(b)

Using this form in the likelihood function yields


n n
lb xbj -1 l nb æ n ö -l ån x j
L( x , b, l) = Õ f T ( x , b, l ) = Õ G(b) e
- lx j
= n
ç
( G(b)) çè Õ xbj -1 ÷ e
÷
j =1
(8.52)
j =1 j =1 j =1 ø

Taking the logarithm yields


n n
ln ( L( x , b, l) ) = nb ln l - n ln G(b) + (b - 1) å ln x - lå x
j =1
j
j =1
j (8.53)

We proceed in the same manner as we did with the Weibull. We take the
partial derivatives
n
¶ nb
ln ( L( x , b, l) ) = - åx = 0 j
¶l l j =1

b b
l̂ = = (8.54)
å
1 n
x
xj
n j =1

n

ln ( L( x , b, l) ) = n ln l - ny(b) + å ln x = 0 j
¶b j =1

æ1 n ö n

å å ln x = 0
1
ln bˆ - ln ç x j ÷ - y(bˆ ) + j (8.55)
çn ÷ n
è j =1 ø j =1

where the psi function is the derivative of the logarithm of the gamma
function

d G¢(b)
y(b) = ln G(b) =
db G(b)
Parametric Statistical Methods 153

TABLE 8.7
Trial Values of the Parameter Estimate

b l.h.s. b̂
b l.h.s.
2.0 0.093 2.9 0.005
4.0 −0.048 2.98 −0.006
3.0 −0.002 2.97 0

This function is relatively well behaved. There are both tables of the function
and numerical strategies for computing it. One such algorithm is included in
Appendix A.
The final form of Equation 8.55 is obtained by substituting Equation 8.54
into the partial derivative with respect to β. Once again, we solve numeri-
cally for the estimate for β and then use that value to compute l̂ using
Equation 8.54. For the example data of Table 8.5, Table 8.7 shows that only six
function evaluations are needed to obtain bˆ = 2.97, and using this value, we
obtain lˆ = 0.00101.
In principle, confidence intervals for the parameter estimates, quantiles,
and reliability values for the gamma distribution cannot be defined. A key
reason for this is that there is no closed-form expression for the derivative of
the psi function. Nevertheless, a heuristic algorithm for pursuing the con-
fidence interval analysis like the one defined for the Weibull distribution
earlier can be constructed. However, it is heuristic and has no demonstrable
statistical properties.
To start, note that the second partial derivatives of the logarithm of the
gamma likelihood function are

¶2 nb
ln ( L( x , b, l) ) = - 2 (8.56)
¶l 2 l

¶2 n
ln ( L( x , b, l) ) = (8.57)
¶b ¶l l

¶2 ¶
ln ( L( x , b, l) ) = -n y(b) (8.58)
¶b2 ¶b

As a result, the Fisher information matrix is

é ¶ 2 ln L ¶ ¶ 2 ln L nù
ê- = n y(b) - =- ú
¶b 2
¶b ¶b ¶l lú
F=ê (8.59)
ê 2
¶ ln L n ¶ ln L nb ú
2
ê - =- - = 2ú
ë ¶ b ¶l l ¶l 2 l û
154 Reliability Engineering: Probabilistic Models and Maintenance Methods

and the resulting variance–covariance matrix is

é Var(bˆ ) Cov(bˆ , lˆ )ù
(8.60)
F -1 = ê ˆ ˆ
ë Cov(b , l ) Var(lˆ ) úû

Clearly, given estimates for the parameters of the distribution, all of the

terms in the information matrix are well defined except y(b).
¶b
There are several equivalent formulas for the psi function and one that
appears helpful is [59]
¥
x
y( x + 1) = - g + å j( j + x)
j =1
(8.61)

where γ is Euler’s constant. Taking the derivative of this form of the psi func-
tion yields
¥
dy( x + 1)
å ( j + x)
1
= 2
(8.62)
dx j =1

A reasonable heuristic to computing the variance–covariance matrix is to


use the derivative defined in Equation 8.62 with truncation of the infinite
sum after a sufficient number of terms to assure adequate accuracy.
Consider the example of the maximum likelihood estimates of bˆ = 2.97 and
lˆ = 0.00101 that were obtained using the data in Table 8.5. For those values,
the information matrix is

é 7.992 -19856.9 ù
F=ê
ë -19856.9 5.855 ´ 107 úû

where the derivative of the psi function was truncated at 0.39 after 100 terms
of the sum were included. For this matrix,

é 0.795 0.00027 ù
F -1 = ê
ë0.00027 1.085 ´ 10 -7 úû

Using these variance estimates, the confidence intervals on the parameters


are

1.222 = bˆ + za/2 Var(bˆ ) £ b £ bˆ + z1- a/2 Var(bˆ ) = 4.718

0.00036 = lˆ + za/2 Var(lˆ ) £ l £ lˆ + z1- a/2 Var(lˆ ) = 0.00166


Parametric Statistical Methods 155

bˆ bˆ
Using the same variance estimates, take tˆg = Eˆ [T ] + zg sˆ T = + zg 2 and
ˆl ˆl
compute

( )
Var(tˆg ) = Var(bˆ ) + 2tˆg Cov(bˆ , lˆ ) + tˆ 2g Var(lˆ )

The confidence interval for the quantile is then

tˆg + za/2 Var(tˆg ) £ tg £ tˆg + z1- a/2 Var(tˆg )

For this example, the estimate of the 10% quantile is tˆ0.10 = 755.199 and the
associated confidence interval is
752.995 £ t0.10 £ 757.403

Finally, for reliability values, the best approach to obtaining an approxi-


mate confidence interval is to use the normal distribution. For any time, say,
t = 1000.0, compute


ˆ [T ] t-
t - E lˆ
zˆ = =
sˆ T bˆ /lˆ 2

Next, take

(
Var( zˆ ) = Var(bˆ ) + 2zˆ Cov(bˆ , lˆ ) + zˆ 2 Var(lˆ ) )
and compute the confidence intervals as

( ) (
F zˆ + za/2 Var( zˆ ) £ F(t) £ F zˆ + z1- a/2 Var( zˆ ) )
For the example, t = 1000.0, zˆ = -1.139 and Var( ẑ ) = 0.794 so

0.002 = F ( -2.886 ) £ F(1000) £ F ( 0.608 ) = 0.728

and
0.272 £ F(1000) £ 0.998

In conclusion, working with the gamma distribution requires a heuristic


approach and some of the analysis are useful despite being approximate.
In addition, the true probability content of the confidence intervals is not
computable.
156 Reliability Engineering: Probabilistic Models and Maintenance Methods

One final case that is quite intuitive (and more precise) is the normal dis-
tribution. Taking the density function for the normal as stated in Chapter 4,
the likelihood function is

n
1
n n
1 2 2 1
- å ( x j -m )2
2 s2 j=1
L( x , m , s2 ) = Õ f ( x , m, s ) = Õ
j =1
T
2

j =1 2ps2
e
- ( x j - m ) /2 s
=
( 2p )
n/ 2
sn
e

(8.63)

and taking the logarithm yields

å( x - m)
n 1 2
( )
ln L x , m , s2 = -
2
ln(2p) - n ln s - 2
2s
j (8.64)
j =1

The partial derivatives are

n

¶s
( n 1
ln L x , m , s2 = - + 3
s s
) å( x - m) j
2
=0
j =1

n

( ) å( x - m) = 0
1
ln L x , m , s2 = 2 j
¶m s j =1

Solving the second of these equations first yields


n

åx = x
1
m̂ = j (8.65)
n j =1

and then the first yields

å ( x - mˆ )
1 2
sˆ = j (8.66)
n j =1

These expressions should seem familiar. Applying these expressions to the


data in Table 8.8 yields mˆ = 1990.025 and sˆ = 29.63.
The construction of variance measures and confidence intervals for the
parameter estimates, for quantiles, and for reliability values follows the
same path as that defined for the Weibull distribution. The matrix obtained
by multiplying the second partial derivatives by minus one is the Fisher
information matrix and the inverse of that matrix is the variance–covariance
Parametric Statistical Methods 157

TABLE 8.8
Normally Distributed Life Test Data
2029.37 2018.04 2051.76
2006.77 1990.25 1968.18
1971.75 2041.18 1913.2
1994.76 2002.67 1968.34
1982.65 1945.68 1983.37
1972.79 1985.5 1975.8
2017.15 1993.63 1994.95
1967.66 1974.04 2025.12
1976.01

matrix for the parameter estimates. For the normal distribution, the second
partial derivatives are
n
¶2
¶s 2 ( n
)
3
ln L x , m , s2 = 2 - 4
s s å( x - m) j
2
(8.67)
j =1

n
¶2
( ) å( x - m)
2
ln L x , m , s2 = - 3 j (8.68)
¶m¶s s j =1

¶2
¶m 2 ( ) n
ln L x , m , s2 = - 2
s
(8.69)

For the data in Table 8.8, following the same pattern as in the analyses of the
Weibull and gamma distributions yields

é ¶ 2 ln L ¶ 2 ln L ù
ê - = 0.0285 - = 3.844 ´ 10 -7 ú
ê ¶ mˆ 2 ¶mˆ ¶sˆ ú
F=
ê ¶ 2 ln L ¶ 2 ln L ú
ê- = 3.844 ´ 10 -7 - = 0.0570 ú
ˆ
m
ë ¶ ¶s ˆ ¶ŝ 2
û

so

é 35.088 -2.366 ´ 10 -4 ù
F -1 = ê -4 ú
ë -2.366 ´ 10 17.544 û

The 1 – α probability confidence intervals on the parameters are

mˆ + t a Var(mˆ ) £ m £ mˆ + t a Var(mˆ ) (8.70)


, n -1 1- , n - 1
2 2
158 Reliability Engineering: Probabilistic Models and Maintenance Methods

(n - 1)sˆ 2 (n - 1)sˆ 2
£ s £ (8.71)
c2 a c 2a
1- , n - 1 , n -1
2 2

so for the example data of Table 8.8,

1977.80 £ m £ 2002.25

24(29.63)2 24(29.63)2
23.125 = £s£ = 41.222
39.4 12.4

Distribution quantiles are analyzed using the standard normal distribution.


Take tˆg = mˆ + zg sˆ and compute

Var(tˆ g ) = ( Var(mˆ ) + 2Cov(mˆ , sˆ ) + Var(sˆ ) )

to obtain

tˆg + z a Var(tˆg ) £ tg £ tˆg + z a Var(tˆg )


1-
2 2

For the example data of Table 8.8, t0.10 = 1952.04 and Var(tˆg ) = 52.63 so

1937.82 £ t0.10 £ 1966.26

Finally, for the confidence intervals on reliability values, use of the standard
normal distribution leads to

æ t - mˆ ö
FˆT (t) = F ç ÷ (8.72)
è sˆ ø

and

FˆT (t) + za/2 Var( FˆT (t)) £ FT (t) £ FˆT (t) + z1- a/2 Var( FˆT (t)) (8.73)

where

Fˆ T (t) ( 1 - Fˆ T (t) )
Var( Fˆ T (t)) =
n
Parametric Statistical Methods 159

For the example data in Table 8.8, take t = 1975 so zˆ = -0.507 and FˆT (t) = 0.306
so
0.125 £ FT (1975) £ 0.487

and
0.513 £ FT (1975) £ 0.875

8.4 Maximum Likelihood Method with Data Censoring


Returning to the exponential and Weibull distributions, the fact that these
distributions have algebraic forms for their cumulative distribution func-
tions and reliability functions permits their analysis under test censoring.
Suppose a sample of n copies of a component is placed on test and the test
is terminated before all the items have failed. Suppose that only r failure
times have been observed. In this case, there are r failure times x1, x2, …, xr
and n − r components have survived over xr time units. Thus, the likelihood
function is

æ r ö
( )
n-r
L( x , q) = ç Õ fT ( x j , q) ÷ FT ( xr ) (8.74)
ç ÷
è j =1 ø

Since the exponential distribution is subsumed by the Weibull, consider the


application of this form to the Weibull:

æ b
æ xj ö ö æ -æ xr öb ön - r r æ
b
ö - å rj=1 æç x j ö÷ æx ö
b
bxbj -1
r r - ( n - r )ç r ÷
÷ ç e çè q ÷ø ÷ = b ç
-ç ÷
L( x , b, q) = ç Õ e è q ø
Õx b -1 ÷e è q ø è q ø
ç qb ÷ç ÷ qbç
r j
÷
ç j =1 ÷è ø è j =1 ø
è ø
(8.75)

Taking the logarithm yields


b b
æ xj ö æ æ xr ö ö
r r
ln ( L( x , b, q) ) = r ln b - rb ln q + (b - 1) å ln x - å
j =1
j
j =1
ç q ÷ - çç (n - r ) ç q ÷ ÷÷ (8.76)
è ø è è ø ø

The partial derivatives are

b æ ö
r
¶ rb
ln ( L( x , b, q) ) = - + b +1 ç åx b
j + (n - r )xrb ÷
¶q q q ç ÷
è j =1 ø
160 Reliability Engineering: Probabilistic Models and Maintenance Methods

ln q æç ö
r r
¶ r
ln ( L( x , b, q) ) = - r ln q + å ln x j + åx b
+ (n - r )xrb ÷
qb ç
j
¶b b ÷
j =1 è j =1 ø

1 æ ö
r
- bç
q ç å x ln x + (n - r)x ln x ÷÷
b
j j
b
r r

è j =1 ø

Solving the first of the equations, we obtain

1/bˆ
æ æ r öö
1
qˆ = ç ç xbj + (n - r )xrb ÷ ÷
ˆ ˆ

çrç å ÷÷
(8.77)
è è j =1 øø

and substituting this expression into the second of the partial derivative
equations, we have

æ r
xbj ln x j + (n - r )xrb ln xr ö÷
ˆ ˆ

1 1
r ç
ln x j - è
å ø =0
å
j =1
+ (8.78)
bˆ r æ r ˆb ˆb ö
j =1 ç
è å j =1
x j + ( n - r )x r ÷
ø

Once again, we solve (8.78) numerically for b̂ and then compute q̂ using (8.77).
Note that the form of Equations 8.77 and 8.78 are consistent with those
obtained for complete data sets. In fact, if we take r = n, these equations
reduce to those obtained for complete data sets.
As an example, suppose only the first r = 15 of the failure times of Table 8.2
had been observed. Using the equations developed earlier, we obtain the
parameter estimates bˆ = 3.513 and qˆ = 965.571. A check of the Hessian matrix
evaluated at these values confirms that they constitute a maximum.
For the method of maximum likelihood, even with censored data, the sec-
ond derivatives provide an estimate of the Fisher information matrix and
hence of the variance–covariance matrix for the estimators. This means that
the same methods that were used for complete data sets can be applied to the
censored data to obtain confidence intervals for the parameter estimates, for
estimates of distribution quantiles, and for reliability estimates.
To complete this discussion, it is appropriate to note that the method of
maximum likelihood is very appealing because of its intuitive foundation.
It is widely used and the fact that it can sometimes be applied to censored
data is an additional positive feature of the method. The main disadvantage
of the method is that the estimators obtained are not always unbiased. In
particular, the estimators for the Weibull distribution are not unbiased. Even
for the normal distribution, the estimator for the mean is unbiased, but the
one for the standard deviation is not.
Parametric Statistical Methods 161

8.5 Special Topics


The statistical methods for parameter estimation that have been presented
in this chapter are generally adequate for nearly all test situations. However,
there are two additional topics that are worthy of our attention. The first of
these is the extension of the method of moments to right censored data. As
indicated earlier, the method of moments has historically been applied only
to complete data sets. Nachlas and Kumar [60] suggest a heuristic strategy
for implementing the method of moments using data sets that are censored
because of test termination. The construction of the method is based on the
substitution of the expected remaining life lengths of the unfailed test speci-
mens for their actual failure times.
The second topic is the use of step stress testing in which a sequence of
progressively more severe environmental stresses is used to accelerate com-
ponent aging. As is shown, the statistical analysis is essentially the same as
for usual life testing.

8.5.1 Method of Moments with Censored Data


In the case of the exponential distribution, a test for which the first r failure
times are observed and the remaining n − r are not, the expression for the
“surrogate” sample mean is

1 æç ö
r
x=
nç å x j + (n - r ) ( xr + E[Y ]) ÷
÷
(8.79)
è j =1 ø

where Y is the remaining life of the components that have not yet failed. The
general expression for the expected value of Y has been shown by Cox [61]
to be

m 2 + s2
E[Y ] = (8.80)
2m

where
μ is the mean of the life distribution
σ2 is the variance of the life distribution

For the exponential distribution, this expected value is 1/λ so Equation 8.79
becomes

1 æç 1 ö
r
x=
nç å x + (n - r) æçè x + l̂ ö÷ø ÷÷
j r

è j =1 ø
162 Reliability Engineering: Probabilistic Models and Maintenance Methods

and we equate this to the population mean to solve for the estimate of the
distribution parameter:

1 1 æç 1 öö
r
æ
=
lˆ n ç å x j + (n - r ) ç xr + ÷ ÷
è lˆ ø ÷
è j =1 ø

This yields the reasonably intuitive expression


-1
æ æ r öö
1
lˆ = ç ç å x j + ( n - r )x r ÷ ÷ (8.81)
çrç ÷÷
è è j =1 øø

The equation is intuitive as it represents the sample mean as the average of


the observed failure times plus an additional term to compensate for the
unobserved life lengths. For r = 12, the data in Table 8.1 yield the estimate
lˆ = 0.0065.
In the case of the Weibull distribution, Cox’s result implies

qG ( 1 + 2/b )
E[Y ] =
2G ( 1 + 1/b )

and substituting this expression into Equation 8.78 yields

1æ æ qG ( 1 + 2/bˆ ) öö
r
x= ç
nç å x j + (n - r ) ç xr +
ç
è 2G ( 1 + 1/bˆ )
÷÷ ÷
ø ÷ø
è j =1

1æ ö n - r æ qG ( 1 + 2/bˆ ) ö
r
= ç

è
å
j =1
x j + ( n - r )x r ÷ +
÷
ø
ç ÷
n çè 2G ( 1 + 1/bˆ ) ÷ø

The corresponding representation of the sample variance is most conve-


niently stated as

1 æç qG ( 1 + 2/bˆ ) ö ö
2
r
æ
2
s = å x + (n - r ) ç xr +
2
÷ - nx 2 ÷
n-1ç
j
ç 2G ( 1 + 1/bˆ ) ÷ø ÷
è j =1 è ø

To obtain parameter estimates, the expression for x is equated to the distri-


bution mean and the resulting expression is solved for θ. This yields

2 æç x j + (n - r )xr ö÷ G(1 + 1/bˆ )


r

ˆq = è
å
j =1 ø
2nG 2 (1 + 1/bˆ ) - (n - r )G(1 + 2/bˆ )
Parametric Statistical Methods 163

This expression is used in place of θ in the expressions for both the sample
mean and the sample variance that are included in the coefficient of varia-
tion as in the case of uncensored data. A numerical search yields an estimate
for β, which is then used in the previous equation to compute an estimate
for θ. As an example, the first r = 15 data values from Table 8.2 have been used
to obtain parameter estimates of bˆ = 2.391 and qˆ = 1175.1.
In the case of the gamma distribution, Cox’s result implies

b+1
E[Y ] =
2l

and substituting this expression into Equation 8.65 yields

1æ b + 1 ö ö÷ 1 æç ö n-r b+1
r r
æ æ ö
x= ç
nç å x j + (n - r ) ç xr +
è
÷ =
2l ø ÷ n ç å x j + ( n - r )x r ÷ +
÷
ç ÷
n è 2l ø
è j =1 ø è j =1 ø

The corresponding representation of the sample variance is most conve-


niently stated as

1 æç ö
r 2
æ b+1ö
s2 =
n-1ç å x 2j + (n - r ) ç xr +
è
÷ - nx ÷
2l ø
2
÷
è j =1 ø

To obtain parameter estimates, the expression for x is equated to the distri-


bution mean, β/λ, and the resulting expression is solved for λ. This yields

2nbˆ - (n - r )(bˆ + 1)
lˆ =
2 çæ x j + (n - r )xr ÷ö
r

èå j =1 ø

This expression is used in place of λ in the expressions for both the sam-
ple mean and the sample variance, which are included in the coefficient
of variation as in the case of uncensored data. A numerical search yields
an estimate for β, which is then used in the previous equation to compute an
estimate for λ. As an example, the first r = 12 data values from Table 8.5 have
been used to obtain parameter estimates of bˆ = 3.540 and lˆ = 0.00118.
As should now be evident, it is possible to use right censored data to con-
struct estimators for distribution parameters using the method of moments.
The process is fairly involved, but it does yield reasonable estimator values.
However, the sampling distributions for the estimator values have not been
obtained so confidence intervals cannot be constructed.
164 Reliability Engineering: Probabilistic Models and Maintenance Methods

8.5.2 Data Analysis under Step Stress Testing


Given the efficiencies associated with accelerated life testing, the idea of
using a sequence of increasingly severe stress levels has been considered
worthwhile for some components. The implementation of the idea is to test
at each stress level for a set time interval so that the test may be characterized
by the sequence {(s1, τ1), (s2, τ2), …, (sm, τm)} in which the si are the stress levels
and the τi are the change times [62,63]. For a sequence of three stress levels,
the progress of the test is illustrated in Figure 8.6. Note that the interval over
which each stress is applied is τi − τi−1.
There is a significant risk in the use of the step stress approach. The risk
is that an excessive stress level (an overstress) will be used and that this will
alter the nature of the failure process and thus change the life distribution.
Usually, the motivation for using a step stress test is to provoke failures very
rapidly in order to study a new design for which the failure behavior is not
yet understood. In this case, the risk of overstress and erroneous inference
is substantial. Thus, the use of a step stress regimen must be undertaken
carefully.
Assuming, as we did for a single stress, that the life distribution under
all stresses is shape invariant and that we avoid the application of an exces-
sive stress level, the data from a step stress test permit two useful analyses.
First, we can compute an acceleration factor for each stress level and map the
observed failure times to the life distribution under normal operating condi-
tions. Second, we can use the test data to estimate physical parameters such
as activation energy.

s3

s2

s1

τ1 τ2 τ3

FIGURE 8.6
Representation of a three-level step stress regimen.
Parametric Statistical Methods 165

If the acceleration model is already known or assumed and a set of test


data is obtained in the form of observed failure times t = (t1, t2, …, tn), then the
corresponding equivalent data set, x = (x1, x2, …, xn), is defined by

ì a1t j 0 < t j £ t1
ï a1t1 + a2 (t j - t1 ) t1 < t j £ t2
ïï
x j = í a1t1 + a2 (t2 - t1 ) + a3 (t j - t2 ) t2 < t j £ t3 (8.82)
ï ⋮ ⋮
ï
a t + a ( t - t
ïî 1 1 2 2 1 ⋯ + am (t j - tm -1 )
) + tm -1 < t j £ tm

Then, this data set may be analyzed using any of the parametric (or nonpara-
metric) methods described previously.
Consider an example. Suppose n = 25 copies of a device for which life
length is temperature dependent and the nominal operating temperature is
55°C are subjected to a step stress test with the regimen {(85°C, 40 h), (95°C,
100 h), (105°C, 60 h)}. Suppose further that the device has an activation energy
of 0.80 ev/°k and that the failure times recorded during the test are those
shown in Table 8.9. The corresponding equivalent failure times are also
shown in the table. The equivalent failure times are the life lengths to which
the accelerated ages correspond. Subjecting the equivalent failure times to
the method of moments estimation procedure for a Weibull distribution
yields the parameter estimates of bˆ = 0.835 and qˆ = 886.427.

TABLE 8.9
Example Data for a Step Stress Test
j tj xj j tj xj
1 1.316 14.056 14 52.155 689.435
2 3.411 36.427 15 56.620 785.784
3 6.031 64.396 16 61.629 893.898
4 9.122 97.403 17 67.305 1016.375
5 12.675 135.346 18 73.811 1156.797
6 16.702 178.343 19 81.387 1320.296
7 21.228 226.667 20 90.390 1514.574
8 26.290 280.722 21 101.389 1751.944
9 31.940 341.051 22 143.300 2723.871
10 38.242 408.348 23 160.333 3439.661
11 41.236 453.784 24 191.578 4752.684
12 44.524 524.749 25 No failure
13 48.146 602.917
166 Reliability Engineering: Probabilistic Models and Maintenance Methods

An alternate use of data generated in an accelerated life test under a step


stress regimen is the estimation of the parameters of the physical model of
failure. Specifically, we can compute jointly the parameter estimates and an
estimate for the activation energy. We can use the data in Table 8.9 to illus-
trate the computations. Assume the activation energy is not known and that
we have recorded only the “raw failure times,” tj.
Using the data from the first 40 hours of testing, we may use any of the
estimation methods discussed previously to obtain estimates of the distribu-
tion parameters. For this example, the graphical estimation methods yield
estimates of the distribution parameters at a temperature of 85°C of bˆ = 0.834
and qˆ = 87.632. Now, the failure times recorded after 40 hours and before
140 hours are obtained at a higher temperature so aging is accelerated rela-
tive to the temperature of 85°C. Assuming shape parameter invariance of
the life distribution, we can compute the extrapolated times at which the
reliability should take values corresponding to any of the observed survivor
counts. For example, the extrapolated reliability value at the time of the 13th
failure is obtained as
ˆ 12.7
= 0.5 = e ( e ,13
87.632 )
0.834
- t
FT (te ,13 ) = Þ te ,13 = 56.469
25.4

This is to say that if 85°C were the normal operating temperature and if the
Weibull distribution having parameters bˆ = 0.834 and qˆ = 87.632 were an
accurate model of device length, then the device population should have a
reliability value of 0.50 at a time near to te,13 = 56.469.
The shape invariance of the life distribution implies that following an ini-
tial operating period of length τ, age acceleration implies that observed and
equivalent ages are related by

a(t j - t) = te , j - t (8.83)

For the example time values earlier, the realization of this expression is

a( 48.146 - 40) = 56.469 - 40

So the acceleration factor produced by increasing the temperature to 95°C is


a = 2.021. Since the temperatures and acceleration factor are known, the use
of the Arrhenius acceleration equation yields
Ea æ 1 1 ö
ç - ÷
a = 2.021 = e K è T0 Ta ø
Þ Ea = 0.801

Naturally, once the activation energy is estimated, the characteristic life at


normal use temperature may be computed directly. For a normal use tem-
perature of 55°C, the acceleration at 85°C is 10.678 so qˆ = 935.734.
Parametric Statistical Methods 167

Note that the choice of the 13th failure time was arbitrary. If we had used
the 16th recorded failure time, the extrapolated failure time would have
been

ˆ 9.7
= 0.382 = e ( e ,16
/87.632 )
0.834
- t
FT (te ,16 ) = Þ te ,16 = 83.719
25.4

and as a result, the relation

a(61.629 - 40) = 83.719 - 40

also yields a = 2.021.


Finally, note that the same procedure works for other failure distributions.
The recorded failure data during the first stress interval allow the computa-
tion of initial parameter estimates and these can be used to obtain extrapo-
lated life lengths, which, in turn, permit the calculations of an estimate of the
acceleration factor.

Exercises
8.1 Assume the following data have been generated in the life test of a
component that is believed to have exponential life distribution. Use
the complete data set to obtain a graphical estimate of the distribution
parameter.

30.950 343.959 118.522 117.887 21.685


68.513 193.503 42.602 64.028 60.502
319.443 200.655 258.156 205.073 247.614
135.314 282.421 5.153 227.674 103.803
211.208 426.444 240.096 15.180 536.120
425.731 433.697 79.7491 258.614 26.505
541.114 8.043 75.666 272.753 50.566
247.343 379.959 220.658 415.477 211.587
123.526 324.697 54.672 67.085 77.181
61.702 1011.308 104.253 547.470 237.281

8.2 Assume that only the earliest 20 data values for Problem 8.1 are avail-
able and use them to construct the graphical estimate for λ.
8.3 Use the data set of Problem 8.1 to compute the method of moments
estimate for λ. Then compute a 95% confidence interval for the estimate
and for the reliability at 150 hours.
168 Reliability Engineering: Probabilistic Models and Maintenance Methods

8.4 Assume the following data have been generated in the life test of a
component that is believed to have exponential life distribution. Use
the complete data set to obtain a graphical estimate of the distribution
parameter.

158.033 9.964 247.922 324.953 984.249


579.983 268.967 211.775 252.635 185.474
142.382 524.376 455.421 268.403 9.362
73.521 6.956 145.474 9.527 58.538
799.290 274.426 1500.586 229.539 45.400
2609.448 175.002 50.447 252.747 847.613
711.720 491.442 982.752 2894.425 61.204
505.976 416.891 138.747 12.911 392.515
12.908 497.851 1280.381 213.168 648.615
302.120 780.516 109.058 875.372 197.162

8.5 Assume that only the earliest 20 data values for Problem 8.4 are avail-
able and use them to construct the graphical estimate for λ.
8.6 Use the data set of Problem 8.4 to compute the method of moments
estimate for λ. Then compute a 95% confidence interval for the estimate
and for the reliability at 500 hours.
8.7 Assume the following data have been generated in the life test of a com-
ponent that is believed to have Weibull life distribution. Use the complete
data set to obtain graphical estimates of the distribution parameters.

176.225 1058.727 918.823 29.835 203.809


252.424 695.579 68.348 1262.860 309.879
333.961 2.970 76.218 191.423 3094.748
141.948 75.239 1048.132 804.623 55.853
1989.75 1.340 641.953 152.969 126.097
1646.635 346.605 762.882 691.363 434.761
344.135 801.645 206.062 654.951 382.691
48.831 154.884 1122.269 409.087 170.046
480.549 103.444 593.04 1615.691 1880.473
131.215 724.292 224.793 470.408 957.271

8.8 Assume that only the earliest 20 data values for Problem 8.7 are avail-
able and use them to construct the graphical estimate for the distribu-
tion parameters.
8.9 Use the complete data set of Problem 8.7 to compute the method of
moments estimate for the distribution parameters.
8.10 Using the definitions of the matrix U and the vector V of Section 8.1,
derive Equations 8.7 and 8.8.
Parametric Statistical Methods 169

8.11 Assume the following data have been generated in the life test
of a component that is believed to have Weibull life distribution.
Compute the method of moments estimates of the distribution
parameters. Also compute the 95% confidence bounds on the param-
eter estimates.

170.998 216.222 335.464 456.731 635.655


465.023 204.064 100.790 459.482 369.012
254.582 228.195 453.539 420.944 312.489
319.789 528.971 82.843 306.064 196.092
285.048 270.256 356.053 216.330 72.393
307.343 255.021 181.568 137.704 22.150
318.541 302.217 93.694 180.638 302.257
242.783 117.524 314.594 159.855 114.434
458.005 70.280 282.015 231.442 68.381
130.900 93.151 172.954 203.094 200.899

8.12 Suppose a sample of 40 copies of a device were placed on test and that
the observed failure times are

1080.75 1860.35 1282.37 1861.67


1990.16 1218.02 1392.7 1447.29
727.969 1760.22 2025.54 1116.86
1396.47 1967.9 1849.47 2166.7
1459.98 1897.26 2634.16 1725.15
972.731 1081.11 1452.74 2141.54
1365.97 1891.58 1331.48 2058.84
1889.62 1907.9 822.497 1213.68
1401.61 1805.05 2171.56 1386.75
2296.4 1945.56 2546.97 991.637

Assume that the appropriate choice of life distribution model for the
devices is the Weibull distribution and use the graphical/regression
analysis method to compute estimates of the distribution parameters.
8.13 For the data of Problem 8.11, compute the maximum likelihood esti-
mates of the distribution parameters and the associated 95% confidence
intervals for those estimates.
8.14 Use the solutions of Problem 8.13 to compute confidence bounds on t0.10
and on FT (250).
8.15 Suppose the life test of Problem 8.11 had been censored so that only the
first 24 data values were observed. Compute the maximum likelihood
estimates for the distribution parameters.
170 Reliability Engineering: Probabilistic Models and Maintenance Methods

8.16 Assume the following data have been generated in the life test of a
component that is believed to have gamma life distribution. Use the
complete data set to obtain the method of moments estimates of the
distribution parameters.

2765.078 5218.866 1283.310 3893.011


4976.570 3472.228 1461.590 2779.887
1982.422 2262.938 777.843 2106.880
3038.784 838.939 4147.827 4622.490
2480.552 6357.559 2281.027 4269.130

8.17 Use the method of maximum likelihood to compute parameter


estimates for the data of Problem 8.16.
8.18 For the data of Problem 8.16, compute 95% confidence intervals on the
maximum likelihood parameter estimates obtained in problem 16 and
also on t0.10 and on FT (2500).
8.19 Suppose a sample of 40 copies of a device were placed on test and that
the observed failure times are:

798.846 965.81 776.534 294.581


723.834 2192.25 556.55 1168.84
294.037 959.446 1095.09 499.323
946.261 615.946 716.619 1158.25
655.747 1143.47 905.078 1264.29
1464.04 827.645 131.267 1584.32
1159.82 952.722 624.413 478.481
2074.19 1084.76 1458.15 194.864
789.998 1087.74 1461.84 347.816
1440.89 979.906 809.7 793.874

Assuming that the gamma distribution is an appropriate model for the


device failure times, use the method of moments with the tabulated
failure data to compute distribution parameter estimates.
8.20 Assume the following data have been generated in the life test of a
component that is believed to have normal life distribution. Use the
complete data set to obtain the maximum likelihood and the method
of moments estimates of the distribution parameters.

404.271 399.481 377.337 372.954


421.809 407.544 407.888 416.883
391.095 403.850 443.486 408.798
402.554 411.579 414.333 394.265
419.868 380.270 400.514 372.843
Parametric Statistical Methods 171

8.21 For the data of Problem 8.18, compute 95% confidence intervals on
the maximum likelihood parameter estimates and also on t0.10 and on
FT ( 425).
8.22 Suppose 42 copies of a device for which temperature-dependent life
length is believed to be well represented by a Weibull distribution were
subjected to a step stress accelerated life test with a stress regimen of
{(85°C, 40 h), (95°C, 80 h), (105°C, 40 h)} and that the data in the follow-
ing table were obtained. Compute estimates for the distribution param-
eters at the normal operating temperature of 85°C and for the activation
energy.

j tj j tj
1 0.3799 22 40.5757
2 0.3894 23 44.3383
3 0.5062 24 48.9265
4 1.3523 25 49.5716
5 2.9877 26 51.2090
6 3.0075 27 51.5310
7 3.1151 28 53.6824
8 3.4273 29 54.7999
9 4.3191 30 63.3631
10 6.2674 31 64.6060
11 6.6553 32 69.4093
12 8.3147 33 69.6571
13 13.8439 34 75.79811
14 15.7537 35 101.3423
15 16.2473 36 110.6855
16 17.7215 37 117.9570
17 22.3923 38 125.9469
18 23.1186 39 129.8984
19 24.5316 40 130.6095
20 26.7895 41 140.1335
21 36.8783 42 No failure
9
Repairable Systems I: Renewal
and Instantaneous Repair

Most modern equipment is designed under the assumption that it will be


maintained in some manner. Complicated equipment is usually expected
to operate for substantial lengths of time so upkeep and repair activities are
assumed to be part of the device operating experience. A familiar example
is the personal automobile that is comprised of many components, is expen-
sive, and is usually expected to operate properly for several years if serviced
appropriately. Military hardware, manufacturing machinery, and electrical
power generation equipment are other well-known examples of assets that
require maintenance support.
A comprehensive examination of the operation of repairable systems must
necessarily include a wide variety of equipment behaviors. We consider that
there are essentially two classes of maintenance—preventive and corrective—
and for each of these classes, there are numerous specific policy realizations.
There are also several classes of possible outcomes of maintenance activity.
In addition, the maintenance activity itself is subject to model representation.
The discussion of repairable systems starts here with a consideration of
corrective maintenance only. In addition, repair is assumed to be instanta-
neous and to yield device renewal. In the following chapters repair times
of random duration are considered, nonrenewal models are discussed, and
both instantaneous preventive maintenance and preventive maintenance of
random duration are treated.
A defining feature of the analyses of this chapter is that the state of the
system or component of interest is assumed to be the same as new follow-
ing a maintenance activity. The assumption of “good as new” implies device
renewal. As will be discussed subsequently, the assumption about the state
of a system following maintenance is one of the principal defining features
of models of the behavior of repairable systems. The foundations for both the
renewal and the nonrenewal cases are treated in this chapter.
To start our examination of models of the operation of repairable systems,
consider the following simple definition:

Definition 9.1

A repairable system is an equipment entity that is capable of being restored to


an operating condition following a failure.

173
174 Reliability Engineering: Probabilistic Models and Maintenance Methods

While simple, this definition permits us to clearly distinguish between


our models of life length prior to failure and the models we now develop to
represent periods of operation that may extend across several failures and
multiple life lengths.
Naturally, to build our models, we start with the most basic forms and then
expand their features. The logical first step in studying repairable systems
is to represent the simplest sequence of operating periods. We use a renewal
process for this model, so we begin with a review of some concepts from
renewal theory and some corresponding classifications of life distributions.
We then move on to the more complicated models.

9.1 Renewal Processes


A key feature in our study of repairable systems is that we change the way
in which we view the system and use probabilities. Rather than considering
simple life lengths and their associated distributions, we consider a series
of operating intervals and we say that the end points of these intervals form
a point process. Naturally, the duration probabilities for the intervals of the
point processes are based on the life distributions. The simplest general sto-
chastic model of a representative point process is that for a renewal process.
We start with this one. Formally, consider the following definition:

Definition 9.2

A renewal process is an indexed sequence of independent and identically dis-


tributed nonnegative random variables, say, T1, T2, ….

Conceptually, we think in terms of a device that is operated until it fails


at which time it is immediately replaced with an identical but new device
that is, in turn, operated until failure. For the moment, we consider that the
replacement is accomplished instantaneously so the duration of the replace-
ment activity can be ignored. Because they are mutually independent and
have the same distribution function on length, the sequence of operating
times forms a renewal process.
The earlier rather terse definition of a renewal process is assumed to imply
that the random variables of the sequence have a well-defined probability
distribution function and that this distribution function does not change
over time (or as the process continues). A distinction that is often made when
the process is a sequence of device operating times is that the first device in
use may be new or used. (The process may have started before we began
observing it.) If the first device is new, then T1 has the same distribution as
Repairable Systems I 175

all other Ti and the process is said to be an “ordinary” renewal process. On


the other hand, if the first device is used, it may not have exactly the same
life distribution as the remaining copies of the device. (It has a residual life
distribution.) In this case, the renewal process is said to be “modified” or
“delayed.” For now, consider only the ordinary renewal processes.
Note also that the definition of a renewal process is the starting point
for an extensive body of study of which reliability and equipment main-
tenance form only a small part. It is a rich model format that has many
applications.
A simple graphical representation of the point process corresponding to
the operation of a sequence of identical devices is shown in Figure 9.1. Note
that the duration of each of the successive intervals is denoted by Ti, while
the variables Sk represent the total time elapsed until the time of the kth
replacement. That is,
k
Sk = åT
i =1
i (9.1)

and we say that Sk is the duration of the interval over which k copies of the
device are operated to failure. As each of the Ti has the same probability dis-
tribution function, FT(t), the probability distribution on Sk is the k-fold convo-
lution of FT(t). We represent this as

FSk (t) = FT( k ) (t) (9.2)

where
t

ò
F (t) = FT( k -1) (t - u) fT (u)du (9.3)
(k )
T
0

and by convention, FT( 0 ) (t) = 1 for all values of time.


The conceptual application of this model form to individual system com-
ponents is straightforward. We may consider that the model represents the
sequence of copies of a component that are used in a particular position
(or slot) in a system structure. For example, we might consider that it repre-
sents the sequence of light bulbs used in a lamp.

T1 T2 T3

0 S1 S2 S3

FIGURE 9.1
Illustration of a renewal process.
176 Reliability Engineering: Probabilistic Models and Maintenance Methods

There are two natural questions that one can address using this basic
model. These are as follows: (1) How long until the kth failure? (2) How many
failures will occur over a fixed time interval? That is, how soon and how
many? It is easy to see that the answers to these questions form the basis for
planning the volume of spare parts that are purchased, the extent of invest-
ment in repair facilities and equipment, the levels of staffing that are estab-
lished and perhaps the extent to which substitute systems are acquired.
The general expression of these questions is that we wish to examine the
time until an arbitrary number of renewals have occurred and the probabili-
ties on the number of renewals over a defined time interval. The probabilities
for these two measures are related by the important fundamental relation

Pr[Nt ³ k ] = Pr[Sk £ t] (9.4)

where Nt represents the number of renewals that occur during the interval
[0, t]. This expression is a bit subtle and is worth pondering. An example
realization is that the number of renewals during 1000 hours can equal or
exceed 4 only if the time of the fourth renewal is equal to or earlier than
t = 1000 hours.
We exploit this relationship to address the questions of how soon and
how many. Note also that we are really considering a measure in the “time
domain” and a measure in the “frequency domain” and that Equation 9.4
is the bridge between the two. One further point is that once we have used
Equation 9.4 to address the questions of time until the kth renewal and the
number of renewals over time, we can use the results of that analysis to deter-
mine (1) the expected number of renewals during an interval, (2) the identity
of the renewal density, (3) higher moments of the distribution on renewals,
(4) the distribution of backward recurrence times, and (5) the distribution on
forward recurrence times. The forward recurrence time is the time from an
arbitrary point in time until the next event (failure) and the backward recur-
rence time is the time that has elapsed since the last event.
To exploit Equation 9.4, we assume that we know the distribution FT(t) on
the length of the individual operating intervals. In principle, this means we
also know the distribution on Sk as we presume that we can construct the
convolution of FT(t). Then, to determine the probability distribution on the
number of renewals, we use

Pr[Nt = n] = Pr[Nt ³ n] - Pr[Nt ³ n + 1] = FT( n ) (t) - FT( n +1) (t) (9.5)

Observe that one realization of Equation 9.5 is

Pr[Nt = 0] = FT( 0 ) (t) - FT(1) (t) = 1 - FT (t) = FT (t)

Now, to proceed with the identification of the specific forms of the distribu-
tions on Nt and Sk, we must specify FT(t). The most well-known construction
Repairable Systems I 177

of this type, and one of the few that is tractable at all, starts with the selection
of the exponential distribution for FT(t). In this special case, the point process
is called a Poisson process.
If the lengths of the individual intervals are exponential in distribution, then
the variable Sk has a gamma distribution. This result is relatively well known,
but its construction is repeated here to remind the reader of the method of
analysis. We start with the statement of the exponential distribution as
FT (t) = 1 - e - lt
and we state the corresponding Laplace transform for its density function as
¥ ¥
l
ò
0
ò
fT*(s) = L fT (s) = L ( fT (t) ) = e - st fT (t)dt = le - st - lt dt =
0
s+l

so the transform for the distribution function is

l
FT*(s) = L ( FT (t) ) =
s(s + l)

Often, constructing the transform for the density is easier than doing so for
the distribution function.
The transform for the convolution is the product of the transforms for the
distributions included in the convolution. This means that the transform for
the distribution on Sk is the product of k identical terms, each of which is the
transform on the exponential distribution. Thus,

( )
k
k
æ l ö
fS*k (s) = fT*(s) =ç ÷
è s+l ø

This is the transform for a gamma distribution so

l k k - 1 - lt
fSk (t) = t e
G(k )

and since k is clearly an integer, we have


¥
(l t ) j
FSk (t) = å G( j + 1) e
j=k
- lt
(9.6)

The corresponding realization of Equation 9.5 is


¥ ¥
(l t ) j - lt (l t ) j - lt ( l t )n - l t
Pr[Nt = n] = F (t) - F
( n)
T
( n + 1)
T (t) = å
j=n
G( j + 1)
e - å
j = n +1
G( j + 1)
e =
G(n + 1)
e

(9.7)
178 Reliability Engineering: Probabilistic Models and Maintenance Methods

In other words, if the durations of the individual operating intervals are


exponentially distributed, then the time until the nth renewal is gamma dis-
tributed and the number of renewals over a fixed time interval has a Poisson
distribution.
The example of the exponential case is a bit misleading. In general, the
construction of the distribution on the number of renewals is much more
difficult. In fact, there are many cases in which one cannot obtain a trac-
table solution. Nevertheless, the review of the procedure is worthwhile, and
it leads us to examine the expectation on the number of renewals that can
often be obtained even if the distribution cannot be constructed.
The expected number of renewals over a fixed time interval is called
the renewal function. It is the expectation for the distribution on Nt and is
defined as
¥ ¥ ¥
MFT (t) = E[Nt ] = ån=0
n Pr[Nt = n] = å(
n=0
n FT( n ) (t) - FT( n +1) (t) = ) åF
n =1
( n)
T (t) (9.8)

The analysis of the renewal function is often accomplished by first trans-


forming it into a recursive form:
¥ ¥ ¥
MFT (t) = å
n =1
FT( n ) (t) = FT(1) (t) + å
n=2
FT( n ) (t) = FT (t) + åF j =1
( j + 1)
T (t)

¥ t t ¥
= FT (t) + åò j =1 0
F (t - u) fT (u)du = FT (t) +
( j)
T
òåF0 j =1
( j)
T (t - u) fT (u)du

ò
MFT (t) = FT (t) + MFT (t - u) fT (u)du
0
(9.9)

Equation 9.9 is the very well-known fundamental form called the “key
renewal theorem.” Very often, this form serves as the basis for the analysis of
a renewal process. The main reason for its extensive use is that the associated
Laplace transform yields a direct relationship between the renewal function
and the underlying distribution of the process FT(t). That relationship is

æt ö
ç
è0
ò
MF*T (s) = FT*(s) + L ç MFT (t - u) fT (u)du ÷ = FT*(s) + MF*T (s) fT*(s)
÷
ø
(9.10)

The transform for the integral is shown in Appendix C. Then, we have

FT*(s) FT*(s)
MF*T (s) = = (9.11)
1 - f T*(s) 1 - s FT*(s)
Repairable Systems I 179

and equivalently,

MF*T (s) MF*T (s)


FT*(s) = = (9.12)
1 + sMF*T (s) 1 + mF*T (s)

Of course, the utility of these results depends upon our ability to invert the
transforms in any particular application.
Note that in Equation 9.12, the lowercase m represents the transform of the
renewal density. The renewal density is the first derivative of the renewal
function. As a derivative, the renewal density necessarily represents a rate. It
is the rate at which renewals occur. Thus, it is the rate at which the number of
failures (renewals) increases. It is the “rate of failure.” The quotation marks
are intended to signal the fact that there is considerable opportunity for mis-
understanding here.
Some people refer to the hazard function for a life distribution as the failure
rate while others use the term failure rate to mean the renewal density. These
two entities are not the same conceptually, and except for the case of the
exponential distribution, they are not the same algebraically. As explained
in Chapter 4, the hazard function is the conditional probability of failure for
members of a population given survival to any time. As shown previously, the
renewal density is the unconditional probability of another event in a sequence
of events. In an effort to clearly distinguish between the two entities, the term
failure rate is not used here. The hazard function is called the hazard func-
tion or hazard rate and the renewal density is called the renewal density, the
intensity function, or the failure intensity. In fact, the most appropriate label
for the renewal density (aside from renewal density) is the failure intensity.
The renewal density really is the intensity with which new renewals occur.
Algebraically, we can represent the derivative as
d
mFT (t) = MFT (t) (9.13)
dt
and we note that the derivative extends to both Equation 9.8 and the key
renewal theorem. Hence, we have
¥
mFT (t) = åf
n =1
( n)
T (t) (9.14)

ò
mFT (t) = fT (t) + mFT (t - u) f T (u)du
0
(9.15)

and

fT*(s) mF*T (s)


mF*T (s) = and fT*(s) = (9.16)
1 - fT*(s) 1 + mF*T (s)
180 Reliability Engineering: Probabilistic Models and Maintenance Methods

Clearly, depending upon their relative complexity, one may work with either
the renewal function or the renewal density.
The single case in which the analysis is relatively straightforward is the
exponential case. As indicated earlier, the forms for the Laplace transforms
for the exponential are
l l
fT*(s) = and FT*(s) =
s+l s(s + l)

so
l l
mF*T (s) = and MF*T (s) =
s s2

In this case, the inversion of the transform is easy and yields


mFT (t) = l and MFT (t) = lt (9.17)

Note that these results imply that when the individual interval lengths are
exponentially distributed, the expected number of renewals (failures) over
an interval of length t is λt and the intensity with which new failures occur
is λ. Thus, for the Poisson process, the failure intensity is constant.
For many other life distributions, the analysis of the renewal function is
rather difficult. In those cases, there are some basic results that can be useful.
A few of those results are
t
MFT (t) ³ -1
EF [T ]
N 1
lim t =
t ®¥ t EF [T ]
(9.18)
M (t) 1
lim FT =
t ®¥ t EF [T ]
x
lim ( MFT (t + x) - MFT (t) ) =
t ®¥ EF [T ]
where the notation EF[T] is used to represent the mean of the distribution on
the lengths of the individual intervals and it is assumed that mean is finite.
As an example, applying these expressions to the exponential distribution
for which the expected value is 1/λ yields

MFT (t) ³ lt - 1
N
lim t = l
t ®¥ t

M (t)
lim FT = l
t ®¥ t
lim ( MFT (t + x) - MFT (t) ) = lx
t ®¥
Repairable Systems I 181

The corresponding results for a Weibull life distribution having shape


parameter β = 1.5 and scale parameter θ = 2500 hours are

t
MFT (t) ³ -1
2256.86

Nt 1
lim = = 4.43 ´ 10 -4
t ®¥ t 2256.86

MFT (t)
lim = 4.43 ´ 10 -4
t ®¥ t

x
lim ( MFT (t + x) - MFT (t) ) =
t ®¥ 2256.86

Note particularly the fact that Equation 9.18 applies to nearly any choice of
distribution so the previous results can be very useful. The last of the four
relationships is known as Blackwell’s theorem. It is well treated in Feller [64].
In effect, that last relationship states that as time advances, a renewal pro-
cess “settles down” and experiences renewals about once per EF[T] so the
expected number of renewals during any interval is the length of the inter-
val divided by the mean of the underlying distribution.
To close this discussion, note that a sequence of intervals over which a
series of copies of a component are used in a machine can reasonably be
represented using the renewal process model provided each interval has
the same stochastic characteristics. When one is examining components
and even some modules, this is often the case. For other levels of equip-
ment aggregation and other types of operational profiles, we will modify the
renewal models later in this text. Note also that several of the most popular
choices of distribution for representing operating durations yield renewal
functions that are impossible or quite taxing to analyze. However, modern
computing power has made these distributions much more manageable. The
example of the Weibull is treated at the end of the next section on the basis of
a numerical strategy that is described in Appendix B.

9.2 Classification of Distributions and Bounds


on Renewal Measures
The next topic we should consider as a foundation for our examination of
repairable systems is a reclassification of the basic life distributions and
some associated bounds on the numbers of renewals. The idea here is that
the length of each of the stochastically identical operating intervals is well
182 Reliability Engineering: Probabilistic Models and Maintenance Methods

represented by one of the life distributions with which we are already famil-
iar. We now describe those distributions in terms that are meaningful for a
series of intervals. To start, we compare life lengths of new and used devices.

Definition 9.3

A life distribution, FT(t), is said to be new better than used (NBU) if

FT (t + u) £ FT (t)FT (u) (9.19)

and to be new worse than used (NWU) if

FT (t + u) ³ FT (t)FT (u) (9.20)

In other words, an NBU component has a life distribution that assigns a


higher probability of survival for a fixed time to a new device that it does for a
device that has already been used. An algebraic equivalent to Equation 9.19 is
FT (t + u)
FT (t + u|t) = Pr[T ³ t + u|T > t] = £ FT (u) (9.21)
FT (t)

For an NWU device, the direction of the inequality is reversed.


As with our original classification of life distributions, a weaker classifica-
tion than NBU and NWU has also been defined. This is as follows:

Definition 9.4

A life distribution, FT(t), is said to be new better than used in expectation (NBUE) if
¥

ò F (u)du £ E [T]F (t)


t
T F T (9.22)

and to be new worse than used in expectation (NWUE) if


¥

ò F (u)du ³ E [T]F (t)


t
T F T (9.23)

This definition may also be considered consistent with intuition as rearrang-


ing the condition (9.22) yields
¥
1
FT (t) ò
FT (u)du £ EF [T ]
t
(9.24)
Repairable Systems I 183

for NBUE and the inequality is reversed for an NWUE component.


Equation  9.24 says that the mean of the residual life distribution for a
used component is shorter than the mean of the life distribution for a new
component.
The two final points concerning classifications of distributions are that
(1) there are numerous additional classifications that are not treated here and
(2) the classifications that we have defined yield a natural nested ordering
that can be useful. An example that is a classification not treated here is the
increasing mean residual life. The nested ordering is the increasing failure
rate (IFR) ⊆ IFRA ⊆ NBU ⊆ NBUE and the decreasing failure rate (DFR) ⊆
DFRA ⊆ NWU ⊆ NWUE.
The primary reason we define the classes of life distributions is that the
classifications permit us to define bounds on renewal behaviors that we
would not otherwise be able to obtain. Keep in mind the fact that when
we model the failure and renewal behavior of a device with a distribution
other than the exponential, the renewal expressions may become intractable.
In some such cases, we may use the exponential forms to obtain various
bounds. Here are the most commonly used examples of these bounds:

1. If FT(t) is IFR and has finite mean EF[T], then for 0 ≤ t < EF[T],

FT (t) ³ e -t/EF [T ] (9.25)

2. If FT(t) is IFR and has finite mean EF[T], then for 0 ≤ t < EF[T],

( t/EF [T ])
n -1 j
( n)
F (t) £ 1 -
T å j =0
j!
e -t/EF [T ] (9.26)

3. If FT(t) is IFR and has finite mean EF[T], then for 0 ≤ t < EF[T],

( t/EF [T ])
¥ j

Pr[Nt ³ n] £ åj=n
j!
e -t/EF [T ] (9.27)

4. If FT(t) is NBU and has cumulative hazard function ZT(t), then for t ≥ 0,
n -1 j
( ZT (t))
Pr[Nt < n] ³ å j =0
j!
e - ZT (t ) (9.28)

5. If FT(t) is IFR and has cumulative hazard function ZT(t), then for t ≥ 0,

( nZ (t/n ) )
j
n -1
T
Pr[Nt < n] £ åj =0
j!
e - nZT (t/n ) (9.29)
184 Reliability Engineering: Probabilistic Models and Maintenance Methods

6. If FT(t) is NWU and has cumulative hazard function ZT(t), then for t ≥ 0,

n -1 j
( ZT (t))
Pr[Nt < n] £ å
j =0
j!
e - ZT (t ) (9.30)

7. If FT(t) is DFR and has cumulative hazard function ZT(t), then for t ≥ 0,

j
n -1
( nZ (t/n ) )
T
Pr[Nt < n] ³ å
j =0
j!
e - nZT (t/n ) (9.31)

8. If FT(t) is NBUE and has finite mean EF[T], then for t ≥ 0,

MFT (t) £ t/EF [T ] (9.32)

9. If FT(t) is NWUE and has finite mean EF[T], then for t ≥ 0,

MFT (t) ³ t/EF [T ] (9.33)

Consider the interpretations and some examples of these set of


conditions.

Suppose we are studying the behavior of a population of components for


which the Weibull distribution provides a good model of the dispersion in
life lengths. Suppose further that we have estimated the parameters of that
Weibull distribution to be β = 2.75 and θ = 4000 hours. For this IFR distribu-
tion, the mean life is EF[T] = 3559.43 hours so the reciprocal of the mean life
is 2.809 × 10−4 and Equation 9.25 states that for t ≤ 3559.43,

FT (t) ³ e -t/3559.43 = e -0.0002809t

so

FT (1000) ³ e -( 0.0002809)(1000 ) = 0.755

In this case, we can compute the reliability at 1000 hours to be 0.978. Perhaps
more interesting is the fact that the probability that the third component life
length is completed by 5000 hours is bounded by Equation 9.26 as

( t/3559.43 )
2 j
(3)
F (2000) £ 1 -
T åj =0
j!
e -t/3559.43 = 0.0195
Repairable Systems I 185

On the other hand, Equation 9.27 indicates that the probability of 3 or more
renewals in 5000 hours is bounded by the same quantity. That is,

Pr[N 2000 ³ 3] £ 0.0195

For the same device population, Equations 9.28 and 9.29 indicate that

( ZT (t))
2 j

0.718 = å
j =0
j!
e - ZT (t ) £ Pr[N 5000 < 3]

( 3Z (1.25/3 ))
j
2
T -3 ZT ( 1.25/3 )
£ åj =0
j!
e = 0.997

Finally, for the same population, Equation 9.32 states that


t t
MFT (t) £ =
EF [T ] 3559.43

so
5000
MFT (5000) £ = 1.405
3559.43

Note further that Equation 9.32 can be combined with the first of the expres-
sions of (9.18) to yield

t t
- 1 £ MFT (t) £
EF [T ] EF [T ]

so
0.405 £ MFT (5000) £ 1.405
Naturally, we can perform similar example computations for Equations 9.30,
9.31, and 9.33. If our Weibull population had a shape parameter of β = 0.75
rather than 2.75, the mean life would be 4762.56 hours, ZT(2000) = 0.595, and
ZT(6000) = 1.355 so Equation 9.30 indicates that

Pr[N 2000 < 3] £ 0.977 and Pr[N 6000 < 3] £ 0.844

On the other hand, Equation 9.31 indicates that

Pr[N 2000 < 3] £ 0.955 and Pr[N 6000 < 3] £ 0.735

In the case of Equation 9.33, we obtain

MFT (2000) ³ 0.420 and MFT (6000) ³ 1.260


186 Reliability Engineering: Probabilistic Models and Maintenance Methods

These examples illustrate the information one can obtain in cases in which
the convolutions or renewal equations are computationally difficult.
As noted earlier, modern computing power has made previously tax-
ing computations much more manageable. For example, the renewal func-
tion for the Weibull distribution cannot be expressed in closed form at all.
However, Lomnicki [65] defined an equivalent infinite series expansion
for the Weibull renewal function. The series is exact until it is truncated
to finitely many terms in which case it provides an approximation that is
often quite accurate. The series form is provided in Appendix B. When
the series is truncated at 15 terms and the value of the shape parameter is
β = 2.75, we obtain MFT (1000) = 0.022, MFT (2000) = 0.140, MFT (6000) = 1.272, and
MFT (8000) = 1.821. For the case in which β = 0.75, we obtain MFT (1000) = 0.370,
MFT (2000) = 0.641, MFT (6000) = 1.585, and MFT (8000) = 2.029.
In closing this discussion, it is appropriate to note that the computational
effort involved in using Lomnicki’s method depends on the time over which
the renewal function is evaluated. Truncating the infinite series after rela-
tively few terms yields quite accurate values for MFT (t) when t is smaller than
around 2EF [T ]. As the time interval is increased, more terms must be included
in the truncated sum and the computational effort grows exponentially. For
the values provided here, 15 terms were more than adequate for all of the val-
ues other than MFT (8000), and for that value, 18 terms were required.

9.3 Residual Life Distribution


In some cases, we wish to examine the behavior of a device after it has been
operating for some period of time. The basic concept of a residual life distri-
bution was developed in Chapter 4. The question of how much life remains to
the device is a key question and is reasonable within the context of a renewal
process as well as within the context of the life distribution. In addition, the
residual life distribution can also be used to define bounds on renewals.
When examining a renewal process, we might reasonably ask the question
of how long it is likely to be until the next renewal in view of the fact that the
previous renewal took place a particular number of hours ago. This is the
same question as how long will a device of a given age continue to function.
For a device having a known age, say, τ, the residual life distribution is the
probability distribution on the longevity of the device from age τ onward.
We may consider the residual life distribution as the conditional distribu-
tion on additional life length given survival to age τ. Thus, if we denote the
residual life distribution for a device of age τ by FU(τ)(u), we can say that

FT (u + t) - FT (t) FT (t) - FT (u + t) F (u + t)
Pr[U (t) £ u] = FU ( t) (u) = = = 1- T (9.34)
FT (t) FT (t) FT (t)
Repairable Systems I 187

and the corresponding reliability statement is

FT (u + t)
Pr[U (t) > u] = FU ( t) (u) = (9.35)
FT (t)

Of course, in both of these expressions, FT(t) is the underlying life distribution.


Consider two examples. First, if the underlying life distribution is exponen-
tial, the memoryless property of the distribution surfaces again and we find
that the residual life distribution is the same as the life distribution. That is,

FT (u + t) e - l(u + t)
FU ( t) (u) = 1 - = 1 - - lt = 1 - e - lu
FT (t) e

If the underlying life distribution is Weibull with shape parameter β = 2.75


and scale parameter θ = 4000 hours, then at 500 hours after a renewal, the
distribution on the time to the next renewal is
2.75
æ u + 500 ö
-ç ÷ æ u ö
2.75
F (u + t) e è 4000 ø -ç + 0.2 ÷
FU ( t = 500 ) (u) = 1 - T = 1- 2.75 = 1 - 1.012e
è 4000 ø
FT (t) æ 500 ö
-ç ÷
e è 4000 ø

so FU(τ = 500)(1000) = 0.208 and FU(τ = 500)(2500) = 0.806. The corresponding reli-
ability values are the complements of the failure probabilities.
Here is another way to look at the residual life distribution. If we consider
the progress of a renewal process over the time domain, the age, say, τ, of a
functioning device at any point in time, say, t, is a random variable. The value
of the random variable, device age may be represented by

t = t - SNt (9.36)

and the time until the next renewal is


u(t) = SNt +1 - t (9.37)

Given these definitions, we can represent the probability that at any point
in time, t, the residual life exceeds a specific value, u, as the probability that
the first device in the process survives beyond t + u plus the probability that a
renewal occurred at some point in time prior to t and the device started at
that time survives longer than t + u. That is,
t

ò
FU ( t) (u) = Pr[U (t) > u] = FT (t + u) + FT (t + u - x)mFT ( x)dx
0
(9.38)

Notice the similarity of Equation 9.38 to the key renewal theorem.


188 Reliability Engineering: Probabilistic Models and Maintenance Methods

The approach to the analysis of Equation 9.38 is the same as that for the
key renewal theorem and can be just as taxing. Nevertheless, there are some
useful results we can obtain. First, the limiting form of the residual life dis-
tribution at any point in time is

u
1
FU (u) =
ò
EF [T ]
0
FT ( x)dx (9.39)

Note that this form does not depend upon the age of the operating device. As
an example, consider again the Weibull distributions having scale parameter θ =
4000 hours and shape parameters β = 2.75 and β = 0.75. The corresponding limit-
ing residual life distributions of Equation 9.39 are shown in Figure 9.2a and b,

1.0

0.8

0.6
F

0.4

0.2

2,000 4,000 6,000 8,000 10,000 12,000


(a) Time

1.0

0.8

0.6
F

0.4

0.2

2,000 4,000 6,000 8,000 10,000 12,000


(b) Time

FIGURE 9.2
(a) Limiting forms of residual life distributions. (b) The corresponding underlying life
distributions.
Repairable Systems I 189

where the underlying life distributions are shown. The contrast in the behaviors
of the residual life distributions and the life distributions is striking. In addition,
observe that the limiting forms seem to fit our intuition concerning their individ-
ual and their compared behaviors as the upper curve corresponds to the larger
value of β.
As indicated in Chapter 8, Cox [61] has shown that the mean of this distri-
bution will be

EF2 [T ] + VarF [T ]
EU [U ] = (9.40)
2EF [T ]

In addition, we have the following additional conditions:

1. If FT(t) is NBU, then FU (u) £ FT (u).


2. If FT(t) is NWU, then FU (u) ³ FT (u).

Here again, these apply at any point in time.


To close this discussion, consider the following points. First, looking
forward from any arbitrary point in time, the time until the next renewal
is called the forward recurrence time. We have just obtained the primary
results related to that quantity. Second, from any point in time, we may also
look back and ask how long has it been since the most recent renewal. This
quantity is called the backward recurrence time. Cox [61] has shown that the
forward recurrence time and the backward recurrence time have the same
distributions so the results described here apply to both. Finally, a key point
here is that it is the use of the renewal density in Equation 9.38 that permits
us to analyze the residual life (and recurrence times).

9.4 Conclusion
The models presented in this chapter serve to highlight the questions one
should consider in the study of repairable systems. For some systems, it
is the periods of operation that are the greatest concern and the duration
of repair is either negligible or unimportant. For those systems, treating
repair as instantaneous is appropriate. Similarly, the study of individual
components and the analysis of some systems may reasonably be based
on renewal processes. Finally, the operating performance of many systems
is improved by the use of preventive maintenance, while for some other
systems, preventive maintenance is unproductive or impossible. As we
study specific equipment items, we should tailor our models in terms of
these operating features. The models presented in this chapter apply to the
190 Reliability Engineering: Probabilistic Models and Maintenance Methods

instantaneous repair case with renewal. More important, they establish the
basic approaches to model formulation and analysis and emphasize the
choices we must make.

Exercises
9.1 Construct the Laplace transform for the Poisson distribution. Use the
transform to identify the distribution on a sum of two identically dis-
tributed Poisson random variables.
9.2 Construct the Laplace transform for the normal probability density
function. Use the transform to show that the sum of independent nor-
mal random variables has a normal distribution.
9.3 Using the transform obtained in Problem 9.2, state the transform for
the renewal density and the renewal function for the normal renewal
process.
9.4 If the length of the operating intervals in a renewal process is well
modeled by an exponential distribution having parameter λ = 0.004,
what is the probability that the number of failures during 1250 hours
will exceed 7?
9.5 Consider a renewal process in which times between failures have a
normal distribution with a mean of 400 hours and a standard devia-
tion of 50 hours. Construct the functional form for FSk (t) = FT( k ) (t) and the
specific realization of this function when k = 3. Compute the mean and
standard deviation for the distribution on S3.
9.6 For the distribution described in Problem 9.5, construct the bounds
of Equation 9.18 using t(or x) = 1000 hours. Then compute the bounds
of Equations 9.25, 9.26, 9.27, and 9.32 for t = 100 hours.
9.7 Consider a Weibull distribution for which β = 0.60 and θ = 1000 hours.
For this distribution, compute the bounds at 2000 hours defined by
Equations 9.18, 9.30, 9.31 (for n = 3), and 9.33.
9.8 Let FT(t) be a Weibull distribution with β = 1.75 and θ = 800 hours.
For this distribution, compute the bounds at 1500 hours defined by
Equation 9.18.
9.9 For the Weibull distribution of Problem 9.8, compute the bounds of
Equations 9.25 through 9.29 and 9.32 at t = 500 hours and n = 3.
9.10 Prove that an IFR distribution is NBU.
9.11 Consider a Weibull distribution for which β = 0.60 and θ = 1000 hours.
Use Lomnicki’s method to obtain approximate values for the renewal
function at 1000, 2000, and 5000 hours.
Repairable Systems I 191

9.12 For the distribution of Problem 9.8, use Lomnicki’s method to obtain
approximate values for the renewal function at 2000, 4000, and 5000
hours.
9.13 Let FT(t) be a Weibull distribution with β = 1.75 and θ = 800 hours and
let u(τ) be the residual life at age τ of a component having this life distri-
bution. Compute the reliability at u = 100, u = 1000, and u = 1500 hours
for devices that have achieved ages of τ = 500 hours, τ = 1000 hours, and
τ = 2000 hours.
9.14 Let FT(t) be an NBU distribution and let u(τ) be the residual life at age τ
of a component having life distribution FT(t). Show that EU[U(τ)] ≤ EF[T]
where EF[T] is the mean of FT(t).
9.15 Consider a population of devices for which the gamma distribution
having β = 3 and λ = 0.005 is used to model life length. For the elements
of that population that have survived for 200 hours, compute the value
of the survivor function for the residual life distribution at t = 475 hours
(τ = 200, t + τ = 475). If it were decided that a more accurate parameter
set for the gamma distribution model is β = 3.6 and λ = 0.006, what
would be the corrected value of GT (t = 475|t = 200)?
10
Repairable Systems II: Nonrenewal
and Instantaneous Repair

As indicated in the previous chapter, there are many types of devices for
which repair implies renewal. For those cases in which we are studying
individual components in a specific equipment “slot,” treating component
repair as a component renewal point is clearly appropriate. For some other
systems, repair corresponds either exactly or approximately to system
renewal, so the models described in the preceding chapter provide a rea-
sonable portrayal of operating behavior.
On the other hand, there are many types of devices for which a repair does
not return the unit to a new condition. There are also large complex systems,
such as automobiles, for which the replacement of a few of its many compo-
nents does not appreciably change the “age” of the system. For equipment of
this sort, the unit is not as good as new following repair, so unit age follow-
ing repair may not be taken to be zero. Clearly, the state of a device following
its repair determines whether or not a sequence of operating periods is well
modeled by a common distribution. When system age following repair is
nonzero, successive operating periods do not have a common distribution
and the renewal model does not apply.
Several models based on nonstationary processes have been suggested for
those devices that are not renewed by repair. We shall explore some of them
here. While doing so, we will continue to assume that repair is instantaneous
or, equivalently, that the operating periods are our key concern and repair
intervals are negligible or unimportant.
The idea of a nonstationary process is that we have a sequence or oper-
ating intervals, each of which ends with unit failure. As in the case of the
renewal process, we denote the lengths of the intervals by T1, T2, T3, …, but for
the nonstationary process, each interval has a distinct distribution. Several
useful models have been developed to treat the nonstationary sequence of
operating intervals. Three of them are treated here.

193
194 Reliability Engineering: Probabilistic Models and Maintenance Methods

10.1 Minimal Repair Models


The earliest model of system behavior for the case in which repair does
not imply renewal was suggested by Barlow and Hunter [66]. Their model
is called the “minimal repair” model as it is constructed by assuming that
when failure of a device occurs, the unit is repaired and is placed back in
operation without any change in its age. That is, the repaired unit has the
same age and the same failure hazard following repair as it had just prior to
failure. It is “as bad as old.” Consequently, over time, the restart points form
a nonstationary counting process.
The algebraic form of this model follows directly the construction of a
counting process. In fact, because the number of failure events has a “Poisson
type” of probability structure, the model is the nonhomogeneous Poisson
process (NHPP) model. The construction starts with the time process. We
denote the number of failures over the interval (0, t] by Nt and let

L(t) = E[Nt ] (10.1)

represent its expected value. We call Λ(t) the “cumulative rate function” for
the process and we also define the failure intensity function, λ(t), so that
t

ò
L(t) = l(u)du
0
(10.2)

In general, for an NHPP, the integrand of Equation 10.2 may be any function,
but for reliability applications, we will take it to be the hazard function of the
underlying life distribution for the device of interest.
The definition of the accumulating operating time, Sk, is the same as
defined in Equaion 9.1 and the basic relationship between the variables in the
time domain and those in the frequency domain (Equation 9.4) still applies.
Therefore, we can say that in the frequency domain,

( L(t + x) - L(t))
k

e (
- L ( t + x ) - L ( t ))
Pr[Nt + x - Nt = k ] = (10.3)
k!

and in the time domain,

Pr[Tk +1 - Tk > t] = e ( k
- L ( T + t ) - L ( Tk ))
(10.4)

Note that special cases of Equations 10.3 and 10.4 are

( L(t))
n

Pr[Nt = n] = e - L (t ) and Pr[T1 > t] = e - L(t ) = FT (t)


n!
Repairable Systems II 195

In many respects, this is a very appealing model. It is relatively easy to use


and to analyze. One needs to only specify a failure intensity function and this
is reasonably obvious as the choice must be consistent for T1 so by implication
L(t) = ZT (t) (10.5)
which is to say the cumulative failure intensity function must correspond to
the cumulative hazard function of the device life distribution. Reversing this
logic, if the failure intensity corresponds to the device hazard function, then
the NHPP model provides the appropriate representation of system behavior.
In this case, Equations 10.3 and 10.4 yield the system performance information
of interest. Furthermore, direct application of Equation 9.4 to the NHPP per-
mits the construction of the distributions on Sk and its expected value. That is,
¥ ¥
( L(t))
j

FSk (t) = Pr[Sk £ t] = Pr[Nt ³ k ] = åj=k


Pr[Nt = j] = å
j=k
j!
e - L (t ) (10.6)

so
k -1 k -1
d
fSk (t) = FSk (t) =
( L(t)) æ d L(t) ö e - L(t ) = ( L(t)) zT (t) e - L(t ) (10.7)
ç ÷
dt (k - 1)! è dt ø (k - 1)!

In addition, since the expectation of any random variable, say, y, may be com-
puted using either

E[Y ] =
ò yf (y)dy
Y
Y or E[Y ] = FY ( y )dy
ò
Y

we may compute E[Sk] as

¥ ¥ k -1
( L(t)) zT (t) - L(t )
ò
E[Sk ] = tfSk (t)dt = t
0
ò
0
(k - 1)!
e dt (10.8)

or as
¥ ¥ k -1 k -1 ¥
( L(t)) ( L(t))
j j

ò
E[Sk ] = FSk (t)dt =
0
òå
0 j =0
j!
e - L(t )dt = åò
j =0 0
j!
e - L((t )dt (10.9)

Two further points here are (1) that the realization of FSk(t) for k = 1 implies
that E[Nt] = Λ(t) and (2) that since E[X + Y] = E[X] + E[Y] regardless of whether
the variables are independent,

E[Sk ] = E[Sk - Sj + Sj ] = E[Sk - Sj ] + E[Sj ]


196 Reliability Engineering: Probabilistic Models and Maintenance Methods

so

E[Sk - Sj ] = E[Sk ] - E[Sj ]

for k > j. Finally, observe that just as with the renewal models, Equation 10.9
can be used to define the calculation of E[Tk] as E[Sk] − E[Sk−1].
b
Consider the example of a device for which L(t) = ( t/q ) with θ = 4000
hours and β = 2.75. Then, Λ(2000) = 0.149, Λ(4000) = 1.0, Λ(6000) = 3.049, and
Λ(8000) = 6.727. Also, using Equation 10.3, we obtain results such as

- L ( 2000 ) - L ( 0 )) ( L( 2000) - L(0) )


2

Pr[N 2000 = 2] = e ( = 0.0095


2!

- L ( 2000 ) - L ( 0 )) ( L( 2000) - L(0) )


Pr[N 2000 = 1] = e ( = 0.1281
1!

- L ( 2000 ) - L ( 0 )) ( L( 2000) - L(0) )


0

Pr[N 2000 = 0] = e ( = 0.8619


0!

- L ( 4000 ) - L ( 2000 )) ( L( 4000) - L( 2000) )


2

Pr[N 4000 - N 2000 = 2] = e ( = 0.1547


2!

- L ( 4000 ) - L ( 2000 )) ( L( 4000) - L( 2000) )


1

Pr[N 4000 - N 2000 = 1] = e ( = 0.3634


1!

( L(4000) - L(2000))
0
- ( L ( 4000 ) - L ( 2000 ))
Pr[N 4000 - N 2000 = 0] = e = 0.4268
0!

- L ( 6000 ) - L ( 4000 )) ( L(6000) - L( 4000) )


2

Pr[N 6000 - N 4000 = 2] = e ( = 0.2705


2!

- L ( 6000 ) - L ( 4000 )) ( L(6000) - L( 4000) )


1

Pr[N 6000 - N 4000 = 1] = e ( = 0.2640


1!

( L(6000) - L( 4000))
0
- ( L ( 6000 ) - L ( 4000 ))
Pr[N 6000 - N 4000 = 0] = e = 0.1288
0!

and

( L(8000) - L(6000))
2 k

Pr[N 8000 - N 6000 £ 2] = e (


- L ( 8000 ) - L ( 6000 ))
åk =0
k!
= 0.2892
Repairable Systems II 197

Similarly, using Equation 10.4, we can compute

Pr[T2 - T1 > 2000] = e ( 1


- L ( T + 2000 ) - L ( T1 ))

so

T1 = 500 ® Pr[T2 - T1 > 2000] = e (


- L ( 2500 ) - L ( 500 ))
= 0.7624

T1 = 1500 ® Pr[T2 - T1 > 2000] = e (


- L ( 3500 ) - L (1500 ))
= 0.5351

T1 = 3500 ® Pr[T2 - T1 > 2000] = e (


- L ( 5500 ) - L ( 3500 ))
= 0.1812

and
Pr[T1 > 2000] = e (
- L ( 2000 ))
= 0.8619

Note the implied decline in the 2000 hour reliability with increasing refer-
ence time. This is a feature identified as indicative of increasing hazard in
Chapter 4. Finally, observe that using either Equation 10.8 or 10.9, we may
compute values of E[Sk] such as

E[S3 ] = 5736.27 and E[S2 ] = 4853.76

so
E[S3 - S2 ] = E[S3 ] - E[S2 ] = 882.510.

Consider another example. Suppose we are interested in a device for which


we believe the life distribution is gamma with parameters λ = 0.05 and β = 3
and for which minimal repair is assumed. In this case, since β is integer, the
hazard function may be defined as

lbtb -1/G(b) lbtb -1/(b - 1)! (lt)b/t(b - 1)!


zT (t) = = =
å
b -1 b -1 b -1

k =0
(lt)k/k ! å k =0
(lt)k/k ! å k =0
(lt)k/k !

and then using Equation 10.5, we obtain a plot of the cumulative intensity
function as shown in Figure 10.1.
It is also possible to compute values such as Λ(100) = 2.082. Thus, we can
compute the probabilities for the possible number of failures over the first
100 hours of device operation. These are shown in Table 10.1. Finally, appli-
cation of either Equation 10.8 or Equation 10.9 to the gamma distribution–
based example yields

E[S3 ] = 123.338 and E[S2 ] = 94.085


198 Reliability Engineering: Probabilistic Models and Maintenance Methods

6
Cumulative intensity

50 100 150 200 250


Time

FIGURE 10.1
Cumulative intensity function for a nonhomogeneous Poisson process based on a gamma
distribution.

TABLE 10.1
Frequency Probabilities for a Nonhomogeneous
Poisson Process Based on the Equivalent HPP
N100 Pr
0 0.125
1 0.260
2 0.270
3 0.186
4 0.098
5 0.041
6 0.014
7 0.004
8 0.001

so

E[S3 - S2 ] = E[S3 ] - E[S2 ] = 29.253.

The NHPP also has the feature that it can be transformed into an equiva-
lent homogeneous Poisson process (HPP). Conceptually, the transformation
is revision in the timescale. Formally, for a nonhomogeneous process with
cumulative intensity function Λ(t) = E[Nt], as stated in Equation 10.1, the
inversion is defined by

t(t) = inf {s|L(s) > t} (10.10)


Repairable Systems II 199

To treat this in a practical sense, let the random variable Uk = Λ(Tk) represent
the cumulative rate function at the time of the kth failure event. It has been
shown that Uk has a gamma distribution for which the density function is

dt u k -1 - u
gUk (u) = fSk (t) = e (10.11)
du (k - 1)!

so
n -1
uk
GUn (u) = e - u å
k =0
k!

and these expressions do not depend on the specific NHPP that we wish to
analyze. We can plot GUk (u) for any value(s) of k that are interesting. Figure 10.2
shows the function for k = 2, 3, and 4, respectively. Some of the values rep-
resented by the graph are GU2 (3) = 0.8008 , GU2 ( 4) = 0.9084 , GU3 (3) = 0.5768 ,
GU3 ( 4) = 0.7619, GU4 (3) = 0.3528 , and GU4 ( 4) = 0.5665.
Recall that a gamma distribution is the convolution of identically distrib-
uted exponential random variables, so U k may be viewed as the sum of k
exponentials, each of which has λ = 1. Thus, if we take Mu to be the count-
ing process associated with those exponentials, then setting t = Λ−1(u) and
Mu = Nt defines the mapping from the NHPP to an HPP having the rate
function λ = 1.
For the example of the Weibull distribution-based NHPP, we com-
pute t = Λ−1(2) = 5146.66, t = Λ−1(3) = 5964.29, and t = Λ−1(4) = 6602.03 and we
may then say that E[N(5146.66)] = E[M2] = 2, E[N(5964.29)] = E[M3] = 3, and

G
1.0

0.8

0.6

0.4

0.2

4000
2 4 6 8

FIGURE 10.2
Distribution functions for the nonhomogeneous Poisson process cumulative intensity function
at a sequence of failure times.
200 Reliability Engineering: Probabilistic Models and Maintenance Methods

E[N(6602.03)] = E[M4] = 4. Similarly, for the gamma distribution–based example,


the inversion of the cumulative rate function is more difficult, but it is manage-
able and yields t = Λ−1(2) = 97.56, t = Λ−1(3) = 126.03, and t = Λ−1(4) = 152.60 so we
conclude that E[N(97.56)] = 2, E[N(126.03)] = 3, and E[N(152.60)] = 4.
It is the correspondence between the homogeneous and the nonhomoge-
neous processes that permits us to define the probabilities on the number of
events in Equation 10.3. The real utility of the relation is that we can some-
times use our results for renewal processes to obtain probabilities and other
information for nonhomogeneous processes such as the one used to repre-
sent minimal repair.

10.2 Imperfect Repair Models


The minimal repair models were the first to capture the feature of real oper-
ating equipment that often is not renewed by repair. They were thus the
first of the nonstationary models and provided an advance in realism. On
the other hand, most devices do not remain “as bad as old” when repaired.
Actual device behavior is often somewhere between the extremes of renewal
and simple continuation. In an attempt to make the nonstationary models
more realistic, Brown and Proschan [67] defined “imperfect repair.” Under
an imperfect repair regime, following repair, a device is renewed (as good as
new) with probability p and minimally repaired (as bad as old) with prob-
ability q = 1 − p. The result is that the probability process that represents
equipment experiences is a mixture of the renewal process and the minimal
repair NHPP.
The key to the analysis of this basic imperfect repair model is the fact that
the times at which a perfect repair is performed constitute a renewal process
that is embedded in the more general point process produced by the device
failures. The points at which a perfect repair occurs are restart points. Brown
and Proschan [67] point this out and then proceed with the analysis of the
process as follows. Denote the distribution on the time interval between per-
fect repairs by FT|p(t) and its hazard function by zT|p(t). Now, if a device has
life distribution FT(t) and only minimal repairs occur during (0, t), then at
time t, the device behaves as if it has age t. Its hazard function is zT(t), and if a
failure occurs, the associated repair will be perfect with probability p. Thus,
the conditional intensity of the occurrence of a perfect repair is
zT|p (t) = pzT (t) (10.12)
As with any hazard function, the survivor function corresponding to zp(t) is
t t

FT|p (t) = e ò0 = e ò0 ( ) = ( F (t))


- z p ( u ) du - p zT ( u ) du p p
= e - pZT (t ) = e - ZT (t ) T (10.13)
Repairable Systems II 201

Consequently, the distribution on the duration of the intervals between per-


fect repairs, which is to say renewals, is

( )
p
FT|p (t) = 1 - FT (t) = 1 - e - pZT (t ) (10.14)

This is a very useful and very general result. It is useful first because it applies
to all choices of underlying life distribution. In addition, Equation  10.12
and  those that follow it show that the distribution on the time between
perfect repairs will be of the same class as the life distribution—increasing
failure rate (IFR), increasing failure rate on average (IFRA), decreasing fail-
ure rate (DFR), and so on. Within the intervals between perfect repairs,
Equations 10.3 and 10.4 describe the probabilities on the frequency of
minimal repairs and the times between them. Finally, all of the results for
renewal processes that we have examined apply to the intervals between
perfect repairs.
Consider a simple example. Suppose that failures of a particular device
are well modeled by a Weibull distribution with parameters θ = 4000 hours
and β = 2.75. Suppose further that the probability of perfect repair is p = 0.25.
Using these values, we compute the values for the life distribution as usual
and the calculation of the times between renewals has the distribution
b
( q ) = 1 - e -0.25( t 4000 )
2.75

( )
p -p t
FT|p (t) = 1 - FT (t) = 1- e

The results of these calculations are plotted in Figure 10.3. Note that the
distribution on individual life lengths is stochastically smaller than the dis-
tribution on renewal times. Naturally, as p is increased, the times between

0.8

0.6
Pr

0.4

0.2

2,000 4,000 6,000 8,000 10,000 12,000


Time

FIGURE 10.3
Example of life and renewal time distributions under imperfect repair.
202 Reliability Engineering: Probabilistic Models and Maintenance Methods

perfect repairs tend to be shortened, and as p is decreased, the times tend to


become longer.
Given the apparent utility of the previous imperfect repair model, Block
et  al. [68] suggested that the models could be enhanced by making p age
dependent. They define the probability of a perfect repair for a device of age
t to be p(t) with the complementary probability of q(t) = 1 − p(t) for a minimal
repair. In a sequence of failures with minimal repairs, the device is consid-
ered to continue to age so the time since the last perfect repair is the age of
the device. Using the logic that is similar to but rather more intricate than
that for the simple (p, q) model, Block et al. [68] show that the distribution on
the times between perfect repairs has the same basic form as that for the case
of age-independent perfect repair probabilities. That is,

p(t) fT (t)
zT|p (t) = p(t)zT (t) = (10.15)
FT (t)

leads to the distribution


t p ( u ) fT ( u )
- ZT|p ( t )
- ò0 du
FT|p (t) = 1 - e = 1- e FT ( u )
(10.16)

As in the simpler case, the times of perfect repairs form a renewal process
and it is again the case that we can compute device behavior measures using
the methods discussed earlier. It is also often but not always the case that
the distribution on times between renewals has the same behavior as the
life distribution. Behavior is preserved for IFR, IFRA, new better than used,
DFR, decreasing failure rate on average, and new worse than used hazards
but not for new better than used in expectation.
Consider an example device for which failures are well modeled by a
Weibull distribution with parameters θ = 4000 hours and β = 2.75. Suppose
further that the probability function for perfect repair is p(t) = 1 − e−ρt where
ρ  = 0.0001. Using these parameter values, we find that the distribution on
the time between perfect repairs (renewals) is the one shown in Figure 10.4.
In addition, as noted earlier, during the intervals between renewals, we use
the cumulative intensity function (Equation 10.5) of

L(t) = ZT|p (t) (10.17)

and Equations 10.3 and 10.4 to compute specific event probabilities.


Clearly, the imperfect repair models permit us to move away from the
“pure renewal” models of device behavior. However, the use of the mixture
of repair types is rather artificial and does not really provide a true picture of
operating experiences. More recent models using what are called equivalent
age measures have been developed to try to provide more realistic represen-
tations of repair processes.
Repairable Systems II 203

0.8

0.6

0.4

0.2

2,000 4,000 6,000 8,000 10,000

FIGURE 10.4
Example of renewal time distributions under age-dependent imperfect repair.

10.3 Equivalent Age Models


In order to provide a more realistic portrayal of equipment state following
repair, two sets of authors have developed models that define an equivalent
age following repair. Kijima [69,70] defines two models for what he calls gen-
eral repair. Wang and Pham [71,72] define a “quasi-renewal” process model to
represent postrepair device state. We shall examine all of these models here.
Before examining these models, recall that the residual life distribu-
tion is the distribution on the remaining life length for a device that has
already operated for some time. In Chapter 9, we found that the residual
life distribution for a device of age τ can be defined in terms of the device life
distribution (Equation 9.34) as
FT (u + t)
Pr[U (t) £ u] = FU ( t) (u) = 1 -
FT (t)
Clearly, as with all distributions, the residual life distribution has a mean
value of E[U(τ)], and if this mean value is decreasing in τ, we say the distri-
bution is in the class of decreasing mean residual life (DMRL) distributions.
All IFR distributions are DMRL and the models developed by Kijima and by
Wang and Pham apply specifically to cases in which the underlying life dis-
tribution of the device of interest is DMRL. The condition DMRL is weaker
than IFR but implies device deterioration over time for all repair scenarios.

10.3.1 Kijima Models


The Kijima I model is constructed as follows. Let Tj, j = 1, 2, …, represent
the lengths of successive equipment operating intervals. Then assume that a
repair, say, the nth, can ameliorate the damage or aging experienced by the
204 Reliability Engineering: Probabilistic Models and Maintenance Methods

equipment during only the most recent operating interval and not any of the
damage or aging that was incurred during earlier intervals. Thus, following
the nth repair, the virtual (or equivalent) age of the device, An, is defined to be

An = An -1 + pnTn (10.18)

In this expression, πn is the fraction of the age that is accumulated during


the nth operating interval that is not “healed” by repair and is thus the addi-
tional age that is accumulated during the nth interval. We may consider that
1 − πn is the degree of repair for the nth repair action. In general, we refer to
the quantities πn as “repair effectiveness factors.” (Repair ineffectiveness fac-
tors might be more appropriate.)
The Kijima II model is similar. The difference is that the amelioration of
accumulated damage or age is assumed to apply to all of the accumulated
age. Thus, the virtual age of a device following the nth repair is taken to be

An = pn ( An -1 + Tn ) (10.19)

The selection of the values πn determines the form of these models. In both
cases, 0 ≤ πn ≤ 1 for all n. If we take πn = 0 for all n, both Model I and Model
II reduce to the renewal model. On the other hand, if we set πn = 1 for all n in
Model I, we obtain the minimal repair model. If we take πn to be a Bernoulli
random variable for each n in Model II, the result is the imperfect repair
model. Because the various choices of coefficients lead to the previous mod-
els, Kijima called his models general repair models. He also specified that in
the most general case, the coefficients πn should be taken to be random vari-
ables with any arbitrary and not necessarily identical distributions.
For all choices of the coefficients, the analysis of the Kijima models is com-
plicated. In general, for any set of coefficients, the distribution on the dura-
tion of any operating interval can be defined for both Model I and Model II
using the residual life distribution. The general form is

FT (t + u) - FT (u)
Pr[Tn £ t| An -1 = u] = FTn ( u ) (t) = (10.20)
FT (u)

where naturally we take A0 = 0. Note that the complement is

FT (t + u)
Pr[Tn > t| An -1 = u] = (10.21)
FT (u)

Now, we would like to be able to identify the probabilities and the expecta-
tions for the number of repairs over time and the time until a given number
of repairs have been made. If we let Π = {π1, π2, …} represent the sequence of
repair effectiveness factors, then we can identify algebraic representations
Repairable Systems II 205

for these measures. Specifically, we note that the sequence of random vari-
ables {An} represents the “virtual age” stochastic process and that the ran-
dom variable
n
Sn = åT j =1
j (10.22)

is the real age (or elapsed time) for the sequence of operating intervals. We
can denote the distribution on the real age by

FSn (t) = Pr[Sn £ t] (10.23)

and the number of repairs by

Nt = sup{n ³ 1|Sn £ t} (10.24)

Then, the usual relationship between the time and frequency domains
implies that

Pr[Nt ³ n] = FSn (t) (10.25)

and (in principle) we can compute expectations on the measures as

¥
E[Nt ] = åF n =1
Sn (t) (10.26)

and
¥

ò
E[Sn ] = FSn (u)du
0
(10.27)

Unfortunately, for the most general definitions of the vector Π, these mea-
sures of interest are extremely difficult to determine. As shown in the
succeeding texts, even for quite simple choices of Π, the exact analysis of
Equations 10.26 and 10.27 requires successive numerical integrations that
are intricate. Rather than attempt the difficult numerical analysis, we usu-
ally compute an upper bound on E[Sn] and use that as our key measure of
system behavior.
To appreciate the utility of the bounds, we examine the analysis of the
basic models. Assume that the factors πn are independently and identically
distributed (i.i.d.) random variables having expected value E[π]. Start with
Model I.
206 Reliability Engineering: Probabilistic Models and Maintenance Methods

Suppose that πn + 1 = π for π ∈ [0, 1] and assume that π ≠ 0 so that the applica-
tion of Equation 10.21 yields
t

ò
FAn+1 (t) = Pr[ An +1 > t] = Pr[ An > t] + Pr[pTn > t - u| An = u]Pr[ An = u]du
0

æ t-uö æ t-uö
t FT ç u + ÷ t FT ç u + ÷
è p ø è p ø
= FAn (t) +
ò
0
FT (u)
dFAn (u) = FAn (t) +
ò
0
FT (u)
f An (u) du

(10.28)
and the density function is

æ t-uö
t fT ç u + ÷
1 è p ø
f An+1 (t) =
p ò
0
FT (u)
f An (u)du (10.29)

with
1
f A1 (t) = fT (t/p) (10.30)
p
Now that the density on An is defined, we can construct the expected value of
each operating interval using Equations 10.21 and 10.29. Specifically,
¥ ¥
FT (t + u)
ò
E[Tn | An -1 = u] = Pr[Tn > t| An -1 = u]dt =
0
ò
0
FT (u)
dt (10.31)

and
¥

ò
E[Tn ] = E[Tn | An -1 = u] f An-1 (u)du
0
(10.32)

Clearly, one possible approach to computing E[Sn] is as the sum of the


terms E[Tj].
Kijima [69] suggests that Equation 10.28 be restated as
t

FAn+1 (t) =
ò g ( u) f
0
t An (u)du (10.33)

where

ì t-u
ï FU ( u ) ( ) u£t
g t ( u) = í p (10.34)
ïî 1 u>t
Repairable Systems II 207

He then notes that when the elements of the vector of repair effectiveness
factors are i.i.d. random variables having distribution function FΠ(π), the sto-
chastic process, {An} having A0 = 0, is a Markov process with the following
transition probability function:
1

Pr[ An +1 £ t| An = u] =
ò (1 - g (u)) f (p)dp
0
t P (10.35)

Using this result and the assumption that the underlying life distribution is
DMRL, Kijima shows that bounds for E[Sn] can be obtained. A lower bound
is obtained when it is assumed that πn = π for all n and π is simply a constant.
In that case, successive evaluation of the integrals stated in Equations 10.31
and 10.32 are demanding but possible. For an upper bound for E[Sn], we use
the corresponding quantity for the imperfect repair case. That is, taking the
πn to be Bernoulli with E[πn] = p, we have the imperfect repair case. For that
model, we can compute E[Sn|p] using the following recursion:
v(m, n) = m m + (1 - p)v(m, n - 1) + pv(m + 1, n - 1) (10.36)

where v(m, n) is the expected value of Sn for a device that is subject to an imper-
fect repair regime and has already had m − 1 minimal repairs. The quantity μm
is the mean length of m intervals when only minimal repair is used and p is the
probability that repair is perfect (π = 0). The boundary conditions for the recur-
sion are v(m, 1) = μm and the values v(1, n) = E[Sn|p] provide the upper bounds.
The interpretation of Equation 10.36 is that the expected value of the sum
of the lengths of the next n operating intervals for a device that has had m
minimal repairs so far is the expected value length of the next operating
interval, μm, plus the expected value of the sum of the lengths of the following
n − 1 intervals, for which the number of minimal repairs will be m + 1 with
probability p and m with probability 1 − p. The upper bound value that we
obtain using Equation 10.36 applies to the “general repair case” in which the
distribution on the πn has expected value E[πn] = p.
Based on our understanding of the minimal repair case, we can compute
the value of the mean length of any of the operating intervals under minimal
repair using Equation 10.9 as
¥
1
ò ( Z ( u) )
m -1
mm = T e - ZT ( u )du (10.37)
G(m)
0

That is, μm is really just E[Tm]. This quantity forms the basis for the numerical
computations for the recursion of Equation 10.36.
Consider an example. Suppose the underlying life distribution for a device
is Weibull with β = 2.75 and θ = 4000 hours. In this case, we calculate
b 2.75
ætö æ t ö
ZT (t) = ç ÷ = ç ÷
è q ø è 4000 ø
208 Reliability Engineering: Probabilistic Models and Maintenance Methods

so the evaluation of Equation 10.37 is straightforward and yields the


entries of Table 10.2.
Notice the clearly decreasing values of these mean residual life lengths.
Using the values of Table 10.2, we can solve the recursion equation of
Equation 10.32 to obtain the values of E[Sn|p], which are our upper bounds on
E[Sn] for the general repair model. Example results are shown in Table 10.3.
As indicated previously, the nested integrals make the calculation of the
lower bounds very difficult. Using Equations 10.32 and 10.29, the calculation
of E[T2] is manageable. We find that for π = 0.50, E[T2] = 760.168; for π = 0.75,
E[T2] = 591.923; and for π = 0.90, E[T2] = 514.559. The corresponding lower
bounds on E[S2] are obtained by adding the known value of E[T1] = 3559.43
to these quantities to obtain E[S2] ≥ 4319.60, E[S2] ≥ 4151.35, and E[S2] ≥ 4073.99,
respectively.

TABLE 10.2
Values of the Mean Residual Life
under Minimal Repair
n μn
1 3559.43
2 1294.34
3 882.50
4 695.31
5 584.69
6 510.27
7 456.15
8 414.69
9 381.70
10 354.71

TABLE 10.3
Values of E[Sn|p] for Minimal Repair as Upper Bounds
on E[Sn] for General Repair
n E[πn] = p = 0.90 E[πn] = p = 0.75 E[πn] = p = 0.50
1 3559.43 3559.43 3559.43
2 5080.27 5420.04 5936.81
3 6063.38 6624.28 7743.96
4 6823.51 7527.55 9092.12
5 7457.78 8266.61 10179.47
6 8009.35 8902.29 11093.31
7 8501.38 9465.85 11886.39
8 8948.05 9975.54 12591.67
9 9358.77 10443.04 13230.49
10 9740.17 10876.37 13817.18
Repairable Systems II 209

The extension of our analysis to the Kijima II model is reasonably direct.


We start with the basic model stated in Equation 10.19 and assume the πn are
i.i.d. random variables. We then use Equation 10.21 again to obtain
t/p
é tù é t ù
FAn+1 (t) = Pr[ An +1 > t] = Pr ê An > ú +
ë p û ò Pr êëT
0
n >
p
- u| An = u ú Pr[ An = u]du
û
t/p
FT ( t/p ) FT ( t/p )
t
ætö
= FAn ç ÷ +
èpø ò
0
FT (u)
dFAn (u) = FAn (t) +
ò
0
FT (u)
f An (u) du (10.38)

and the density function is


t /p
1 ætö f An (u)
f An+1 (t) = fT ç ÷
p èpø ò0
FT (u)
du (10.39)

with
1 ætö
f A1 (t) = fT ç ÷ (10.40)
p èpø

as before. The general statements of the expected values for the Sn of


Equations  10.31 and 10.32 are the same for Model II as for Model I. Their
analysis is no less complicated for Model II so our approach is the same.
To compute upper bounds for the E[Sn], we again use the corresponding
quantity for the imperfect repair case. We take the πn to be Bernoulli with
E[πn] = p and we compute E[Sn|p] using the following revised recursion:

v(m, n) = m m + (1 - p)v(1, n - 1) + pv(m + 1, n - 1) (10.41)

where the values v(1,n) = E[Sn|p] are the upper bounds on E[Sn] for the general
repair case when the πn are i.i.d. and the distribution on the πn has expected
value E[πn] = p.
For Model II, we also note that repeated substitution within the recursion
Equation 10.41 leads to the relation that under imperfect repair, when n ≥ 2,

n -1
E[Tn ] = E[Sn ] - E[Sn -1 ] = (1 - p) åp
j =1
j -1
m j + p n -1m n (10.42)

The quantities μn in each of these expressions are the same as for Model I.
They are the successive mean residual life lengths for a device subjected to a
minimal repair regime.
210 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 10.4
Values of E[Sn|p] for Minimal Repair as Upper
Bounds on E[Sn] for General Repair
n E[πn] = p = 0.90 E[πn] = p = 0.75 E[πn] = p = 0.50
1 3559.43 3559.43 3559.43
2 5080.27 5420.04 5986.31
3 6267.53 7048.99 8310.23
4 7318.33 8598.97 10610.76
5 8296.54 10113.95 12904.37
6 9230.82 11611.27 15195.65
7 10136.33 13098.95 17486.09
8 11022.02 14581.11 19776.21
9 11893.50 16059.96 22066.19
10 12754.52 17536.78 24356.13

Consider the same example as the previous one. When the life distribution
is Weibull with β = 2.75 and θ = 4000.0, the mean residual life values are those
listed in Table 10.2 and the upper bounds on E[Sn] defined in Equation 10.41
are shown in Table 10.4. Here again, the computational effort associated with
the computation of the lower bounds is excessive.
In summary, we may observe that the Kijima models are very appealing as
they provide a significantly more realistic image of the state of equipment follow-
ing repair. Unfortunately, the models are correspondingly difficult to analyze.
On the other hand, it is not difficult to simulate the models and to use the simu-
lation output to describe system behavior. This approach is now widely used.

10.3.2 Quasi-Renewal Process


An alternative approach to modeling the postrepair state of a device is sug-
gested by Wang and Pham [71]. Rather than directly adjusting the age of the
unit following repair, Wang and Pham adjust the life distribution. For the
sequence of operating intervals, T1, T2, T3, …, each of which ends with device
failure and repair, they assume that

Tn = a n -1X n (10.43)

where α (>0) is a constant that alters the scale of the distribution and the Xn
are i.i.d. random variables. Under this definition, the sequence {Tn} is said
to form a quasi-renewal process. Actually, Lam [73,74] and Finkelstein [75]
studied the same process earlier and called it a geometric process. Clearly, if
α = 1, the sequence is a renewal process. An interesting feature of this model
is that a choice of α < 1 implies that the operating intervals are decreasing in
magnitude as might occur with aging and deterioration. On the other hand,
a choice of α > 1 might be used to represent an ongoing improvement in the
Repairable Systems II 211

α1–n f t α1–n
0.06

0.05

0.04

0.03

0.02

0.01

t
100 120 140 160

FIGURE 10.5
Normal density functions on Tn for n = 4, 8, and 12.

quality of replacement units with a corresponding gradual increase in the


duration of operating intervals. This is illustrated in Figure 10.5 for α = 1.04
applied to normal density having μ = 100 and σ2 = 6.
For the models developed here, we assume that α < 1 as the unit of
interest is aging. The expressions developed apply to other choices of the
parameter, but this is not pursued here. For any particular choice of α, we
find that

æ t ö
FTn (t) = FX ç n -1 ÷
èa ø
1 æ t ö
fTn (t) = n -1 f X ç n -1 ÷ (10.44)
a èa ø
n -1
E[Tn ] = a E[X ]

Now, the similarity to the renewal process permits us to construct some


descriptive expressions for the model. In particular, we again define Sn to be
the sum of the Tj and Nt to be the number of failures during (0, t]. An example
of the behavior of FSn (8000) is shown in Figure 10.6 for d = 0.95 when applied
to a normal distribution having μ = 2500 and σ2 = 400. Then, in contrast to the
Kijima models, we can immediately compute

n n
1 - an
E[Sn ] = å
j =1
E[Tj ] = E[X ] åaj =1
j -1
=
1- a
E[X ] (10.45)

In addition, we use the usual time frequency relationship

Pr[Nt ³ n] = Pr[Sn £ t]
212 Reliability Engineering: Probabilistic Models and Maintenance Methods

F(8000)
1.0

0.8

0.6

0.4

0.2

n
2 3 4 5 6

FIGURE 10.6
Normal distribution functions on Sn at 8000 hours as n increases.

and the same logic that we used for the renewal process to construct the
quasi-renewal function
¥
QFT (t) = E[Nt ] = åF
n =1
Sn (t) (10.46)

Of course, the distributions FSn (t) are all distinct as they are convolutions of
distinct distributions from a common class. However, since the process {Tn}
is constructed so regularly, the Laplace transforms for the distinct distribu-
tions and hence for the quasi-renewal function are readily constructed as

FT*n (s) = FX* (a n-1s) (10.47)

and
¥ ¥ n

QF*T (s) = å
n =1
FS*n (s) = å Õ F * (a
n =1 j =1
X
j -1
s) (10.48)

Also, as with the renewal process, the derivative of the quasi-renewal func-
tion is the quasi-renewal intensity function
¥ ¥ n

qF*T (s) = å
n =1
fS*n (s) = å Õ f * (a
n =1 j =1
X
j -1
s) (10.49)

As in the case of the renewal model, it is usually quite difficult to obtain


numerical values for the device performance measures. As indicated earlier,
Equation 10.45 permits us to compute E[Sn]. On the other hand, the quasi-
renewal function tends to be quite difficult.
Repairable Systems II 213

Consider an example. If the life distribution is normal with mean μ


and standard deviation σ, then using the second of the relationships in
Equation 10.44, we find that each of the Tn has a normal distribution:

- ( t/a n-1 - m )
2
1 æ t ö 1 2 s2
fTn (t) = n -1
f X ç n -1 ÷ = e
a èa ø a n - 1
2ps 2

( )
2
1 - t - ma n-1 2( a n-1s )2
= e
n -1
2p(a s) 2

but with a mean value of αn−1μ and a standard deviation of αn−1σ. We can thus
construct the convolutions of the distributions on the Tn to obtain the distri-
butions on the Sn. We do this most easily using the Laplace transforms. For
the normal life distribution,
s2 s2
- sm +
f X* (s) = e 2

so
s2 ( a n-1s )2
- sa n-1m +
fT* (s) = e
n
2

and

n
s2 s2 n æ 1- a n ö s 2 s 2 æ 1- a 2 n ö
n n
- sa j-1m +
s2 ( a j-1s )2 - sm å a j-1 + ( a j-1 )2
å - sm ç ÷+ ç ÷

Õ f * (s) =Õ e
2 j=1 ç 1- a ÷ 2 çè 1- a 2 ÷ø
fS*n (s) = Tn
2 =e j =1
=e è ø

j =1 j =1

and we can conclude that the distributions on the Sn are normal with mean
1
1 - an æ 1 - a 2n ö 2
m and standard deviation ç 2 ÷
s . Then, we observe that as n
1- a è 1- a ø
increases, both the mean and the standard deviations values converge—
often quite rapidly. In addition, for any time interval, t, the sum of the quan-
tities FSn (t) will also converge. Algebraically, it is difficult to identify the
convergent form, but we can easily compute approximate values of the limit
numerically. As a general observation, the closer the value of α is to 1.0, the
more quickly the values FSn (t) decline and, therefore, the more quickly the
numerical convergence of the sum.
Suppose the underlying life distribution has μ = 4000.0 and σ = 1000.0.
The corresponding (approximate) limits for several values of α are shown
in Table 10.5. As the value of α is decreased, the number of terms that must
be computed increases. For example, for α = 0.95 and a time of 5,000 hours,
Q(5,000) is determined using only 2 terms FSn (t), and Q(10,000) requires
214 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 10.5
Values of Q(t) for the Quasi-Renewal Model
with Normal Life Distribution
t α = 0.95 α = 0.85 α = 0.75
1,000 0.001 0.001 0.001
2,500 0.067 0.067 0.067
5,000 0.863 0.875 0.897
8,000 1.577 1.741 1.992
10,000 2.145 2.447 3.084

6 terms. On the other hand, for α = 0.75, computing Q(5,000) requires 6 terms
and computing Q(10,000) requires 200 terms.

10.4 Conclusion
As in the previous chapter, the models presented earlier serve to highlight the
questions one should consider in the study of repairable systems. The mod-
els in this chapter focus on the more realistic representation of the postrepair
equipment state. By treating repair as instantaneous, we obtain the simplest
model forms possible and develop the methods best suited to their analysis.
With these methods now defined, we are ready to move on to the investiga-
tion of equipment performance when repair is noninstantaneous.

Exercises
10.1 Suppose that the minimal repair model is to be applied to a device
for which the underlying life distribution is a gamma distribution.
For each of the parameters pairs obtained by crossing β = 2, 3, 5, and 8
with λ = 0.05 and 0.005, compute E[T] and then compute Λ(0.5E[T]),
Λ(1.5E[T]), and Λ(2.5E[T]).
10.2 Assume the operation of a device is to be represented using the mini-
mal repair model with a Weibull life distribution having β = 1.5 and
θ = 5000.0 hours. Compute Pr[T4 − T3 > 1200] for T3 values of 6,000,
9,000, and 12,000 hours. Also compute Pr[Nt = 4] for times of 6,000,
9,000, and 12,000 hours. Finally, compute Pr[Nt = 4], Pr[N4000 − N1000 > 2],
Pr[N6000 − N3000 > 2], and Pr[N8000 − N5000 > 2].
10.3 Plot the cumulative intensity function for the minimal repair model
for the case in which the underlying life distribution is gamma with
β = 2 and λ = 0.20.
Repairable Systems II 215

10.4 Compute and plot the distribution on the time between device renew-
als for the imperfect repair model having a Weibull life distribution
with β = 1.5 and θ = 5000.0 hours and p = 0.4. Then compute compari-
son values of FT|p(t) and FT(t) for t = 2,000, 5,000, 10,000,  and 20,000.
10.5 Resolve Problem 10.4 using the time-dependent form of the renewal
probability p(t) = 1−e−ρt with ρ = 0.0005. (Hint: The function p(t) is not a
CDF and should be used as stated.)
10.6 Compute E[T4−T3] and E[T5−T4] for the gamma distribution–based
NHPP where β = 2 and λ = 0.008.
10.7 Compute E[T4−T3] and E[T5−T4] for the gamma distribution–based
NHPP where β = 4 and λ = 0.005.
10.8 Return to your analysis of the Weibull renewal process and use your
coefficients to obtain a plot of the approximation to the convolutions
FT( k ) (t) for the distribution having β = 2.75 and θ = 4000. Do this for
k  =  1, 3, and 5, and in each case, carry the plot to 8,000 and 12,000
hours. Comment on your results.
1 æyö
10.9 Prove that in general, f cX (cx) = f X ç ÷ and construct the realization
c ècø
of this relationship for the Weibull distribution.
10.10 Prove that an IFR distribution is DMRL.
10.11 Replicate the bounds on E[Sn] for the Kijima I stated in Table 10.3 by
computing the values of E[Sn, p] using Equation 10.32 for the case in
which p = 0.90.
10.12 Suppose the Weibull life distribution for the device subjected to mini-
mal repair has shape parameter β = 0.75. Compute the first 10 values
of the mean residual life and the resulting upper bounds for E[Sn] for
general repair when θ = 4000 and p = 0.90. How do these values com-
pare to those of Problem 10.11? Why?
10.13 Assume that the base distribution for a quasi-renewal process, FX(x),
is normal with μ = 2500 and σ = 600. Construct the algebraic form of
the quasi-renewal function and evaluate it using α = 0.98 at t = 2600,
t = 5000, and t = 7500.
10.14 For the quasi-renewal model having α = 0.90, assume FX(x) is a gamma
distribution having β = 2.0 and λ = 0.05. Construct general expressions
for fTn (t) and FTn (t) and plot these functions for n = 2, n = 4, and n = 6.
Next, construct the Laplace transform for fSn (t) and use it as a basis for
constructing FSn (t). Finally, use your expressions to compute values of
the quasi-renewal function at t = 25 and t = 100 using 2 or 3 truncation
levels. Comment on your results.
11
Availability Analysis

The models and results described in Chapters 9 and 10 are constructed with-
out considering the duration of the repair activity. In many situations, the
focus of our analysis is upon questions for which the answers do not depend
upon the duration of the repair process. This is especially true when the
time a system spends down is relatively less important than the fact that a
failure has occurred and when the duration of repair is small or negligible
in comparison to the device life length. In contrast to such devices, there are
components and systems for which the duration of the repair activity has an
impact on the meaningful device performance measures. For items of this
sort, we must include repair times in our models.
Naturally, when we include repair time in our models, we may represent
the possible repair durations in whatever manner seems most representa-
tive of actual experience. In some cases, repair time is taken to be a constant,
while in most cases, repair time is treated as a random variable and a specific
distribution is selected to portray the dispersion in repair times. We will
consider both of these possibilities.
We may consider that a typical sample path for a device is one in which
periods of operation are terminated by device failure and that therefore a
repair period follows each failure. Upon completion of the repair, the device
is placed in operation again. A representative sample path is shown in
Figure 11.1. Note that we will modify the labels soon but for now, we indicate
that the periods of repair have durations Rj while the operating periods are
labeled as Tj. Observe also that the device’s state is denoted by Xi and that the
value of the state variable is shown to be 1 when the device is operating and
0 when it is being repaired. Note further that each pair of periods, operating
and repair, is shown to have total duration Vj where

Vj = Tj + Rj (11.1)

It is reasonable to consider that each interval that includes an operating


period and a repair period is a “cycle.”
In the particular case in which all of the operating times Tj are random
variables with a common probability distribution and all of the repair times
Rj are random variables with a common probability distribution, the series
Vj forms a renewal process. This is because the intervals of operation Tj form
a renewal process as do the intervals of repair Rj and thus those of the Vj
form an alternating renewal process, which is also a renewal process. For the

217
218 Reliability Engineering: Probabilistic Models and Maintenance Methods

1
State

T1 R1 T2 R2

0
0 V1 V2
Time

FIGURE 11.1
Representative sample path.

majority of this chapter, the cycles are assumed to form a renewal process
and the types of analysis that can be performed are discussed. Toward the
end of the next chapter, we examine the relatively few results that have been
constructed for the nonrenewal case.
To study the cycles in the renewal case, we first modify the definition of the
quantities Vj. In general, our analyses are based on an interest in the times of
device failure rather than in times of device restart. Therefore, we modify the
labels on the sample paths so that the Vj corresponds to failure times. This
is illustrated in Figure 11.2. The difference is quite subtle. The change is that
the values of the Vj now correspond to failure times. That is,

V1 = T1
V2 = R1 + T2
(11.2)

Vj = Rj -1 + Tj

Using this definition, the series Vj forms a “modified” renewal process


because the distributions for all of the Vj other than V1 are the same and that
for V1 is different.

1
State

T1 R1 T2 R2

0
0 V1 V2
Time

FIGURE 11.2
Representative sample path with revised labeling of the Vj.
Availability Analysis 219

As a matter of convention, let the distributions FTj (t) represent the distribu-
tions on the lengths of the operating intervals (which could be just the life
distribution), and let GRj (t) represent the distributions on the lengths of the
repair intervals. In that case, the distributions on the durations of the cycles
are constructed as the convolution of FTj (t) and GRj (t) and are denoted by
HVj (t). Thus, in general,
t

ò
HVj (t) = FTj (t) * GRj-1 (t) = GRj-1 (t - u)dFTj (u)
0
(11.3)

When the distributions are common for all cycles, the subscript “j” is
dropped.
An interesting consequence of the relabeling of the sample path is that
the renewal functions for the expected number of failures and the expected
number of repairs are different. We will exploit that difference in our analy-
sis later. For now, note that the Laplace transform for the expected number of
repair completions during the interval (0, t) is given by

fT* (s) g R* (s)


MG*R (s) = (11.4)
(
s 1 - fT* (s) g R* (s) )
while the transform for the expected number of failures during an interval
(0, t) is

fT* (s)
MH*V (s) = (11.5)
(
s 1 - fT* (s) g R* (s) )
The expected number of failures and the expected number of points Vj are
the same. These arise from the applicable forms of the key renewal theorem
that are as follows:
t

ò
MGR (t) = HV (t) + MGR (t - u)hV (u)du
0
(11.6)

for the expected number of repair completions and

ò
MHV (t) = FT (t) + MHV (t - u)hV (u)du
0
(11.7)

for the expected number of failures.


220 Reliability Engineering: Probabilistic Models and Maintenance Methods

11.1 Availability Measures


Regardless of whether or not the cycles form a renewal process, the fact that
there are periods during which the device is not functioning leads naturally
to the definition of “availability” as a measure of device performance. In fact,
four distinct but related availability measures have been defined. The basic
one is described here.

Definition 11.1

The (point) availability at time t, A(t), for a device is the probability that it is
functioning (X(t) = 1) at the time. Thus,

A(t) = Pr[X(t) = 1] = E[X(t)] (11.8)

The distinction that this is the point availability is important. Nevertheless,


A(t) is usually called simply the availability and this is done here unless
doing so is ambiguous. The other three availability measures are as follows:

Definition 11.2

The limiting availability for a device is the limit of the point availability func-
tion. That is,

A¥ (t) = lim A(t) (11.9)


t ®¥

The limiting availability can be very useful. As we will see, many devices
experience an interval of transience before they “settle down” into a consis-
tent pattern of operation. At that point, the devices often display availability
behavior that is stable and similar (or equal) to the limiting form. In addi-
tion, there are many analytical cases in which the point availability is very
difficult (or impossible) to compute but the limiting availability measure is
manageable.
It may also be useful at certain times to compute averages, so we have the
following definition:

Definition 11.3

The average availability over an interval (t1, t2) for a device is


t2
1
A(t1, t2 ) =
t2 - t1 ò A(u)du
t1
(11.10)
Availability Analysis 221

The average availability is often evaluated for t1 = 0. In addition, we have the


limiting average availability.

Definition 11.4

The limiting average availability for a device is the limit of the average avail-
ability over (0, t). That is,

A¥ = lim A(0, t) (11.11)


t ®¥

Each of the four measures has utility in specific cases. In general, the point
availability is the most informative measure but it is usually the most difficult
to obtain. The most commonly used of the measures is the limiting avail-
ability. The primary reason for its popularity is that in the case in which the
operating and repair cycles form a renewal process, it is easily computed as

EF [T ] MTTF
A¥ = = (11.12)
EF [T ] + EG [T ] MTTF + MTTR

where
MTTF is the “mean time to failure”
MTTR is the “mean time to repair”

Of course, when no repair is possible, the availability measures reduce to


their corresponding reliability terms.
For most applications, we should find that A¥ = A¥. Also, A(t1, t2 ) is the
proportion of the time the device operates during the interval (t1, t2). A fur-
ther point that is somewhat subtle is the fact that for a population of devices,
each of the availability measures is an expected value relative to the num-
ber (or  proportion) of the members of the population that are operational.
The distribution on the frequency for which the availability measures are
expected values is the binomial [76].
Given the elementary definitions stated earlier, we may proceed to con-
struct an availability function. The analysis of availability can become quite
intricate, so it is usually performed relative to a very carefully defined state-
ment of the assumed operating and repair scenario. This is because very
minor differences in operation plan and particularly in the assumed failure
and repair distributions may have quite pronounced effect on the identity of
the appropriate model and its solution.
For now, we assume that each repair operation returns a device to a
“good as new” state so that all of the operating periods have the same dis-
tribution on duration. Similarly, we assume that all of the repair activities
have a common distribution on duration. Therefore, as noted previously,
222 Reliability Engineering: Probabilistic Models and Maintenance Methods

each cycle has the same probability distribution on its length and the cycles
form a renewal process. Therefore, the availability function may be defined
as the sum of two probabilities:

1. The probability that the device has never failed and is thus still
functioning
2. The probability that a new cycle was started at some recent point in
time and no failure has occurred since then

Algebraically, this is
t

ò
A(t) = FT (t) + FT (t - u)mHV (u)du
0
(11.13)

where, as indicated in this equation, HV(t) is the convolution of the distribu-


tions on operating intervals and on repair times. Note particularly the simi-
larity of the availability function to the key renewal theorem.
Generally, when we can, we use Laplace transforms to obtain the final
form of the availability function. In that case, we construct

A* (s) =
1
s
æ1 ö 1
( )(
- FT* (s) + ç - FT* (s) ÷ m*HV (s) = 1 - fT* (s) 1 + mH* V (s)
ès ø s
) (11.14)

and the inverse transform gives the availability function. Note that
Equation 9.16 of Chapter 9 implies that we may use

A* (s) =
1
s(1 - fT* (s) ) *
1
1 - fT (s) g R* (s)
(11.15)

instead of Equation 11.14, especially if it is easier to avoid determining the


specific form of the renewal density. In addition, as indicated previously,
we can exploit the difference in the repair completion and the failure time
renewal functions to obtain an additive form. Specifically,

1
A* (s) = MG*R (s) - MH* V (s) + (11.16)
s

The advantage of this form is that its inverse is

A(t) = MGR (t) - MHV (t) + 1 (11.17)

Also, we again avoid specifically constructing the renewal density.


Availability Analysis 223

11.2 Example Computations


Nearly all of the possible realizations of the availability model are algebra-
ically and computationally quite difficult. The exception is the case in which
both the life distribution and the repair time distribution are exponential
and the repairs return the system to a “good as new” state. In order to dem-
onstrate the method of analysis, the availability function for this renewal
case is constructed here. Following that analysis, a more difficult case is
examined.

11.2.1 Exponential Case


To start, assume the exponential life distribution has parameter λf and the
exponential repair time distribution has parameter λr. It is worthwhile to
construct the distribution on cycle length as this illustrates the process of
constructing convolutions. As we know,

lf lr l f lr
hV* (s) = fT* (s) g R* (s) = =
s + l f s + l r s2 + (l f + l r )s + l f l r

For this transform, we use the method of partial fractions to solve

1 A B
= +
s2 + (l f + l r )s + l f l r s + l f s + l r

for the forms

1 -1
A= and B =
lr - l f lr - l f

Thus, the inverse of the transform is

hV (t) =
l f lr
lr - l f
(
-l t
e f - e -lr t )
with the corresponding distribution function

HV (t) = 1 -
(l er
-l f t
- l f e -lr t )
lr - l f

Observe that this is neither an exponential nor a gamma distribution but that
it is a proper distribution function and it provides an unambiguous model of
224 Reliability Engineering: Probabilistic Models and Maintenance Methods

the dispersion in the lengths of the cycles of operation and repair. Naturally,
it also correctly yields
EHV [T ] = EF [T ] + ER [T ]

Next, in order to construct the availability function, we use Equation 11.15:


-1
1
(
A* (s) = 1 - fT* (s)
s
)
*
1 1æ
= çç 1 -
1 - fT (s) g R* (s) s è
lf
s+lf
öæ
÷÷ çç 1 -
øè
lf lr
s + l f s + lr
ö
÷÷
ø

which we can reduce to


s + lr
A* (s) =
s(s + l f + l r )

Using the partial fraction expansion,


s + lr A B
= +
s(s + l f + l r ) s (s + l f + l r )

we obtain
lr lf
A= and B =
lr + l f lr + l f

and the inverse transform is

lr lf - ( l + l )t
A(t) = + e f r
l f + lr l f + lr

This result is quite useful for examining the behavior of the availability
measures for the renewal case. Note first that the availability function is
comprised of a constant term and a term that diminishes over time so the
function displays an initial transience and then settles down to a stable and
essentially constant value. This is illustrated in Figure 11.3 for λf = 0.01 and
λr = 0.1. The limiting value is
lr
A¥ = = 0.909
l f + lr

Note also that we may use Equation 11.5 to compute

A(t1 , t2 ) =
lr
+
lf
l f + l r (t2 - t1 )(l f + l r )2
(
- ( l + l )t - ( l + l )t
e f r 1 -e f r 2 )
so, for example, A(10, 20) = 0.927 . Also, we may note that A¥ = A¥.
Availability Analysis 225

0.945

0.94

0.935
Availability

0.93

0.925

0.92

0.915

20 40 60 80 100
Time

FIGURE 11.3
Availability function example.

Examine the limiting availability that we obtained for the exponential


case. Specifically, we can restate the limiting availability as

lr 1/EG [T ] EF [T ]
A¥ = = =
l f + l r 1/EF [T ] + 1/EG [T ] EF [T ] + EG [T ]

which is the form given in Equation 11.12.


As we have seen earlier, the exponential model provides a relatively
simple illustration of all of the basic availability analyses. It must be
emphasized that other choices of life and repair time distributions can
be analyzed but the other cases often require considerable computational
effort.
On the other hand, there are quite a number of nonexponential cases in
which the bounds described in Chapter 9 can be used to provide information
on device performance. For cases in which the life distribution and the repair
time distribution are both increasing failure rate (IFR), the distribution on
cycle length, HV(t), will also be an IFR. In that case, we can compute bounds
on the lengths of cycles and on the number of cycles using Equations 9.25
through 9.29 and 9.32.

11.2.2 Numerical Case


If the distribution that best represents the life lengths of a device is not expo-
nential, the computation of the point availability function can be quite taxing.
However, given modern computing power, the calculations are frequently
manageable. The advantages of the capabilities of modern computers and
226 Reliability Engineering: Probabilistic Models and Maintenance Methods

software should not be taken lightly as many of the interesting results we


can now obtain were completely impossible in the past. In any case, the com-
putation of the point availability function may be pursued in any of several
ways. One approach is to construct the relevant Laplace transforms and then
use numerical algorithms to invert A* (s). An alternate approach is to directly
perform the integration of Equation 11.13 numerically.
Consider an example. Suppose the life lengths for a population of a device
are well represented by a Weibull distribution having parameters β = 2.0 and
θ = 200 hours. Suppose further that repair of the device involves replacement
and takes a random time for which an exponential distribution with λ = 0.10
provides a reasonable model. For these distributions, we first note that

EF [T ] 177.245
A¥ = = = 0.94659
EF [T ] + EG [T ] 177.245 + 10.0

Next, we can evaluate the point availability by performing the numerical


integration of Equation 11.13 in which we make the substitution:

¥ t

mHV (t) = åò
k =1 0
f F( k ) (u) g R( k ) (t - u)du (11.18)

and then replace fT( n ) (t) with its approximate form as defined in Appendix B.
The resulting values of the function are fully accurate until we introduce
the numerical error associated with the integration and the error resulting
from the finite truncation of the infinite sums. The actual numerical effort
involved in this analysis is quite taxing. However, it is manageable. Several
of the values of the point availability obtained for the given distributions
are shown in Table 11.1. The values show a transient interval and a relatively
rapid convergence to the limiting value. In view of the fact that both the

TABLE 11.1
Computed Values of the Availability
Function for Weibull Failures
and Exponential Repairs
Time A(t)
50 0.9807
100 0.9614
150 0.9498
200 0.9454
250 0.9451
300 0.9466
350 0.9466
400 0.9466
Availability Analysis 227

Weibull life distribution and the exponential repair time distribution cen-
ter around exponential terms, the rapid convergence is exactly the type of
behavior one expects to see.

11.3 System-Level Availability


Naturally, it is most often the case that we are interested in the availability
of a system for which repair involves “servicing” one or several components
of a multicomponent system. Examples of systems for which availability is
important and repair frequently involves component or module replacement
include aircraft, computers, production equipment, and motorized vehicles.
For the present analysis, consider that the repair activity involves the replace-
ment of a component. Of course, there are very many possible operating sce-
narios each of which may involve distinct models. Consider two cases here.
For the first case, suppose that a system is comprised of independent com-
ponents arranged in an arbitrary configuration and that each component has
an associated life distribution and repair time distribution. Also assume that
components are replaced on failure but that while any single component is
being repaired, the other components continue to operate and thus to age.
For the second case, we will assume a series system in which the functioning
components do not age during the replacement of a failed component.
The first case fits very well with the reliability and system status con-
struction we studied earlier. Specifically, suppose that for component j, the
life distribution is Fj,T(t) and the repair time distribution is Gj,R(t) so that the
renewal function for that component is MH j ,V (t). Then, the availability func-
tion for the component is Aj(t) where as before

A j (t) = Pr[X j (t) = 1]

The availability function Aj(t) is computed as the solution to Equation 11.13


without regard to the other components. Then, since the system status is

AS (t) = E[f(X(t))] = f(E[X(t)]) (11.19)

the system’s availability function are

AS (t) = f( A1(t), A2 (t), … , An (t)) (11.20)

Thus, we compute the system availability in exactly the same manner as


our calculation of system reliability. All of the computation rules we devel-
oped in Chapter 3 including the rules for calculating bounds for general
228 Reliability Engineering: Probabilistic Models and Maintenance Methods

structures apply. In addition, these rules apply to the limiting availabilities


as well as to the point availabilities.
Consider two examples. Suppose that each of the components of a five
component series structure has an exponential life distribution with the
parameters λf1 = 0.833, λf2 = 1.250, λf3 = 2.500, λf4 = 1.111, and λf5 = 0.200 and that
for each component, the distribution on repair time is exponential with the
parameters λr1 = 0.0625, λr2 = 0.040, λr3 = 0.100, λr4 = 0.025, and λr5 = 0.050. Then,
we know from our earlier analysis that

l jr l jf - ( l + l )t
A j (t) = + e jf jr
l jf + l jr l jf + l jr

and

l jr
A j¥ =
l jf + l jr

Therefore, since the system is a series system,


5 5

Õ Õl
l jr
AS¥ = A j¥ = 0.800) = 0.678
= (0.930)(0.969)(0.962)(0.998)(0
j =1 j =1 jf + l jr

and the point availability function is the one shown in Figure 11.4. We can see
that the product form of the structure function implies a fairly rapid decline
in the value of the exponential term and the system thus “settles down” into
its limiting behavior in a short time.
As an example of the more complicated analysis we might encounter, con-
sider the bridge structure of Chapter 2 and assume each of the components

0.9

0.85

0.8

0.75

10 20 30 40

FIGURE 11.4
Point availability function for a series system example.
Availability Analysis 229

has a Weibull life distribution with parameters β = 2.0 and θ = 200 hours and
an exponential repair time distribution with parameter λ = 0.10. As we have
already computed the availability for a single device with these character-
istics, we know that Aj(50) = 0.9807 and Aj(200) = 0.9454. We may also recall
that the series–parallel bounds, the min path upper–min cut lower bounds,
and the minimax bounds on reliability for the bridge structure with identical
components are

r 5 £ Rs £ 1 - (1 - r )5

(1 - (1 - r) ) (1 - (1 - r) ) ( )
2 2
2 3
£ Rs £ 1 - (1 - r 2 )2 (1 - r 3 )2

and

r 2 £ Rs £ 1 - (1 - r )2

respectively. Replacing the reliabilities with availability terms yields

( Aj (t)) £ As (t) £ 1 - ( 1 - A j (t) )


5 5

(1 - (1 - A (t)) ) (1 - (1 - A (t)) ) £ A (t) £ æçè 1 - (1 - ( A (t)) ) (1 - ( A (t)) ) ö÷ø


j
2 2
j
3 2
s j
2 2
j
3 2

and

( Aj (t)) £ As (t) £ 1 - ( 1 - A j (t) )


2 2

Therefore, we can compute the availability bounds:

0.9072 £ As (50) £ 1.0

0.9992 £ As (50) £ 0.9999

0.9618 £ As (50) £ 0.9996

and

0.7552 £ As (200) £ 1.0

0.9937 £ As (200) £ 0.9997

0.8938 £ As (200) £ 0.9970


230 Reliability Engineering: Probabilistic Models and Maintenance Methods

Of course, we also have the limiting availabilities of

E[Tj ] 177.245
A j¥ = = = 0.947
E[Rj ] + E[Tj ] 10.0 + 177.245

so we can compute the bounds:

0.760 £ As¥ £ 1

0.994 £ As¥ £ 0.969

and

0.896 £ As¥ £ 0.997

Thus, we can see that for the assumed operating scenario, the results we devel-
oped earlier using the system structure can be very useful and informative.
A second model for which we can obtain results is that in which we have
an m component series system. We assume that the failure of any component
stops the operation of the system and thus of the other components. The
other components do not age while the failed component is being replaced.
Thus, when system operation is resumed, the replaced component is new
and the other components have the age they had when the replaced compo-
nent failed. Several key structural results for this model were developed by
Barlow and Proschan [12]. These results are general in that they apply to any
choice of continuous life and repair time distributions.
The sample path shown in Figure 11.1 at the start of this chapter is rep-
resentative of the operating experience for a series system. The individual
failure and repair times may (usually do) correspond to the failures and
replacements of different components. Thus, let Tji represent the life length
of the ith copy of component j used in the system, and let Rji represent the ith
replacement time for the jth component. Next, let U(t) represent the system
operating time (uptime) during the real time interval (0, t). Note that U(t) is
not an availability measure but that for any realization of the system sample
path, U(t)/t is the proportion of the time the system functioned and hence is
the average availability for that sample path. Now, Barlow and Proschan [12]
prove that
-1
U (t) æç EGj [R] ö
m
AS¥ = lim
t ®¥ t
= 1+
ç å ÷
EFj [T ] ÷
(11.21)
è j =1 ø

That is, we take the ratio of the expected repair time to the expected life
length for each component and sum across components. Add this to one and
take the reciprocal and the result is the limiting average system availability.
Availability Analysis 231

In addition, for typical choices of the life distributions—those that are


continuous—the limiting system availability is the same as the limiting aver-
age system availability. Also, the limiting values for the component replace-
ment rates are the respective reciprocals of the mean life times:

ìï é N j (U (t)) ù üï 1
lim íE ê
( ) ú ý = E [T ] (11.22)
t ®¥
îï ë U t û þï Fj

whereas in our previous analyses


n
Sjn = åT
i =1
ji

and
N j (t) = sup{n|Sjn £ t}

A conceptually equivalent statement to Equation 11.22 is that the average life


length of a component divided by the total uptime equals (in the limit) the
average number of copies of the component that are used. Additional useful
results are

U (t) » tAS¥ (11.23)

and
ì N j (t) ü AS¥
lim í ý» (11.24)
t ®¥
î t þ EFj [T ]

Over time, the uptime is approximated by the product of time and limit-
ing average system availability, and the number of replacements for any
component is approximately the limiting average availability divided by
the average life length. Note that dividing both sides of Equation 11.24 gives
(approximately) Equation 11.22.
At the system level, we can state that the average system uptime per cycle
converges in the limit to
-1
æ m
1 ö÷
U =ç
ç å EFj [T ] ÷
(11.25)
è j =1 ø

and the corresponding limit for the average time down per cycle is
æ m
EGj [R] ö
D =Uç
ç å E [T] ÷÷ (11.26)
è j =1 Fj
ø
232 Reliability Engineering: Probabilistic Models and Maintenance Methods

Clearly, these imply the average cycle length and they are consistent with the
average availability expressions.
Consider again the example of a system comprised of m = 5 components in
series each of which has an exponential life distribution with the parameters
λf1 = 1.20, λf2 = 0.800, λf3 = 0.400, λf4 = 0.90, and λf5 = 5.00 and an exponential
repair time distribution with parameters λr1 = 19.01, λr2 = 25.00, λr3 = 10.00,
λr4 = 40.00, and λr5 = 20.00. In this case, we find

-1
æ
AS¥ = ç 1 +
( 0.0625 ) + ( 0.0640 ) + ( 0.100 ) + ( 0.025 ) + ( 0.050 ) ö = 0.695
ç ÷
è ( 0.833 ) (1.250 ) ( 2.500 ) (1.111) ( 0.200 ) ÷ø
-1
æ m
1 ö÷
U =ç
ç å EFj [T ] ÷
-1
= ( 0.833 + 1.25 + 2.50 + 1.111 + 0.20 ) = 0.11668
è j =1 ø

D = U (0.4387 ) = 0.0732

so the average cycle length is 0.24 time units.


There are two interesting final observations concerning the system avail-
ability measures. First, Equations 11.21, 11.25, and 11.26 jointly imply that

U
AS¥ =
U+D

as we would expect to be the case. Similarly, if the number of components of


the system is m = 1, the limiting average system availability of Equation 11.21
reduces to

-1
æ E [R] ö EF [T ]
AS¥ = ç1+ G ÷ =
è EF [T ] ø EF [T ] + EG [R]

and this is also what we would expect.

11.4 Nonrenewal Cases


Very few availability results have been developed for the cases in which
repair does not imply renewal. Numerous authors have used cost models
rather than availability models to describe the consequences of downtime.
The reason for the shift to cost models is the complexity (or impossibility) of
Availability Analysis 233

the probability analysis for availability. Some of these cost models are dis-
cussed in the next chapter. In the meantime, we can construct some single
unit availability results for the imperfect repair model and for the quasi-
renewal model.

11.4.1 Availability under Imperfect Repair


Recall that under an imperfect repair regime, following repair, a device is
returned to a good as new condition by a perfect repair with probability p
and remains in a bad as old condition with probability q = 1 − p. The general-
ization of this model defined by Block et al. [68] and discussed in Chapter 10
is to make the probability of perfect repair age dependent. Then, the prob-
ability of perfect repair is p(t). As discussed in Chapter 10, the distribution on
the total operating time from a renewal point until a failure that is followed
by a perfect repair for the first time is (Equation 10.10)

( )
p
Fp (t) = 1 - FT (t) = 1 - e - pZT (t ) (11.27)

This expression applies when the minimal repair actions are instantaneous.
However, it also represents the distribution on total operating time for the
case in which minimal repairs have nonzero durations. In addition, it is use-
ful to note that the average time of operation per renewal cycle may be deter-
mined as
¥

Ep [T ] = Fp (u)du
ò
0
(11.28)

Now, if the minimal repair times have distribution, Gm(t) and if it is the nth
device failure that produces the first perfect repair, then the total time spent
in minimal repair will have the distribution, Gm( n -1) (t), the n − 1 fold convolu-
tion of Gm(t). For this conceptual model, Iyer [77] has shown that the distri-
bution on the total time devoted to minimal repairs in any renewal cycle is
n -1
¥
( L m (t))
G(t) = e - L m (t ) å
n =1
(n - 1)!
Gm( n -1) (t) (11.29)

where
s

ò
L m (s) = q(u)zT (u)du
0
(11.30)

represents the cumulative intensity function for the occurrence of a failure


with a minimal repair response. Clearly, the distribution in Equation 11.29
234 Reliability Engineering: Probabilistic Models and Maintenance Methods

is constructed by using a Poisson distribution on the number of minimal


repair events as the mixing distribution for the extent of the minimal repair
time convolution.
The convolution of the distribution G(t) with the distribution on total oper-
ating time per cycle yields the time until the occurrence of a failure, which
is to be followed by a perfect repair. That is, the distribution on time until a
perfect repair is started:
t

ò
Fp* (t) = G(t - u) f p (u)du
0
(11.31)

Similarly, the distribution on the total duration of the renewal cycle is the
convolution of this distribution with the one on perfect repair time, say, Gp(t).
We may also determine the limiting device (average) availability using the
expectations for the durations already defined. To do this, note first that in
general, each renewal cycle will include, say, N periods of device operation
terminated by a device failure. The first N − 1 of the operating periods is
followed by periods of minimal repair and the last failure is followed by a
period of perfect repair. Now Iyer [77] has shown that
¥

ò
E[N ] = 1 + L m (u)Fp (u)du
0
(11.32)

and since E[N − 1] = E[N] − 1, the limiting availability is

EFp [T ]
A¥ = (11.33)
EFp [T ] + E[N - 1]EGm [T ] + EGp [T ]

That is, the average uptime divided by the average cycle length gives us the
average availability. The average cycle length is comprised of the average
uptime plus the average number of minimal repair intervals times their
average duration plus the average duration of the perfect repair interval.
Notice that we can also state that the average fraction of the time the device
is undergoing minimal repair is

E[N - 1]EGm [T ]
EFp [T ] + E[N - 1]EGm [T ] + EGp [T ]

and the average fraction of the time the device is undergoing perfect repair is

EGp [T ]
EFp [T ] + E[N - 1]EGm [T ] + EGp [T ]
Availability Analysis 235

Finally, we note that for the simpler case in which p(t) = p, the probability
of perfect repair is age independent so the expected number of failures to a
perfect repair is geometric and

1
E[N ] = (11.34)
p

In addition, the distribution G(t) reduces to

¥
(1 - q)qn -1 ( n -1)
G(t) = å
n =1
(1 - qn )
Gm (t) (11.35)

Other simplifications are also consistent with our results. If we take p = 0,


we obtain the minimal repair model either with or without instantaneous
minimal repair times. If we take p = 1, we have the renewal case. Assuming
the basic distributions are exponential yields several simple models that all
behave as Markov processes as we would expect.

11.4.2 Availability Analysis for the Quasi-Renewal Model


Because the structure of the quasi-renewal model is sufficiently similar to
that of the renewal models, Rehmert and Nachlas [78] were able to obtain
approximate values for and tight bounds on device availability by using the
clever idea of computing both availability and unavailability measures. To
construct their model, Rehmert and Nachlas represent the distributions on
the operating intervals using

æ t ö
FTn (t) = FX1 ç n -1 ÷
èa ø

and the distributions on the repair times as

æ 1 ö
GYi (t) = GY1 ç i -1 t ÷
èb ø

where the repair process is also assumed to be quasi-renewal with β serving


as the timescale parameter. In this case, setting β to be greater than one
implies that an aging device is increasingly more difficult to repair. Then,
referring to Figure 11.1, they take Φn = T1 + T2 + ⋯ + Tn and in an equivalent
å
n
manner, Y n = Ri with distributions FF n (t) and GY n (t). Further, they
i =1
236 Reliability Engineering: Probabilistic Models and Maintenance Methods

define Vi = Ti + Ri so that HVi (t) is the convolution of FTi (t) and GYi (t), and
å
n
Qn = Vi has distribution H Qn (t) so that the quasi-renewal function for
i =1

the process is QH(t).


The key to their analysis is that Rehmert and Nachlas next observe that
availability may be viewed from both an “uptime” and a “downtime” per-
spective. Specifically, using the concept on which Equation 11.13 is based
yields an availability function that represents the probability that a device is
functioning at any point in time:

t t

ò
A(t) = FX1 (t) + FX2 (t - u)hQ1 (u)du + FX3 (t - u)hQ2 (u)du
0
ò
0

ò
+ ⋯ + FXn+1 (t - u)hQn (u)du + ⋯
0

¥ t

= FX1 (t) + åòF


n =1 0
X n+1 (t - u)hQn (u)du (11.36)

The Laplace transform for this expression is

¥ ¥
1
s (
A* (s) = 1 - f X*1 (s) + ) å n =1
H Q* n (s) - å F*
n =1
X n+1 (s)hQ* n (s)

1æ ö
¥ ¥

s çè (
= ç 1 - f X*1 (s) + ) å n =1
hQ* n (s) - å f*n =1
X n+1 (s)hQ* n (s) ÷
÷
ø

1æ ö
¥

s çè (
= ç 1 - f X*1 (s) + ) å (1 - f * (s)) h* (s) ÷÷ø
n =1
X n+1 Qn (11.37)

Device availability is also equal to one minus the probability that the device
is down for repair. This event can be represented as

æt
ç
è0
ò
A(t) = 1 - ç GY1 (t - u) f X1 (u)du

¥ t t-w
ö
+ åò ò G Yn+1 (t - w - u) f Xn+1 (u)hQn (w)du ÷
÷
(11.38)
n =1 0 0 ø
Availability Analysis 237

and the Laplace transform for this expression is

A* (s) =
1
s { (
1 - f X*1 (s)[1 - gY*1 (s)]

+ f X*1 (s) gY*1 (s) f X*2 (s)[1 - gY*2 (s)] + ⋯ )}


1 ïì
¥
= í1 -
s îï å[1 - g* (s)]( f * (s) f * (s)⋯
n =1
Yn X1 X2

üï
⋯ f X*n (s) gY*1 (s) gY*2 (s)⋯ gY*n-1 (s) ý )
þï
1 ìï æ 1 ö üï
¥
= í1 -

î
å ç
ç *
1 è g Yn ( s)
n =1
- 1 ÷hQ* n (s)ý
÷
ø ïþ
(11.39)

This second form of the availability function is called the downtime-based form.
As stated, the two forms are equal but both the uptime- and the downtime-based
functions include infinite sums. As it is usually impossible to obtain closed-form
expressions for the inverse transforms of the availability expressions, the trans-
forms are inverted numerically with truncation of the infinite sums. The result-
ing solutions are approximate and because the two expressions are based on
complementary concepts, they form bounds on the actual availability.
Truncating the infinite sum in the transform of the uptime-based function
after “c” terms yields

1æ ö
( ) å (1 - f * (s)) h* (s) ÷÷ø
c
Al*( c ) (s) = ç 1 - f X*1 (s) + Qn (11.40)
s çè
X n+1
n =1

This function is increasing in c so it is a lower bound on device availabil-


ity. Similarly, truncating the infinite sum in the transform of the downtime-
based function after “c” terms yields

1 ìï æ ö ü
c

å ççè g* (s) - 1÷÷øh* (s)ïýïþ


1
Au*( c ) (s) = í1 - Qn (11.41)

î n =1 Yn

which is decreasing in c so it is an upper bound on device availability. In


addition, for any value of c, the difference between the two bounds is a mea-
sure of the error that results from the approximating strategy.
Rehmert and Nachlas analyze the approximate functions for several cases
and also consider both the case of independently and identically distributed
238 Reliability Engineering: Probabilistic Models and Maintenance Methods

(i.i.d.) repairs where β = 1 and quasi-renewal repairs for which β > 1. As an


example, suppose the operating intervals and the repair intervals are both
exponentially distributed. In this case, the transform for the uptime-based
approximation is

c n
æ (aml)n ö
å Õ (l + a
1 1
A* (s) = + ç ÷ j -1
(11.42)
l+s n =1 è
l + a ns ø j =1
s)(m + b j -1s)

and the transform for the downtime-based approximation is

c n

A* (s) =
1 1
-
s m å( bn -1(ml)n ) Õ (l + a j -1
1
s)(m + b j -1s)
(11.43)
n =1 j =1

If the repair intervals are i.i.d. rather than quasi-renewal, these two equa-
tions reduce to

c n
æ (aml)n ö
å Õ (l + a
1 1
A* (s) = + ç n ÷ j -1
(11.44)
l+s n =1 è
(l + a s)(m + s) ø
n
j =1
s)

and

c n n
æ ml ö
å Õ (l + a
1 1 1
A* (s) = - ç ÷ j -1
(11.45)
s m n =1 è
m+sø j =1
s)

Rehmert and Nachlas actually obtain closed-form expressions for the avail-
ability measures for the case in which operating periods and repair inter-
vals are normally distributed. The analysis of the functions using various
other distributions yields a common structure for the availability measures.
They show that the contribution of successive terms in the summations
behave as shown in Figure 11.5a and b for the uptime and downtime models,
respectively.
They show further that the approximations may be made arbitrarily close
by the choice of c. This is illustrated for an exponential example using c = 30
in Figure 11.6. Note that as time is increased, the number of terms needed
to keep the approximations close together increases. By selecting the value
of c, one may attain any preferred level of accuracy. In addition, analysis of
a large number of terms in the downtime expression can be used on its own
to obtain a tight upper bound on availability, and this may be adequate for
device management decisions. In both of these analyses, there will be a bal-
ance between computational effort and accuracy.
Availability Analysis 239

0.8
A(t) contribution

0.6

0.4

0.2

200 400 600 800 1000


(a) Time

1
0.975

0.95
A(t) contribution

0.925

0.9

0.875

0.85

0.825

200 400 600 800 1000


(b) Time

FIGURE 11.5
(a) Contributions of the individual terms in the lower bound on point availability. (b) Contribu-
tions of the individual terms in the upper bound on point availability.

11.5 Markov Models


To conclude this chapter, note that there is an entirely different approach
than the ones discussed previously to modeling the time evolution of repair-
able systems. Many people, and most notably Birolini [79], have used Markov
chains and Markov processes to represent device and system-level behav-
iors and have used the Markov models to obtain availability or cost mea-
sures. The core idea for the Markov models is to represent system status in
a manner like the one shown in Figure 11.7. In the figure, the failure haz-
ard is represented by λ(t) and the repair hazard is denoted by μ(t). These
functions represent the intensity of transition to and from the functioning
240 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.8

0.6
A(t)

0.4

0.2

200 400 600 800 1000 1200 1400


Time

FIGURE 11.6
Curves of A H(t) for the exponential case for values of c = {1, 2, …, 13} (λT = 0.01 and λ R = 0.05).

λ(t)

X=1 X=0

μ(t)

FIGURE 11.7
Two-state transition diagram.

state  (state  1). Under this format, the device operation is represented as a
continuous time Markov process with the Chapman–Kolmogorov forward
differential equations [80] being
d
p1,1(t) = -l(t)p1,1(t) + m(t)p2 ,1(t)
dt
d
p1, 2 (t) = -m(t)p1, 2 (t) + l(t)p1,1(t)
dt
(11.46)
d
p2 ,1(t) = -l(t)p2 ,1(t) + m(t)p2 , 2 (t)
dt
d
p2 , 2 (t) = -m(t)p2 , 2 (t) + l(t)p2 ,1(t)
dt

where pi,j(t) is the probability that the process passes from state i to state j
during an interval of length t. For the case in which the intensity functions
Availability Analysis 241

are constant, the fact that p1,1(t) + (t)p1,2(t) = 1 implies that the first of the
Equation 11.46 becomes

d
p1,1(t) = -(l + m)p1,1(t) + m
dt

for which the Laplace transform is

p1*,1(s) - p1,1(0) = -(l + m)p1*,1(s) + ms

so
s+m
p1*,1(s) =
s(s + l + m)

which is the form given in Section 11.2.1. Clearly, the result may be con-
structed using either approach.
For some of the more complicated cases, the use of the Markov process
model is easier than the direct analysis of the renewal process model. As a
case in point, consider a repairable parallel system of two components.
Assume that two identical components are arranged in a parallel configu-
ration and that the system is operated so that one of the components is
functioning and has constant failure hazard λ, while the second component
is in reserve and is therefore subject to a constant but reduced failure haz-
ard, say, λ r. Suppose that whenever a component fails, repair is undertaken
immediately. The repair time is exponential with intensity μ, and if the
repair is completed before a second component failure, the system contin-
ues to function with the repaired component in the reserve role. System
failure occurs when a component failure precedes the completion of a com-
ponent repair. This seemingly simple system is actually quite difficult to
analyze. The state transition diagram for the representative Markov pro-
cess is shown in Figure 11.8. The state space E = {0, 1, 2} represents the
number of failed components.

λ + λr λ

E=0 E=1 E=2

μ μ

FIGURE 11.8
State transition diagram for the repairable parallel system.
242 Reliability Engineering: Probabilistic Models and Maintenance Methods

The associated Chapman–Kolmogorov forward differential equations are


the following nine equations. To evaluate system availability, we need only
to solve the first three equations because the availability corresponds to the
time spent in states 1 and 2:

d
p0 , 0 (t) = -(l + l r )p0 , 0 (t) + mp0 ,1(t)
dt

d
p0 ,1(t) = -(l + m)p0 ,1(t) + mp0 , 2 (t) + (l + l r )p0 , 0 (t)
dt
d
p0 , 2 (t) = -mp0 , 2 (t) + lp0 ,1(t)
dt
d (11.47)
p1, 0 (t) = -(l + l r )p1, 0 (t) + mp1,1(t) + lp1, 2 (t)
dt
d
p1,1(t) = -(l + m)p1,1(t) + mp1, 2 (t) + (l + l r )p1, 0 (t)
dt

d
p1, 2 (t) = -m(t)p1, 2 (t) + l(t)p1,1(t)
dt
d
p2 , 0 (t) = -(l + l r )p2 , 0 (t) + mp2 ,1(t)
dt
d
p2 ,1(t) = -(l + m)p2 ,1(t) + mp2 , 2 (t) + (l + l r )p2 , 0 (t)
dt
d
p2 , 2 (t) = -mp2 , 2 (t) + lp2 ,1(t)
dt

Keeping in mind the fact that

p j , 0 (t) + p j ,1(t) + p j , 2 (t) = 1

we use the same method as for the previous model to obtain

(s + l + l r )p0*, 0 (s) - mp0*,1(s) = 1


m (11.48)
-(l + l r - m)p0*, 0 (s) + (s + l + 2m)p0*,1(s) =
s
Availability Analysis 243

The solution of these two equations requires considerable effort but finally
one finds the solutions to be

æ ì ü
ç m 2 + ïí (l + l r )(l + m) + (l + l r )(2lm - l rm - ll r ) ïý
ç ïî 2 2 4lm + l 2r ïþ
è

ì t t
üï - t ( 2l + 2m + lr ) ö
´ ïíe 2
4 lm + l r2 - 4 lm + l r2
-e 2
ýe 2 ÷
÷
îï þï ø
p0 , 0 (t) =
(l + l r )(l + m) + m 2

æ ìï m (2l 2 + 2ll r + l rm) üï


(l + l r ) ç m + í - + ý
ç ï 2 lm + l 2
è î 2 4 r þï

ì t ïü - 2 ( 2l + 2m + lr ) ö
t t
´ ïíe 2
4 lm + l r2 - 4 lm + l r2
-e 2
ýe ÷
ïî ïþ ÷
p0 ,1(t) = ø
(l + l r )(l + m) + m 2

Assuming the system starts with both components being new, the sum of
these two probabilities is the availability function:

A(t) =
( m(l + l r + m))
(l + l r )(l + m) + m 2

æ ìï l(l + l ) l(2l + 2m + l )(l + l ) üï ìï t 4 lm + lr2 -


t
4 lm + l r2 üï - t ( 2l + 2m + lr ) ö
çí r
+ r r
ýí e 2 - e 2
ýe 2 ÷
çï 2 2 4lm + l 2r ïþ îï þï ÷
è î ø
+
(l + l r )(l + m) + m 2
(11.49)

and the complement is the solution for p0,2(t). Clearly, the limiting availability is

A¥ =
( m(l + l r + m)) (11.50)
(l + l r )(l + m) + m 2

Note that if we set λr = 0, the model and its results represent the standby
redundant case in which the second component is not activated until the first
one fails. For that case, the limiting availability reduces to

A¥ =
( m(l + m))
l(l + m) + m 2
244 Reliability Engineering: Probabilistic Models and Maintenance Methods

Similarly, taking λr = λ, implies active redundancy in which the second com-


ponent ages while waiting to be used. In that case, the limiting availability is

A¥ =
( m(2l + m))
2l(l + m) + m 2

For both of these special cases, the corresponding forms for the point avail-
ability function apply.
The Markov process models are used widely to represent system behavior.
It is appropriate to emphasize that the earlier case is the simplest one of its
class. A more general case is to assume a k out of n structure for which com-
ponent failures are repaired as they occur by a single repair person. System
failure occurs when the number of functioning components is reduced by
k + 1 to n − k − 1. The problem may again be represented as a Markov pro-
cess. In fact, since only adjacent states are accessible in single transitions, it
is a birth–death process [80]. This makes analysis possible. However, it is
exceedingly complicated to construct the time-dependent transition func-
tions. Birolini [79] gives the expected first passage time to the system down
state and the limiting availability. To state his solution, let Ei represent the
state that i components are failed and let δi denote the failure intensity for
state Ei. Then,

di = kl + (n - k - 1)l r

and the performance measures depend upon the quantity n − k. When


n − k = 1, the expected first passage time to system failure is

d0 + d1 + m
E[T ] =
d0d1

and
m(d0 + m)
A¥ =
d0d1 + md0 + m 2

and by similar analysis, when n − k = 2, the expected first passage time to


system failure is

d2 (d0 + d1 + m) + m(d0 + m) + d0d1


E[T ] =
d0d1d2

and

m(d0d1 + d0m + m 2 )
A¥ =
d0d1d2 + md0d1 + d0m 2 + m 3
Availability Analysis 245

Finally, Birolini provides the general solution for the limiting availability for
any value of n − k:

æ i -1 ö
å Õ
n-k
1+ ç d j ÷ /m i
i =1 ç ÷
A¥ = è j=0 ø (11.51)
n - k +1 æ ö
i -1

1+ å i =1 Õ
ç
ç j=0 ÷
d j ÷ /m i
è ø

Clearly, more complicated system behavior or even less regular life and
repair time distributions may make the analysis of system behavior quite
intricate. For the more complicated systems, the general results defined in
Section 11.3 provide one approach and a simulation of the general Markov
process is often a worthwhile approach.

Exercises
11.1 Assume a device has an exponential life distribution with parameter
λf = 0.005 and an exponential repair time distribution with parameter
λr = 0.08. Plot the point availability function for the device and plot also
the Laplace transform of the availability function, A* (s).
11.2 For the device of the previous problem, compute the limiting availabil-
ity and also the average availability over the interval (40, 80) hours.
11.3 Reconstruct the numerical results shown in Table 11.1 for the Weibull
life distribution and exponential repair time distribution for 50 hours
and for 100 hours.
11.4 Assume the bridge structure is comprised of 5 identical copies of the
device described in Problem 11.1. Plot the point availability function
for the system.
11.5 Suppose a series system is comprised of four components, each of
which has an exponential life distribution and an exponential repair
time distribution. Assume the parameters for these distributions are
λ1f = 0.001, λ1r = 0.02, λ2f = 0.003, λ2r = 0.07, λ 3f = 0.004, λ 3r = 0.03, λ 4f = 0.002,
and λ 4r = 0.08. Plot the point availability function for the system and
compute the limiting system availability.
11.6 Consider the following 6-component system that was treated in an
earlier chapter. Recall that components 1 and 6 are identical and com-
ponents 2 and 3 are identical.
246 Reliability Engineering: Probabilistic Models and Maintenance Methods

5
2

Suppose the life distributions for components 1 and 6 are Weibull with
β = 2.25 and θ = 750.0 and the other components have exponential life
distributions with λ2 = λ 3 = 0.0016, λ 4 = 0.0025, and λ 5 = 0.001. Assume
further that the repair times for the components are exponential with
μ1 = 0.05, μ2 = 0.08, μ 3 = 0.08, μ 4 = 0.06, μ 5 = 0.02, and μ6 = 0.05. Use the
min paths and min cuts to compute bounds on the system availability
at 50 hours and to compute bounds on the limiting system availability.
11.7 Suppose a device has a Weibull life distribution with parameters β = 2.0
and θ = 200 hours and is subject to imperfect repair with p(t) = p = 0.25.
The duration of perfect repair has an exponential distribution with
parameter λp = 0.08 and the duration of minimal repair has an expo-
nential distribution with parameter λr = 0.02. Compute the limiting
availability for the device.
11.8 Construct an algebraic statement of the uptime and the downtime
availability functions for the quasi-renewal model for the case in
which both failure times and repair times have normal distributions
and both failure and repair processes are quasi-renewal.
11.9 Suppose that for the previous problem, μo = 250, σo = 60 and μr = 150,
σr = 25 and that α = 0.975, β = 1.01. Compute the values of the point
availability functions for truncation values of c = 2,3, and 4 for times
of 200, 400, 900, and 1200 hours. Comment on the comparison of the
uptime- and downtime-based values. Then, plot the uptime-based
availability function and comment on its behavior.
12
Preventive Maintenance

Manufacturing equipment, electrical power generation stations, airplanes,


and automobiles are examples of types of equipment for which operation
to failure is usually inefficient and dangerous. While it is true that com-
ponent and system failures do occur for these equipment types, it is also
true that failure events can be very important and the frequency of fail-
ures and also the severity of failures can often be reduced by preventive
maintenance (PM).
PM is the practice of removing a functioning device from operation in
order to repair, replace, adjust, test, or simply inspect it. Naturally, the spe-
cific PM action depends upon the particular system, but all PM activity is
undertaken with the intent to provide enhanced assurance of function. In
some cases, PM involves the replacement of a functioning device prior to
failure. Generally, the motivation for replacing a functioning device is that
the cost of doing so is small in comparison to the expense of responding to
a failure that occurs during device operation—a field failure. Actual failure
may imply damage to the device or to other components or equipment. For
other cases such as alarm systems, simple inspection and testing are the logi-
cal forms of PM. For still other equipment types, adjustment is the appropri-
ate action, while for others, lubrication, annealing, or reinforcement are the
required forms of service.
In general, performing PM implies some sort of resource commitment and
it is expected that the PM effort will reduce the probability of device failure.
The expense of the resources devoted to PM is expected to be balanced by
the savings in (potentially substantial) expenses associated with in service
failure.
Because of the wide variety of possible applications and the diversity of
conceivable policies, numerous models of PM have been developed. We will
examine several classes of PM strategies and their associated models here.
Some of the models are defined in terms of availability. However, because
the availability function is usually very difficult to construct and analyze,
most of the models are constructed using cost functions for the measure of
performance instead of availability. We should recognize that in general, PM
reduces availability even though it also usually reduces cost.

247
248 Reliability Engineering: Probabilistic Models and Maintenance Methods

12.1 Replacement Policies


12.1.1 Elementary Models
Historically, two classes of replacement-type preventive maintenance (PM)
policies have been defined. They are referred to as “age replacement” and
“block replacement” policies. An age replacement policy operates as follows:

The device is replaced by a new copy upon failure or if it achieves an


age equal to the “policy age,” τa.

In both cases, the device is assumed to be as good as new following replace-


ment so renewal theory–based models may be used to represent device
performance. Also, the traditional models were constructed under the
assumption that the duration of the replacement activity has the same dis-
tribution for both PM and replacement following failure. Replacement or
repair following failure is referred to here as “corrective” maintenance in
contrast to PM. The assumption of a common replacement time distribu-
tion is used here initially and a model based on distinct distributions is pre-
sented subsequently.
In contrast to an age replacement policy, a block replacement policy is
based on scheduled actions rather than on device age. The policy operates
as follows:

The device is replaced by a new copy upon failure or at uniformly spaced


time intervals, τb, 2τb, 3τb, … where τb is called the “policy time.”

A slight modification to the policy has been studied by several people. It is


that when the device fails at a time close to the policy time, either the device
is left idle until the replacement time or the replacement time is advanced
slightly. As they constitute minor revisions to the models, these adjustments
to the policy are not pursued here.
For both age replacement and block replacement, optimal values for the
policy times may be determined by analyzing appropriate cost models.
This is shown in the following. First, relationships between the two policies
and general patterns of behavior are examined. Start by defining a conve-
nient notation and an ordering relationship among distribution functions.
Specifically, let

N(t, τa) be the number of device failures during an interval (0, t) when the
device is operated under an age replacement policy with policy age τa
N(t, τb) be the number of device failures during an interval (0, t) when
the device is operated under a block replacement policy with policy
time τb
Preventive Maintenance 249

N(t) be the number of device failures during an interval (0, t) when the
device is operated to failure with no PM
ɶ (t , ta ) be the number of device replacements (failures and PM) during
N
an interval (0, t) when the device is operated under an age replace-
ment policy with policy age τa
ɶ (t , tb ) be the number of device replacements (failures and PM) during
N
an interval (0, t) when the device is operated under a block replace-
ment policy with policy time τb

The general points of interest revolve around the relationships among


these quantities. As all are random variables, we use stochastic ordering for
which we have the following definition:

Definition 12.1

A random variable X is stochastically greater than or equal to the random


st
variable Y, written X ³ Y if

Pr[X > z] ³ Pr[Y > z] "z (12.1)

This definition is widely used but is somewhat counterintuitive as it implies


st
that X ³ Y when FX(z) ≤ FY(z) for all z. The appropriate interpretation is that
the distribution on X has its mass concentrated on greater values of the ran-
dom variable than does the distribution on Y.
Now, using the previous definitions, we find that for all devices with new
better than used (NBU) life distributions over all values of t and all choices
of τa,

st
N (t) ³ N a (t , t a ) (12.2)

Thus, using an age replacement policy makes lower numbers of failures


more likely than they are when PM is not used. This provides a reasonable
starting justification for the use of PM. Some additional relationships are
as follows:

st
1.  For all t ≥ 0 and τb ≥ 0, N (t) ³ N b (t , tb ) if and only if FT(t) is NBU.
2. For all t ≥ 0 and τa ≥ 0, Na(t, τa) is stochastically increasing in τa if and
only if FT(t) is NBU.
3. For all t ≥ 0 and τb ≥ 0, Nb(t, τb) is stochastically increasing in τb if and
only if FT(t) is NBU.
250 Reliability Engineering: Probabilistic Models and Maintenance Methods

With regard to the comparisons of the policies, the relationships are quite inter-
esting. Specifically, for all t ≥ 0 and for any value assigned to the policy param-
eters, τ

st
ɶ a (t , t) £ N
N ɶ b (t , t)
st
(12.3)
N a (t , t) ³ N b (t , t)

where the first of these relations holds in general and the second applies
when the life distribution is increasing failure rate (IFR). Basically, these two
inequalities state that block replacement tends to yield more device remov-
als, while age replacement tends to yield more device failures. This seems to
conform to intuition as one has the sense that the block replacement policy
involves removal of relatively young copies of the device.
Now, the cost models for these two basic PM strategies are usually for-
mulated without considering the durations of the maintenance tasks. It is
considered that the costs adequately capture the implications of failure and
of planned replacement so the total cost per unit time is an informative
measure of device performance. It is also considered that the use of PM
will reduce the frequency of “field failures” (failures while in operation)
and presumably, this implies a cost savings. Start with the block replace-
ment policy.
Suppose it is possible to identify the costs of a planned replacement and of
field failures and these quantities are represented by c1 and c2, respectively.
Then, a model for the total cost per unit time associated with a block replace-
ment PM strategy is

c1 + c2 MFT (tb )
E[Cost|tb ] = (12.4)
tb

The interpretation of this model is that there will be one planned replace-
ment per period at a cost of c1, and the expected number of failures with
corrective replacements per period is given by the renewal function. Each
corrective replacement has a cost of c2. Dividing by the length of the period
gives the expected cost per unit time.
Once the cost model is defined, we can use it to determine an optimal
choice to the policy time, τb, by conventional optimization methods. Taking
the derivative,

d
c2 t b MFT (tb ) - c1 - c2 MFT (tb )
d dt b
E[Cost|tb ] =
dtb tb2
Preventive Maintenance 251

and equating it to zero, we find that the optimal choice of the policy time is
the value for which

d c
tb MFT (tb ) - MFT (tb ) = 1 (12.5)
dtb c2

Intuitively, it is appealing that the policy time should depend upon the ratio
of preventive to corrective replacement costs. Before analyzing the derivative
equation further, observe that the second derivative condition becomes

d2 d
c2tb3 MFT (tb ) - 2c2tb2 MFT (tb ) - 2c1tb - 2c2tb MFT (tb )
d2 dtb2 dt b
E[Cost|tb ] =
dtb2 tb4

c2 d 2
= MFT (tb )
tb dtb2

Since c2 and τb are positive, the sign of the second derivative is determined by
the life distribution. Recall that a distribution that is NBU has

MFT ( x + y ) ³ MFT ( x) + MFT ( y )

so, the slope of the renewal density is positive. Thus, if the device life dis-
tribution is NBU, the value of τb computed using Equation 12.5 will corre-
spond to a minimum of the expected cost function. Logically, if replacement
improves reliability, it is worthwhile and if replacement does not improve
reliability, PM is not appropriate.
Consider an example. Suppose a device has a Weibull life distribution with
parameters β = 2.00 and θ = 2000 hours. Suppose further that the ratio of
preventive replacement cost to corrective replacement cost is ρ. Recall that
the derivative of the renewal function is the renewal density, so we wish to
determine the value of τb for which

c1
tb mFT (tb ) - MFT (tb ) = =r
c2

Numerical solution of this expression using Lomnicki’s [65] coefficients


yields the solutions shown in Table 12.1 as a function of ρ.
For the age replacement policy, there is only one replacement per cycle but
it may be either corrective or preventive. If failure occurs before the device
has an age of τa, replacement will be corrective. If the device survives to age
τa, replacement is preventive. Thus, the cost per cycle is

c1FT (ta ) + c2 FT (ta )


252 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 12.1
Optimal Block Replacement Intervals
as a Function of Cost Ratio
ρ τb
0.05 350
0.20 730
0.40 1102
0.50 1282
0.75 1808

and we must distribute this cost over the expected cycle length, which is
ta ta

ò
ta FT (ta ) + tfT (t)dt =
0
ò F (t)dt
0
T

Therefore, the expected cost per unit time is

c1FT (ta ) + c2 FT (ta )


E[Cost|ta ] = ta (12.6)
ò 0
FT (t)dt

Here again, we take the derivative and obtain

ta

d ( -c1 fT (ta ) + c2 fT (ta )) ò ( )


FT (t)dt - c1FT (ta ) + c2 FT (ta ) FT (ta )
E[Cost|ta ] = 0
2
dta æ ta
ö
ç
è ò 0
FT (t)dt ÷
ø

and equating the derivative to zero, we obtain the optimality condition:


ta
c2
ò
FT (ta ) + zT (ta ) FT (t)dt =
0
c2 - c1
(12.7)

Once again, the solution has the intuitive appeal that the length of the
replacement interval depends directly upon a ratio of the replacement costs.
Consider the same example as was used for block replacement. A plot of the
cost function is shown in Figure 12.1. Solution of Equation 12.7 indicates that
the optimal replacement age is τa = 460.86 hours.
The block replacement and age replacement policies and models provide
an informative starting point for our examination of PM. Once the initial
models were defined, numerous extensions and improvements were defined
Preventive Maintenance 253

0.05

0.04
Cost

0.03

0.02

0.01

500 1000 1500 2000


Time

FIGURE 12.1
Expected cost function for the age replacement.

and then more intricate and more efficient policies were defined. We will
examine a variety of these extensions in the pages that follow.

12.1.2 Availability Model for Age Replacement


One of the first and most realistic extensions to the basic models is to distin-
guish between the times required for preventive and corrective maintenance
actions. Consideration of the times to perform service implies the analysis
of availability. Arguably, the magnitudes of the two cost coefficients in the
previous models represent the difference in the service efforts. On the other
hand, the availability measures are very appealing descriptors of device
performance.
Barlow and Hunter [66] first studied the age replacement policy with dis-
tinct service times but took the times to be fixed so that they were essentially
the same as cost factors. Using the fixed times, they obtained the limiting
availability but not the point availability function, and they showed that PM
is not efficient if the cost of PM is the same as the cost of corrective replace-
ment. More recently, Murdock and Nachlas [81] constructed a general avail-
ability model for age replacement when the service time distributions are
distinct. Examine their model.
To start, let Gp(t) be the distribution function on the times to perform PM
and let Gc(t) be the distribution on corrective maintenance times. Assume
that all service implies device renewal and that the life distribution is rep-
resented by FT(t). Now, each time the renewed device starts an operation,
two types of operating and service intervals are possible. Operation may
end in failure with corrective repair or it may end after τa hours of operation
with a PM replacement. Thus, the renewal interval may look like either of
the ones shown in Figure 12.2. The interval that involves repair following
254 Reliability Engineering: Probabilistic Models and Maintenance Methods

T1 R1 τa PM

(a) Time (b) Time

FIGURE 12.2
(a) Operation to failure with repair. (b) Operation to preventive maintenance replacement.

failure occurs with probability FT(τa) and the one that involves PM occurs
with probability FT (ta ), so the overall renewal process is a mixture of the
two processes created by the operation/repair renewals and the operation/
PM renewals.
Now, the length of the renewal intervals for the case of operation and
repair has distribution Ha,c(t) where the distribution is the convolution of the
repair time distribution with the truncated life distribution:
t
f T ( u)
ò
H a , c (t) = Gc (t - u)
0
FT (ta )
du (12.8)

The life distribution is truncated because device operation is not allowed to


continue beyond the age replacement policy age. In order to construct this
convolution, we will use the partial Laplace transform:
ta

ò
fT*, c (s, ta ) = e - st fT (t)dt
0
(12.9)

Then, the transform for the density on the length of the interval is

1
ha*, c (s) = fT*, c (s, ta ) gc* (s) (12.10)
FT (ta )

The renewal function for the process associated with operation and repair
intervals is denoted by MHa ,c (t).
In the case of the renewal intervals that involve PM, the distribution on the
duration of the intervals is
H a , p (t) = Gp (t - ta ) (12.11)

as the interval includes operation for τa hours followed by a PM activity of


random duration. The Laplace transform for the density on the length of this
type of interval includes the Heaviside function because of the delayed start
of the probability mass. It is

ha*, p (s) = e - sta g *p (s) (12.12)


Preventive Maintenance 255

The renewal function associated with this process is denoted by MHa ,p (t),
and the density on the duration of the renewal intervals is the mixture of the
densities for the lengths of the two types of possible intervals:

ha* (s) = FT (ta )ha*, p (s) + FT (ta )ha*, c (s) = FT (ta )e - sta g *p (s) + fT*, c (s, ta ) gc* (s) (12.13)

The associated renewal function is denoted by MHa (t).


Given the definition of the renewal function, the availability function
is defined in a similar manner to that used to construct Equation 11.13 of
Chapter 11:

ì t

ïï ò
ï FT (t) + FT (t - u)mHa ,c (u)du
0
0 < t £ ta
A(t) = í t (12.14)
ï
ï ò
ïî t - ta
FT (t - u)mHa (u)du t > ta

Observe that in this expression, for times before τa, there will be no PM,
so all intervals are failure intervals, and therefore, the renewal density on
restart times is mHa ,c (t). As with the availability functions of Chapter 11, there
are two reasonable approaches to constructing the actual point availability
function for a particular choice of life and service time distributions. Direct
numerical integration using appropriate numerical approximations can be
very effective. Murdock and Nachlas [81] constructed the Laplace transform
for Equation 12.14 as

FT*, a (s, ta )
A* (s) = (12.15)
1 - ha* (s)

They then used numerical inversion algorithms to obtain values for the
time-dependent function for several combinations of life and service time
distributions. An example of their results for the Weibull life distribution
and two different service time distributions is shown in Figure 12.3.

12.1.3 Availability Model for Block Replacement


A similar analysis may be performed for the block replacement policy.
Suppose, for example, that the block replacement schedule calls for a device
replacement at the end of each month. Then, each month begins with a PM
interval of random duration, and this is followed by a period of operation
that may in turn be followed by one or more periods of repair and opera-
tion. If we formulate our model of this process as the sequence of intervals
256 Reliability Engineering: Probabilistic Models and Maintenance Methods

1.00

0.95

0.90

0.85

0.80

0.75

0 10 20 30 40 50

FIGURE 12.3
Availability function for an age replacement policy example.

(PM + T1), (R1 + T2), …, then the failure times form a delayed renewal process.
The availability function is thus

t t

A(t) =
ò
0
ò
g p (u)FT (t - u)du + FT (t - u)mHb (u)du
0
(12.16)

In this expression, the renewal density for the delayed renewal process is
constructed by first forming the convolutions:

t
f T ( u)
hb , c (t) =
ò g (t - u) F (t - u) du
0
c
T b
(12.17)

t
f T ( u)
hb , p (t) =
ò g (t - u) F (t - u) du
0
p
T b
(12.18)

where the terms in the denominators reflect the truncation of the life dis-
tribution at the end of the block replacement interval less the time already
consumed by replacement. Then, the renewal function is the convolution of
hb,p(t) and the renewal density for the ordinary renewal process based on
hb,c(t). That is,

mHb (t) = hb , p (t) * mHb ,c (t) (12.19)


Preventive Maintenance 257

A (t)

0.95

0.9

0.85

Time
100 200 300 400 500 600 700

FIGURE 12.4
Availability function for a block replacement policy example.

A point here related to the evaluation of this model is that the Laplace trans-
form for the availability function, Equation 12.16, is likely to be completely
unmanageable so that direct numerical integration appears to be the most
efficient approach to the analysis. The numerical analysis is not too difficult.
In addition, for most plausible cases, the first of the two terms of the avail-
ability function dominates the other in magnitude. Consider an example.
Suppose we have a device with a Weibull life distribution with parameters
β = 2.0 and θ = 2000 and exponential corrective and preventive replacement
times with λ c = 0.005 and λp = 0.025. If we take τb = 730 hours, the availability
function will be as shown in Figure 12.4. Finally, note that Equation 12.16
applies for t in the interval [0, τb] or to say it another way, it is defined relative
to the length of the block replacement policy time.

12.1.4 Availability Model for Opportunistic Age Replacement


There are a number of alternate replacement-type PM policies that may be
defined. One particular possibility is the use of an “opportunistic” PM strat-
egy. This is an appealing and quite realistic policy in which the equipment
down state is exploited to perform additional maintenance tasks. Clearly, such
a policy yields efficiencies. It is also commonly used. People often choose to
combine PM actions in order to reduce the number of times a machine or vehi-
cle must be removed from service. However, models for opportunistic mainte-
nance have not been developed until recently because of the complexity of the
probability analysis required. Barlow and Proschan [12] do give the limiting
availability for the opportunistic policy, but it was only recently that Degbotse
and Nachlas [82] constructed the time-dependent availability function for a
general model of opportunistic age replacement. Examine that model.
258 Reliability Engineering: Probabilistic Models and Maintenance Methods

To start, assume as before that replacement implies renewal. Then, con-


sider that any system may be viewed as comprised of two components—one
of interest and the balance of the system. The policy operates as follows:

1. If either component fails, it is replaced and the other component


does not age during that replacement.
2. If either component attains an age corresponding to its age replace-
ment policy age, it is replaced preventively and the other component
does not age during that activity.
3. If at any time the system is “down” in order to replace one of the
components, the other component will also be replaced if its age
exceeds the opportunistic replacement policy age ω.

A model for this policy is rather intricate and requires considerable notation.
Let τai represent the age replacement policy age for component i and ω ai rep-
resent the opportunistic replacement policy age for component i. Necessarily,
ωai < τai. Also, let FTi(t) represent the life distribution function for component
i and GTi(t) represent the distribution on the time to replace component i.
These quantities are used to define a nested renewal process that represents
the operation of the components and thus of the system.
An illustration of the nested renewal process is shown in Figure 12.5.
Notice that the renewal points for “component 1,” points t1, t2, t4, and t5, are
“nested” within those for “component 2,” points t3 and t5. To describe oper-
ating profiles that correspond to nested renewal processes, we define two
classes of operating intervals, major intervals and minor intervals and we
further distinguish between initial and general minor intervals. A “major
interval” is a period of system operation and repair that starts with both
components being new and ends with the completion of a service period
in which both components are replaced (renewed). Thus, a major interval
is a system renewal period and a sequence of major intervals forms a
renewal process. In Figure 12.5, the interval (0, t5) is a major interval.

0 t1 t2 t3 t4 t5

Component 1 Component 2

FIGURE 12.5
Illustration of a nested renewal process for opportunistic replacement. (Reprinted with permis-
sion from Degbotse, A.T. and Nachlas, J.A., Use of nested renewals to model availability under
opportunistic maintenance policies, Proceedings of the Annual Reliability and Maintainability
Symposium, pp. 344–350. Copyright 2003 IEEE.)
Preventive Maintenance 259

In contrast, a “minor interval” is a period between two successive restarts


of the system following replacement of a specific one of the components.
Since one may use either of the two components to delimit a minor inter-
val, we arbitrarily use “component 2.” Thus, minor intervals are the periods
between component 2 restarts. The sequence of minor intervals thus forms a
renewal process for component 2 but not for the system.
Next, distinguish further between an “initial minor interval,” which
starts with both components being new, and a “general minor interval,”
which starts with component 2 being new and component 1 being used. In
Figure 12.5, the interval (0, t3) is an initial minor interval as both components
are new when it starts. It ends with the system restart following replace-
ment of component 2. During the interval, component 1 has been replaced
several times. The interval (t3, t5) is a general minor interval as it starts with
component 1 being used and component 2 being new. A major interval will
be comprised of an initial minor interval and some number (possibly zero)
of general minor intervals.
Now, given the life and replacement time distributions, we define sev-
eral quantities that help describe system operation. We focus on cases in
which  system renewal does not occur and these lead to expressions that
represent system renewal and availability. Let

HI(t, k) be the cumulative distribution function on the accumulated


operating time during an initial minor interval that ends with the
replacement of component 2 following k component 1 replacements
and without system renewal
HG(t, k) be the cumulative distribution function on the accumulated
operating time during a general minor interval that ends with the
replacement of component 2 following k component 1 replacements
and without system renewal

Denote the density functions as hI(t, k) and hG(t, k). The functions HI and HG
are the key building blocks of the model of system behavior. They are con-
structed using the life and the residual life distributions. That construction
is postponed for now so that the general model structure can be defined
without treating special cases.
Since a minor interval that ends without system renewal includes k compo-
nent 1 replacements and one component 2 replacement, the density function
on the total repair time during a minor interval is

gT (t , k ) =
òg ( x) g 2 (t - x)dx
(k )
1 (12.20)
0

where g1(k ) (t) is the k-fold convolution of the replacement time density.
260 Reliability Engineering: Probabilistic Models and Maintenance Methods

The density function on the length of a minor interval is the convolution of


the operating and the repair time densities. That is, for QI(t), which is the dis-
tribution function on the length of an initial minor interval that ends without
system renewal, and QG(t), which is the distribution function on the length of
a general minor interval that ends without system renewal, we obtain

¥ t

qI (t) = å ò h (x, k)g (t - x, k)dx


k =0 0
I T (12.21)

¥ t

qG (t) = å ò h (x, k)g (t - x, k)dx


k =0 0
G T (12.22)

ò
QI (t) = qI ( x)dx
0
(12.23)

ò
QG (t) = qG ( x)dx
0
(12.24)

Corresponding relationships are defined for the minor intervals that end
with system renewal. Let

UI(t) be the distribution function on the length of an initial minor inter-


val that ends with system renewal
UG(t) be the distribution function on the length of a general minor
interval that ends with system renewal

To construct these functions, we use the following:

VI1(t, k) is the cumulative distribution function on the accumulated


operating time during an initial minor interval that ends with sys-
tem renewal due to the replacement of both components following a
component 1 failure.
VI2(t, k) is the cumulative distribution function on the accumulated
operating time during an initial minor interval that ends with sys-
tem renewal due to the replacement of both components following a
component 2 failure.
VG1(t, k) is the cumulative distribution function on the accumulated
operating time during a general minor interval that ends with sys-
tem renewal due to the replacement of both components following a
component 1 failure.
Preventive Maintenance 261

VG2(t,k) is the cumulative distribution function on the accumulated


operating time during a general minor interval that ends with sys-
tem renewal due to the replacement of both components following a
component 1 failure.

Finally, we define the following:

ZI(t) is the distribution function on the length of an initial minor inter-


val that ends with or without system renewal.
ZG(t) is the distribution function on the length of a general minor inter-
val that ends with or without system renewal.

Naturally,

¥ ¥ ¥
ZI (t) = å
k =0
H I (t , k ) + å k =0
VI 1(t , k ) + åV (t, k)
k =0
I2 (12.25)

and
¥ ¥ ¥
ZG (t) = å H (t, k) + åV
k =0
G
k =0
G1 (t , k ) + åV
k =0
G2 (t , k ) (12.26)

The functions defined previously can be used to construct two availability


measures. First, we obtain the availability within a sequence of minor inter-
vals that do not yield system renewal and we then incorporate that availabil-
ity measure in an overall system availability function. The renewal density
for the minor intervals without having system renewal is given by

¥ t

mQ (t) = å ò q (x)q
n=0 0
I
(n)
G (t - x)dx (12.27)

This represents the probability that a minor interval starts at any time. For
the sequence of minor intervals, the system availability is the probability
that the system is operating at any point in time and is thus given by

ò
A(t) = ZI (t) + ZG (t - x)mQ ( x)dx
0
(12.28)

where ZI (t) = 1 - ZI (t) represents the probability that the length of the operat-
ing period in an initial minor interval exceeds t.
262 Reliability Engineering: Probabilistic Models and Maintenance Methods

The system-level availability starts with the distribution function for the
lengths of the major intervals, Φ(t), and the corresponding renewal density,
mΦ(t). The distribution on the lengths of the major intervals is

ò
F(t) = U I (t) + mQ ( x)UG (t - x)dx
0
(12.29)

as either an initial or a general minor interval may be the last minor interval
in a major interval. Given this definition, the system-level availability func-
tion is

ò
AS (t) = A(t) + A( x)mF (t - x)dx
0
(12.30)

This form of the system-level availability function reflects the nesting of the
minor intervals within the major intervals.
Next, observe that we can apply the general model structure to four problem
classes and that the stepwise application over the four classes will promote
understanding of the model structure and will allow us to connect our avail-
ability results to the few others that are available.
Recall that the two components each have an age replacement policy age τai
and an opportunistic replacement age ω ai. The four cases are defined by the
assumed magnitudes of those policy parameters. Specifically, we consider (1)
τa1 = τa2 = ωa1 = ωa2 = ∞; (2) τa1 = τa2 = ∞ with ωa1 and ωa2 finite; (3) τa1 = ∞ with
τa2, ωa1, and ω a2 finite; and (4) all policy ages finite. For each case, the specifica-
tion of the policy ages leads to the definition of the functions HI and HG. The
first case is a pure failure model.

12.1.4.1 Failure Model


When we take all of the policy ages to be infinite, we say that there will be
no PM and no opportunistic replacement. Components will only be replaced
upon failure. This implies that system renewal will never occur. Refer again
to Figure 12.5 and note that for an initial minor interval, the probability that
component 2 fails on or before t time units of operation during which com-
ponent 1 fails k times is

ì F1(t)F2 (t) k=0


ïï t
H I (t , k ) = í (12.31)
ò (k )
ï F1(t - u) f1 (u)F2 (t)du k ³1
ïî 0
Preventive Maintenance 263

The associated density function is

ì F1(t) f 2 (t) k=0


ïï t
hI (t , k ) = í (12.32)
òï F1(t - u) f1 (u) f 2 (t)du
ïî 0
(k )
k ³1

The corresponding forms for the general minor interval are

ì Fɶ1(t)F2 (t) k=0


ïï
HG (t , k ) = í t (12.33)
ò ɶ(k )
ï F1(t - u) f1 (u)F2 (t)du
ïî 0
k ³1

and

ì Fɶ1(t) f 2 (t) k=0


ïï
hG (t , k ) = í t (12.34)
ò ɶ(k )
ï F1(t - u) f1 (u) f 2 (t)du
ïî 0
k ³1

where it should be particularly noted that Fɶ1(t) denotes the survivor function
for the residual life distribution, Fɶ1(t). The residual life distribution is used
because the general minor intervals begin with component 1 being used.
Note further that the convolution, ɶf1( k ) (t), includes a first operating period
during which component 1 is used followed by k − 1 periods in which com-
ponent 1 is new at the start.
In general, it is difficult to specify the residual life distribution for a pro-
cess such as the one studied here. We use the approximation to the residual
life distribution defined by Cox [61]:
t
1
FɶT (t) =
mT ò F (x)dx
0
T (12.35)

where μT is the mean of the original life distribution.


Next, we construct
t

qI (t) =
ò f (u)F (u)g (t - u)du
0
2 1 2

¥ t
æ x
ö æ t-x ö
+ åò ç
0 è
ò ÷ç ò
ç f 2 ( x) F1( x - u) f1( k ) (u)du ÷ ç g1( k ) (w) g 2 (t - x - w)dw ÷ dx
÷
(12.36)
k =1 0 øè 0 ø
264 Reliability Engineering: Probabilistic Models and Maintenance Methods

and
t

qG (t) =
ò f (u)Fɶ (u)g (t - u)du
0
2 1 2

¥ t
æ x
ö æ t-x ö
+ åò ç
0 è
ò ÷ç ò
ç f 2 ( x) F1( x - u) ɶf1( k ) (u)du ÷ ç g1( k ) (w) g 2 (t - x - w)dw ÷ dx
÷
(12.37)
k =1 0 øè 0 ø

These forms are substituted into Equation 12.27 to obtain the renewal den-
sity for minor intervals. The distributions on total operating time during
minor intervals are

¥ ¥ t x

ZI (t) = å
k =0
H I (t , k ) = åò
k =0 0
ò
f 2 ( x) F1( x - u) f1( k ) (u)dudx = F2 (t)
0
(12.38)

and

¥ ¥ t x

ZG (t) = å H (t, k) = å ò f (x)ò F (x - u) ɶf


k =0
G
k =0 0
2
0
1 1
(k )
(u)dudx = F2 (t) (12.39)

which is an appealing and rather intuitive result. An important point here


is that the simplification of the general model to the simple case of the
failure process provides some validation of the general formulation. The
expressions for uI(t) and uG(t) are constructed in the same fashion as is
shown shortly.
Returning to the functions QI(t) and QG(t), note that QI(∞) and QG(∞) are the
probabilities for the two types of minor intervals that opportunistic replace-
ment does not occur. If QI(∞) < 1 and QG(∞) < 1, then there is a positive prob-
ability that an opportunistic replacement and hence a system renewal occurs.
In this case, the minor intervals are transient. On the other hand, if QI(∞) = 1
and QG(∞) = 1, then there is no system renewal and the minor intervals are
recurrent. For the failure model, QI(∞) = 1 and QG(∞) = 1 and also ZI(∞) = 1
and ZG(∞) = 1. Hence, the minor intervals are recurrent. As a result, the system
availability function reduces to the within minor interval availability defined
in Equation 12.28. Further, based on Equations 12.38 and 12.39, the system avail-
ability is

ò
A(t) = F2 (t) + F2 (t - x)mQ ( x)dx
0
(12.40)

As usual, depending upon the life and repair time distributions, eval-
uation of the availability function can be rather intricate. As noted
Preventive Maintenance 265

previously, Barlow and Proschan [12] present the limiting availability for
the failure model as
-1
æ n n ö
A¥ = ç 1 + 1 + 2 ÷ (12.41)
è m1 m 2 ø

where
νi is the mean of the repair time distribution Gi(t)
μi is the mean of the life distribution for component i

Using the derivatives of the Laplace transforms of Equation 12.40 yields the
same result.

12.1.4.2 Opportunistic Failure Replacement Policy


Next, suppose that each of the components is replaced either upon failure
or if it has age ωai when system operation is interrupted by the failure of
the other component. We first construct expressions for the probabilities that
various sample paths yield restarts for the minor intervals without system
renewal. We must perform the construction relative to the magnitudes of
ωa1 and ωa2. We could consider both cases but for explanation, let us simply
take ωa1 < ω a2. The corresponding analysis and results for the case in which
ωa1 > ωa2 are implied by this analysis. To start, note that when k = 0,
hI (t , 0) = f 2 (t)F1(t) (12.42)

provided t ≤ ω a1. If k ≥ 1, the cases that imply no system renewal are as follows:

1. Component 2 fails at time t where 0 ≤ t < ω a1.


2. Component 2 fails at time t where ω a1 ≤ t < ω a2 and the kth component
1 failure occurred at time x where t − x < ωa1. That is, x > t − ωa1.
3. Component 2 fails at time t where t ≥ ωa1 and the kth component 1
failure occurred at time x where x < ω a2 and t − x < ω a1. That is,
t − ωa1 < x < ωa2 and by implication, t ≤ ωa1 + ωa2.

Combining these cases, the realization of Equation 12.33 is

ì t

ï ò
ï fT 2 (t) FT 1(t - u) f ( k ) (u)du
T1 0 £ t < wa 1
ï 0
ï t
ï
hI (t , k ) = í fT 2 (t)
ï
ò
t - wa 1
FT 1(t - u) fT( k1) (u)du wa 1 £ t < wa 2 (12.43)

ï wa 2
ï
ï fT 2 (t)
ïî
ò
t - wa 1
FT 1(t - u) fT( k1) (u)du wa 2 £ t < w a 1 + wa 2
266 Reliability Engineering: Probabilistic Models and Maintenance Methods

The expressions in Equations 12.42 and 12.43 are used in Equation 12.21 to
obtain the density on the lengths of the initial minor intervals.
Next, we consider the general minor intervals that end without system
renewal. The reasoning and hence construction for the general minor
intervals are identical to that for the initial minor intervals except that com-
ponent 1 is used and is subject to a residual life distribution at the start of
a general minor interval. To reflect the age of component 1 at the start of a
general minor interval, we take the component 1 age to be the average back-
ward recurrence time based on the life distribution, FT1(t). Denote that age as
a1 and note that Cox [61] shows that age to be
mT2 1 + sT2 1
a1 = (12.44)
2mT 1
Given the average starting age a1,

hG (t , 0) = fT 2 (t)FɶT 1(t) 0 £ t < wa1 - a1 (12.45)


and
ì t

ï ò
ï fT 2 (t) FT 1(t - u) ɶf ( k ) (u)du
T1 0 £ t < wa 1
ï 0
t
ïï
hG (t , k ) = í fT 2 (t)
ï
ò
t - wa 1
FT 1(t - u) ɶfT( k1) (u)du wa 1 £ t < wa 2 (12.46)

ï wa 2
ï f (t)
ï T2
ïî
ò
t - wa 1
FT 1(t - u) ɶfT( k1) (u)du wa 2 £ t < w a 1 + wa 2

Again, these expressions are combined with the total repair time to deter-
mine the density on the general minor interval length.
Observe that the conditions under which renewal does not occur corre-
spond to a specific set of sample paths. The set of possible sample paths that
is the complement of the set described earlier defines the cases in which sys-
tem renewal does occur. Considering initial minor intervals, we note that
system renewal will occur as a result of a component 2 replacement if k = 0
and component 2 fails at time t where ωa1 ≤ t. That is,
vI 2 (t , 0) = fT 2 (t)FT 1(t) wa1 £ t (12.47)

and a component 2 failure will precipitate system renewal for the sample
paths having a component 2 failure time of t where
1. ωa1 ≤ t < ω a2 and the kth component 1 failure time occurred at x where
0 ≤ x ≤ t − ωa1
2. ω a2 ≤ t < ω a1 + ω a2, the kth component 1 failure occurred at x where
0 ≤ x ≤ t − ωa1, and it is also the case that x < ωa2
3. ωa1 + ωa2 < t and the kth component 1 failure occurred at x where x < ωa2
Preventive Maintenance 267

while it will be the kth component 1 failure that precipitates the system
renewal at time t if that failure occurs at time t where ωa2 ≤ t < ωa1 + ωa2 and
the k − 1st failure occurred at time x where x < ωa2 or if that failure occurs at
time t > ωa1 + ωa2 and the k − 1st failure occurred at time x < ωa2. These cases
exhaust the complement of the set of sample paths that do not yield system
renewal at the end of an initial minor interval. Algebraically,
t - wa 1
ì
ï fT 2 (t)
ï ò FT 1(t - u) fT( k1) (u)du wa 1 £ t < wa 2
ï 0
t - wa 1
ïï
vI 2 (t , k ) = í fT 2 (t)
ï
ò0
FT 1(t - u) fT( k1) (u)du wa 2 £ t < w a 1 + wa 2 (12.48)

ï wa 2
ï
ò
ï fT 2 (t) FT 1(t - u) fT 1 (u)du wa 1 + wa 2 £ t
(k )

ïî 0

and
wa 2

vI 1(t , k ) = FT 2 (t)
òf
0
T1 (t - u) fT( k1-1) (u)du wa 2 £ t (12.49)

To combine the operating and repair times, we observe that when a compo-
nent 2 failure precipitates system renewal, the accumulated repair time is
gT(t, k + 1) and in contrast, when a component 1 failure precipitates system
renewal, the accumulated repair time is gT(t, k). Thus, the density function
for the duration of an initial minor interval that ends with system renewal is
¥ t ¥ t

uI (t) = åò
k =0 0
vI 1( x , k ) gT (t - x , k )dx + åòv
k =0 0
I2 ( x , k ) gT (t - x , k + 1)dx (12.50)

For the general minor intervals that end with system renewal, we again con-
sider the sample paths that form the complement of the set of paths that do
not yield renewal so,

vG 2 (t , 0) = fT 2 (t)FɶT 1(t) wa1 - a1 £ t (12.51)

and

t - wa 1
ì
ï fT 2 (t)
ïï ò0
FT 1(t - u) ɶfT( k1) (u)du wa 1 £ t < wa 1 + wa 2
vG 2 (t , k ) = í wa 2
(12.52)
ï
ïî
ò
0
ɶ(k )
ï fT 2 (t) FT 1(t - u) fT 1 (u)du wa 1 + wa 2 £ t
268 Reliability Engineering: Probabilistic Models and Maintenance Methods

and
wa 2

vI 1(t , k ) = FT 2 (t)
òf
0
T1 (t - u) ɶfT( k1-1) (u)du wa 2 £ t (12.53)

Then, the density on the length of the general minor intervals that end with
system renewal is
¥ t ¥ t

uG (t) = åòv
k =0 0
G1 ( x , k ) gT (t - x , k )dx + åòv
k =0 0
G2 ( x , k ) gT (t - x , k + 1)dx (12.54)

Having enumerated and represented all of the sample paths for the minor
intervals, the system availability is determined by the successive application
of Equations 12.25 through 12.30. That is, the specification of any particular
realization of the general model structure hinges upon the careful construc-
tion of the quantities qI(t) and qG(t) using hI(t, k) and hG(t, k) and the quantities
uI(t) and uG(t) using vI1(t, k), vI2(t, k), vG1(t, k), and vG2(t, k). Once these prob-
abilities have been obtained, Equations 12.25 through 12.30 accumulate their
content to provide the system availability measure.
To complete this model, we observe that there is a finite probability of
system renewal when the system is subjected to an opportunistic failure
replacement policy and this is confirmed for the stated expressions in that
QI(∞) < 1 and QG(∞) < 1. Also, one can obtain a partial validation of the
stated expressions by noting that taking ω a1 = ω a2 = ∞ yields the equations for
the failure replacement model. Finally, note that it is possible to obtain the
limiting system availability by taking derivatives of the pertinent Laplace
transforms. Doing this yields
m ZI (1 - QG (¥)) + m ZGQI (¥)
AS (¥) = (12.55)
(mUI + mQI + (n1 + n 2 ))(1 - QG (¥)) + (mUG + mQG )QI (¥)

where the mean values are identified by their subscripts that correspond to
the distributions to which they apply. Note that this result conforms to that
of Barlow and Proschan [12].

12.1.4.3 Partial Opportunistic Age Replacement Policy


A partial opportunistic age replacement policy is one in which both ω a1 and
ωa2 are finite but only one of τa1 and τa2 is. Arbitrarily, let τa2 be finite and
assume ωa1 < ωa2. Then, system renewal occurs when either component 1 fails
and component 2 has age of at least ωa2 or component 2 fails or achieves an
age of τa2 and the age of component 1 is at least ωa1. Keep in mind the fact that
the analysis is slightly different depending upon whether τa2 < ω a1 + ω a2 or
τa2 > ωa1 + ωa2. For now, we consider only the first of those cases.
As with the previous model, consider first the initial minor interval that
ends without system renewal. Let t denote the time of the component 2 failure
Preventive Maintenance 269

(if there is one) and let x denote the time of the kth component 1 failure (if there
is one). The sample paths for which there is no system renewal are
ì ì0 £ t < wa1 , 0 £ x < t ü
ï ïw £ t < w , t - w £ x < t ï
ï ï a1 ï
Sk = 0 = {(t|0 £ t < w1 )} and Sk ³1
a2 a1
= ít , x|í ý
ï w
ï a2 £ t < t a2 , t - w a1 £ x < w a2 ï
ï ïît = ta 2 , ta 2 - wa1 £ x < wa 2 ï
î þ
These sets imply that

hI (t , 0) = fT 2 (t)FT 1(t) 0 £ t < wa1 (12.56)


and

ì t

ï ò
ï fT 2 (t) FT 1(t - u) fT( k1) (u)du 0 £ t < wa 1
ï 0

ï t

ï fT 2 (t)
ïï ò
t - wa 1
FT 1(t - u) fT( k1) (u)du wa 1 £ t < wa 2

hI (t , k ) = í wa 2
(12.57)
ï
ï
ï
f T 2 (t )
ò
t - wa 1
FT 1(t - u) fT( k1) (u)du wa 2 £ t < t a 2

ï wa 2
ï
ï FT 2 (t)
ïî
ò
ta 2 - wa 1
FT 1(t - u) fT( k1) (u)du t = ta 2

The same sample paths apply to the general minor interval except that the
initial age must be considered when k = 0. Thus,
hG (t , 0) = fT 2 (t)FT 1(t) 0 £ t < wa1 - a1 (12.58)

and
ì t

ï ò
ï fT 2 (t) FT 1(t - u) ɶfT( k1) (u)du 0 £ t < wa 1
ï 0

ï t

ï fT 2 (t)
ïï ò
t - wa 1
FT 1(t - u) ɶfT( k1) (u)du wa1 £ t < wa 2

hG (t , k ) = í wa 2
(12.59)
ï
ï fT 2 (t)
ï
ò
t - wa 1
FT 1(t - u) ɶfT( k1) (u)du wa 2 £ t < t a 2

ï wa 2
ï
F
ï T2
ïî
( t )
ò
ta 2 - wa 1
FT 1(t - u) ɶfT( k1) (u)du t = ta 2
270 Reliability Engineering: Probabilistic Models and Maintenance Methods

In order to identify the probability functions for the system renewal cases,
we take the complements of the sets Sk = 0 and Sk ≥ 1. Clearly,

ì ì wa 1 £ t < wa 2 , x £ t - wa 1 ü
ï ïw £ t < t , x £ t - w ï
ï ï a2 a1 ï
Sk = 0 = {(t|wa1 £ t)}
a2
and Sk ³1 = ít , x|í ý
ï ï wa 2 £ t < t a 2 , x ³ wa 2 ï
ï ïî t = ta 2 , x £ ta 2 - wa1 ï
î þ
so,
vI 2 (t , 0) = fT 2 (t)FT 1(t) wa1 £ t (12.60)

and
t - wa 1
ì
ï fT 2 (t)
ïï ò 0
FT 1(t - u) fT( k1) (u)du wa 1 £ t < t a 2
vI 2 (t , k ) = í ta 2 - wa 1
(12.61)
ï
ï FT 2 (t)
ïî
ò0
FT 1(t - u) fT( k1) (u)du t = ta 2

and
t2

ò
vI 1(t , k ) = FT 2 (t) fT 1(t - u) fT( k1-1) (u)du
0
wa 2 £ t < t a 2 (12.62)

As with the nonrenewal case, the general minor intervals are similar to the
initial minor intervals, so,

vG 2 (t , 0) = fT 2 (t)FɶT 1(t) wa1 - a1 £ t (12.63)

and

t - wa 1
ì
ï fT 2 (t)
ïï ò 0
FT 1(t - u) ɶfT( k1) (u)du wa 1 £ t < t a 2
vG 2 (t , k ) = í ta 2 - wa 1
(12.64)
ï
ï FT 2 (t)
îï
ò 0
FT 1(t - u) ɶfT( k1) (u)du t = ta 2

and
wa 2

vG1(t , k ) = FT 2 (t)
òf
0
T1 (t - u) ɶfT( k1-1) (u)du wa 2 £ t < t a 2 (12.65)
Preventive Maintenance 271

Once the basic densities on interval duration have been defined relative to
the cases that do and do not yield system renewal, the calculation of avail-
ability measures follows the previously defined format.
The densities uI(t) and uG(t) are computed as given in Equations 12.50 and
12.54, respectively. These are then used in Equations 12.25 through 12.30 to
obtain the system availability measure. Observe that QI(∞) < 1 and QG(∞) <
1, so system renewals do occur for this model and Equation 12.55 again gives
the limiting system availability. Also, setting τa2 = ∞ causes the model to
reduce to the opportunistic failure replacement model as it should.

12.1.4.4 Full Opportunistic Age Replacement Policy


In a full opportunistic age replacement policy, all of the policy parameters
have finite values and both planned replacements and failure events have the
potential to precipitate a system renewal. A model of system operation under
a full opportunistic age replacement policy is defined in the same manner
as the previously constructed models. In fact, the previous models subsume
most of the model forms for the full opportunistic replacement case.
Note that each of the models defined previously is constructed in terms
of the relative magnitudes of the policy parameters. If we further assume
that τa1 ≥ τa2, then the models for the partial opportunistic age replacement
policy apply to the full policy case because the value τa2 bounds the length of
the minor intervals such that age replacement of component 1 cannot occur.
Thus, for the full opportunistic age replacement policy, we need to consider
only cases in which τa1 < τa2. However, we note further that the assignment
of the indices to the components is arbitrary, so either component may be
considered component 1 and it will always be the case that one of the age
replacement times limits the duration of the minor intervals. Hence, the
models defined previously exhaust the “practically interesting” cases and
are sufficient to study all of the opportunistic replacement policies enumer-
ated here.
To close the discussion of the construction of the models, observe that the
four models are mutually consistent in that each is successively a generaliza-
tion of the preceding model.

12.1.4.5 Analysis of the Opportunistic Replacement Models


As we have seen previously, availability models tend to be difficult to ana-
lyze, particularly in closed-algebraic form. The earlier models of opportunis-
tic replacement share the attribute of being relatively intractable. However,
as with most availability models, we can obtain the Laplace transform of
most of the time-dependent measures and can then invert the transform
numerically. We can also use the transform to obtain the limiting availabil-
ity. At the very least, this permits verification of the stability of the system
and validation of the models.
272 Reliability Engineering: Probabilistic Models and Maintenance Methods

It is not necessary to show all of the available results here but a few key
results will help to illustrate the power of the nested renewal concept. To
start, we note that

¥
qI* (s) = å h* ( s, k ) g * ( s, k )
k =0
I T (12.66)

is well defined so,

QI (¥) = qI* (s) (12.67)


s =0

and the same applies to QG(∞). Substituting into Equation 12.27 yields

qI* (s)
mQ* (s) = (12.68)
1 - qG* (s)

Then, using the derivatives of the Laplace transforms for Equation 12.26,

d * d QI (¥)
mF = - F (s) = - éêuI* (s) + mQ* (s)uG* (s)ùú = mUI + mQI + (mUG + mQG )
ds ds ë û 1 - QG (¥)
(12.69)

and this leads directly to the limiting availability Equation 12.55. The inter-
mediate steps are to take

AS* (s) = A* (s) éê1 + mQ* (s)ùú (12.70)


ë û

and

A* (s) = ZI* (s) + ZG* (s)mQ* (s) (12.71)

along with the standard definition of a renewal function for mF* (s) to obtain

ZI* (s)(1 - qG* (s)) + ZG* (s)qI* (s)


AS* (s) = (12.72)
(1 - f* (s))(1 - qG* (s))

Note that all of these results apply regardless of the choices of life and repair
time distributions.
Preventive Maintenance 273

Next, to illustrate the construction of numerical results, assume that the


distributions on the replacement times are exponential and are

g1(t) = l1e - l1t and g 2 (t) = l 2e - l2t

respectively, and that the life distributions are Weibull and are

FTj (t) = 1 - e ( )
b
- t/q j

For the purposes of illustration and recognizing that scale is arbitrary, take
β1 = β2 = 2.0 with θ1 = 1.5, θ2 = 1.0, λ2 = 0.4, and λ1 = 0.667. Then, take the age
replacement policy ages to be integer multiples of the mean of the life distri-
butions so that

taj = mm j

and take the opportunistic age replacement policy ages to be fractions of the
age replacement policy ages:

waj = gtaj

Using this construction, the limiting availability of Equation 12.55 for several
cases is shown in Table 12.2.
Similarly, the time-dependent availability function can be obtained using
numerical inversion of the Laplace transforms. These functions are shown
for the failure replacement, opportunistic failure replacement, and the par-
tial opportunistic age replacement models in Figure 12.6 for the specific case
in which ωa1 = 1.50, ωa2 = 2.13, and τa2 = 2.51 and the life and replacement time
distributions are the same as that given earlier. The figure serves to illustrate
the fact that numerical results are possible. The results also verify the inter-
nal consistency of the models.

TABLE 12.2
Representative Limiting Availability Values
γ m 1 = 1, m 2 = 1 m 1 = 1, m 2 = 2 m 1 = 2, m 2 = 1 m 1 = 2, m 2 = 2
0.2 0.47 0.51 0.48 0.54
0.4 0.49 0.52 0.64 0.66
0.6 0.54 0.53 0.62 0.65
0.8 0.63 0.53 0.55 0.61
274 Reliability Engineering: Probabilistic Models and Maintenance Methods

1
Opp. age
Opp. fail
0.9 Failure

0.8
A(t)

0.7

0.6

0.5
1 2 3 4 5 6 7
t

FIGURE 12.6
Example of availability functions. (Reprinted with permission from Degbotse, A.T. and
Nachlas, J.A., Use of nested renewals to model availability under opportunistic mainte-
nance policies, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 344–350.
Copyright 2003 IEEE.)

12.2 Nonrenewal Models


PM, like corrective maintenance, may not imply equipment renewal. For
many types of equipment, PM reduces failure probability but does not
return the system to a “good as new” state. Nevertheless, the reduction in
failure probability may make PM appealing. Both time- and calendar-based
PM policies have been defined.
As in the case of the corrective maintenance models of Chapter 10, the
portrayal of equipment performance under PM depends most heavily on the
representation of the post maintenance failure distribution. Most of the mod-
els that have been developed are based on the use of the imperfect repair for-
mat for representing the postrepair equipment state. We will examine a few
of these models as well as one based on the Kijima virtual age concept and
one based on the quasi-renewal process. Note that in some of the models, the
analysis requires that a finite end to the interval of analysis be defined, while
for others, one must assume an infinite time horizon.
One of the earliest and most general models of nonrenewal equip-
ment operation under PM is the one defined by Nakagawa [83]. In this
model, a block (or periodic) policy is assumed with minimal repair of
failures between PM times and system replacement after N PM intervals.
Preventive Maintenance 275

Nakagawa treats the post PM equipment state in the simplest possible


manner by assuming only that the system hazard function during the kth
interval is zk(t) where zk(t) < zk + 1(t) so that the system is gradually deteriorating
despite the PM schedule. He then assumes all service times are negligible
and defines a cost rate function:
tb

å ò
N
c2 zk (t) + ( N - 1)c1 + c3
k =1 0
C( t b , N ) = (12.73)
Nt b

where c3 is the cost of system replacement. Note that the assumption of even-
tual replacement was needed to obtain a tractable model. Note further that
N is a decision variable so the model analysis includes selection of optimal
values for both τb and N. In fact, Nakagawa shows that the optimal value for
τb may be obtained by differentiation to be the one for which
N æ tb
ö
ç tb zk (tb ) - zk (t)dt ÷ = ( N - 1)c1 + c3
å ç
k =1 è
ò
÷ c2
(12.74)
0 ø
and the optimal value of N may be determined using the difference expressions
C(tb , N ) < C(tb , N - 1)
(12.75)
C(tb , N ) £ C(tb , N + 1)
Clearly, the appeal of this model is the fact that the hazard function behavior
may be specified arbitrarily (as long as zk(t) < zk + 1(t)). The assumption of mini-
mal repair at failure may or may not be viewed as a drawback of the model.

12.2.1 Imperfect PM Models


Clearly, there are numerous possible operating scenarios one might consider.
Suppose that PM is imperfect in that a device is renewed with probability p
and the hazard is unchanged by PM with probability 1 − p. Assume also that
a device failure implies device renewal. In this case, a periodic (block) PM
policy will yield a cost per unit time [83] of

æc p ¥ ¥
(1 - p) j -1 FT ( jtb ) ö÷
ç 2
C( t b , p ) = è
å j =1
(1 - p) j -1 FT ( jtb ) + c1 å j =1 ø (12.76)
æ ¥ jtb
ö
ç
è
å j =1
(1 - p) j -1
ò( j -1) tb
FT (u)du ÷
ø
because the probability that it is a failure that precipitates renewal is
¥ jtb ¥

åj =1
(1 - p) j -1
ò
( j -1) tb
fT (u)du = p å (1 - p)
j =1
j -1
FT ( jtb )
276 Reliability Engineering: Probabilistic Models and Maintenance Methods

In contrast, if we assume that failure implies minimal repair rather than


replacement, then renewal only occurs as a result of (some) PM events. In
this case, the cost rate [83] is

jtb
æ ¥ ö
å
1
C( t b , p ) = ç c 2 p 2
ò zT (u)du + c1 ÷
j -1
(1 - p) (12.77)
tb ç ÷
è j =1 0 ø

where c2 is the cost of the (minimal) repair that is performed following


failure.
In the case of age replacement policies, fewer results have been developed.
Block et  al. [84] extend their age-dependent imperfect repair model [68] to
include PM that is perfect. Let the age-dependent probability of replacement
upon failure be denoted by p(t) and the probability of minimal repair upon
failure by q(t) = 1 − p(t). Using this format and the associated life distribution
of Equation 10.12, Block et al. [84] also allow for the cost of minimal repair
to depend upon both time and the number of minimal repairs. They define
the following:

c0k (t) is the cost for the kth minimal repair t time units after the most
recent device replacement.

Then, recalling that E[Nt] is the expected number of events during (0, t) in a
nonhomogeneous Poisson process that has mean value
t

ò q(u)z (u)du
0
T

the cost of minimal repairs during an interval (0, t) is c0E[ Nt ]+1(t) and the result-
ing cost per unit time is
ta x

q( x)c0E[ Nx ]+1( x)e ò0


ò
q( u ) zT ( u ) du
c2 Fp (ta ) + c1Fp (ta ) + dx
C(ta , p(t)) = 0
ta (12.78)
ò 0
Fp ( x)dx

One appealing feature of this model is that it generalizes several other age
replacement imperfect repair models. Specifically, if the minimal repair cost
is a constant c0, the cost model reduces to
ta

C( t a , p ) =
(c2 - c0 )Fp (ta ) + c1Fp (ta ) + c0
ta
ò0
Fp ( x)zT ( x)dx
(12.79)
ò 0
Fp ( x)dx
Preventive Maintenance 277

and if the probability of minimal repair is constant rather than age depen-
dent and the minimal repair cost equals the failure replacement cost, the
model becomes

c2
Fp (ta ) + c1Fp (ta )
p
C( t a , p ) = ta (12.80)
ò0
Fp ( x)dx

where, in this case, Fp(t) is defined by Equation 10.10.

12.2.2 Models Based on the Quasi-Renewal Process


Recall that in a quasi-renewal process [72], the distribution on the kth operat-
ing interval, say, Tk, is scaled by a constant, αk−1, so that

Tk = aTk -1 = a k -1T1

and

1 æ t ö
FTk (t) = FT1 ç k -1 ÷
a k -1 èa ø

where FT1 (t) is the underlying life distribution for the device. Recall further
that the quasi-renewal function, QFT (t), is defined in the same manner as a
renewal function:
¥
QFT (t) = E[Nt ] = åF
n =1
Sn (t)

Given these definitions, a block replacement policy (perfect PM) with quasi-
renewals upon failure has cost rate:
c2QFT (tb ) + c1
C( t b ) = (12.81)
tb
This corresponds directly to the expression for the renewal case and has the
corresponding optimality condition.
If the PM that is performed at the scheduled times is imperfect and failures
are quasi-renewal, we obtain a cost rate model of

å
¥
c2 p 2 (1 - p) j -1QFT ( jtb ) + c1
j =1
C( t b ) = (12.82)
tb
278 Reliability Engineering: Probabilistic Models and Maintenance Methods

Here again, differentiation yields an optimality condition for selecting the


value of τb.
A rather more interesting model is suggested by Wang and Pham [71].
They consider that when a new device is placed in operation, it does not
undergo PM until after it has experienced some number, say, K, of failures.
Then, it is subjected to a periodic PM plan. They further consider that the
cost of c2f + (j − 1)c2v is incurred at the jth failure. Here c2f is the fixed compo-
nent and c2v is the variable component of the repair cost. Thus, repair costs
are increasing in the number of repairs performed. Wang and Pham also
treat both the operating periods and the repair intervals as quasi-renewal
processes so that the length of the jth operating interval has mean αk−1E[T1],
and the elements of the sequence of repair times are scaled by βj−1 so that the
duration of the jth repair has mean βk−1E[R1]. Next, Wang and Pham assume
that after a device has experienced K failures, it is subjected to imperfect PM
at intervals of τb and minimal repair at subsequent failures. With this intri-
cate setup, they show that the expected maintenance cost per unit time is

jtb
K (K - 1) æ u ö
å
c ¥

C( t b , K , p ) =
Kc2 f +
2
c2v + 1 + pc2 f
p j =1 ò
(1 - p) j -1
0
zT ç k
èa
÷ du
ø
(12.83)
1 - ak 1 - bk tb
E[T1 ] + E[R1 ] + + E[R0 ]
1- a 1-b p

In this expression, the minimal repair cost is taken to be c2f, the time to per-
form minimal repair is taken to be zero, and the time to perform a perfect
repair (renewal) is denoted by R0. For this policy and model, the correspond-
ing average availability is

1 - ak t
E[T1 ] + b
1- a p
(12.84)
1 - ak 1 - bk t
E[T1 ] + E[R1 ] + b + E[R0 ]
1- a 1-b p

If we assume that K is specified on the basis of the equipment design, then we


can determine an optimal value of τb by differentiation. On the other hand,
we can also solve for τb as a function of K and then solve for K using finite dif-
ferences as shown in Equations 12.75. Finally, note that taking K = 0 or K = 1
in the earlier two models leads to additional interesting special cases.
For the case of age-based PM, Intiyot [85] uses the mixture of repair cycles
and PM cycles as defined by Murdock and Nachlas [81] and integrates that
formulation into the availability analysis method defined by Rehmert and
Nachlas [78]. The resulting model yields structural results that are very simi-
lar to those of Rehmert and Nachlas. As in [81], Intiyot [85] uses Ta to repre-
sent the PM policy age and then defines the availability functions for the
Preventive Maintenance 279

times within each cycle that are t ≤ Ta and t > Ta. Consequently, within each
operating interval, the life distribution is
ì FTi (t)
ï t £ Ta
FTi , p (t) = í FTi (Ta ) (12.85)
ï1 t > Ta
î
If a period of device operation ends before Ta as a result of a failure, the repair
time distribution is
æ t ö
GRi (t) = GR1 ç i -1 ÷ (12.86)
è bR ø
and if the operating intervals extend until Ta, the service time distribution is

æ t ö
GPi (t) = GP1 ç i -1 ÷ (12.87)
è bP ø
These functional forms are folded into the construction of the convolutions
that comprise the availability models.
As in Rehmert and Nachlas [78], Vi = Ti + Ri but two realizations of this
definition are possible. For a repair interval, Vi corresponds to a repair com-
pletion, so
t

H R ,Vi (t) = fTi , p (t) * GRi (t) =


òf
0
Ti , p (u)GRi (t - u)du (12.88)

On the other hand, for a PM interval

H P ,Vi (t) = GPi (t - Ta ), t > Ta (12.89)

With these definitions, the length of the ith cycle is a mixture of the two pos-
sible forms. Thus,

H· ,Vi (t) = FTi (Ta )H R ,Vi (t) + FTi (Ta )H P ,Vi (t) (12.90)

and the resulting distribution on a series of intervals is

ì H· ,V1 (t) n=1


H· , Qn (t) = í (12.91)
î h· , Qn-1 (t) * H· ,Vn (t) n³2

However, for a series of cycles that ends before Ta, all intervals must be repair
intervals, so

ì H R ,V1 (t) n=1


H R , Qn (t) = í (12.92)
î hR , Qn-1 (t) * H R ,Vn (t) n³2
280 Reliability Engineering: Probabilistic Models and Maintenance Methods

Then, the quasi-renewal function is

ì ¥
ï å
ï n =1
H R , Qn (t) t £ Ta
QH· ,Q (t) = í ¥ (12.93)
ï
ï å
î n =1
H· , Qn (t) t > Ta

and the uptime-based availability function is

ì ¥ t
ï FT1 (t) +
ïï åò
n =1 0
hR , Qn (u)FTn+1 (t - u)du t £ Ta
A(t) = í t
(12.94)
¥
ï
ï åò
ïî n =1 t -Ta
h· , Qn (u)FTn+1 (t - u)du t > Ta

Making the substitutions for the life distribution and service time distribu-
tions and taking the Laplace transform yields

ì 1é ¥
æ öù
( ) å
n
ï ê 1 - fT*1 (s) +
ïï s ê
ç 1 - fT*1 (a n s)
ç
n =1 è
Õ fT*1 (a i -1s) g R*1 (biR-1s) ÷ ú
÷ú
øû
t £ Ta
A* (s) = í ë i =1
¥
ï
å ( )Õ( )
n
1
ï 1 - fT*1 (a n s) fT*1 (a i -1s) g R*1 (s) + FT1 (Ta )e - sTa g P*1 (s) t > Ta
ïî n =1 s i =1

(12.95)
The downtime-based availability function is
ì t
ò
ï1 - fT1 (u)GR1 (t - u)du
ï
ï 0
ï ¥ t u
ï
ï - åòò hR , Qn (w) fTn+1 (u - w)GR , n +1(t - u)dwdu t £ Ta
ï n = 1 0 0
ï Ta
ï1 - F (T ) f (u)G (t - u)du - F (T )G (t - T )
A(t) = í
ï
T1 a
0
òT1 , P R1 T1 a P1 a (12.96)

ï t u
ï ¥

ï -
ï n =1
å FTn+1 (Ta )
òò
0 u - Ta
h· , Qn (w) fT1 , P (u - w)GRn+1 (t - u)dwdu t > Ta
ï
ï ¥ t - Ta
ï -
ï
î n =
å
1
FTn+1 (T a )
ò
0
h· ,Qn (u)GPn+1 (t - Ta - u)du
Preventive Maintenance 281

and the Laplace transform for this function is

ì é ¥ æ ö n ù
ï1 ê1 -
å Õ
1
ç - 1÷ fT*1 (a i -1s) g R*1 (biR-1s)ú t £ Ta
ïs ê ç * n -1 ÷ úû
n =1 è g R1 (b R s) ø i =1
ï ë
ï1
( ) (
A* (s) = í é1 - fT*1 (s) 1 - g R*1 (s) - FT1 (Ta )e - sTa 1 - g P*1 (s) ù
ê
ïs ë
) ûú
ï ¥ t > Ta
ï-
ï å
î n =1
( ( ) (
h·*, Qn (s) fT*n+1 (s) 1 - g R*1 (bnR s) + FTn+1 (Ta )e - sTa 1 - g P*1 (bnp s) ))
(12.97)

Just as in the cases studied by Rehmert and Nachlas [78], truncating the infi-
nite sums and numerical inversion of the two transforms provides arbitrarily
tight bounds on the availability function. Intiyot shows this by application to
several application cases that are defined by the choice of life and service time
distributions. The solutions display the same behaviors as those illustrated
in Figures 11.5a,b and 11.6 with the single difference that the initial transient
behavior extends over a slightly longer time. Thus, the quasi-renewal model
permits the investigation of the benefits of age-based PM policies for devices
that are well represented by this model.

12.2.3 Models Based on the Kijima Model


As discussed in Chapter 10, Kijima [69] uses the concept of equivalent or
“virtual” age to represent the fact that a device is improved but not renewed
by maintenance. The basic expressions that represent this, (10.14) and (10.15),
may be represented in general by

An = qSn (12.98)

where
Sn is the actual device age at the time of the nth failure æç Tj ö÷
n

è j =1 ø å
An is the resulting virtual age just after the completion of the nth repair

As shown in Chapter 10, this construction implies that

FT (t + u) - FT (u)
FTn+1 (t) = Pr[Tn +1 £ t| An = u] =
FT (u)

The corresponding hazard function during the n + 1st operating interval is

zTn+1 (t) = zT ( An + t - Sn ) = zT (qSn + t - Sn ) = zT (t - (1 - q)Sn ) (12.99)


282 Reliability Engineering: Probabilistic Models and Maintenance Methods

and the corresponding hazard at any point in time is

zTNt +1 (t) = zT ( ANt + t - SNt ) = zT (qSNt + t - SNt ) = zT (t - (1 - q)SNt ) (12.100)

Using this notation, Kijima [69] defines the generalized renewal density as

mFA (t) = E[zTNt +1 (t)] = E[zT (t - (1 - q)SNt )] (12.101)

and shows that this generalized renewal density satisfies

t
fT ( y - (1 - q)x)
mFA (t) = fT (t) + mFA ( x)
ò
0
FT (qx)
dx (12.102)

Clearly, this form corresponds to the key renewal theorem.


Using these definitions, Makis and Jardine [86] examine a modified age
replacement policy in which failures prior to τa are treated with repairs
that improve the device in the sense described by Kijima, and the device is
replaced at the first failure to occur after age τa. The costs for the repairs are
assumed to be time dependent, so the cost per replacement cycle is

ta

C( t a ) =
c1 +
ta
ò0
c2 (u)mFA (u)du
(12.103)
EFT [T ] +
ò
0
EFA [Tn +1 | An = qu]mFA (u)du

where the denominator is the expected length of the replacement cycle. Note
that the form of the time-dependent repair cost may be selected as appro-
priate to an application. Makis and Jardine also show that as long as the
repair cost is bounded (c2(t) ≤ K) and EFA [Tn +1 | An = y] ³ e > 0 , the cost func-
tion of Equation 12.103 has a unique minimum that can be computed using
the derivative.
One final model is the one defined by Kijima et al. [70] for a block replace-
ment policy with repairs that improve the device age following failure. Since
it is assumed that the PM activity involves replacement that is perfect, the
model for this policy appears only slightly different from the basic block
replacement model. The model is
c2 MFA (tb ) + c1
C( t b ) = (12.104)
tb

Clearly, for appropriate choices of the cost parameters and life distribution,
this cost function will have a unique minimum. The degree of difficulty for
Preventive Maintenance 283

the calculation of the minimum depends mostly on the choice of life distri-
bution. However, Kijima et al. show that using the now familiar form of the
residual life distribution

FT (t + qx) - FT (qx)
FA (t|x) =
FT (qx)

the generalized renewal function can be approximated by

MFA (t) »
t
-
ò F (u)du
0
T

EFT [T ] EFT [T ]

æ ì t
üì t-x
üö
ç ïï
t
ò FA (u|x)du ï ï
ïï ò FA (u|t - x)du ï ÷
ï
ò
+ ç1- í
ç ï
0
ýí
0
¥ ý÷
FA (u|t - x)du ï ÷÷
ò
EFT [T ] ïï
0 ç
è îï ïþ ïî 0 ïþ ø

æ x
ö
ç
´ ç fT ( x) + x
0 ò
FT ( x) f A ( x - u|u)du ÷
÷ dx (12.105)
ç
ç
è 0 ò
FA ( x - u|u)du ÷÷
ø

Depending on the choice of life distribution, this approximation can be very


accurate and quite manageable numerically.

12.3 Conclusion
PM policies may be defined in several ways and may ultimately have very
many operational realizations. The models discussed in this chapter serve
to illustrate the many forms the models may take and the two principal
approaches to analysis—availability and cost. PM is an essential ingredient
in any productivity assurance strategy and for many devices, it is critical
to system safety. Given the significance of PM, methods for selecting effi-
cient and effective PM policies are important and are worth the (sometimes
taxing) effort they require. It should now be clear that the state of a unit fol-
lowing repair or replacement is the aspect of the equipment behavior that
guides the analysis of maintenance policies. It should also be appreciated
that numerical approximations to complicated functions will often yield
policy solutions that are quite satisfactory.
284 Reliability Engineering: Probabilistic Models and Maintenance Methods

Exercises
12.1 Assume a population of devices display life lengths that are well
modeled by a gamma distribution having parameters β = 3.0 and
λ = 0.005. Compute and compare the quantities E[N(t, τa)], E[N(t, τb)],
and E[N(t)] for t = 100, 200, and 400 hours and τa and τb values of
0.2E[T], 0.40E[T], and 0.75E[T].
12.2 Establish the stochastic ordering between a Weibull distribution and
an exponential distribution having the same mean value for the cases
of IFR and DFR.
12.3 Compute an optimal block replacement policy (under the model in
Equation 12.4) for a system having a normal life distribution with
parameters μ = 400 and σ = 50 and assuming the repair cost is 12
times the PM cost.
12.4 For the system of the previous problem, assume operating and repair
periods are quasi-renewal and compute the optimal block replace-
ment policy time (under the model in Equation 12.81) for the case in
which repairs are also normal quasi-renewal with EG[T] = 20, σG = 4,
and α = 0.975, β = 1.02.
12.5 Compute an optimal age replacement policy for a system having
Weibull life distribution with β = 2.75 and θ = 5000 hours and the
repair cost that is 20 times the PM cost.
12.6 Assume a population of devices display life lengths that are well
modeled by a gamma distribution having parameters β = 3.0 and
λ = 0.002. Assume further that the devices are managed using an
age replacement policy with τb = 750 hours, Gc(t) is exponential with
parameter λ c = 0.02, and Gp(t) is exponential with parameter λp = 0.05.
Use Equation 12.15 to compute A(t) for t = 900, 1200, and 1500 hours.
12.7 Assume a population of devices display life lengths that are well
modeled by a Weibull distribution having parameters β = 2.25 and
θ = 2000 hours. Assume further that the devices are managed using
an age replacement policy with τa = 750 hours, Gc(t) is exponential
with parameter λ = 0.02, and Gp(t) is exponential with parameter
λ = 0.05. Use Equation 12.14 to compute A(t) for t = 100, 400, and 500
hours and also then for t = 800, 1000, 1500, and 2000 hours.
12.8 Suppose that PM is imperfect in that a device is renewed with prob-
ability p and the hazard is unchanged by PM with probability 1 − p.
Assume also that a device failure implies device renewal so that the
cost rate function is Equation 12.76. Analyze this model as a function
of p for a Weibull life distribution having parameters β = 2.25 and θ =
2000 hours, a block replacement policy time of τb = 1500 hours, and c2
= 20c1. Plot the cost rate divided by c1 as a function of p.
Preventive Maintenance 285

12.9 Assume a population of devices display life lengths that are well
modeled by a Weibull distribution having parameters β = 2.25 and
θ = 2000 hours. Use the approximation for the generalized renewal
function of Equation 12.105 with π = 0.90 to compute the values of
MFA (t) for t = 500, 1000, 2000, 3000, 4000, and 5000. Then, compute
the value of cost rate function of the Kijima et al. model of Equation
12.104 for each of these times when c2 = 20c1.
12.10 Assume a population of devices display life lengths that are well
modeled by a gamma distribution having parameters β = 3.0 and
λ = 0.002. Use the approximation for the generalized renewal func-
tion of Equation 12.105 with π = 0.90 to compute the values of MFA (t)
for t = 500, 1000, 2000, 3000, 4000, and 5000. Then, recompute the
values for π = 0.75 and π = 0.50. 
13
Predictive Maintenance

Increased recognition of magnitude of the potential benefits of preventive


maintenance (PM) has stimulated an interest in finding new and more efficient
PM strategies. One of the most promising new ideas is predictive mainte-
nance (which is also sometimes called condition-based maintenance and is
sometimes referred to as “prognostics”). The idea of predictive maintenance
is that monitoring equipment status should permit the recognition of failure
precursors and the corresponding opportunity for preventive intervention.
Just as with nonrepairable devices, failure of a repairable item is usually
considered to correspond to the completion of some type of degradation pro-
cess. Models such as the ones discussed in Chapter 5 describe the evolution
of the relevant degradation processes. The conceptual basis of “predictive
maintenance” is that observation of the degradation process can provide
warning of impending failure or advanced degradation so that a PM action
may be performed and failure avoided.
The development of this idea has followed two principal tracks. One ave-
nue of study has been the investigation of how ordinary process control and
process status variables can be interpreted to provide warning of impend-
ing failure. For example, one study noted that a rotating cutting tool drew
increased quantities of electricity as the tool wore down and approached
failure. Another example was an observed increase in the width of the vibra-
tion spectrum for a pump as the pump diaphragm deteriorated. As these
examples suggest, the focus of this research path is the identification of pro-
cess variables that provide warnings of failure, the selection of an instru-
mentation plan that permits monitoring of the variable, and the development
of pattern recognition rules that have a high probability of recognizing
pathological patterns in the quantities being observed. This is an exciting
and interesting domain of study. The selection of instrumentation is distinct
from the probability analysis aspects of this method, so this question is not
treated here. Instead, some general analytical methods for the probability
component of this approach are treated later in this chapter.
The second primary research direction has been the use of reliability and
cost measures to formulate predictive maintenance strategies. Reliability
function – based monitoring plans, policies for inspection and replacement
schedules based both on hazard functions and on cost functions, and adjust-
ment plans for potentially fallible measurements have been addressed. We
shall examine some of the models constructed to portray the implications of
predictive maintenance.

287
288 Reliability Engineering: Probabilistic Models and Maintenance Methods

As with all of the repairable systems analyses we have examined,


models that represent predictive maintenance plans differ depending on
the assumed state of the system following maintenance. They also differ in
the ways in which system deterioration is portrayed. The feature they tend to
have in common is the deterioration process and the assumption that there is
a measurable variable that represents or is highly correlated with the extent
of deterioration. Thus, we start with the representation of system evolution
and then move to the analysis of predictive maintenance policies.

13.1 System Deterioration


We have examined models of component and system failure in Chapter 5.
We noted there that the operation of a device usually implies the gradual
degradation of the unit and its ultimate failure. Equipment for which this is
a reasonable conceptual description are exactly the ones for which predic-
tive maintenance is appropriate. Recognizing this, Van Noortwijk et al. [87]
argued that an isotropic deterioration model is appropriate.
The term isotropic is used to represent the idea that between any two
points in time, the deterioration depends on the state of the device at the
start of the interval but not on the deterioration prior to the start of the inter-
val. Further, during the interval, deterioration is assumed to proceed in the
form of infinitely many very small steps as described by Feller [64] and by
Nachlas [35] with the consequence that deterioration is best modeled using
the generalized gamma process.
Algebraically, we may represent the condition of the device by the state
variable X(t) for which increments during any interval are random. The mag-
nitude of any increment has a gamma distribution with a shape parameter
that is proportional to the length of the interval. That is,

lb(t2 -t1 )
f X (t2 )- X (t1 ) ( x) = xb(t2 -t1 )-1e - lx (13.1)
G(b(t2 - t1 ))

Using this model, the average deterioration during the interval is β(t2−t1)/λ
and the variance is β(t2−t1)/λ2. Thus, there is considerable flexibility in the
model so many types of deterioration can be represented this way. Van
Noortwijk et al. [87] argue that this model provides an accurate representa-
tion of cumulative damage processes, erosion and corrosion, defect-based
degradation, and most other evolutionary deterioration behaviors. A graphi-
cal representation of this type of process is shown in Figure 13.1. Note that
the assumption of independent increments implies that the process is semi-
regenerative and the times T1, T2, and T3 are semiregeneration points.
Predictive Maintenance 289

X(t)

T1 T2 T3
Time

FIGURE 13.1
Representation of generalized gamma deterioration process.

13.2 Inspection Scheduling


The simplest of the predictive maintenance analyses is the definition of a
model for selecting inspection schedules. The idea is that a system is gradu-
ally deteriorating, but the state of the system may only be determined by
inspection, even if it has failed. The problem is to select an inspection sched-
ule and a replacement criterion that will permit us to monitor the state of the
system and to renew it prior to failure.
This elementary condition-based replacement strategy for preventive main-
tenance (PM) is defined like an age replacement policy. Device status is deter-
mined by inspections that are equally spaced in time at intervals of length τ.
Replacement is performed whenever inspection shows device deterioration
to be beyond a PM criterion, say, θ and below a failure threshold, say, Xf and
also following failure. This is illustrated in Figure 13.2. For equally spaced
inspections and a gamma deterioration process, the mean deterioration per
inspection interval is a linear function of the length of the interval and is βτ/λ,
so the average number of intervals until the PM threshold is crossed is

q ql (13.2)
=
bt/l bt

and the number of inspections required to discover that the threshold has
been passed is one greater than that number. Assuming an inspection cost of
cI and PM and corrective replacement costs of c1 and c2 as in Chapter 12, the
cost per unit time of an inspection policy τ is

cI ( ql/bt + 1) + c1FX (X f - q) - c2 FX (X f - q)
C(t) = (13.3)
( ql/bt + 1) t
290 Reliability Engineering: Probabilistic Models and Maintenance Methods

Xf

X(t)

τ 2τ 3τ 4τ
Time

FIGURE 13.2
Inspection intervals with PM criterion.

where FX(Xf − θ) is the probability that the deterioration in the final inspec-
tion interval does not exceed the difference between the failure threshold
and the PM threshold.
Clearly, the simple model of Equation 13.3 is very similar to the age replace-
ment model. One important difference is that analysis of the model permits
optimization of both the inspection interval and the PM criterion. A short-
coming of the model is the restriction to equally spaced inspection intervals.
A key feature of condition-based maintenance is sensitivity to device state
and a corresponding capability to adjust inspection frequency.

13.3 More Complete Policy Analysis


Recently, Grall et al. [88] defined a model that allows for variable inspection
intervals and can be optimized with respect to the replacement threshold. As
in the model earlier, they assume that there is a failure threshold, Xf, that is
fixed by the physical nature of the device. The device is replaced correctively
if its state variable passes that threshold and preventively if any inspection
shows device state to equal or exceed the PM threshold, θ. In addition, Grall,
Dieulle, Berenguer, and Roussignol assume that following each inspection,
the time of the next inspection is selected as a function of device state.
The analysis of a device that is managed adaptively and replaced as
described can be based on cost per renewal cycle. However, this can be quite
complicated so instead, Grall et al. [88] based their analysis on the limiting
expected cost rate. That is, they formulate the basic cost function as

C(t) = cI N I (t) + cP N P (t) + cC NC (t) + cdD(t) (13.4)


Predictive Maintenance 291

and then construct

é E[C(t)] ù
E[C¥ ] = lim ê
t ®¥ ë t úû

é E[N I (t)] ù é E[N P (t)] ù é E[NC (t)] ù é E[D(t)] ù


= cI lim ê úû + cP lim úû + cC lim úû + cd lim
t ®¥ ë t t ®¥ ê
ë t t ®¥ ê
ë t t ®¥ ê
ë t úû
(13.5)

where
D(t) is the quantity of device downtime over (0, t)
cd is the unit cost of downtime
NI(t) is the number of inspections
NP(t) is the number of PM replacements
NC(t) is the number of corrective (failure precipitated) replacements
over (0, t)

For mathematical convenience, they assume a minimum inspection inter-


val and then further assume an inspection scheduling decision policy that
the time to the next inspection is a linearly decreasing function of system
state. That is, following the nth inspection at which the system state is deter-
mined to be X(τn), the next inspection time is selected as

ì æ t öü
tn +1 = tn + 1 + max í0, ç t0 - 0 X(tn ) ÷ ý (13.6)
î è q - e øþ

Using this form, the minimum inspection interval is one time unit, and since
X(τ0) = 0, the maximum interval is 1 + τ0 time units. The minimum interval
is used whenever the device state is within ε units of the PM threshold, and
over time, inspection interval length declines from τ0 to 1.
As a matter of notational convenience, let In(X(τn)) represent the time inter-
val until the next inspection that is selected using Equation 13.6. That is,

I n (X(tn )) = tn +1 - tn

The gamma distribution that describes the independent increments in the


state variable over any interval, as defined in Equation 13.1, may now be spe-
cifically stated as

lbIn ( X ( tn ))
f X ( I ) ( x) = f X ( tn+1 )- X ( tn ) ( x) = xbIn ( X ( tn ))-1e - lx (13.7)
G(bI n (X(tn )))

which is to say that the shape parameter of the distribution depends upon
the selected length of the inspection interval.
292 Reliability Engineering: Probabilistic Models and Maintenance Methods

Given the definition of the replacement, inspection, and deterioration


processes, the device state, X(t), is a semiregenerative process with regeneration
times equal to the times of device replacement. In addition, the discrete time
process, Yn, corresponding to the device state after each inspection is defined by

Yn = X(tn )

and takes real values in [0, θ). The evolution of the variable Yn is a continuous
state space Markov chain. If we let Π(Y) represent the stationary distribution
on the state variable Yn, then the limits in Equation 13.5 can be shown [80]
to equal the corresponding first interval expectations with respect to Π(Y).
That is,

é E [N (t )] ù é E [N (t )] ù é E [N (t )] ù
E[C¥ ] = cI ê P I 1 ú + cP ê P P 1 ú + cC ê P C 1 ú
ë EP [ t1 ] û ë EP [ t1 ] û ë EP [t1 ] û
é E [D(t1 )] ù
+ cd ê P ú (13.8)
ë EP [t1 ] û

Thus, the first step in analyzing the PM strategy is the construction of Π(Y).
For the gamma deterioration process described by the distribution of (13.7),
the stationary distribution is obtained as the solution to
q

P(Y ) =
ò (F
0
X(I ) (q - x)d0 ( y ) + f X ( I ) ( y - x) P( x)dx ) (13.9)

in which δ 0(y) is the Dirac mass function. The solution of this equation takes
the form of the convex combination:

P( y ) = ad0 ( y ) + (1 - a) g( y ) (13.10)

where the parameter a (0 < a < 1) must be computed. To compute a, we start


with the convolutions:
y

òf
( k -1)
f (k )
X(I ) (y) = X ( tk -1 ) - X ( t0 ) ( y - x) f X ( tk )- X ( tk -1 ) ( x)dx (13.11)
0

Keep in mind that since the underlying distribution is a gamma distribution,


so is the convolution. Then, we solve the expression
y
æ ¥
ö
y( y ) = f X ( I ) ( y - 0) + ç
ç
0 è
òå k =1
f X( k()I ) ( y - x) ÷ f X ( I ) ( x)dx
÷
ø
(13.12)
Predictive Maintenance 293

and use the result to compute a as


-1
æ q
ö
ç
ç
è 0
ò
a = 1 + y( y )dy ÷
÷
ø
(13.13)

Once the value of a is determined, the function g(x) in Equation 13.10 is com-
puted as
a
g( x ) = y( x) (13.14)
1- a

and the stationary distribution is then well defined.


Once the stationary distribution of the Markov process is obtained, the
components of the limiting expected cost rate can be calculated. Specifically,
q

EP [N P (t1 )] =
ò (F
0
X(I ) )
(q - x) - FX ( I ) (X f - x) P( x)dx (13.15)

ò
EP [NC (t1 )] = FX ( I ) (X f - x)P( x)dx
0
(13.16)

æ I(x)
q ö
ç
0 è 0
ò ò
EP [D(t1 )] = ç FX ( I ( u )) (X f - u)du ÷ P( x)dx
÷
ø
(13.17)

and for a semigeneration interval corresponding to the time between inspec-


tions, EΠ[NI(τ1)] = 1. Finally, the average first cycle length is
q

ò
EP [t1 ] = I ( x)P( x)dx
0
(13.18)

The evaluation of these expressions appears to require considerable numeri-


cal effort, but actually, Grall et  al. [88] show that the computational effort
is quite manageable. Taking θ = 6 to be fixed and Xf = 12 with costs of cI = 25,
c1 = 50, c2 = 100, and cd = 250 and the distribution parameters of β = λ = 1.0, they
compute the optimal values of τ0 = 6.5 and ε = −3, which define the inspec-
tion policy. (The negative value of ε means the minimum feasible interval
duration is not used.) In a separate analysis, they assume Xf = 60 with costs
of cI = 2, c1 = 90, c2 = 100, and cd = 100 and distribution parameters of β = 1.0
and λ = 0.2 and compute the optimal policy parameters to be θ = 50, τ0 = 5.4,
and ε = 5. In general, they illustrate the fact that the numerical integration
necessary to evaluate the model can be accomplished efficiently.
294 Reliability Engineering: Probabilistic Models and Maintenance Methods

13.4 Models and Analysis Based on


Continuous Process Monitoring
Models that are based on continuous monitoring of device performance or
behavior are described here. For each of these models, it is assumed that the
underlying degradation process is well represented by the diffusion process
model described in Chapter 5 and specifically in Equation 5.23. Under this
model framework, failure occurs when the device state variable X(t) passes
a threshold value, say, Xf. For a process is described in this manner, the gen-
eral result is that the first passage time to the failure state has an inverse
Gaussian distribution [89].
The predictive maintenance concept leads to two classes of analysis. In
some cases, the extent of the deterioration can be directly observed. Examples
of this situation are crack length in a fatigue process and tread depth in tire
wear. More often, the extent of device deterioration is not directly observ-
able. The extent of oxidation in gallium arsenide – based visual displays and
the degree of metal depletion in the electromigration process in integrated
circuits are examples of deterioration that cannot be observed directly. For
cases in which deterioration is observable, the definition of warning strate-
gies is based on the direct observations. For cases in which degradation is
not observable, some sort of surrogate indicator measure is usually identi-
fied. The surrogate measure is often called a “marker” variable. The extent to
which the marker is correlated with the actual degradation process may be
unclear, but it is usually the only available source of information concerning
the state of a device.

13.4.1 Observable Degradation Processes


When the degradation process is observable, it is often possible to use reli-
ability testing data to provide a life distribution model for a device and
to use the corresponding residual life distribution as a basis for failure
warnings. Doksum and Hoyland [90] perform this analysis using accel-
erated life test data obtained from step stress experiments. The general
analytical approach can be defined without specifying the test regime, so
this is done here.
Assuming a degradation process is described by the progress to a failure
threshold, say, ω, by a diffusion process with drift η and diffusion coefficient
δ2, the distribution on life length will be an inverse Gaussian distribution [89]
for which the density function is

l ( t - m )2
l - 2tm 2
fT (t) = e (13.19)
2pt 3
Predictive Maintenance 295

and the distribution function is

æ læt ö ö ( 2l/m ) æ l æ t öö
FT (t) = F çç ç - 1 ÷ ÷÷ + e F çç - ç - 1 ÷ ÷÷ (13.20)
è t è m øø è t è m øø

where
μ = ω/η
λ = ω2/δ2

For a complete data set from a reliability test, the observed failure times
(t1,t2,…,tn) can be used to construct the likelihood function:

n n æ l ( t - m )2
- i ö
l
Õ Õ ç ÷
2
L( t , l , m) = fT (ti ) = e 2ti m
ç 2pti3 ÷
i =1 i =1
è ø
n
l ( ti - m )2
æ l ö
n/2
æ n
ö -å
=ç ÷
è 2p ø
ç
ç
è
Õt
i =1
-3/2
i ÷ e i=1
÷
ø
2ti m 2
(13.21)

for which the logarithm is

n n
l (ti - m)2
å å
n 3
ln L( t , l , m) = ( ln l - ln 2 - ln p ) - ln ti - (13.22)
2 2 i =1
2m 2 i =1
ti

Equating the partial derivatives of the logarithm of the likelihood function


to zero yields the following estimation equations:

mˆ 2
lˆ = (13.23)
n (t - m ˆ )2
å
1 i
n i =1 ti

and

n n
(ti - mˆ ) (ti - mˆ )2
å å
1
+ =0 (13.24)
i =1
ti mˆ i =1
ti

Once the historical data have been used to obtain estimates for the life
distribution parameters, there are two reasonable approaches to using device
monitoring measurements to evaluate failure risk.
296 Reliability Engineering: Probabilistic Models and Maintenance Methods

One approach is to use the residual life distribution to quantify failure


risk. If the degradation measure at time τ is X(τ), then the distribution on the
residual life length is

FT (t + t)
FT|t (t|t) = 1 - (13.25)
FT (t)

so a decision rule for initiating maintenance should have the form that PM
is done when FT|τ(t|τ) exceeds a selected value for an appropriately chosen
value of time. The choice of the probability threshold and the time interval
can be based on cost analyses or any other meaningful criteria.
An alternate approach that may be more precise is to take the degradation
measure as the starting point for the diffusion process and to consider that
the distance to the threshold is revised to be ω – X(τ). Consequently, the dis-
tribution on time to failure is adjusted to be

æ l¢ æ t ö ö -( 2l¢/m¢) æ l¢ æ t öö
FT|t (t) = F çç ç ¢ - 1 ÷ ÷÷ + e F çç - ç ¢ - 1 ÷ ÷÷ (13.26)
è t èm øø è t èm øø

where
μ′ = (ω – X(τ))/η
λ′ = (ω – X(τ))2/δ2

Here again, the decision criterion for initiating PM is a threshold value on


the failure probability.
Consider an example in which the underlying degradation process for a
device is believed to be a Wiener process with drift parameter η and diffusion
coefficient δ2. The failure threshold is ω = 48.0. Suppose a sample of 17 copies
of the device were subjected to a life test with the failure times in days being
those listed in Table 13.1. Application of the maximum likelihood equations
to these data yields the estimates mˆ = 1034.5 and lˆ = 14465.0. Using these esti-
mates, we may compute estimated values for η and δ of 0.0464 and  0.399,
respectively.
Next, suppose a copy of the device has been monitored while in ser-
vice and its most recent degradation measurement is x(t) = 44.0. Then, for

TABLE 13.1
Times to Reach an Example Failure Threshold (in Days)
1460 1193 797 1169
1385 1006 1125 1450
660 869 775 728
999 1035 1424 670
841
Predictive Maintenance 297

1.0

0.8
Reliability residual

0.6

0.4

0.2

100 200 300 400


Time

FIGURE 13.3
Residual reliability over time for three state variable measures.

Equation  13.26, the probability of failure within 100  days of additional


operation is 0.606, while at x(t) = 46.5, that probability is increased to 0.874.
Figure 13.3 shows the corresponding residual reliability curves for values of
x(t) of 40.0, 44.0, and 47.0. The definition of a PM policy using curves of this
type would be very reasonable and efficient.
Arguably, an ability to directly observe a degradation measure should
permit the definition of very efficient maintenance plans. Doing this hinges
on being able to estimate the life distribution. The earlier procedure based
on the diffusion process is one approach to obtaining descriptors of the life
distribution and using them to plan maintenance. An alternate approach is
to use a regression model of the degradation process as a function device
operating variables. This is the approach taken by Lu and Meeker [91]. The
advantage of the method is that aside from its compatibility with the data
regression analysis, no assumption concerning the nature of the degrada-
tion process is required. The disadvantage is that fairly demanding statistical
analysis is required and the inverse link to the residual life distribution is
very challenging.

13.4.2 Unobservable Degradation Processes


When the state variable that determines device failure is not observable
so that a marker variable is observed and used to estimate the remain-
ing device life length, two general types of analysis seem to be the most
widely used. One is based on time series analysis and the other is based
on conditional probability constructions. Both of these methods are
described here.
298 Reliability Engineering: Probabilistic Models and Maintenance Methods

13.4.2.1 Time Series Methods


It is suggested by Lu et  al. [92] that the progress of a device degradation
process will imply a corresponding change in an associated performance
measure that can be observed. Examples of the performance measure might
be the quality of product generated by the device, the energy or force output
by the device, or the efficiency of its operation. Thus, a sequence of observa-
tions of the performance measure may be treated as a time series. The series
may be subjected to second-order exponential smoothing in order to con-
struct a predictive model.
Suppose that at a sequence of equally spaced points in time, a performance
variable, {Yt}, that indirectly reflects device status is measured relative to a
performance criterion C that indicates device failure. Then, the evolution of
the performance measure may be subjected to a “trend analysis” as

st = ayt + (1 - a)(st -1 + bt -1 ) (13.27)

bt = b(st - st -1 ) + (1 - b)bt -1 (13.28)

First-order exponential smoothing excludes the series of coefficients bt


and thus smooths only the data. Second-order smoothing that includes
the series bt provides smoothing of the trend in the data series. α is called the
data smoothing factor and β is called the trend smoothing factor. The values of
both factors should be in the interval (0, 1), but aside from that restriction, the
choices for their values may be made in several ways. An effective approach
to selecting the two factors is to choose values that minimize the sum of the
squared values of errors in a one-step-ahead prediction. In general, future
values of the series, ŷt + r , are computed as

ŷt + r = st + rbt (13.29)

so choosing α and β to minimize


T T

å
t =1
( yˆ t +1 - yt +1 )2 = å (s + b - y
t =1
t t t +1 )2

over an observation interval (0, T) can be difficult but is reasonable.


It is also necessary to select initial values for the data series. Usually, the
initial value for the data series is s0 = y1 so that s1 = y1 and (yn – y0)/n for an
arbitrary choice of n is considered a reasonable choice for b0. If a sufficient
volume of data is accumulated, part of the data series may be used to select
the smoothing factors and initial values.
Note that once the time series model is defined, a data series permits predict-
ing future data values and these predictions can be used to set a maintenance
Predictive Maintenance 299

TABLE 13.2
Sequence of Observed Fraction Defective in Sampled Output
0.0027 0.0159 0.0306
0.0061 0.0197 0.0320
0.0088 0.0216 0.0348
0.0125 0.0252 0.0359
0.0144 0.0272

policy. This is often done by initiating PM when a predicted value exceeds a


defined performance-based failure limit, which is denoted earlier as C.
Consider an example of a cutting tool for which evolving wear results in
an increasing fraction of the machined pieces it generates being noncon-
forming with respect to dimensional tolerance. If output pieces are sampled
and inspected at regular intervals, the fraction nonconforming observed
in the samples over time forms a data series. Suppose the observations are
those in Table 13.2 and the tool is considered to have failed when the fraction
nonconforming exceeds 0.04.
The values of α = 0.018 and β = 0.027 were selected as were s0 = 0.0027 and
b0 = 0.0024. Using these values, the resulting values of st and bt are shown in
Table 13.3. Clearly, the computed values of the series st show the gradual per-
formance decline that is resulting from tool degradation. The best use of the
computed values is to note that at the 9th time point s9 + 3b9 > C, at the 10th
time point s10 + 3b10 > C, at the 11th time point s11 + 2b11 > C, and at the 12th
time point s12 + 3b12 > C. Thus, the series could have been used to choose any
of the several PM responses to the process changes.

TABLE 13.3
Computed Time Series Values
t bt st
1 0.0024 0.0052
2 0.0025 0.0079
3 0.0026 0.0108
4 0.0040 0.0135
5 0.0039 0.0117
6 0.0038 0.0209
7 0.0037 0.0242
8 0.0036 0.0274
9 0.0034 0.0303
10 0.0032 0.0332
11 0.0030 0.0356
12 0.0037 0.0380
13 0.0034 0.0406
300 Reliability Engineering: Probabilistic Models and Maintenance Methods

13.4.2.2 Conditional Probability Methods


Return to the idea that the degradation process is represented by the
unobserved state variable X(t) and that a correlated marker variable Y(t) is
measured periodically. As discussed by Whitmore et al. [93], if the state vari-
able evolution is well described by a Wiener process, then the vector (X(t),Y(t))
has a bivariate normal distribution for which the covariance matrix should
be positive definite.
Suppose a sample of copies of a device are placed in a life test for a
specified time, T, and the resulting observations of the marker variable,
Y(t), include both failed and censored readings. The failures correspond
to cases in which the unobserved state variable X(t) reaches its failure
threshold ω during the test interval. Whitmore et al. [93] point out that the
conditional density on the marker variable given the state variable differs
for surviving and failed copies of the device. For the surviving units, that
density is

æ æ rs y öö
ç y - ç my + (x - mx ) ÷ ÷
è sx ø÷
g1( y |x) = Pr[Y(t) = y |X(t) = x , S > T ] = f ç (13.30)
ç s y (1 - r2 ) ÷
ç ÷
è ø

where S represents the failure time. For devices that fail during the test, the
conditional density on Y(t) is

g 2 ( y |x) = Pr[Y(t) = y |S = s] (13.31)

They further state that the density on the value of the state variable for sur-
viving units is

æ ( x - m x T )2 ö
-ç ÷
ç 2 s2 T
è xæ -ç
æ 2 w( w- x ) ö ö
÷
ø ÷
e ç 1 - e çè s2x T ÷ø ÷
hX ( x) = Pr[X(T ) = x , S > T ] = (13.32)
2ps xT çç
2 ÷÷
è ø

where necessarily x < ω. Next, the failure time for the Wiener process is the
inverse Gaussian so

æ ( w- m x s )2 ö
-ç ÷
ç 2 s2 s ÷
we è x ø
hS (s) = Pr[S = s] = (13.33)
2ps2x s3
Predictive Maintenance 301

Using these definitions, Whitmore et  al. [93] give the joint density on the
marker variable and the chance of survival as
w

hY, survivor ( y ) = Pr[Y(T ) = y, S > T ] =


ò g (y|x)h (x)dx

1 X

æ æ s2xy öö
ç w - çç m xT + 2 ( y - m y ) ÷÷ ÷ æ ö -æ 2wmx ö
ç è sy ø ÷ f ç y - mY T ÷ - e çè sx ÷øF
= Fç ÷ ç
s2x (1 - r2 ) ÷
÷ è s yT ø
2
ç
ç ÷
è ø
æ æ s xy
2
öö æ æ s2xy ö ö
ç w - çç m xT + 2 ( y - m y ) + 2w(1 - r2 ) ÷÷ ÷ ç y - (mY T + 2w ç 2 ÷ ÷
ç è sy ø ÷ fç è sx ø ÷
´ç ÷ ç ÷
ç s x (1 - r )
2 2
÷ ç s yT
2
÷
ç ÷ ç ÷
è ø è ø
(13.34)

and the complementary joint density on the marker variable and failure
time as
1
- M ¢S -1 M
we 2
hY , failure,S ( y, s) = Pr[Y(s) = y, S = s] = g 2 ( y |x)hS (s) = (13.35)
2p |S|1/2 s2

where Σ is the covariance matrix and the transpose of the vector M is

é y - mys w - mxs ù
M¢ = ê ,
ë s s úû

Using the defined probability functions, the failure and survival probabili-
ties over time are
t t ¥

ò
FT (t) = hS (s)ds =
0
òòh
0 -¥
Y, failure, S ( y, s)dyds (13.36)

w ¥

FT (t) =
ò h (x)dx = ò h

X

Y, survivor ( y )dy (13.37)

The defined probability structure can be used in a maximum likelihood


analysis of observed life test data to obtain estimates of the parameters of
the model functions. If a set of n copies of a device are placed on test and m
302 Reliability Engineering: Probabilistic Models and Maintenance Methods

of the units fail by the termination of the test at time T, then the data observa-
tions will be (yi, si) for the items that fail and (yi, T) for those that survive. The
general form of the logarithm of the likelihood function is
m n

(
ln L m , S = ) å
i =1
ln ( hY, failure,S ( yi , si ) ) + å ln ( h
i = m +1
Y, survivor ( yi )) (13.38)

In general, computation of the parameter estimates must be done numeri-


cally. However, the least difficult case arises when the data form a complete
sample in that all units fail during the test. In that case, Whitmore et al. [93]
give the estimating equations to be

å
n
yi nw
mˆ y = i =1
mˆ x = (13.39)
å å
n n
si si
i =1 i =1

and
n n
(w - mˆ x si )2 ( yi - mˆ y si )2
å å
1 1
sˆ 2x = sˆ 2y =
n i =1
si n i =1
si
n
( yi - mˆ y si )(w - mˆ x si )
å
1
sˆ 2xy = (13.40)
n i =1
si

This is possible because the second summation in the log-likelihood equa-


tion is dropped. When the data set is censored, computation of parameter
estimates is best performed by numerical optimization of the log-likelihood
function.
Once the parameter estimates are obtained, the distribution on the remain-
ing life may be analyzed relative to any PM policy. If, at time τ, the marker
variable is observed to have value y(τ), then the conditional density on the
level of degradation is
g1( y |x)hX ( x)
f X|Y ( x| y ) = Pr[X(t) = x|Y(t) = y , S > t] = (13.41)
hY, survivor ( y )

where the densities are evaluated at τ rather than T and the time from τ to
failure for any given value of the degradation is the inverse Gaussian density
for s – τ when the distance to the threshold is ω – X(τ). Therefore, the density on
remaining time to failure when the marker variable value y(τ) is observed is

æ ˆ w
-
ˆl ( s - m
ˆ )2
ö
fS|Y (s| y ) = ç l 3 e 2 sm
ò ÷ f X|Y ( x| y )dx
ˆ 2
(13.42)
ç 2ps ÷
-¥ è ø
Predictive Maintenance 303

TABLE 13.4
Simulated Failure Times and Marker Variable Values
si yi si yi si yi
652 52.155 671 53.216 666 53.056
658 52.228 657 53.056 660 53.335
663 52.459 674 53.221 656 52.561
654 52.730 671 53.060 664 52.027
657 51.927 671 52.628 656 52.986
656 53.108 671 53.014 665 52.706
660 52.749 658 53.021 651 52.066
675 53.090 652 52.729 643 53.152
648 52.526

This density must be evaluated numerically, but once this is done, a PM


policy of initiating maintenance when FS|Y(s|y) exceeds a policy value or
when FS|Y (s| y ) falls below a policy value should prove very efficient.
Consider an example. The failure data listed in Table 13.4 were obtained
by simulating a degradation process for which the per time step parameter
values are μ x = 0.035, σx = 0.012, μy = 0.08, σy = 0.025, ρ = 0.6, and ω = 23 and
the time unit is arbitrary but may be considered to be hours. Since, the data
form a complete sample, the estimation equations may be used directly
and yield estimates of mˆ x = 0.0348, mˆ y = 0.0799, sˆ x = 0.0113, sˆ y = 0.0258, and
rˆ = 0.811.
An alternate simulated example is based on the same assumed param-
eter values with the addition of a censoring time of 660 hours. The result-
ing data observations are shown in Table 13.5. Rather than attempting the
construction or computation of partial derivatives, the calculation of the
parameter estimates is performed using a heuristic optimization routine.

TABLE 13.5
Simulated Marker Variable Values and Censored Failure Times
si yi si yi si yi
657 52.244 657 52.910 649 52.369
Censored 52.345 Censored 51.711 656 53.230
654 52.510 Censored 52.617 659 52.679
646 51.944 656 52.372 Censored 52.820
657 53.089 Censored 52.459 660 52.787
654 52.764 659 51.784 647 52.267
647 51.667 Censored 53.548 Censored 51.955
660 51.925 656 52.445 632 51.543
655 51.067
304 Reliability Engineering: Probabilistic Models and Maintenance Methods

0.0070

0.0065

0.0060

0.0055
f (s|y)

0.0050

0.0045

0.0040

0.0035
700 800 900 1000 1100 1200 1300
Time

FIGURE 13.4
Residual life density for the simulated marker variable.

The resulting estimates (which may correspond to a local optimum) are


mˆ x = 0.0356, mˆ y = 0.083, sˆ x = 0.0104, sˆ y = 0.0235, and rˆ = 0.564.
Now, suppose a copy of the device to which the estimates from the
complete sample apply is observed to have Y(640) = 51.5. The principal
section of the residual life density f S|Y(s|y) based on these values is shown
in Figure 13.4. The corresponding distribution has FS|Y (700| y ) = 0.844,
FS|Y (750| y ) = 0.594, FS|Y (800| y ) = 0.287 , and FS|Y (825| y ) = 0.123. By numerical
search, FS|Y (821| y ) = 0.150 so one reasonable policy would be to schedule
PM for 821 hours, which is 821 – 640 = 181 hours beyond the observa-
tion time. Other similar rules are equally appropriate, but the key is that
observations of the marker variable can be used to select efficient main-
tenance policies.

13.5 Conclusion
The concept of predictive maintenance is very appealing. Predictive
maintenance holds the promise of significant maintenance cost reductions
and associated increases in productivity. The definition of alternate policy
formats and the formulation and analysis of models for selecting predictive
maintenance policies are embryonic and represent one of the important new
frontiers in maintenance planning research.
Predictive Maintenance 305

Exercises
13.1 Assume that the gamma process of Equation 13.1 has parameters β = 3.0
and λ = 0.5 and simulate a sample path of the process over time with
inspections every 4 time units. Plot the observed values of X(t) relative
to a failure PM threshold of θ = 100.
13.2 Assume a gamma deterioration process has parameters β = 2.0 and
λ = 0.4 and cost parameters of c1 = 1 and c2 = 22 with failure threshold
Xf = 75.0 and PM criterion θ = 70.0. Solve the model of Equation 13.3 to
determine the optimal inspection schedule.
13.3 Assume a device has a gamma deterioration process that has param-
eters β = 2.0 and λ = 1.2; failure and PM thresholds of Xf = 45 and θ = 40,
respectively; and an inspection schedule that is defined by τ0 = 5 and
τn+1 = τ0 – 0.1X(τn). Use Equations 13.9 through 13.14 to determine the
stationary density on the state of the device.
13.4 For the device described in Problem 13.3, assume cost parameters of
cI = 1, cP = 10, cC = 125, and cd = 250 and calculate the limiting expected
cost rate for predictive maintenance of the device.
13.5 Suppose a sample of 40 copies of a device having failure threshold
ω = 20.0 had previously subjected to a life test and that the failure time
data obtained are

31.29 10.479 0.757 9.31


1.496 0.479 0.256 13.566
7.477 2.331 0.482 11.636
1.919 3.552 0.511 0.596
0.88 4.819 6.726 0.218
4.826 1.645 4.315 11.93
0.722 0.974 34.54 0.649
0.833 0.304 0.841 1.49
88.938 0.351 0.553 0.818
2.8765 4.098 11.84 3.522

Suppose further that observation of the degradation of an operating


copy of the device has shown that X(8.45) = 16.2. Compute estimates of
the parameters of the diffusion process model of device degradation
and use them to compute the residual life probability for the device
being observed.
306 Reliability Engineering: Probabilistic Models and Maintenance Methods

13.6 Suppose that a performance (or marker) variable for a device has
been monitored at regular time intervals of one week over a period of
24 weeks and that the observed data are

31.841 35.141 37.300


32.289 36.017 37.341
33.267 36.160 37.973
33.613 36.852 37.972
34.184 36.914 38.217
34.224 36.995 38.299
34.428 37.158 39.317
34.530 37.260 40.010

Use the time series methods of Equations 13.27 through 13.29


to construct the time series model for the data and then use the model
to predict future observations relative to a PM criterion of C = 45.
13.7 A life test of a sample of 25 copies of a device has produced the follow-
ing observations of the failure time and the corresponding values of the
marker variable. The failure criterion was ω = 24.0. Use the maximum
likelihood estimation Equations 13.39 and 13.40 to compute estimates
of the life distribution parameters.

si yi si yi si yi
449 53.486 446 53.256 453 54.533
444 51.894 442 53.39 455 53.613
451 53.537 443 52.47 452 52.836
442 53.626 453 52.926 453 53.3
437 53.313 442 52.821 443 53.231
454 53.043 439 52.312 439 52.761
441 52.464 455 54.051 438 52.902
441 52.914 447 55.008 449 54.444
444 53.352
14
Special Topics

To conclude our study of reliability and maintenance planning models, we


should examine some special topics that build upon the topics treated in
this text. These are not the only extensions to the material in this text, but
they are reasonably general and relatively close to the preceding discussions.
The  first of these extensions is the development of statistical methods that
can be applied to failure data for repairable systems. These methods permit
us to estimate the parameters of models that describe the failure patterns
despite the fact that they correspond to stochastic processes rather than life
distributions. The second of the topics is warranties. As much of the study of
reliability is motivated by a wish to make products and systems durable and
thus market competitive, warranties constitute a component of this effort.
Following a discussion of warranty policies, the idea of reliability growth is
examined. This is another aspect of assuring reliable product designs. Finally,
we consider reliability models for dependent components and we explore the
construction and use of bivariate and multivariate reliability models.

14.1 Statistical Analysis of Repairable System Data


14.1.1 Data from a Single System
The discussions of maintenance plans for repairable systems serve to dis-
tinguish the random processes of operation to failure and the stochastic
processes of function and repair. The elementary process of operation to
failure is represented by the life distribution, and the discussions in Chapters
7 and 8 address the statistical methods that are applied to data to support the
choice and use of various life distribution models. For repairable systems
that are managed using maintenance policies, a similar but different set of
statistical methods is appropriate.
For a device that is operated and repaired, the stochastic process
corresponding to the series of failure times (operating intervals) may be
homogeneous but, more often, is nonhomogeneous. In either case, the failure
intensity function is a useful nonparametric descriptor of device behavior.
A simple estimator for the failure intensity function is obtained by partitioning

307
308 Reliability Engineering: Probabilistic Models and Maintenance Methods

an interval of device operation into subintervals within each of which the


number of observed failures is denoted by Ni. Then, the failure intensity func-
tion estimate is

ì N1
ït 0 £ t < t1
ï 1
ï N2 t1 £ t < t2
ï
l(t) = í (t2 - t1 ) (14.1)
ï ⋮ ⋮
ï
ï Nm
tm -1 £ t < tm
ï ( tm - tm -1 )
î

where the m values τi are not necessarily of equal length but together define
the partition of the observation interval.
If the times τi are selected because they are failure times, then the Laplace
test [52] may be used to determine whether or not the failure intensity func-
tion appears to be homogeneous. If the intensity function is homogeneous,
then the quantity

ti T
å
n
-
L= n 2
i =1

T/ 12n

has a standard normal distribution. In this expression, n is the number of


failures and T is the total test time. Thus, if za/2 < L < z1- a/2, then we have
(1 – α) × 100% confidence that the failure intensity function is homogeneous,
and if L falls outside of that interval, the appropriate conclusion is that the
failure intensity function is nonhomogeneous.
As an example, suppose that a device has been monitored over 5000 hours
and that the observed sequence of failure times is shown in Table 14.1. For
these 11 failure times, we obtain L = −6.424, so we conclude that the process
is not homogeneous.
Once it is decided whether or not the sequence of failure times comprises a
homogeneous process, one can use the observed data to compute parameter
estimates for the failure intensity function. In the case of the homogeneous

TABLE 14.1
Failure Times for a Nonhomogeneous Failure Process
Example
676.4103 2289.604 4014.987
814.0804 3015.355 4222.475
1194.773 3142.368 4691.219
1790.756 3580.177
Special Topics 309

process, the times between failures constitute observations from a Poisson


process, so the estimation of the rate function is direct. We compute ti = τi – τi−1
for each of the failure times, and then just as in Chapter 8,
1 (14.2)
l̂ =
å
1 n
ti
n i =1

and also as shown in Chapter 8, a (1 – α) probability confidence bound on the


parameter is

c2 a c2 a
2n, 2 n , 1-
2
£l£ 2
(14.3)
2nt 2nt

As an example, consider the 20 data values in Table 7.8 that correspond to


the earliest failure times from Table 7.2. For those data, lˆ = 3.539, and using
c 22 n, 0.025 = 24.433 and c 22 n, 0.975 = 59.342, we obtain Pr[2.162 ≤ λ ≤ 5.251] = 0.95.
If the conclusion is that the process is nonhomogeneous, parameter estimates
for the model for Λ(t) can be computed once a model for the intensity function
is specified. As indicated by Ascher and Feingold [94], the nonhomogeneous
Poisson process (NHPP) is a sensible general choice for the model of the failure
intensity function. Since it is based on two parameters, it can be fit to nearly
any of the processes that are common to repairable systems. As discussed in
Chapter 10, the general model for the cumulative failure intensity is
b t t
ætö b ub - 1
L(t) = ç ÷ = l(u)du =
èqø 0 0
ò qb
du
ò (14.4)

Assuming that a system is to be observed until a time T, the number of


observed failures will be a random variable that has a Poisson distribution
b
with mean ( T/q ) and the conditional distribution on the failure times, given
the number of failures is the distribution on the order statistics given by
Rigdon and Basu [95]

ì 0 t<0
ï
G(t) = íL(t)/L(T ) 0£t£T (14.5)
ï 1 t>T
î

For the intensity function of Equation 14.4, the distribution on order statistics
simplifies to

L(t)/L(T ) = ( t/q )
b

G(t) = (14.6)
( T/q ) = ( t/T )
b b
310 Reliability Engineering: Probabilistic Models and Maintenance Methods

The likelihood function that applies to a set of observed failure times τi and
the associated random variable N is the joint density on these quantities:

nb b
- T /q
e ( )
n n
( T/q ) æ tb -11 ö
L(q, b|n, t1 , t2 , … , tn ) = f N (n) ´ n ! Õ i =1
g( t ) =
i
n!
n! Õ
i =1
bç i b ÷
è T ø
b -1
æ bn ö æ ö
n
b
- T/q
= ç nb ÷ ç Õ t ÷÷ø e ( ) (14.7)
è q ø çè
i
i =1

Taking the logarithm yields

n b
æT ö
ln L = n ln b - nb ln q + (b - 1) å i =1
ln ti - ç ÷
èqø
(14.8)

and solving the derivative expression yields the estimation expressions

n n T
bˆ = = and qˆ = 1/bˆ (14.9)
å å
n n
ln ( T/ti ) n ln T - ln ti n
i =1 i =1

Following the logic used in Chapter 8, the (1 – α) probability confidence inter-


val for β is

ˆ 2
bc a
ˆ 2
bc a
2n, 2 n , 1-
2
£b£ 2
(14.10)
2n 2n

Unfortunately, there is no defined method for computing confidence limits


for θ.
As an example, consider the data provided in Table 14.1. For the failure
times listed, applying the estimation Equations 14.9 yields bˆ = 1.248 and
qˆ = 732.47. The confidence interval for β is 0.623 ≤ β ≤ 2.087.

14.1.2 Data from Multiple Identical Systems


There are many instances in which a fleet (or set) of identical systems is oper-
ated and failure data are observed. Suppose there are J systems in the fleet
and each is monitored over a time interval of length Tj. Suppose further that
the number of failures observed for the various systems is denoted by nj and
Special Topics 311

that the observed times are τji for i = 1,…,nj. Here again, the numbers of fail-
ures and the times of failures are random variables.
If the failure process is homogeneous, then the failure intensity function is
defined by a single parameter and the likelihood function for the observed
failure data is

nj
J
( lTj ) J
å j=1 n j -l
J
å j=1 Tj
J nj
Tj
L(l|n, t1 , t2 , … , tJ ) = Õj =1
nj !
e
- lTj
=l e Õn! j =1 j
(14.11)

and the logarithm is

J æ J ö J

ln L = -l å Tj + ç
ç å n j ÷ ln l +
÷ å ( n ln T - ln n )
j j j (14.12)
j =1 è j =1 ø j =1

Setting the derivative of the logarithm of the likelihood function yields the
estimation equation

å
J
nj
j =1
l̂ = (14.13)
å
J
Tj
j =1

The (1 – α) probability confidence interval for λ is again based on the chi-


square distribution. For this application, the number of degrees of freedom
å
J
is n = 2 n j and the confidence interval is
j =1

c2 a c2 a
n, n , 1-
J
2
£l£ J
2 (14.14)
2å j =1
Tj 2 å j =1
Tj

For example, suppose 5 identical systems are observed over intervals of


length T1 = 2200, T2 = 750, T3 = 870, T4 = 1200, and T5 = 1900 hours and the
failure data obtained are that shown in Table 14.2. For these data, we obtain
lˆ = 16/6920 = 0.0023, and a 95% confidence interval on the parameter is
0.0013 ≤ λ ≤ 0.0036.
For a fleet of identical systems that display a nonhomogeneous failure
pattern, we adopt the same notation as for the homogeneous case. Let Tj rep-
resent the length of time that the jth system is observed, nj represent the
number of observed failures, and the values τji represent the observed failure
312 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 14.2
Example of Homogeneous Failure Times for a Fleet of 5 Identical
Systems
n1 = 6 n2 = 3 n3 = 0 n4 = 3 n5 = 4
234.076 150.168 378.700 110.829
480.5505 329.935 413.786 209.701
941.3557 501.633 572.794 1400.070
1505.591 1668.630
1792.566
1984.426

times. The failure times for the separate systems should be independent so
the likelihood function for the failure intensity function parameters is
J æ nj
ö J æ nj
btbji-1 ö -(Tj/q)b
L(q, b|n, t1 , t2 , … , tJ ) = ÕÕ
j =1
ç
ç
è i =1
÷
ø
- L(T )
l(t ji ) ÷ e j = ÕÕ j =1
ç
ç
è i =1
qb ÷
÷e
ø
b -1 J
å j=1 n j æ
J b
b
J nj
ö - å (Tj /q)
=
q
bå nj ç
J
j =1
ç
è
ÕÕ j =1 i =1
t ji ÷
÷
ø
e j =1
(14.15)

The logarithm of this function is not complicated and taking its derivative
leads to the estimation equations
1/bˆ
æ J
T ÷ö
ˆq = ç
ç
å j =1
÷
j
b

(14.16)
J
ç
è å j =1
nj ÷
ø

and
J

bˆ =
å j =1
nj
(14.17)
å ( )-
J J ˆ
å nj Tjb ln Tj J nj
j =1
J
j =1
bˆ å å ln t ji
å T j =1
j
j =1 i =1

or equivalently in terms of the derivative

å å ln t - å (T ln T ) = 0
J nj J

ji j j
1 j =1 i =1 j =1
+ (14.18)

å n å T
J J

j j
j =1 j =1
Special Topics 313

TABLE 14.3
Example of Nonhomogeneous Failure Times for a Fleet
of 4 Identical Systems
n1 = 11 n 2 = 13 n3 = 9 n4 = 9
676.410 399.084 303.763 570.445
814.080 440.728 461.063 841.766
1194.773 722.429 695.678 1201.825
1790.756 1316.966 1325.921 1508.394
2289.604 2011.369 1442.667 1653.020
3015.355 2587.809 1845.714 2189.773
3142.368 3139.724 2376.580 2591.334
3580.177 3490.071 2657.079 2863.896
4014.987 4057.239 2896.699 3621.946
4222.475 4190.488
4691.219 4970.179
5344.368
5858.649

For example, suppose 4 identical systems are monitored for T1 = 5000,


T2 = 6000, T3 = 3000, and T4 = 4000 hours and the observed failure times are
those shown in Table 14.3. The solution to Equation 14.18 is bˆ = 1.067 and then
Equation 14.16 yields qˆ = 498.8. A modified form of Equation 14.10 applies to
the confidence interval on β. The number of degrees of freedom is

n=2 ån
j =1
j

and

ˆ 2
bc a
ˆ 2
bc a
n, n , 1-
J
2
£b£ J
2 (14.19)
2å j =1
nj å
2
j =1
nj

so 0.770 ≤ β ≤ 1.414.
In summary, the statistical methods that were developed for estimating
the parameters of life distributions have been modified slightly. The revised
methods permit us to compute parameter estimates for useful models of the
behavior of the stochastic processes that represent the operation of repairable
equipment.
314 Reliability Engineering: Probabilistic Models and Maintenance Methods

14.2 Warranties
A warranty is a guarantee by a producer that a product will display a defined
level of reliability. It is increasingly common for manufacturers to provide
warranties for their products. The chief reason that manufacturers pro-
vide warranties is that customers often consider warranties to be appropri-
ate and warranties are thus an important ingredient in successful marketing.
Independent of this point, it is generally recognized that complex and espe-
cially expensive products should be guaranteed to function properly. In 1964,
the U.S. Congress enacted the Magnuson–Moss Act, which formally defined
how the warranties are to be structured and specified the responsibilities of
manufacturers in meeting warranty commitments.
There are essentially two types of warranties. These are full replacement
warranties and pro rata warranties. Pro rata warranties are usually offered
on products such as automobile components such as tires and batteries.
Common characteristics of products that carry pro rata warranties are that
(1) product use implies wear or at least accumulating deterioration, (2) repair
is either physically impractical or economically inefficient, and (3) evaluation
of product age is reasonably straightforward. Under a pro rata warranty, the
manufacturer returns a proportion of the original price of the product to the
customer in the event of product failure. The proportion of the price that is
returned is computed on the basis of an estimate of how much of the product
life the customer has lost due to failure of the product. In the case of an auto-
mobile tire, a blowout renders the tire unusable. If one occurs, it is common
to measure the depth of the tire tread that remains and to use the ratio of
the remaining tread depth to the original tread depth to determine the tire
life lost due to the failure. The formula for converting the lost life into a cash
settlement is usually specific to the manufacturer.
The full replacement warranty is a guarantee to repair or replace a failed
product (or component of a product) in order that the product be as good as
new. A key feature of the full replacement warranty is the definition of the
time limit on its applicability. For some products, that warranty is in force for
a fixed period of time from the date of product purchase. A 3-year warranty
on an automobile and a 1-year warranty on a stereo are examples. In the case
of the auto, the frequency of repair does not change the termination date of
the warranty.
For some products, the full replacement warranty is renewed when the
product is replaced under the warranty agreement. For example, a por-
table stereo that fails prior to the completion of its 1-year warranty period
might be replaced with another copy of the product that is also warranted
for 1 year. Most durable consumer goods carry one of the two types of
warranties.
The choice of which type of warranty to offer is made by the manufac-
turer and is highly influenced by the nature of the product and its reliability.
Special Topics 315

Questions about warranty policies for which analysis is meaningful include


the length of the warranty period, the cost that is expected to accrue in meet-
ing warranty commitments, and the formula to apply for the pro rata case.
We shall examine some simple models for the warranties in the following
discussion. For each model, we assume that the warranty promotes demand
and therefore increases profit.

14.2.1 Full Replacement Warranties


Most analysts distinguish between repairable and nonrepairable products.
From an analytical perspective, this distinction is unnecessary if the product
is as good as new after warranty service. Assume that the warranty for a
product is renewed at a repair time and let Tw represent the duration of the
full replacement warranty. Now there are many scenarios that one might
assume for the warranty. For now, we suppose that the cost of providing
warranty support for a single unit of product is the cost of manufacturing
the product plus an operating or processing cost that is proportional to the
selling price of the product. That is, the cost to provide a warranted replace-
ment is (C + δP) where C is the production cost for the unit of product, P is its
selling price, and δ is the proportionality factor that represents the fraction
of the selling price consumed in the processing of a warranty claim. Assume
further that we can represent the product generation cost as a fraction of the
selling price:

C = gP

Next, assume that that the likelihood of product sales is enhanced by the
warranty and that this can be modeled using

(
E [ demand|Tw ] = d(Tw ) = u1 + u2Twr )
where 0 ≤ r < 1. The expected profit realized from product sales adjusted by
the cost of warranty support is

E[ profit] = ( (P - C ) - E[N (Tw )](C + dP) ) d(Tw ) (14.20)

where

1
E[N (Tw )] = -1 (14.21)
FT (Tw )

is the expected number of replacement units provided for each purchaser.


The reason this is the number of copies used to meet each purchase is that
316 Reliability Engineering: Probabilistic Models and Maintenance Methods

the number of copies of the product used in order to achieve a life duration
in excess of the warranty interval is geometric with parameter FT (Tw ), the
survivor function for the product life distribution. The expected value of
the number of replacements, N(Tw), is one less than the number of copies
used. Substituting the demand and cost expressions along with E[N(Tw)] into
Equation 14.20 yields the following expected profit model:

æ æ 1 ö ö
E[ profit] = çç (P - gP) - ç (
- 1 ÷ ( gP + dP) ÷÷ u1 + u2Twr ) (14.22)
è è FT (Tw ) ø ø

Taking the derivative of this expression yields

dE[ profit] æ æ 1 ö ö
= çç (1 - g )P - ç - 1 ÷ ( g + d)P ÷÷ ru2Twr -1
dTw è è FT (Tw ) ø ø
ææ ö ö
f (T )
(
- ç ç T w 2 ÷ ( g + d)P ÷ u1 + u2Twr
çç ç F (T ) ÷ ÷÷ )
(
èè T w ø ) ø
æ æ 1 ö ö
= çç (1 - g ) - ç - 1 ÷ ( g + d) ÷÷ Pru2Twr -1
è è FT (Tw ) ø ø
æ æ z (T ) ö ö
(
- çç ç T w ÷ ( g + d) ÷÷ P u1 + u2Twr )
è è FT (Tw ) ø ø

Setting this derivative equal to zero allows us to divide out the price, P, and
then applying the hazard function identity, we obtain

æ æ FT (Tw ) ö ö r -1
æ æ zT (Tw ) ö ö
çç (1 - g ) - ç
F (T )
÷ ( g + d) ÷÷ ru2Tw - çç ç
F (T )
( )
÷ ( g + d) ÷÷ u1 + u2Tw = 0
r
(14.23)
è è T w ø ø èè T w ø ø

Numerical analysis seems to be the most promising approach to solving


for Tw. As an example, if we assume that production cost is 40% of the sell-
ing price for a product and that δ = 0.01, u2 = 25, u1 = 4.0, r = 0.1, and the
device has a Weibull life distribution with β = 2.25 and θ = 200 weeks. For
these values, a plot of the expected profit function is shown in Figure 14.1.
Numerical solution of Equation 14.23 yields an optimal warranty period of
Tw = 55.015 weeks.
Note that the previous formulation yields a solution that is independent of
the price of the product and depends instead upon the proportion of revenue
returned to the customer and upon the failure behavior of the product.
Special Topics 317

23.25

23.20

23.15

23.10

23.05

23.00

22.95

45 50 55 60 65 70 75

FIGURE 14.1
Expected profit function for a full replacement warranty example.

Next, consider the situation in which the warranty has a fixed duration
during which replacement (repair) service is provided but the duration of
the warranty is not extended. For this case, the cost of meeting the warranty
commitment for each unit of product is defined by the renewal function
based on the product life distribution function. If it is again assumed that
demand depends on the duration of the warranty as expressed by d(Tw), the
expected profit function is

(
E[ profit] = ( (P - gP) - MF (Tw )( gP + dP) ) u1 + u2Twr ) (14.24)

To identify the optimal warranty policy time, we again use the derivative

dE[ profit]
= ( (P - gP) - MF (Tw )( gP + dP) ) ru2Twr -1
dTw

(
+ ( -mF (Tw )( gP + dP) ) u1 + u2Twr ) (14.25)

which also is set equal to zero and analyzed numerically. For the same
parameter values as those used for the full replacement warranty example,
the optimal policy time is Tw = 57.5 weeks.

14.2.2 Pro Rata Warranties


The chief difference between a full replacement and a pro rata warranty
is that the resources committed to meeting the warranty vary for the pro
rata case. Under a full replacement warranty, the producer incurs the full cost
318 Reliability Engineering: Probabilistic Models and Maintenance Methods

of product replacement when a warranted item fails. While this may be a


fraction of the sale price of the product, it is still constant over the duration of
the warranty. In the case of a pro rata warranty, the producer usually “pays”
the customer a proportion of the price of the warranted product and the pro-
portion paid depends on the condition or age of the product when it fails.
To represent the effect of pro rata warranty policy on profitability, define
a model that describes the costs and profits associated with each unit of
product. An aspect of the transactions is that the customer may decide not
to repurchase the product. That is, we might assume that for each unit of
product that fails while under warranty, the probability of a repeat pur-
chase (warranty activation) is proportional to the ratio of failure age to
policy age, say,
s
æ t ö
Pr[repurchase] = h ç ÷ (14.26)
è Tw ø

Now, under a pro rata warranty, the residual value of a failed unit may be
represented as

a a
æ Tw - t ö æ t ö
ç T ÷ = 1-ç T ÷
è w ø è w ø

and this value is returned to any customer that decides to repurchase the
product. This form permits either linear or nonlinear reduction in warranty
value and also allows that the maximum warranty value be any fraction of
the purchase price. Using this expression, the repurchase price is

æ æ t öa ö
P - ç1- ç ÷P
ç è Tw ÷ø ÷
è ø

so the consequence of servicing the warranty is that the revenue from the
repurchase is

æ æ t öa ö
P - ç 1 - ç ÷ ÷ P - C - dP (14.27)
ç è Tw ø ÷
è ø

As in the case of the full replacement warranty, the revenue derived


from the initial purchase is (P – C) = (1 – γ)P, and it is again reasonable to
assume that demand depends on the warranty policy so the expected
demand may be represented in the same manner as was used for full
Special Topics 319

replacement warranties. Thus, the expected profit associated the use of


the pro rata warranty:

æ Tw
æ æ t öa ö æ t ös ö
ç
E[ profit] = (P - C ) + ç P ç
ç
è
ò ÷ - C - dP ÷ h ç ÷ fT (t)dt ÷ (u1 + u2Tw )
ç è Tw ø
0 è
÷ è Tw ø
ø
÷
ø
r

æ Tw
æ æ t öa ö æ t ös ö
ç
è
ò
= P ç (1 - g ) + ç ç
ç è Tw ÷ø
0 è
- ( g + d ) ÷ h
÷ çè Tw ÷ø
ø
f T (t )dt ÷
÷
ø
(u1 + u2Twr ) (14.28)

The first derivative condition for this function is

d
E[ profit] = P ( ( 1 - (d + g ) ) h fT (Tw )
dTw
Tw
æ æ at a ö æ t ös æ st s ö ö ö
-
ò ç è Tw ø è Tw ø ( a
)
ç ç a +1 ÷ h ç ÷ + ( t/Tw ) - ( g + d) h ç s +1 ÷ ÷ fT (t)dt ÷ (u1 + u2Twr )
è Tw ø ÷ø ÷
0 è ø
æ Tw
æ æ t öa ö æ t ös ö
ç
è
ò
+ ru2Twr -1 ç (1 - g ) + ç ç ÷ - ( g + d) ÷ h ç
ç è Tw ø
0 è
÷
÷ è Tw ø
ø
fT (t)dt ÷ = 0
÷
ø
(14.29)

For the numerical values of the previous examples, we add s = 0.2, a = 0.1, and
η = 0.8. The model solution is a warranty policy time of 51.5 weeks.
Naturally, there are many possible variations to the models described
here. One extension that is often considered worthwhile is to apply a dis-
count factor to future cash flows. Depending upon the application, this and
other modifications may be reasonable. The modeling format should be the
same and the result should be an effective approach to the selection of a
warranty policy.

14.3 Reliability Growth


New products and new product designs are usually enhanced during some
sort of “development process.” This development process generally includes
considerable testing and adjustment of both the design and the manufactur-
ing processes used to create the product. The adjustments are frequently of
sufficient significance that they are labeled “engineering design changes.”
An important element of the enhancement of the product design and
320 Reliability Engineering: Probabilistic Models and Maintenance Methods

manufacturing process formulation is an increase in the reliability of the


product. This improvement in product reliability is called reliability growth.
It has been observed in many products.
The general context within which reliability growth is observed is the
test-analyze-and-fix (TAAF) process in which observed failures lead to
redesign and product reliability is gradually improved. The idea is that
the observed failure points to failure modes that result from design flaws
or manufacturing errors that can be eliminated. Once those problems are
eliminated, the product reliability is improved and the improvement can
be described by a decrease in the hazard rate or equivalently by an increase
in the mean life length for the product. In addition, the test data that are
accumulated usually support estimation of hazard behavior and the anal-
ysis of failures yields improved knowledge of the underlying product
weaknesses. A key point concerning the reliability growth models is that
they are based on an assumption that the life distribution is exponential
(constant hazard).
There were some earlier embryonic models for reliability growth, but
the first widely adopted reliability growth model was defined by Duane
[96]. He indicated that improvement in reliability during product devel-
opment resembled a learning process, so his model is often described as a
learning curve model. Duane observed that the cumulative failure inten-
sity is a log-linear function of the total test time. Duane’s model, which is
deterministic, is expressed in terms of the “cumulative mean time before
failure,” μ, as

m = aT b (14.30)

where
T is the total accumulated test duration
α is a scale parameter that depends on the initial failure intensity
β is the rate of growth

Clearly, this model reminds us of the Weibull cumulative hazard function,


but it is not a hazard function at all. In fact, the nature of a TAAF process
implies that the failure intensity is changing in a stepwise manner so that the
failure process is nonhomogeneous.
Crow [97] formalized this idea by observing that failure occurring during
product development may be viewed as an NHPP with an intensity function
that resembles the Weibull hazard and is essentially the same as Duane’s
model but with a very different interpretation. The initial form of Crow’s
model is

l(t) = abtb -1 (14.31)


Special Topics 321

It is widely known as the Crow Army Material Systems Analysis Activity


(AMSAA) model. The models defined by both Duane and Crow are now
incorporated in Military Handbook 189C [98] that specifies procedures for per-
forming reliability growth evaluation tests. There have been some criticisms
of the Crow model largely because of its continuous format, but Benton and
Crow [99] responded to this point by indicating that the model serves as a
smoothed approximation to the step function form of the failure intensity
that arises from a sequence of design changes.
The AMSAA model provides three important improvements to the Duane
model. First, Crow specifically distinguishes between failures observed
during testing that result from failure modes that cannot be altered (Type
A modes) and those failures that result from design or manufacturing prob-
lems that can be addressed (Type B modes). Second, he allows for the possibil-
ity that adjustments to the product design or manufacturing process may not
eliminate the problem completely. Instead, for each failure mode addressed,
Crow includes an “effectiveness factor” in the calculation of the reliability
improvement. Finally, Crow defines the growth model for a sequence of test
cycles. He assumes that product design or manufacturing process changes are
implemented at the end of each of several test cycles. Thus, the development
test is described by the set of test cycle durations (τ1, τ2, …, τm). Since the life
distribution is assumed to be exponential, the number of failures that occur
during each test cycle is Poisson distributed with mean equal to λjτj where
λj is the sum of the hazard rates for the correctable and the uncorrectable
failure modes during cycle j. (The failure modes are all assumed to be mutu-
ally independent.) Thus, under the Crow model, the hazard rate during any
test cycle is

j -1 K
l j = lu + å å (1 - d )l
i =1 k =1
ik ck (14.32)

where
λu is the hazard rate associated with the uncorrectable failure modes
λ ck is the hazard rate for the kth correctable failure mode
dik is the effectiveness in modifying the kth failure mode of the engineering
change implemented at the end of cycle i
K is the number of correctable failure modes

Military Handbook 189C [98] provides a thorough explanation of the use


of the growth models and distinguishes between the process of tracking
growth during testing and projecting the eventual failure intensity upon
completion of the test process. The latter case is interesting and is illus-
trated in the handbook by an example. The data presented in the handbook
322 Reliability Engineering: Probabilistic Models and Maintenance Methods

TABLE 14.4
Test Data from Table XIV of Mil. HDBK 189C
First Observed
Type B Mode Failure Time Ni di
1 15.04 2 0.67
2 25.26 3 0.72
3 47.46 2 0.77
4 53.96 2 0.77
5 56.42 3 0.87
6 99.57 2 0.92
7 100.31 1 0.50
8 111.99 3 0.85
9 125.48 3 0.89
10 133.43 4 0.74
11 192.66 1 0.70
12 249.15 2 0.63
13 285.01 1 0.64
14 379.43 1 0.72
15 388.97 1 0.69
16 395.25 1 0.46
Totals 32 11.54

are shown in Table 14.4. The data shown are based on a simulation of a
TAAF process that includes 10 failures due to Type A failure modes and
32 failures caused by 16 out of 100 possible Type B failure modes. The total
test duration is 400 hours. Note that there are multiple failures for some
of the failure modes and that failure mode correction effectiveness factors
are listed.
Assuming that the expected value of the failure intensity function follow-
ing the TAAF process is

K
E[l(T )] = l u + å (1 - d )l + dm(T )
i =1
i i (14.33)

the maximum likelihood estimates (which are not unbiased) for the failure
intensity model of Equation 14.33 are

m m N N (14.34)
bˆ = = lˆ i = i lˆ u = A
æT ö
å
m

å T T
m
ln ç ÷ m ln T - ln ti
i =1
i =1
è ti ø
Special Topics 323

where
m is the number of Type B failure modes that are observed
d is the average failure mode correction factor
μ(T) is the failure intensity for new Type B failure modes that arise during
testing
T is the total test duration

μ(T) is estimated by mˆ (T ) = mbˆ/T and the estimate for the projected failure
intensity function is

1æ ö
m m
Eˆ [l(T )] = ç N A +
T çè å
i =1
å d ÷÷ø
(1 - di )N i + bˆ
i =1
i (14.35)

The Military Handbook further identifies the unbiased estimate for β as


bˆ u = (m - 1)bˆ/m and the corresponding estimate Eˆ u [l(T )] is obtained by
replacing b̂ by b̂u in Equation 14.35. For the data in Table 14.4, the estima-
tion expressions yield bˆ = 0.797, bˆ u = 0.747, lˆ A = 0.025, mˆ (T ) = 0.0023, and
Eˆ [l(T )] = 0.0675. The reciprocal of Eˆ [l(T )] is taken as the estimated mean time
before failure for the system following testing. For the given data, that value
is 14.81 hours. This value may then be compared to a target to determine
whether additional development testing is needed.
Two final points are that one may set a design reliability target and then
compute how long the development testing should be continued in order to
attain the target. Second, for any specific design, the assumption of constant
hazard requires careful consideration.

14.4 Dependent Components


The possibility of dependence among system components is an important
consideration in the analysis of system reliability. The dependence may arise
as a consequence of sharing of loads, susceptibility to the same failure causes,
or mutual interactions. Two examples of mutual interactions are (1) the case
of microelectronic components that are located near enough that the heat
generated by each component affects the thermal environment experienced
by the other component and (2) mechanical components such as gears and
struts that transmit forces from one component to another.
Naturally, there are several reasonable approaches to modeling component
dependencies. One general approach is to simply formulate a system reliabil-
ity model for which the system continues to operate, but the hazard function
324 Reliability Engineering: Probabilistic Models and Maintenance Methods

changes when a component fails. This model seems most representative of


the shared load case. The general model is
t t

FTs (t) = FT1 (t) +


ò
0
fT1 (u)FT2 (t - u)du = e - ZT1 (t ) +
òf
0
T1 (u)e - ZT2 (t - u )du (14.36)

where
Ts is the overall system life length
T1 is the time at which the first component failure occurs
T2 is the residual life length of the system following the first component
failure

If this model is applied to a shared load system, one often assumes that the
individual component life lengths are independent and it is only the sharing
of the load the produces the dependence at the system level. In that case,

FT1 (t) = FX1 (t| y(T1 ))FX2 (t| y(T1 )) (14.37)

where y(T1) describes the effect of load sharing on the life distributions.
Similarly,

t t
æ F (t - u) F (t - u) ö
ò
0 0
èò
fT1 (u)FT2 (t - u)du = ç f X1 (u)FX2 (u) X2¢
F X2 ( u )
+ f X2 (u)FX1 (u) X1¢
FX1 (u) ø
÷ du

=
ò( f
0
X1 )
(u)FX2¢ (t - u) + f X2 (u)FX1¢ (t - u) du (14.38)

where the notation Xi¢ is intended to signify the fact that the hazard function
for the surviving component is different than when both components are
functioning.
An alternate general model for component dependence was proposed by
Barlow and Proschan [12]. They argued that a reasonable interpretation of cer-
tain shock models leads directly to a bivariate exponential system life distri-
bution. Their construction proceeds as follows. Suppose a system composed
of two components (in arbitrary structure) is subject to shocks from three
sources. The shocks from each source occur according to Poisson processes
such that [Ni(t)] is the number of shocks from source i occurring during (0, t)
and λi is the parameter for shock process i. Suppose further that a shock from
source 1 causes failure of component 1 with probability θ1, while a shock
from source 2 causes failure of component 1 with probability θ2. In the case
of shocks from source 3, each shock causes failure of both components with
probability θ11, causes failure of component 1 only with probability θ10, causes
Special Topics 325

failure of component 2 only with probability θ01, and results in no failures


with probability θ 00. Then the joint survival probability for life lengths T1 and
T2 for the two components is

æ ¥
(l1t1 )k öæ ¥
(l 2t2 )k ö
Pr[T1 > t1 , T2 > t2 ] = ç
ç
è
åk =0
e - l1t1
k!
(1 - q1 )k ÷ ç
֍
øè
åk =0
e - l2t2
k!
(1 - q2 )k ÷
÷
ø

æ ¥ ¥
ì - l3 t1 (l 3t1 ) j j ü ì - l3 (t2 -t1 ) (l 3 (t2 - t1 ))k üö
´ç
ç åå íe
î j!
q00 ý íe
þî k!
(q00 + q10 )k ý ÷ (14.39)
þ ÷ø
è k =0 j =0

when 0 ≤ t1 ≤ t2. Taking the sums and simplifying, they obtain

Pr[T1 > t1 , T2 > t2 ] = e 1 ( 1 1 3 10 ) 2 ( 2 2 3 00 10 )


- t l q + l q - t l q + l ( 1- q - q )
(14.40)

and by symmetry, when 0 ≤ t2 ≤ t1,

Pr[T1 > t1 , T2 > t2 ] = e 1 ( 1 1 3 00 01 ) 2 ( 2 2 3 01 )


- t l q + l ( 1- q - q ) - t l q + l q
(14.41)

Combining these expressions yields the standard bivariate exponential sur-


vivor function
- l1* t1 - l *2 t2 - l *3 max ( t1 , t2 )
Pr[T1 > t1 , T2 > t2 ] = e (14.42)

where

l1* = l1q1 + l 3q10

l*2 = l 2q2 + l 3q01

l*3 = l 3q11

14.5 Bivariate Reliability


People frequently discuss equipment behavior in terms of age and usage.
Common examples are automobiles and automobile tires in which both
model year and accumulated mileage are usually included in discussions of
longevity. Less well-recognized examples for which two measurement scales
are quite important include factory equipment, power generation machines,
and aircraft. In fact, the longevity of many of the devices that reliability
specialists study is meaningfully described in terms of two measures.
326 Reliability Engineering: Probabilistic Models and Maintenance Methods

Device life is a resource that may be best represented and for which the
consumption may best be measured using a two (or higher)- dimensional
vector and the quantities that comprise the vector are specific to the equip-
ment. Years of usage and mileage are not the only two quantities that might
describe device longevity. We use the terms age and use here, but these
terms are generic and may represent quite different measures than dura-
tion of ownership and distance traveled. In the example of an automobile
tire, age might correspond to accumulated mileage and usage might be
measured as tread loss. Even more complicated measures such as current
flow and thermal history may be appropriate for some integrated circuits.
There are basically two ways to approach the definition of a bivariate reli-
ability model. The traditional approach has been to define the second vari-
able, usage, as a function of time so that the bivariate model can be collapsed
into a univariate model. Models of this type are discussed first. In fact, we
have already discussed some as the cumulative damage models portray
equipment reliability in terms of deterministically defined deterioration
occurring at random points in time and the proportional hazards models
treat age as a deterministic function of use covariates. Clearly, the nondeter-
ministic cases are more interesting.
A second approach that is more recent and was first treated by Singpurwalla
and Wilson [100] and subsequently by Yang and Nachlas [101] is to develop
reliability functions that are truly bivariate. These are presented subsequently.

14.5.1 Collapsible Models


The collapsible models are generally defined so that the distribution on
“usage” is a stochastic function of age. The cumulative damage models and
particularly the shot noise process discussed in Chapter 5 and the gamma
process models treated in Chapter 13 are examples of the type of formulation.
The basic formulation is to define age and usage variables for which the dis-
tribution on usage depends upon device age. From a statistical perspective,
this is a reasonable concept. For example, it would be appropriate to describe
a distribution on the age (or wear) for a population of 3-year-old cars.
Algebraically, the collapsible models are quite complicated. The most direct
formulation is obtained by defining the usage variable, say, X, as a stochastic
process and defining the failure to correspond to values of the state variable
that exceed a threshold, say, θ (as in Chapter 13). Then for the bivariate sur-
vival function, FT, X (t , x), the time evolution is described by
x

¶t ò
FT , X (t , x) = - ( l + zT , X )(t , x) ) FT , X (t , x) + l FT , X (t , u) f X (t ) (q - u)du
0
(14.43)

where f X(t)(x) is the density function for the distribution on the transition in
state over time.
Special Topics 327

In Chapter 13, we treated state transitions as a gamma process. This is


often an appropriate choice. If the state transition process is to be Gaussian,
the result is a Wiener (rather than gamma) process and this model may be
representative of some devices, particularly those that experience healing or
attenuation of damage.
In any case, the general model of Equation 14.43 permits very many spe-
cific cases. The following is one of the more manageable possibilities. If the
hazard function is assumed to be an additive function of independent time
and usage processes so that

zT , X (t , x) = zT (t) + hx (14.44)

then the Laplace transform of Equation 14.43 with respect to the state vari-
able X is

æ l s + ht t
ö
*
FT , X (t , s) = exp ç
çh
è
ò s
ò
f X (t ) (v)dv - lt - zT (v)dv ÷
*
0
÷
ø
(14.45)

Then, taking the state transition process to be a gamma process, the col-
lapsed time-dependent life distribution is obtained as the Laplace transform
evaluated at s = 0:

æ l ht t
ö
FT (t) = FT*, X (t , 0) = exp ç
çh
è 0
ò ò
f X*(t ) (v)dv - lt - zT (v)dv ÷
0
÷
ø
(14.46)

Nearly all other reasonable choices of constituents of this model are more
difficult to evaluate, but most that are practically interesting can be managed
numerically.

14.5.2 Bivariate Models


There are actually two classes of bivariate models that are not collapsed into
a single-dimensional form. These are the models based on a stochastic func-
tional relationship between the two variables and the models that represent
the variables as correlated.

14.5.2.1 Stochastic Functions


The definition of failure models on the basis of stochastic functions relating
age and use starts with the specification of how the stochastic feature of the
328 Reliability Engineering: Probabilistic Models and Maintenance Methods

longevity variables is portrayed. Assume that the time and use to failure are
related by the function u = B(t) and that the stochastic nature of this relation-
ship can be represented by treating one or more of the parameters of B(t) as
random variables.
The interpretation of the function B(t) is that across a population of devices,
the accumulated usage by age t is B(t). This is equivalent to saying that the
mileage traveled by 2-year-old cars is B(t = 2). Of course, we impose a prob-
ability distribution on B(t) to model its dispersion. To illustrate this construc-
tion, we consider the four example forms here:

1. B(t) = αt + β
2. B(t) = αt2 + βt + γ
3. B(t) = αtn
4. B(t) = (eαt − 1)/(eαt + β)

where the fourth form is the logistic model analyzed by Eliashberg et  al.
[102]. In each case, we introduce randomness into the function by treating
the parameter α as a random variable having distribution πα(·). This imposes
random variation on the extent of use experienced by any age. Consequently,
both age and usage at failure are random variables. Certainly, there may be
many other functional forms that may be defined and that may be repre-
sentative of observed behavior. The analytical methods described here may
apply to those other forms as well.
The use of the distribution πα(·) to construct the marginal probability distribu-
tion on usage is accomplished using well-known methods. In general, as indi-
cated by Eliashberg et al. [102], we may construct the marginal density on use as

da(u)
fU ( u) = f g ( t ) ( u) = pa (a(u)) (14.47)
du

For example, with B(t) = αt + β, solving for α yields

u -b da(u) 1
a(u) = and =
t du t

so

1 æ u -b ö
fU ( u) = p a ç ÷ (14.48)
t è t ø

Once the marginal distribution on usage is obtained, we construct the joint


failure density using the well-known conditioning relation

f T,U(t, u) = f T|U(t)f U(u) (14.49)


Special Topics 329

and the conditional density f T|U(t) is obtained using the well-known


relationship between a density and its hazard function:

ìï t üï
ò
fT|U ( t ) = zT|U ( t ) = zT|g (t ) ( t ) exp í- zT|g (t ) ( x)dx ý
ïî 0 ïþ
(14.50)

We use this form specifically so that we can focus upon the hazard function
in the definition of the failure model. We assume that the bivariate device
failure hazard function may be stated as

zT,U(t, u) = λ(t) + η(u) (14.51)

so that the definitions of the functions λ(t), η(u), and B(t) determine the hazard
and ultimately the bivariate life distribution. Here, we treat the simplest con-
ceivable form of the hazard function. More intricate, and perhaps realistic,
forms should be studied. Thus, we assume that λ(t) and η(u) are simple linear
functions. That is, we use λ(t) = λt and η(u) = ηu.
Under this modeling format, the bivariate life distribution corresponding
to the form (1) earlier is obtained by constructing

ææ u -b ö ö
zT|U ( x ) = zT|g (t ) ( x ) = lx + h ç ç ÷ x + b÷ (14.52)
èè t ø ø

and applying Equations 14.49 and 14.50 to obtain

l t + hu ì h(u + b) l ü æ u -b ö
fT ,U (t , u) = exp í- t - t 2 ý pa ç ÷ (14.53)
t î 2 2 þ è t ø

The same analytical approach yields

l t + hu ì h(u + 2g ü ì 3l + hb 2 ü æ u - bt - g ö
fT ,U (t , u) = exp í- t ý exp í- t ý pa ç ÷ (14.54)
t2 î 3 þ î 6 þ è t2 ø

l t + hu ì hu l ü æuö
fT ,U (t , u) = exp í- t - t 2 ý pa ç n ÷ (14.55)
tn î n + 1 2 þ èt ø

and

ì b+1 ü
ï h ïï
(1 + b)(lt + hu) æ 1 1 + b u ö ïh l 2 b 1
fT ,U (t , u) = pa ç ln ÷ exp í t - t - 1 + bu ln tý
t(1 - u)(1 + bu) è t 1- u ø ïb 2 ln (1 - u) ï
ïî 1- u ïþ
(14.56)
330 Reliability Engineering: Probabilistic Models and Maintenance Methods

for cases (2), (3), and (4), respectively. Note that in the case of (4), the definition
of the use function limits the variable U to [0, 1] so the functions may require
rescaling for some applications. Also, in cases (1) and (2), the forms of the
functions B(t) allow a nonzero minimum value for usage.
Finally, observe further that all four models are well defined and require
only the specification of the density πα(·) to be complete bivariate life distribu-
tions. On the other hand, for each of them, it is unlikely that a closed-form
expression can be obtained for the marginal distribution on age at failure.
In any case, the previous examples illustrate the construction of a model in
which the usage variable is a stochastic function of the age variable.

14.5.2.2 Correlation Models


In many applications, the two life variables appear to be correlated rather
than functionally dependent. The definition of models that can represent cor-
relations in the life variables appears initially to be somewhat simpler than
the construction earlier. We simply choose a bivariate distribution. However,
it is important that the distribution be capable of accurately representing
equipment behavior and in particular that it have marginal distributions
that are consistent with experience. Three example models that appear to
hold promise for representing bivariate failure processes in which the two
variables are correlated are described here. Once again our choices are not
the only conceivable ones. Observed equipment behavior may suggest the
use of a different distribution and the ideas developed here should apply to
those cases as well as the ones treated here.
The first of the candidate models is the generalization of the bivariate
exponential model defined by Baggs and Nagagaja [103]. In this model, the
reliability function is

(
F T ,U (t , u) = e -( lt + hu ) 1 + r(1 - e - lt )(1 - e - hu ) ) (14.57)

so the corresponding density function is

(
fT ,U (t , u) = lhe -( lt + hu ) 1 + r{1 - 2e - lt - 2e - hu + 4e -( lt + hu ) } ) (14.58)

and the marginal densities are the constituent exponentials regardless of the
value of ρ.
A second model that is an obvious choice is the bivariate normal. The den-
sity function for this model is well known so it is not restated here. As is also
well known, the marginal densities are normal.
One final model that we wish to consider here is the one stated by Hunter
[104] in a queueing context but also consistent with reliability interpretations:

lh æ 2 r ö ì l t + hu ü
fT ,U (t , u) = I0 ç lhtu ÷ exp í- ý (14.59)
ç
1-r è 1-r ÷ î 1-r þ
ø
Special Topics 331

where In(·) is the modified Bessel function of order n. The marginal densities
for this model are not obvious.
To summarize the construction to this point, we have defined examples of
two classes of models that might be used to portray the dispersion in equip-
ment longevity as defined using two variables. We next examine the general
probability concepts commonly associated with reliability analysis and use
some of the suggested example forms to illustrate the concepts discussed.
Subsequently, we construct reliability and maintenance models using the
probability concepts and some of the example model forms.

14.5.2.3 Probability Analysis


Consider a device for which longevity is defined in terms of two variables,
say, age and usage. Assuming the distribution function on longevity has
been constructed and is bivariate, there are some subtle and sometimes dif-
ficult questions and concepts that arise in the application of the bivariate
model to reliability. First, we interpret the cumulative failure probability
FT,U(t, u) as the probability that failure occurs by time t and usage u. That is,

FT ,U (t , u) = Pr[T £ t , U £ u] (14.60)

One may say that this probability corresponds to the proportion of the popu-
lation of devices that have longevity vector values at failure that do not exceed
(t, u) in either vector component. We emphasize this definition because of the
fact that for a bivariate distribution, probability is generally computed over
rectangles such as [t1 ≤ T ≤ t2, u1 ≤ U ≤ u2]. Consequently, for any specific lon-
gevity vector, (t, u), the range of age and use values implies that there are four
rectangles in the (T, U) plane for which probabilities may be meaningfully
calculated. Referring to Figure 14.2, observe that in addition to the rectangle
used in Equation 14.60, there are the rectangles Pr[T ≤ t, U ≥  u], Pr[T ≥ t,
U ≤ u], and Pr[T ≥ t, U ≥ u]. It is not obvious, but relative to the cumulative
probability FT,U(t, u), the probabilities
t ¥

Pr[T £ t , U > u] =
òò f
0 u
T ,U (s, v)dvds (14.61)

and
¥ u

Pr[T > t , U £ u] =
òò f
t 0
T ,U (s, v)dvds (14.62)

are survival probabilities. They correspond to the proportions of the popula-


tion that do not have longevity vectors inferior to (t, u) either because their
332 Reliability Engineering: Probabilistic Models and Maintenance Methods

F
Pr[T <= t, U > u]
Pr[T > t, U > u]

F
Pr[T > t, U <= u]
Pr[T < t, U < u]

T
t

FIGURE 14.2
Bivariate probability distributions. (Reprinted with permission from Yang, S.C. and Nachlas,
J.A., Bivariate reliability and maintenance planning models, IEEE Trans. Reliab., 50(1), 26–35.
Copyright 2001 IEEE.)

failure ages exceed t or because their failure usages exceed u. We do not have
informative names for the probabilities represented by Equations 14.61 and
14.62 but have considered names such as marginal survival probabilities.
A further point that is rather subtle is the fact that the reliability at (t, u)
does not include the probabilities represented by Equations 14.61 and 14.62.
The reliability at longevity vector value (t, u) corresponds to the proportion
of the population for which failure age exceeds t and failure usage exceeds u.
Therefore, the reliability function corresponding to FT,U(t, u) is
¥ ¥

F T ,U (t , u) = Pr[T ³ t , U ³ u] =
òò f
t u
T ,U (s, v)dvds (14.63)

Because it does not include the probabilities represented by Equations 14.61


and 14.62, we call this the reliability rather than the survivor function. We
also note that the cumulative failure probability and the reliability no longer
sum to one.
The apparent paradox in the definitions of FT,U(t, u) and F T ,U (t , u) arises
from distinctions in point of observation. When considering the distribu-
tion, all positive-valued longevity vectors can potentially occur and across a
population of devices all do occur. Relative to the distribution, the cumula-
tive probability at (t, u) does not include devices for which either T exceeds
Special Topics 333

t or U exceeds u. On the other hand, all copies of a device population that


have achieved a longevity of (t, u) will have longevity vectors at failure that
lie within the rectangle [t ≤ T < ∞, u ≤ U < ∞] so at (t, u) the rectangles
corresponding to the marginal survival probabilities are not accessible.
The computation of bivariate probabilities is reasonably clear. For any rect-
angle, say, [t1 ≤ T ≤ t2, u1 ≤ U ≤ u2] in the plane, the probability of observing
a failure at a point included in the rectangle is

Pr[t1 £ T £ t2 , u1 £ U £ u2 ] = FT ,U (t2 , u2 ) - FT ,U (t2 , u1 )

- FT ,U (t1 , u2 ) + FT ,U (t1 , u1 ) (14.64)

A useful special case of this expression applies to the reliability function,


which may be represented by

F T ,U (t , u) = Pr[t £ T < ¥ , u £ U < ¥] = 1 - FU (u) - FT (t) + FT ,U (t , u) (14.65)

Observe that this expression may also be used to compute cumulative


probabilities in cases in which the reliability function is easier than the dis-
tribution function to analyze.
Very often, the first question that follows the definition of a probability
model for device failure is that of the identity and behavior of the associ-
ated hazard function. For a bivariate failure distribution, a return to the first
principles yields

Pr[t £ T £ t + Dt , u £ U £ u + Du|T > t , U > u]


zT ,U (t , u) = lim
Dt ® 0
Du ® 0
DuDt

Pr[t £ T £ t + Dt , u £ U £ u + Du]
= lim
Dt ® 0
Du ® 0
DuDt Pr[T > t , U > u]

1 é F (t + Dt , u + Du) F (t + Dt , u)
= ê lim T ,U - lim T ,U
F T ,U (t , u) êë Du ®0
Dt ® 0 DuDt Dt ® 0
Du ® 0
DuDt

FT ,U (t , u + Dt) F (t , u) ù
- lim + lim T ,U ú
Dt ® 0 DuDt Dt ® 0 DuDt ú
Du ® 0 Du ® 0 û
fT ,U (t , u)
= (14.66)
F T ,U (t , u)

which is a very appealing result.


Naturally, the next question is whether or not the hazard function is
increasing. Barlow and Proschan [12] define multivariate increasing failure
334 Reliability Engineering: Probabilistic Models and Maintenance Methods

rate (MIFR), and the application of that definition to the bivariate life distri-
butions is that a distribution is MIFR if and only if

F T ,U (t + s, u + v)
(14.67)
F T ,U (t , u)

is nonincreasing in (t, u) and the same statement applies to the marginal


distributions. That is, it must also be the case that

F T (t + s) F U (u + v )
and (14.68)
F T (t) F U ( u)

are nonincreasing in t and u, respectively. The application of these conditions


is far from direct. Most models require numerical analysis to characterize
hazard function behavior and that behavior may be rather complicated. For
example, for some choices of its parameters, the bivariate exponential dis-
tribution of Equation 14.41 displays a hazard that is increasing in usage and
decreasing in time.
It should be noted that there is an important and subtle difference between
the construction of univariate reliability models on the basis of an assumed
hazard form and the corresponding model definition for a bivariate longev-
ity distribution. As is well known, an assumed univariate hazard function,
zT(t), directly implies the life distribution function by the relation

FT (t) = 1 - e ò0
- zT ( x ) dx

which follows as the solution of the differential equation

dFT (t)
fT (t) = = zT (t) ( 1 - FT (t) )
dt

The corresponding bivariate (partial) differential equation is

¶ 2 FT ,U (t , u)
fT ,U (t , u) = = zT ,U (t , u)FT ,U (t , u) (14.69)
¶t¶u

The solution for this equation has not yet been found so one may not
build a bivariate reliability model from the hazard as is done in the uni-
variate case.
A further question is how one computes the mean and other descriptive
measures for a bivariate longevity distribution. The answer is that as with
univariate distributions, one begins by constructing the moment-generating
function (or Laplace transform) and then obtains moments as successive
Special Topics 335

derivatives of the moment-generating function. The moment-generating


function for the bivariate failure distribution is

y T ,U (q1 , q2 ) = E[e q1t + q2 u ] (14.70)

and its construction is not always simple.


Also of critical importance to bivariate failure modeling is the question
how convolutions are constructed and how bivariate renewal functions are
defined and interpreted. Fortunately, the convolution theorem has been shown
to extend directly to the bivariate case. On the other hand, the definition and
interpretation of the associated counting process and the bivariate renewal
function is less obvious and may depend upon the application. This is a topic
that is treated later and for which considerable further study is needed.

14.5.2.4 Failure and Renewal Models


Consider next the renewal model based on the operation of a sequence of
identical copies of a device to failure with instantaneous replacement. For a
bivariate longevity distribution, the sequence of device lives forms a bivari-
ate renewal process.
Hunter [104] defines a bivariate renewal process using the vector (Xn, Yn)
where it is assumed that the (Tn, Un) is independently and identically distrib-
uted with the common joint distribution FT,U(t, u) and

æ n n
ö
(X n , Yn ) = ç
ç
è
å åU ÷÷ø
i =1
Ti ,
i =1
i (14.71)

Of course, by convention, (X0, Y0) = (0, 0). The renewal vector is illustrated
in Figure 14.3. Referring to the figure, we note that the number of renewals
at any coordinate point in the plane, say, (t, u), corresponds to the largest
value of n for which the nth renewal occurs on or before time t and usage u.
Therefore, it follows that the number of renewals by (t, u) is given by

NT ,U (t , u) = sup{n : n ³ 0, X n £ t , Yn £ u} (14.72)

Under this definition, N T,U(t, u) is a bivariate renewal counting process for


which the distribution is obtained using the usual “time–frequency duality”
relation. Hunter confirms that this leads to

P[NT ,U (t, u) = n] = FT( n,U) (t , u) - FT( n,U+1) (t , u) (14.73)

Observe that this construction implies that

P[NT ,U (t , u) = 0] = 1 - FT ,U (t , u) (14.74)
336 Reliability Engineering: Probabilistic Models and Maintenance Methods

Y3

U3

Y2

U2

Y1
U1

T1 T2 T3
X1 X2 X3

FIGURE 14.3
A bivariate renewal process. (Reprinted with permission from Yang, S.C. and Nachlas, J.A.,
Bivariate reliability and maintenance planning models, IEEE Trans. Reliab., 50(1), 26–35.
Copyright 2001 IEEE.)

which is intuitively appealing. In addition, using Equation 14.73 to obtain the


renewal function yields
¥
MF (t , u) = E[NT ,U (t , u)] = åF
n =1
( n)
(t , u) (14.75)

which corresponds to the univariate form. As in the case of the univariate func-
tion, the recursive statement of Equation 14.75 is the key integral renewal equation:
t u

MF (t , u) = FT ,U (t , u) +
ò ò M (t - x, u - y) dF
0 0
F T ,U (x, y) (14.76)

and this function is the basis for analysis of the renewal process. As a final
point, note that assuming FT,U(t, u) is absolutely continuous implies that the
renewal density exists and is
¥
¶2
mF (t , u) =
¶t¶u
MF (t , u) = åf
n =1
( n)
T ,U (t , u) = fT ,U (t , u)

t u

+
ò ò m (t - x, u - y) f
0 0
F T ,U ( x , y )dxdy (14.77)
Special Topics 337

The Laplace (or Laplace–Stieltjes) transform serves as the usual method of


analysis for the renewal models. For the bivariate case, the Laplace transform
of the density associated with the distribution function, FT,U(t, u), is
¥ ¥

fT*,U (s, v) = E[e - st - vu ] =


òòe - st - vu
fT ,U (t , u)dudt (14.78)
0 0

and
1 *
FT*,U (s, v) = f T ,U ( s , v ) (14.79)
sv

Using these forms in the analysis of the key renewal equation leads to

FT*,U (s, v) MF* (s, v)


MF* (s, v) = and FT*,U (s, v) = (14.80)
1 - fT*,U (s, v) 1 + svMF* (s, v)

and

fT*,U (s, v) m*F (s, v)


mF* (s, v) = and fT*,U (s, v) = (14.81)
1 - fT*,U (s, v) 1 + mF* (s, v)

which correspond to the univariate forms.


Application of these results to specific bivariate distributions is never sim-
ple. In the case of the stochastic functions, the algebraic complexity of the
models implies that the Laplace transforms cannot be obtained in closed
form but must be computed numerically. Even the numerical construction
of the transforms is taxing. The procedure is to select a grid size over the
(s, v) plane and to perform the integration of Equation 14.78 numerically. The
results may then be stored in an array for further use or fit (approximately)
using a second-order regression model.
For the bivariate distributions selected to represent correlation between
the variables, expressions for the Laplace transforms can be obtained. For the
bivariate exponential distribution shown in Equations 14.57 and 14.58,
the renewal function has

lh éë( v + 2h) ( s + 2l ) + svr ùû


MF* (s, v) = (14.82)
sv ëé( v + 2h) ( s + 2l ) ( s ( v + h) + vl ) - svhlr ùû

Similarly, the bivariate exponential distribution of Equation 14.59 has


-1
é æ s v sv öù
MF* (s, v) = ê sv ç + + (1 - r) ÷ ú (14.83)
ë è l h lh øû
338 Reliability Engineering: Probabilistic Models and Maintenance Methods

and the Laplace transform of the renewal function for the bivariate normal
distribution is

exp é -smt - vm u + 12 s2st2 + 2rsvstsu + v 2su2 ù


( )
MF* (s, v) = ë û (14.84)
é é (
sv ê1 - exp -smt - vm u + 2 s st + 2rsvstsu + v su ú
1 2 2 2 2 ùù
)
ë ë ûû

Unfortunately, the inverse transforms for these expressions cannot be con-


structed in closed form. For Equation 14.84, working with the transform of
the associated renewal density permits its inversion [104] to

lh æ 2 lhtu ö ì l t + hu ü
m f (t , u) = I0 ç ÷÷ exp í- ý (14.85)
1 - r çè 1 - r ø î 1-r þ

It is rare that the transform is invertible in closed form. For most of the bivar-
iate models, closed-form transform inversions are not available.
Next, consider the cases in which repair is no longer instantaneous.
Assume instead that repair effort extends over a bivariate interval that is
random. Assume further that the distribution function on the magnitude of
the repair effort is denoted by Gr(t, u) and is of the same family as the fail-
ure distribution. There is no justification for the assumption that the failure
and repair distributions are of the same family. The reason for using this
assumption here is the fact that it sometimes makes the analysis easier. In
addition, we observe that for cases in which it is appropriate, the repair dis-
tribution can easily be collapsed into a univariate form.
To construct renewal models for the failure with noninstantaneous repair
cases, we obtain the convolution on the operating and repair intervals and
then renewal function based on the convolution. As availability is the quan-
tity of interest for these cases, the renewal function is used to obtain the
availability function. That is, considering a longevity cycle to be the sum of
an operating interval and a repair interval, the cycle has distribution Hr(t,u)
that is obtained as the inverse transform of

H r* (s, v) = FT*,U (s, v)Gr* (s, v) (14.86)

Then using the same reasoning as for univariate models, we observe that a
device is available at coordinate point (t, u) if it experiences no failures prior
to that point, or else it is renewed at some earlier coordinate point and expe-
riences no further failures before (t, u). That is,
t u

A(t , u) = FT ,U (t , u) +
òòF
0 0
T ,U (t - x , u - y ) dMH ( x , y ) (14.87)
Special Topics 339

where MH(t, u) is obtained using Equation 14.61. Also comparable to the uni-
variate case is the fact that the availability function is ultimately obtained as
the inverse transform of

(
A* (s, v) = FT*,U (s, v) 1 + m*H (s, v) = ) FT*,U (s, v)
1 - h* ( s, v)
(14.88)

As indicated previously, inversion of the transform is quite a challenge.


Taking the bivariate exponential density of Equation 14.58 as the failure
density and assuming the repair time density comes from the same family
mean that the repair time density is

(
gT ,U (t , u) = l r hr e -( lr t + hr u ) 1 + rr {1 - 2e - lr t - 2e - hr u + 4e -( lr t + hr u ) } ) (14.89)

and

ll r hhr ( (n + 2hr )(s + 2l r ) + snrr ) ( (n + 2h)(s + 2l) + snr )


hr* (s, v) = (14.90)
((n + hr )(s + l r )(n + 2hr )(s + 2l r )) ((n + h)(s + l)(n + 2h)(s + 2l))

so that

a1
A * ( s, v) = (14.91)
a2 - a3

with

a1 = ( (s + 2l)(n + 2h) + lhr ) ( (s + l)(n + h)(s + 2l)(n + 2h) )

a2 = éë( s + l r ) ( v + hr ) ( s + 2l r ) ( v + 2hr ) ( s + l ) ( v + h) ( s + 2l ) ( v + 2h) ùû

and

a3 = ll r hhr ( (s + 2l r )(n + 2hr )snrr ) ( (s + 2l)(n + 2h) + snr )

The corresponding construction for Hunter’s bivariate exponential distribu-


tion of Equation 14.59 yields

æ lh öæ l r hr ö
hr* (s, v) = ç ÷ç ÷ (14.92)
è ( s + l )( n + h) - s nr øè ( s + l r )( n + h r ) - s nr r ø
340 Reliability Engineering: Probabilistic Models and Maintenance Methods

Then, using the probability identity of Equation 14.66 simplifies slightly the
construction of

1 s+ l -1 v + h-1 lh
- - +
sv s ( s + l ) v ( v + h ) ( s + l ) ( v + h) - svr
A * ( s, v) = (14.93)
æ lh öæ l1h1 ö
1-ç
ç ( s + l ) ( v + h) - svr ÷÷ çç ( s + l r ) ( v + hr ) - svrr ÷÷
è øè ø

Finally, for the bivariate normal models, we again use Equation 14.65 with
the result that

1 - vfT* (s) - sfU* (v) + sfT*,U (s, v)


A * ( s, v) = (14.94)
(
sv 1 - h* (s, v) )
where the joint distribution has

ë (
fT*,U (s, v) = exp é -smt - vm u + 12 s2st2 + 2rsvstsu + v 2su2 ù
û ) (14.95)

the marginal distributions have

ë ( )
fT* (s) = exp é -mt s + 12 s2st2 ù and
û (
fU* (v) = exp é -m uv + 12 v 2su2 ù
ë û ) (14.96)

and

hr* (s, v) = exp éë -s ( mt + mtr ) - v ( mu + mur )

( ( ) (
+ 12 s2 st2 + st2r + 2svr ( stsu + str sur ) + v 2 su2 + su2r ùú
û )) (14.97)

Each of the availability function expressions obtained is far too intricate to


allow for closed-form inversion. Numerical evaluation is difficult but is the
only practical approach.
To close this discussion, note that Yang and Nachlas [101] provide the
construction of a model of age replacement for the bivariate renewal models.
Comparable models for other preventive maintenance cases have not yet
been defined but they can be constructed. Thus, the extension of the tradi-
tional univariate maintenance planning models for application to the sys-
tems for which longevity is bivariate is appropriate and manageable.
Special Topics 341

Exercises
14.1 A system has been monitored over 4000 hours and the following
sequence of failure times has been observed. Apply the Laplace test to
these data to decide whether or not it is homogeneous. Then estimate
the parameter(s) for the failure process model.

466.0825 1323.455 2414.17


647.5001 1481.233 2664.088
862.0056 1580.979 2733.686
1026.451 2065.465 3724.515
1121.035 2372.361 3835.696

14.2 A system has been monitored over 6000 hours and the following
sequence of failure times has been observed. Apply the Laplace test to
these data to decide whether or not it is homogeneous. Then estimate
the parameter(s) for the failure process model.

42.84 611.66 2950.01


96.39 705.67 3737.79
228.48 900.83 4811.17
274.89 1587.46 5500.18
436.73 2470.44

14.3 A set of 3 identical systems has been monitored over T1 = 5000, T2 = 5000,
and T3 = 4000 hours and the sequence of observed failure times is shown
in the following table.

n1 = 14 n 2 = 12 n 3 = 14
272.5649 350.5998 314.9211
683.209 662.1392 594.3912
1056.697 1145.495 924.9023
1520.834 1463.103 1180.4
2170.058 1806.611 1461.752
2495.64 2353.335 1604.111
2774.353 2759.114 1909.145
3113.331 3320.393 2239.001
3471.008 3714.615 2579.923
3742.009 4052.621 2888.052
3930.438 4285.35 3120.781
4109.861 4707.776 3338.468
4337.591 3635.817
4854.601 3850.161
342 Reliability Engineering: Probabilistic Models and Maintenance Methods

The failure time process for these systems is nonhomogeneous.


Compute estimates of the parameters of the failure process model for
these systems.
14.4 Consider a product for which the life distribution is gamma having
β = 3.0 and λ = 0.0025 where time is measured in days. Compute the
optimal warranty policy times for this product under each of the full
replacement and pro rata strategies. Use δ = 0.01, r = 0.1, a = 0.1, s = 0.2,
and γ = 0.4 with u1 = 4.0, u2 = 25.0. Comment on the differences in the
optimal policy times.
14.5 Suppose a full replacement warranty period is to be set so that no more
than 4% of a population of product units are expected to fail during the
warranty interval. If the devices have a Weibull life distribution with
θ = 4000 days and β = 3.20, what warranty period (in operating days)
should be used?
14.6 Compute the optimal full replacement warranty period for the prod-
uct of Problem 14.2 for the case in which δ = 0.025, γ = 0.5, u1 = 2.5 and
u2 = 40.0.
14.7 Compute the optimal pro rata warranty period for the product of the
preceding problem for the case in which s = 0.2, a = 0.1, and η = 0.8.
14.8 For the Baggs and Nagagaja bivariate reliability function
(Equation 14.57), identify the marginal densities on age and use. Then,
using Equation 14.65 and ρ = 0.4, l = 0.0028 and η = 0.0016, compute
F T ,U (250, 400) and then plot the reliability function.
Appendix A: Numerical Approximations

There are many situations in which the analysis of a reliability problem


involves numerical calculations. For some of those calculations, precise
numerical computation is not possible but numerical approximations are
available. Several useful numerical approximations are provided here. Most
of these are based on descriptions provided by Abramowitz and Stegun [59].
If desired, alternate or additional algorithms may be found there.

A.1 Normal Distribution Function


For the general normal distribution with mean μ and variance σ2, one often
denotes the distribution by N(μ, σ2) and the standard normal distribution is
denoted by N(0, 1). In general,
x x
1
ò f (u)du = ò
2 2
FX ( x) = N ( x , m, s ) =
2
X e -( u -m ) /2s du (A.1)
-¥ -¥
2ps 2

and the standard normal transformation is


x -m
z= (A.2)
s
It is a convention that the standard normal distribution function be denoted
by Φ(z) and that the density function be denoted by ϕ(z). Under the standard
transformation,
æ x -m ö
FX ( x) = F ç z = ÷ (A.3)
è s ø
and Abramowitz and Stegun [59] provide the approximation
1
( )
-16
F( z) » 1 - 1 + d1z + d2 z 2 + d3 z 3 + d4 z 4 + d5 z 5 + d6 z 6 + e( z) (A.4)
2
where
d1 = 0.04986(73)
d2 = 0.02114(10)
d3 = 0.00327(76)
d4 = 0.00003(80)
d5 = 0.00004(89)
d6 = 0.00000(54)

343
344 Appendix A: Numerical Approximations

and the calculation yields a result with |ε(z)| < 1.5 × 10−7. This algorithm was used
to generate the table of cumulative normal probabilities that is in Appendix D.
There is also an algorithm for computing quantiles in the tail of the distri-
bution. Specifically, for γ in the range 0 ≤ γ ≤ 0.05, Abramowitz and Stegun
[59] indicate that

c0 + c1t + c2t 2
z1- g » t - + e( g ) (A.5)
1 + e1t + e2t 2 + e3t 3

where
c0 = 2.515517
c1 = 0.802853
c2 = 0.010328
e1 = 1.432788
e2 = 0.189269
e3 = 0.001308

and
æ 1ö
t = ln ç 2 ÷ (A.6)
èg ø

This approximation yields results for which |ε(γ)| < 4.5 × 10−4.

A.2 Gamma Function


When working with the gamma distribution, the Weibull distribution, and
some other probability functions, one sometimes needs to evaluate a gamma
function. In general, the gamma function is defined as the definite integral:
¥

ò
G( x) = t x -1e -t dt
0
(A.7)

In the cases in which x is an integer, we know that

G( x) = ( x - 1)! (A.8)

For cases in which x is not an integer, Abramowitz and Stegun [59] provide
the approximation

G( x + 1) = " x !" » 1 + a1x + a2 x 2 + a3 x 3 + a4 x 4 + a5 x 5 + e( x) (A.9)


Appendix A: Numerical Approximations 345

for values of x such that 0 ≤ x ≤ 1. In this expression, we use


a1 = −0.5748646
a2 = 0.9512363
a3 = −0.6998588
a4 = 0.4245549
a5 = −0.1010678
and the resulting error is bounded by |ε(x)| < 5 × 10−5.
As an example, observe that we compute Γ(5.64) as

Γ(5.64) = (4.64)(3.64)(2.64)(1.64) Γ(1.64) = (73.1252)(0.8986) = 65.7122

A.3 Psi (Digamma) Function


The psi (or digamma) function is defined as the derivative of the logarithm of
the gamma function. It is therefore the ratio of the derivative of the gamma
function and the gamma function:

d d/dxG( x)
y( x) = ln G( x) = (A.10)
dx G( x)

Tables of the psi function are available [59]. On the other hand, approximate
numerical evaluation of the psi function is reasonably straightforward except
for the fact that precise values may require that a large number of terms be
included in the calculation.
The general series expansion for the psi function is

¥
y( x + 1) = - g + å (-1) z(n)x
n=2
n n -1
|x|< 1 (A.11)

where
¥
z(n) = åk
k =1
-n
(A.12)

is the Riemann zeta function and γ is Euler’s constant for which the value is
γ = 0.57721(56649). For values of the argument of the function that exceed 1,
we use the recursion

1
y( x + 1) = y( x) + (A.13)
x
346 Appendix A: Numerical Approximations

Combining these expressions implies that a numerical approximation for


any psi function may be computed by truncating the infinite sums. As
stated earlier, it appears that the truncation must be made after rather many
terms to obtain good precision but modern computing power makes this
manageable.

A.4 Coordinates of the Student’s t Distribution


A relatively simple method for computing coordinates of the student’s t dis-
tribution is described by Nachlas [105] using the material provided in [59].
For values of the cumulative distribution function that are greater than 0.5,
the associated coordinates, represented by tν,γ for ν degrees of freedom and
cumulative distribution function (CDF) value γ, may be computed using

g1 ( z g ) g 2 ( z g ) g 3 ( z g ) g 4 ( z g )
tn , g = zg + + + + (A.14)
n n2 n3 n4

where zγ is the standard normal variate having CDF of γ and

x3 + x
g1 ( x ) =
4
5x 5 + 16 x 3 + 3 x
g2 (x) =
96 (A.15)
3 x + 19x + 7 x - 15x
7 5 3
g3 (x) =
384
79x 9 + 776 x 7 + 1, 482x 5 - 1, 920 x 3 - 945x
g 4 ( x) =
92, 160

If necessary, the value of zγ may be computed using Equation A.5. This method
was used to compute the values included in the t table in Appendix D.
Appendix B: Numerical Evaluation
of the Weibull Renewal Functions

The Weibull distribution provides a very useful model of life lengths. The
utility of the distribution results from its tractability when treating simple
probabilities for life lengths. When exploring renewal behavior, it is less
manageable. In fact, there are no closed-form expressions for the convolu-
tions or the renewal function for the Weibull distribution. Lomnicki [65]
shows that the Weibull renewal function has an equivalent representation
as the sum of terms of a Maclaurin series and that the same is true for con-
volutions of a Weibull distribution. Specifically, suppose we represent the
Weibull distribution function as
b
FT (t) = 1 - e ( )
- t/q
(B.1)
Then, define the “Poissonian function”

æ æ t öb ö ( t/q )
Pk ç ç ÷ ÷ =
(
e ( )
- t/q
b )
b k

(B.2)
çè q ø ÷ k!
è ø
with the corresponding remainder accumulation
æ æ t öb ö ¥
æ æ t öb ö
Dk ç ç ÷ ÷ =
çè q ø ÷
è ø
å
j=k
Pj ç ç ÷ ÷
çè q ø ÷
è ø
(B.3)

Then, the Weibull renewal function is represented as the infinite sum:


¥ b
ææ t ö ö
MFT (t) = å
k =1
c(k )Dk ç ç ÷
çè q ø
è
÷
÷
ø
(B.4)

for which the coefficients, c(k), are computed as indicated by Lomnicki and
explained in the following. Observe that the corresponding representation of
the k-fold convolution of the Weibull distribution is
¥
æ æ t öb ö
FT( k ) (t) = å
j=k
f j (k )Dj ç ç ÷ ÷
çè q ø ÷
è ø
(B.5)

in which the coefficients ϕj(k) are also defined by Lomnicki and shown as
follows. Before showing the calculation of the coefficients, we note that the
renewal density and the convolution of the probability density may also be
obtained by taking the derivatives of the functions in Equations B.4 and B.5,
respectively. This is shown here.

347
348 Appendix B: Numerical Evaluation of the Weibull Renewal Functions

Taking the derivatives of the renewal and convolution distribution func-


tions, we find that both depend primarily on the derivative of the function
Dk that comprises the key challenge. That is,
¥ ¥
d d æ æ t öb ö d æ æ t öb ö
mFT (t) =
dt
MFT (t) =
dt å k =1
c(k )Dk ç ç ÷ ÷ =
çè q ø ÷
è ø
å
k =1
c(k ) Dk ç ç ÷ ÷
dt çè è q ø ÷ø
(B.6)

¥ ¥
d (k ) d æ æ t öb ö d æ æ t öb ö
fT( k ) (t) =
dt
FT (t) =
dt åj=k
f j (k )Dj ç ç ÷ ÷ =
çè q ø ÷
è ø
å
j=k
f j (k ) Dj ç ç ÷ ÷
dt çè è q ø ÷ø
(B.7)

Given the definition of the function Dk, we have

æ
( ) e( ö
b j
æ æ t öb ö d æ æ t öb ö d ç ( t/q )
b
d ææ t ö ö
¥ ¥ ¥
÷
å
d b
Dk ç ç ÷ ÷ =
dt çè è q ø ÷ø dt åj=k
Pj ç ç ÷ ÷ =
çè q ø ÷
è ø
åj=k
Pj ç ç ÷ ÷ =
dt çè è q ø ÷ø j=k
dt çç j!
- t/q )
÷
÷
è ø
¥ jb
d æ ( t/q ) -(t/q)b ö
¥ æ ( jb/t ) ( t/q ) jb ( b/t ) ( t/q )b ( t/q ) jb ö - t/q b
= å
j=k
ç
dt ç j !
è
e ÷=
÷
ø
åj=k
ç
ç
è
j!
-
j!
÷e ( )
÷
ø

b
¥ æ ( t/q ) jb ( t/q )( j +1)b ö -(t/q)b b ( t/q )kb -(t/q)b
=
t å
j=k
ç
ç ( j - 1)!
è
-
j!
÷e
÷
ø
=
t (k - 1)!
e (B.8)

Substituting this result into Equations B.6 and B.7, we obtain

b ( t/q ) -(t/q)b
¥ kb

mFT (t) =
k =1
å
c(k )
t (k - 1)!
e (B.9)

and

b ( t/q ) -(t/q)b
¥ jb

fT( k ) (t) = åj=k


f j (k )
t ( j - 1)!
e (B.10)

Thus, all of the interesting probability measures can be computed using the
coefficients defined by Lomnicki.
To obtain those coefficients, we start by computing a ratio of gamma
functions:

G(kb + 1)
g( k ) = (B.11)
G(k + 1)
Appendix B: Numerical Evaluation of the Weibull Renewal Functions 349

Then, we apply the recursive definition:

b0 (k ) = g(k )
k -1
(B.12)
b j +1(k ) = å b (i )g( k - i )
i= j
j

for which j ≥ 0 and k ≥ j + 1. We use the resulting values to obtain


k
æ s ö b j (i )
a j (k ) = å (-1)
i= j
j+i
ç ÷
è i ø g(i )
(B.13)

and then,

f j ( j) = a j ( j)
k k -1
f j (k ) = åi= j
ai (k ) - å a (k - 1)
i= j
i
(B.14)

where again k ≥ j + 1. These are the coefficients we use to compute the con-
volutions on the distribution and density functions. Finally, we obtain the
coefficients for the renewal function and the renewal density:
k
c(k ) = å f (k )
j =1
j (B.15)

To obtain a sense of the computations involved and the likely accuracy of the
finite sum of terms, consider a set of example calculations. Suppose β = 1.5
and since the scale is arbitrary, take θ = 1. Using these values, the first 16 of
the gamma function–based quantities are (Table B.1)

TABLE B.1
Values of γ(k) for β = 1.5
k γ k γ
0 1.0000 8 11880.0000
1 1.3293 9 63636.2377
2 3.0000 10 360360.0000
3 8.7238 11 2.1453 × 106
4 30.0000 12 13.3661 × 106
5 116.9534 13 86.8191 × 106
6 504.0000 14 586.0512 × 106
7 2360.9966 15 4099.8766 × 106
350 Appendix B: Numerical Evaluation of the Weibull Renewal Functions

Then, using the quantities, γ(k), we use Equation B.12 to obtain the bj(k)
(Table B.2):

TABLE B.2
Values of the Quantities bj(k) for β = 1.5
k\j 0 1 2 3 4
0 1.0000
1 1.3293 1.3293
2 3.0000 4.7672 1.7671
3 8.7238 16.6998 10.3251 2.3491
4 30.0000 62.1938 48.0981 19.0271 3.1228
5 116.9534 249.0566 214.2441 110.3305 32.3409
6 504.0000 1071.0464 961.8336 572.1867 224.2416
7 2360.9966 4926.1199 4442.0375 2857.3636 1328.0853
8 11880.0000 24121.6916 21294.4744 14200.6517 7323.0231
9 63636.2377 125198.1354 106397.8593 71453.2323 39160.3981
10 360360.0000 685940.2457 555031.7309 367598.2560 207719.7852
11 2.1453 × 106 3.9521 × 106 3.0421 × 106 1.9452 × 106 1.1080 × 106
12 13.3661 × 106 23.8614 × 106 17.2026 × 106 10.6275 × 106 5.9966 × 106
13 86.8191 × 106 150.4915 × 106 102.0784 × 106 60.0923 × 106 33.1335 × 106
14 586.0512 × 106 988.5588 × 106 631.0661 × 106 352.1645 × 106 187.7382 × 106
15 4099.8766 × 106 6745.3736 × 106 4058.6121 × 106 2140.5447 × 106 1094.3582 × 106
k\j 5 6 7 8 9
5 4.1513
6 52.3605 5.5185
7 422.3589 82.0588 7.3359
8 2814.0220 754.7551 125.6394 9.7519
9 17010.7346 5589.1903 1297.6446 189.0252 12.9636
10 97696.0360 36796.0159 10575.6191 2165.9264 280.5346
11 547237.1877 226339.1912 75373.4950 19267.6607 3531.4026
12 3.0392 × 106 1.3390 × 106 495049.8362 147871.4150 34052.6332
13 16.9205 × 106 7.7593 × 106 3.0903 × 106 1.0338 × 106 280080.8014
14 95.1867 × 106 44.6066 × 106 18.7034 × 106 6.8004 × 106 2.0780 × 106
15 544.2424 × 106 2.567160 × 106 111.2747 × 106 4.29887 × 106 14.3811 × 106
k\j 10 11 12 13 14 15
10 17.2330
11 411.8167 22.9086
12 5649.1317 599.1438 30.4533
13 58697.9832 8895.4067 865.1917 40.4828
14 515220.0330 99086.5916 13822.3045 1241.4942 53.8154
15 4.0449 × 106 924648.5878 164320.0939 21235.7913 1771.8168 71.5390
Appendix B: Numerical Evaluation of the Weibull Renewal Functions 351

Next, the analysis of Equation B.13 yields the following data (Tables B.3
and B.4):

TABLE B.3
Values of the Quantities aj(k) for β = 1.5
k\j 0 1 2 3 4 5 6 7 8
0 1.0000
1 0.0000 1.0000
2 0.0000 0.4110 0.5890
3 0.0000 0.1471 0.5836 0.2693
4 0.0000 0.0497 0.4033 0.4429 0.1041
5 0.0000 0.0163 0.2393 0.4650 0.2439 0.0355
6 0.0000 0.0052 0.1306 0.3970 0.3472 0.1091 0.0109
7 0.0000 0.0017 0.0677 0.3005 0.3881 0.1971 0.0419 0.0031
8 0.0000 0.0005 0.0339 0.2102 0.3752 0.2731 0.0921 0.0143 0.0008
9 0.0000 1.60 × 10−4 0.0165 0.1392 0.3288 0.3211 0.1525 0.0371 0.0044
10 0.0000 4.96 × 10−5 0.0079 0.0885 0.2686 0.3375 0.2114 0.0715 0.0132
11 0.0000 1.52 × 10−5 0.0037 0.0546 0.2081 0.3267 0.2588 0.1142 0.0292
12 0.0000 4.66 × 10−6 0.0017 0.0328 0.1547 0.2971 0.2892 0.1597 0.0528
13 0.0000 1.42 × 10−6 0.0008 0.0194 0.1113 0.2570 0.3011 0.2021 0.0830
14 0.0000 4.32 × 10−7 0.0004 0.0112 0.0779 0.2137 0.2963 0.2367 0.1173
15 0.0000 1.31 × 10−7 0.0002 0.0064 0.0534 0.1719 0.2785 0.2605 0.1525
k\j 9 10 11 12 13 14 15
9 0.0002
10 0.0013 4.78 × 10−5
11 0.0043 0.0003 1.07 × 10−5
12 0.0106 0.0013 8.33 × 10−5 2.28 × 10−6
13 0.0215 0.0035 0.0004 1.97 × 10−5 4.66 × 10−7
14 0.0375 0.0079 0.0011 9.14 × 10−5 4.41 × 10−6 9.18 × 10−8
15 0.0586 0.0151 0.0026 0.0003 2.23 × 10−5 9.45 × 10−7 1.74 × 10−8
352 Appendix B: Numerical Evaluation of the Weibull Renewal Functions

TABLE B.4
Values of the Quantities ϕj(k) for β = 1.5
k\j 1 2 3 4 5 6 7 8 9
0
1 1.0000
2 0.0000 0.5890
3 0.0000 0.2638 0.2693
4 0.0000 0.0974 0.2776 0.1041
5 0.0000 0.0334 0.1974 0.1753 0.0355
6 0.0000 0.0111 0.1197 0.1878 0.0845 0.0109
7 0.0000 0.0036 0.0665 0.1630 0.1220 0.0340 0.0031
8 0.0000 0.0011 0.0350 0.1252 0.1382 0.0622 0.0120 0.0008
9 0.0000 0.0003 0.0177 0.0887 0.1351 0.0870 0.0266 0.0038 0.0002
10 0.0000 0.0001 0.0087 0.0594 0.1196 0.1033 0.0444 0.0099 0.0011
11 0.0000 3.43 × 10−5 0.0042 0.0382 0.0987 0.1095 0.0620 0.0193 0.0033
12 0.0000 1.06 × 10−5 0.0020 0.0237 0.0771 0.1068 0.0764 0.0310 0.0074
13 0.0000 3.24 × 10−6 0.0009 0.0144 0.0578 0.0978 0.0860 0.0436 0.0134
14 0.0000 9.89 × 10−7 0.0004 0.0086 0.0419 0.0853 0.0901 0.0555 0.0212
15 0.0000 3.01 × 10−7 0.0002 0.0050 0.0296 0.0714 0.0891 0.0653 0.0300
k\j 10 11 12 13 14 15
10 1.07 × 10−5
11 0.0002 1.07 × 10−5
12 0.0010 7.49 × 10−5 2.28 × 10−6
13 0.0025 0.0003 1.78 × 10−5 4.66 × 10−7
14 0.0051 0.0008 7.57 × 10−5 4.04 × 10−6 9.18 × 10−8
15 0.0090 0.0018 0.0002 1.88 × 10−5 8.71 × 10−7 1.74 × 10−8

Finally, we obtain the values for the coefficients c(k) (Table B.5):

TABLE B.5
Values of c(k) for β = 1.5
k c(k) k c(k)
1 1.0000 9 0.3595
2 0.5890 10 0.3466
3 0.5331 11 0.3354
4 0.4792 12 0.3256
5 0.4417 13 0.3168
6 0.4140 14 0.3089
7 0.3923 15 0.3017
8 0.3744
Appendix C: Laplace Transform
for the Key Renewal Theorem

In Chapter 9, the key renewal theorem is presented and its Laplace transform
is stated to be
æt ö
ç
è0
ò
MF*T (s) = FT* (s) + L ç MFT (t - u) fT (u)du ÷ = FT* (s) + MF*T (s) fT* (s)
÷
ø
(C.1)

To see that this is correct, start with the general definition of the transform:
¥

ò
L ( FT (t) ) = e - st FT (t)dt = FT* (s)
0
(C.2)

Then, we apply the definition to the renewal function and the transform of
the integral term is
æt ö ¥ t

ò
L ç MFT (t - u) fT (u)du ÷ = e
ò ò
- st
MFT (t - u) fT (u)dudt
ç ÷
è0 ø 0 0

¥ t ¥
=
òò
0 0
e - s( t - u ) åF
n =1
( n)
T (t - u) e - su fT (u)dudt

¥ t ¥

ò òåe
- s( t - u ) ( n )
= T F (t - u) e - su fT (u)dudt
0 0 n =1

¥ t ¥

ò òåe
- s( t - u )
= FT( n ) (t - u) e - su fT (u)dtdu
0 0 n =1

¥ ¥
æ¥ ö
= åò ò ç e - s(t - u ) FT( n ) (t - u)dt ÷ e - su fT (u)du
ç
0 è u
÷
n =1 ø
¥ ¥ ¥ ¥

= åò
n =1 0
F * (s)e - su fT (u)du =
( n)
T å n =1
( n)
T
ò
F * (s) e - su fT (u)du
0

¥ ¥
= åF
n =1
( n)
T
* (s) fT* (s) = fT* (s)
åF
n =1
( n)
T
* (s)

= MF*T (s) fT* (s) (C.3)

353
Appendix D: Probability Tables

TABLE D.1
Standard Normal Cumulative Probabilities
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8079 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8728 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9648 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9712 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9773 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9983 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

355
356 Appendix D: Probability Tables

TABLE D.2
Student’s t Distribution
ν\γ 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995 0.999
1 0.158 0.325 0.51 0.727 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 636.6
2 0.142 0.289 0.445 0.617 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.6
3 0.137 0.277 0.424 0.584 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.92
4 0.134 0.271 0.414 0.569 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.132 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869
6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.130 0.263 0.402 0.549 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 5.408
8 0.130 0.262 0.399 0.546 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 5.041
9 0.129 0.261 0.398 0.543 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.129 0.260 0.397 0.542 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587
11 0.129 0.260 0.396 0.540 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437
12 0.128 0.259 0.395 0.539 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 4.318
13 0.128 0.259 0.394 0.538 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.128 0.258 0.393 0.537 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.128 0.258 0.393 0.536 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 4.073
16 0.128 0.258 0.392 0.535 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.128 0.257 0.392 0.534 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.127 0.257 0.392 0.534 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.127 0.257 0.391 0.533 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.127 0.257 0.391 0.533 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850
21 0.127 0.257 0.391 0.532 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.127 0.256 0.390 0.532 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.768
24 0.127 0.256 0.390 0.531 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725
26 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.127 0.256 0.389 0.531 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.689
28 0.127 0.256 0.389 0.530 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.66
30 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646
Appendix D: Probability Tables 357

TABLE D.3
Table of Chi-Square Coordinates
ν\γ 0.005 0.010 0.025 0.050 0.100 0.900 0.950 0.975 0.990 0.995
1 0.00+ 0.00+ 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.648 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169
References

1. Schlager, N. (1994). When Technology Fails. Detroit, MI: Gale Research, Inc.
2. NASA. (1986). Report of the presidential commission on the space shuttle chal-
lenger accident. NASA Report No. S677. Washington, DC: US Government
Printing Office.
3. NASA. (2003). Columbia accident investigation board report. Washington, DC:
US Government Printing Office.
4. Power System Outage Task Force, North America Electric Reliability Council.
(2004). Final report on the August 14, 2003 blackout in the US and Canada:
Causes and recommendations. Princeton, NJ: NERC.
5. Ostrower. J., Pasztor, A. (2013). Boeing tries to defuse fears about dreamliner.
The Wall Street Journal, January 9, 2013.
6. Blanchard, B. S. (1992). Logistics Engineering and Management. Englewood Cliffs,
NJ: Prentice Hall.
7. Thompkins, J. A. (1989). Winning Manufacturing. New York: McGraw Hill.
8. Shetty, Y. K., Buehler, V. M. (eds.) (1988). Competing through Productivity and
Quality. Cambridge, MA: Productivity Press.
9. Nachlas, J. A., Cassady, C. R. (1999). Preventive maintenance study: A key
component in engineering education to enhance industrial productivity and
competitiveness. European Journal of Engineering Education, 24:299–309.
10. Dougherty, Jr., E. M., Fragola, J. R. (1988). Human Reliability Analysis: A Systems
Engineering Approach with Nuclear Power Plant Applications. New York: John
Wiley & Sons.
11. Musa, J. D., Iannino, A., Okumoto, K. (1987). Software Reliability: Measurement,
Prediction, Application. New York: McGraw Hill.
12. Barlow, R. E., Proschan, F. (1975). Statistical Theory of Reliability and Life Testing:
Probability Models. New York: Holt, Reinhart, and Winston.
13. Chiang, D. T., Niu, S. (1981). Reliability of consecutive k-out-of-n:F systems.
IEEE Transactions on Reliability, 30:65–67.
14. Natvig, B. (1982). Two suggestions of how to define a multistate coherent sys-
tem. Advances in Applied Probability, 14:434–455.
15. Barlow, R. E., Wu, A. S. (1978). Coherent systems with multistate components.
Mathematics of Operations Research, 4:275–281.
16. Lambiris, M., Papastavridis, S. (1985). Exact reliability formulas for linear
and circular consecutive k-out-of-n:F related systems. IEEE Transactions on
Reliability, 34:124–126.
17. Birnbaum, Z. W. (1969). On the importance of different components in a multi-
component system. In Multivariate Analysis, P. R. Krishnaiah (ed.). San Diego,
CA: Academic, pp. 581–592.
18. Fussell, J. B. (1975). How to hand calculate system safety and reliability charac-
teristics. IEEE Transactions on Reliability, 24:169–174.
19. Majety, S. R., Dawande, M., Rajgopal, J. (1999). Optimal reliability allocation with
discrete cost-reliability data for components. Operations Research, 47:899–906.

359
360 References

20. Rice, W. F., Cassady, C. R., Wise, T. R. (1999). Simplifying the solution of redun-
dancy allocation problems. Proceedings of the Annual Reliability and Maintainability
Symposium, pp. 190–194. Piscataway, NS: Institute of Electrical and Electronics
Engineers.
21. Lloyd, D. K., Lipow, M. (1962). Reliability: Management, Methods and Mathematics.
Englewood Cliffs, NJ: Prentice Hall.
22. Wong, K. L., Lindstrom, D. L. (1988). Off the bathtub onto the rollercoaster curve.
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 356–363.
Piscataway, NS: Institute of Electrical and Electronics Engineers.
23. Barlow, R. E., Marshall, A. W., Proschan, F. (1963). Properties of probabil-
ity distribution with monotone hazard rate. Annals of Mathematical Statistics,
34:375–389.
24. Weibull, W. (1951). A statistical distribution function of wide applicability.
Journal of Applied Mechanics, 18:293–297.
25. Gumbel, E. J. (1958). Statistics of Extremes. New York: Columbia University Press.
26. Fisher, R. A., Tippet, L. H. C. (1928). Limiting forms of the frequency distribu-
tion of the largest or smallest member of a sample. Proceedings of the Cambridge
Philosophical Society, XXIV:180–190.
27. Birnbaum, Z. W. (1958). A statistical model for life length of materials. Journal of
the American Statistical Association, 53:151–160.
28. Birnbaum, Z. W., Saunders, S. C. (1969). A new family of life distributions.
Journal of Applied Probability. 6:319–327.
29. Jordan, C. W. (1967). Life Contingencies. Chicago, IL: Society of Actuaries.
30. Hjorth, U. (1980). A reliability distribution with increasing, decreasing and
bathtub shaped failure rates. Technometrics, 22:99–107.
31. Nachlas, J. A., Beqari, E. (2016). Toward a unified failure theory with implica-
tions for condition based maintenance. Proceedings of the Annual Reliability and
Maintainability Symposium, pp. 374–377. Piscataway, NS: Institute of Electrical
and Electronics Engineers.
32. Kapur, K. C., Lamberson, L. R. (1977). Reliability in Engineering Design. New York:
John Wiley & Sons.
33. Gertsbakh, I. H., Kordonskiy, K. H. (1969). Models of Failure. Berlin, Germany:
Springer-Verlag.
34. Karlin, S. (1964). Total positivity, absorption probabilities and applications.
Transactions of the American Mathematics Society, III:33–107.
35. Nachlas, J. A. (1988). A stochastic model of component failure. Proceedings of the
Sixth International Conference on Reliability and Maintainability, pp. 487–491. Paris,
France: Centre National d’Etudes Spatiales.
36. Miner, M. A. (1945). Cumulative damage in fatigue. Journal of Applied Mechanics,
12:A159–A164.
37. Arrhenius, S. Z. (1989). Physical Chemistry, 4:120.
38. Krausz, A. S., Eyring, H. B. (1975). Deformation Kinetics. New York. John Wiley &
Sons.
39. Stevenson, J. L., Nachlas, J. A. (1990). Microelectronics reliability predictions
derived from component defect densities. Proceedings of the Annual Reliability and
Maintainability Symposium, pp. 366–371. Piscataway, NS: Institute of Electrical
and Electronics Engineers.
40. Kumar, D., Klefsjo, B. (1994). Proportional hazards model: A review. Reliability
Engineering and System Safety, 44:177–188.
References 361

41. Cox, D. R., Miller, H. D. (1965). The Theory of Stochastic Processes. London, U.K.:
Chapman & Hall.
42. Kharoufeh, J. P., Cox, S. M. (2005). Stochastic models for degradation based reli-
ability. IIE Transactions, 37(6):533–542.
43. Esary, J. D., Marshall, A. W., Proschan, F. (1973). Shock models and wear pro-
cesses. Annals of Probability, 1:627–649.
44. Lemoine, A. J., Wenocur, M. L. (1985). On failure modeling. Naval Research
Logistics Quarterly, 32:497–508.
45. Cox, D. R., Oakes, D. (1984). Analysis of Survival Data. New York: Chapman & Hall.
46. Nachlas, J. A. (1986). A general model for age acceleration during thermal
cycling. Quality and Reliability Engineering International, 2:3–6.
47. Wong, K. (1989). Off the bathtub curve onto the roller coaster curve. Quality and
Reliability International, 5(1):29–36.
48. Vassiliou, P., Mettas, A. (2004). Understanding accelerated life testing analy-
sis. Topics in Reliability: Tutorial Notes. Annual Reliability and Maintainability
Symposium. Piscataway, NS: Institute of Electrical and Electronics Engineers.
49. Cassady, C. R., Nachlas, J. A. (2010). A general framework for modeling equip-
ment aging. Proceedings of the Annual Reliability and Maintainability Symposium,
pp. 374–377. Piscataway, NS: Institute of Electrical and Electronics Engineers.
50. Kaplan, E. I., Meier, P. (1958). Nonparametric estimation from incomplete obser-
vations. Journal of the American Statistical Association, 38(3):457–481.
51. Greenwood, M. (1926). The natural duration of cancer. Reports on Public Health
and Medical Subjects. Her Majesty’s Stationary Office, London, U.K., pp. 1–26.
52. Meeker, W. Q., Escobar, L. A. (1998). Statistical Methods for Reliability Data.
New York: John Wiley & Sons.
53. Military Handbook 781. (1996). Reliability Test Methods, Plans and Environments
for Engineering, Development Qualification and Production. Washington, DC: U.S.
Department of Defense.
54. Barlow, R. E., Campo, R. (1975). Total time on test processes and applications to
failure data analysis. In Reliability and Fault Tree Analysis, R. E. Harlow, J. Fussel,
N. D. Singpurwalla, eds. Philadelphia, PA: SIAM, pp. 451–481.
55. Klefsjo, B. (1982). On aging properties and total time on test transforms.
Scandinavian Journal of Statistics, 9:37–41.
56. Nelson, W. (1972). Theory and applications of hazard plotting for censored fail-
ure data. Technometrics, 14:945–966.
57. Arrhenius, S. A. (1889). Über die Dissociationswärme und den Einfluss
der Temperatur auf den Dissociationsgrad der Elektrolyte. Zeitschrift für
Physikalische Chemie, 4:96–116.
58. Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data, 2nd edn.
Hoboken, NJ: John Wiley & Sons.
59. Abramowitz, M., Stegun, I. A. (1965). Handbook of Mathematical Functions.
New York: Dover Publications.
60. Nachlas, J. A., Kumar, A. (1993). Reliability estimation using doubly censored
field data. IEEE Transactions on Reliability, 42:268–279.
61. Cox, D. R. (1962). Renewal Theory. London, U.K.: Methuen.
62. Nelson, W. (1980). Accelerated life testing: Step-stress models and data analy-
sis. IEEE Transactions on Reliability, 29:103–108.
63. Nelson, W. (1990). Accelerated Testing: Statistical Models, Test Plans and Data
Analysis. New York: John Wiley & Sons.
362 References

64. Feller, W. (1966). An Introduction to Probability Theory and Its Applications. II.
New York: John Wiley & Sons.
65. Lomnicki, Z. A. (1966). A note on the Weibull renewal process. Biometrika, 53:375–381.
66. Barlow, R. E., Hunter, L. C. (1960). Optimum preventive maintenance policies.
Operations Research, 8:90–100.
67. Brown, M., Proschan, F. (1983). Imperfect repair. Journal of Applied Probability,
20:851–859.
68. Block, H. W., Borges, W. S., Savits, T. H. (1985). Age dependent minimal repair.
Journal of Applied Probability, 22:370–385.
69. Kijima, M. (1989). Some results for repairable systems with general repair.
Journal of Applied Probability, 26:89–102.
70. Kijima, M., Morimura, H., Suzuki, Y. (1988). Periodical replacement problem
without assuming minimal repair. European Journal of Operations Research,
37:194–203.
71. Wang, H., Pham, H. (1996). Optimal maintenance policies for several imperfect
repair models. International Journal of Systems Science, 27:543–549.
72. Wang, H., Pham, H. (1996). A quasi renewal process and its applications in
imperfect maintenance. International Journal of Systems Science, 27:1055–1062.
73. Lam, Y. (1988). Geometric process and replacement problem. Acta Mathematica
Applicata Sinica, 4:366–377.
74. Lam, Y. (1988). A note on the optimal replacement problem. Advances in Applied
Probability, 20:479–482.
75. Finkelstein, M. S. (1993). A scale model of general repair. Microelectronics
Reliability, 33:41–44.
76. Cassady, C. R., Nachlas, J. A. (1994). The frequency distribution of availability.
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 278–282.
Piscataway, NS: Institute of Electrical and Electronics Engineers.
77. Iyer, S. (1992). Availability results for imperfect repair. Sankhaya: The Indian
Journal of Statistics, 54(B):249–256.
78. Rehmert, I. J., Nachlas, J. A. (2009). Availability analysis for the quasi-renewal
process. IEEE Transactions on Man, Systems and Cybernetics, Part A, 39(1):272–280.
79. Birolini, A. (1985). On the Use of Stochastic Processes in Modeling Reliability
Problems. Lecture Notes in Economics and Mathematical Systems, 252. Berlin,
Germany: Springer-Verlag.
80. Ross, S. M. (1988). A First Course in Probability. New York: Macmillan.
81. Murdock, W. P., Nachlas, J. A. (2007). Availability under age replacement with
distinct service time distributions. Proceedings of the Annual Reliability and
Maintainability Symposium, pp. 333–338. Piscataway, NS: Institute of Electrical
and Electronics Engineers.
82. Degbotse, A., Nachlas, J. A. (2003). Use of nested renewals to model avail-
ability under opportunistic maintenance policies. Proceedings of the Annual
Reliability and Maintainability Symposium, pp. 344–350. Piscataway, NS: Institute
of Electrical and Electronics Engineers.
83. Nakagawa, T. (1986). Periodic and sequential preventive maintenance policies.
Journal of Applied Probability, 23:536–542.
84. Block, H. W., Borges, W. S., Savits, T. H. (1986). Preventive maintenance policies.
In Reliability And Quality Control, A. P. Basu, ed. Amsterdam, the Netherlands:
Elsevier Science Publishers, pp. 101–106.
References 363

85. Intiyot, B. (2007). Availability analysis for the quasi-renewal process with an age
dependent preventive maintenance policy. Doctoral Dissertation, Department
of Industrial & Systems Engineering, Virginia Tech, Blacksburg, VA.
86. Makis, V., Jardine, A. K. S. (1993). A note on optimal replacement policy under
general repair. European Journal of Operations Research, 69:75–82.
87. van Noortwijk, J. M., Cooke, R., Kok, M. (1995). A Bayesian failure model
based on isotropic deterioration. European Journal of Operations Research,
82:270–282.
88. Grall, A., Dieulle, L., Berenguer, C., Roussignol, M. (2002). Continuous time pre-
dictive maintenance scheduling for a deteriorating system. IEEE Transactions on
Reliability, 51:141–150.
89. Chhikara, R. S., Folks, J. L. (1989). The Inverse Gaussian Distribution: Theory,
Methodology and Applications. New York: Marcel Dekker.
90. Doksum, K. A., Hoyland, A. (1992). Models for variable-stress accelerated life
testing experiments based on Weiner processes and the inverse Gaussian dis-
tribution. Technometrics, 34:74–82.
91. Lu, C. J., Meeker, W. Q. (1993). Using degradation measures to estimate a time to
failure distribution. Technometrics, 35:161–174.
92. Lu, H., Kolarik, W. J., Lu, S. S. (2001). Real time performance reliability predic-
tion. IEEE Transactions on Reliability, 50:353–357.
93. Whitmore, G. A., Crowder, M. J., Lawless, J. F. (1998). Failure inference from a marker
process based on a bivariate Weiner model. Lifetime Data Analysis, 4:229–251.
94. Ascher, H., Feingold, H. (1984). Repairable Systems Reliability: Modeling, Inference,
Misconceptions and Their Causes. New York: Marcel Dekker.
95. Rigdon, S. E., Basu, A. P. (2000). Statistical Methods for the Reliability of Repairable
Systems. New York: John Wiley & Sons.
96. Duane, J. T. (1964). Learning curve approach to reliability monitoring. IEEE
Transactions on Aerospace, 2:563–566.
97. Crow, L. H. (1984). Methods for assessing reliability growth potential.
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 484–489.
Piscataway, NS: Institute of Electrical and Electronics Engineers.
98. Military Handbook 189C. (2011). Reliability Growth Management. Washington,
DC: U. S. Department of Defense.
99. Benton, A. W., Crow, L. H. (1989). Integrated reliability growth testing.
Proceedings of the Annual Reliability and Maintainability Symposium, pp. 160–166.
Piscataway, NS: Institute of Electrical and Electronics Engineers.
100. Singpurwalla, N. D., Wilson, S. P. (1998). Failure models indexed by two scales.
Advances in Applied Probability, 30:1058–1072.
101. Yang, S. C., Nachlas, J. A. (2001). Bivariate reliability and maintenance planning
models. IEEE Transactions on Reliability, 50:26–35.
102. Eliashberg, J., Singpurwalla, N. D., Wilson, S. P. (1997). Calculating the reserve
for a time and usage indexed warranty. Management Science, 43:966–975.
103. Baggs, G. E., Nagagaja, H. N. (1996). Reliability properties of order statistics from
bivariate exponential distributions. Communications in Statistics, 12:611–631.
104. Hunter, J. (1974). Renewal theory in two dimensions: Basic results. Advances in
Applied Probability, 6:376–391.
105. Nachlas, J. A. (1992). Convenient construction of t-tests for quality evaluation.
Quality and Reliability Engineering International, 7:1–4.
Index

A Army material systems analysis activity


(AMSAA) model, 321
Age acceleration, 97, 166
Arrhenius model
accelerated life testing, 85
acceleration equation, 166
density functions, 86–87
age acceleration, 87–88
distribution functions, 86–87
chemical conversion reactions, 71
for electronic devices
hazard functions, 72
activation energy function, 89
reaction rate, 71–72
Arrhenius model, 87–88
Weibull life distribution, 72
effectiveness, 86
Availability analysis, 278
Eyring model, 89
availability measures
Nachlas model, 88
availability function, 222
power law model, 89
average availability, 220–221
representative thermal cycle, 88
limiting availability, 220
for mechanical devices
limiting average availability, 221
cumulative damage model, 90
point availability, 220
cumulative distribution
exponential case
function, 92
exponential life distribution, 223
deterioration process, 90
exponential repair time
instantaneous rate of aging, 91
distribution, 223
killing function, 92
limiting availability, 225
proportional hazards models, 91
limiting value, 224
rate of actuation, 90
Markov models
stress–strength models, 91
availability function, 243
stress testing, 90
birth–death process, 244
step stress strategies, 92–93
Chapman–Kolmogorov forward
stress screening, 85–86
differential equations, 240, 242
timescale compression, 86
expected first passage time to
Age replacement policy, 289
system failure, 244
age, 258, 262, 273
failure intensity, 244
availability function, 255–256
limiting availability, 243–245
availability measures, 253
parallel configuration, 241
convenient notation, 248–249
state transition diagram, 241
cost per cycle, 251
system availability, 242
expected cost function,
two-state transition diagram,
252–253
239–240
modified policy, 282
nonrenewal cases
operation, 248
cost models, 232–233
preventive vs. corrective
under imperfect repair,
maintenance, 253
233–235
renewal function, 254–255
quasi-renewal model,
renewal intervals, 253–255
235–240
repair time distribution, 254
numerical case, 225–227
Weibull life distribution, 255–256

365
366 Index

system-level availability Blackwell’s theorem, 181


availability bounds, 229 Block replacement policy
availability function, 227 availability function, 256–257
average availability, 230 convenient notation, 248–249
average system uptime per cycle, 231 delayed renewal process, 256
average time down per cycle, 231 device replacement, 255
component replacement rates, 231 Kijima model, 282
individual failure and repair operation, 248
times, 230 policy time, 250–251
limiting availabilities, 228, 230 with quasi-renewals, cost rate, 277
limiting average system renewal function, 256
availability, 232 total cost per unit time, 250
point availabilities, 228 Weibull life distribution, 257
series system, 228
system status, 227
C
Catastrophic equipment failures, 1
B
Chapman–Kolmogorov backward
Backward recurrence time, 176, 189, 266 differential equation, 74
Birnbaum–Saunders distribution, 52, 70–71 Chapman–Kolmogorov forward
Bivariate reliability differential equations, 240, 242
collapsible models, 326–327 Chi-square coordinates, 357
correlation models, 330–331 Coherent systems, 7, 32
device life, 326 Competing risk models
probability analysis crude life distribution, 81
bivariate (partial) differential device reliability function, 81
equation, 334 hazard function, 83
cumulative failure probability, joint survivor function, 80
331–332 life length of device, 80
failure and renewal models, marginal (net) survival function, 81
335–340 survival probabilities, 82
hazard function, 333 Conditional probability methods,
life distribution function, 334 degradation process
longevity vectors, 331–332 bivariate normal distribution, 300
marginal distributions, 334 conditional density, 300, 302
MIFR, 333–334 estimating equations, 302
moment-generating function, failure and survival probabilities, 301
334–335 failure time, 301
reliability function, 332 censored, 303
stochastic functions simulated, 303
analytical approach, 329 joint density, 301
bivariate device failure hazard logarithm of likelihood function, 302
function, 329 marker variable, 301–303
bivariate life distribution, 329–330 residual life density, 304
conditional density, 329 Wiener process, 300
definition of failure models, 327 Condition-based maintenance, see
marginal density, 328 Predictive maintenance
marginal distribution, 330 Constant failure rate (CFR)
probability distribution, 328 distribution, 44, 54
Index 367

Crow’s model, 320–321 survival function, 74


Cumulative distribution function system state change, 74
(CDF), 346 Weibull hazard function, 75
Cut vector, 15 Digamma function, see Psi function
Duane’s model, 320–321
D
E
Decreasing failure rate (DFR), 43–44,
183–184, 201–202 Electronic failure models
Defect model, 72–73 Arrhenius model, 71–72
Degradation processes, 287 defect model, 72–73
killing function, 75 Eyring model, 72
mean value and diffusion power law model, 72
functions, 75 Equivalent infinite series expansion, 186
mechanical and chemical, 76 Equivalent structures
observable algebraic effort, 27
density function, 294 computed values, 28
disadvantage, 297 cut vector, 15
distribution function, 295 inequality, 29
estimation equations, 295 lower bound, 28, 30
failure threshold, 294, 296 minimax bounds, 30–31
likelihood function, 295 minimum cut set, 15–17
regression model, 297 minimum paths and minimum
residual life length, 296 cuts, 26, 30
residual reliability vs. time, 297 minimum path set, 13
unobservable minimum path vector, 13
conditional probability methods, path vector, 12
300–304 reliability values, 26–27
marker variable, 297 system reliability function, 27
time series methods, 298–299 upper bounds, 28–29
Dependent components Wheatstone bridge
bivariate exponential survivor minimum cut equivalent
function, 325 structure, 16–17
joint survival probability, 325 minimum path equivalent
mutual interactions, 323 structure, 14–15
shared load system, 324 reliability block diagram, 12
shock models, 324–325 system status values, 12–13
Development process, 319 Erlang distribution, see Gamma
Diffusion process models distribution
degradation process, 294, 296–297 Estimates, reliability
discretized approximations, 75 based on order statistics, 98–99
Fokker–Planck equation, 74 and confidence intervals
gamma process, 73 beta distribution, 101
killing function, 92 binomial distribution, 100
killing process, 75 density function, 101
life distribution models, 76 life distribution, 102
Markov process, 74 number of survivors, 100
Poisson process, 73–74 point estimates, 99–101
state boundary conditions, 76–78 tolerance bounds, 107–109
368 Index

Kaplan–Meier estimates electronic failure models


continuous monitoring, Arrhenius model, 71–72
102–105 defect model, 72–73
periodic monitoring, 105–107 Eyring model, 72
Nelson cumulative hazard power law model, 72
estimation method, 122–123 excessive stress level, 164
Exponential distribution homogeneous, 311
bivariate, 334, 337, 339 mechanical failure models
conditional survival probability, 45 shock and cumulative damage,
confidence intervals, 139–141 64–71
constant hazard function, 45 stress–strength interference, 62–66
density function, 45 nonhomogeneous, 308, 320
life distribution, 46 principal models, 61
life lengths of components, 130 proportional hazards models,
likelihood function, 145–146 78–80
method of moments with censored Failure/survival probability, 98
data, 161 Fisher information matrix, 147–148, 150,
population mean, 136 153, 156, 160
renewal process, 177, 179–180 Fokker–Planck equation, 74
Weibull distribution, 159 Forward recurrence time, 176, 189
Eyring model, 72, 89 Full opportunistic age replacement
policy, 271
Full replacement warranties
F
expected profit function, 316–317
Failure and renewal models, bivariate expected profit model, 315–316
reliability fixed duration, 317
bivariate exponential distribution, 337 hazard function identity, 316
bivariate normal distribution, 338 key feature, 314
bivariate normal models, 340 optimal policy time, 317
Hunter’s bivariate exponential product generation cost, 315
distribution, 339 vs. pro rata warranties, 317–318
inverse transform, 338–339
joint distribution, 335, 340
G
key integral renewal equation,
336–337 Gamma distribution
Laplace transform, 337 advantage and disadvantage, 52
marginal distributions, 340 Cox’s result, 163
noninstantaneous repair cases, 338 cumulative damage model, 68
number of renewals, 335 cumulative intensity function,
renewal density, 336, 338 197–198
renewal function, 336–337 density function, 52, 199
renewal vector, 335–336 gamma function, 344–345
repair time density, 339 method of maximum likelihood,
time–frequency duality relation, 335 152–153, 155
Failure processes method of moments, 138,
bivariate, 330 142–143
competing risk models, 80–83 renewal process, 177
diffusion process models, system deterioration, 288
73–78 Gompertz distribution, 53
Index 369

H operating intervals, 206–207


random variable, 205, 209
Hessian matrix, 147, 160
residual life distribution, 283
Homogeneous Poisson process (HPP),
revised recursion, 209
198–199, 309
successive mean residual life
HPP, see Homogeneous Poisson process
lengths, 209
time and frequency domains, 205
I transition probability function, 207
virtual age stochastic process, 205
IFR, see Increasing failure rate
IFRA, see Increasing failure rate on
average distribution L
Increasing, decreasing, bathtub (IDB) Lognormal distribution, 51–52
distribution, 53 Lomnicki’s method, 186
Increasing failure rate (IFR), 43,
183–184, 250
Increasing failure rate on average M
(IFRA) distribution, 54 Magnuson–Moss Act, 314
definition, 44 Makeham distribution, 52–53
imperfect repair models, 201–202 Markov models
shock and cumulative damage availability function, 243
models, 67, 69 birth–death process, 244
Inverse power law model, 89 Chapman–Kolmogorov forward
differential equations, 240, 242
expected first passage time to system
K
failure, 244
Kaplan–Meier estimates failure intensity, 244
continuous monitoring, 102–105 limiting availability, 243–245
periodic monitoring, 105–107 parallel configuration, 241
Key integral renewal equation, 336–337 state transition diagram, 241
Key renewal equation, 337 system availability, 242
Key renewal theorem, 178–179, 188, two-state transition diagram,
219, 282 239–240
availability function, 222 Maximum likelihood method
Laplace transform, 353 binomial probability, 143–144
Kijima models confidence interval for quantile, 155
block replacement policy, 282 with data censoring, 159–160
cost per replacement cycle, 282 density function, 152, 156
density function, 206, 209 derivative expressions, 150
expectations on measures, 205 exponential distribution, 145
generalized renewal density, 282 extreme value distribution, 150–151
generalized renewal function, 283 Fisher information matrix, 147–148,
hazard function, 281 150, 153
Kijima I model, 203–204 gamma distributions, 157
Kijima II model, 204, 209 gamma function, 152
mean residual life under minimal Hessian matrix, 147
repair, 208, 210 joint distribution, 144
minimal repair as upper bounds, life test data, 156–157
208, 210 marginal distributions, 145
370 Index

maximum likelihood estimation for mean, 141


equation, 144 Nelson’s method, 140
normal distribution, 157–159 normal distribution, 140–141
partial derivatives, 152 for reliability, 140
1–α probability confidence intervals, sampling distribution,
149, 157 139–140, 142
psi function, 154 for scale parameter, 142
variance and covariance values, 150 for standard deviation, 142
variance–covariance matrix, for θ, 139
147–148, 154 for Weibull parameters, 140
variance term, 149 distribution moments, 135
Weibull distribution, 146–147, 157 estimation expressions
Mechanical failure models coefficient of variation, 137–138
shock and cumulative damage disadvantage, 138–139
models exponential distribution, 136
Birnbaum–Saunders distribution, gamma distributed life test data,
70–71 138–139
conceptual duality, 67 gamma distribution, 138
description, 64–65 normal distribution, 137
failure threshold, 68 numerical search values, 137
fatigue failure, Miner’s rule, 69 variance, 136
IFRA distribution, 67 Weibull distribution, 136, 138
life distribution model, 66 Miner’s rule, fatigue failure, 69
life length distribution, 66 Minimum cut set, 15–17
reliability function, 67–68 Minimum path set, 13
state-dependent arrival rate, 68 Minimum path vector, 13
survival function, 69 Modules of systems, 17–18
stress–strength interference model Multivariate increasing failure rate
basic concept, 63 (MIFR), 333–334
device reliability, 62
interference at time values, 64–65
N
random dispersion, 62
reliability function, 64–66 Nelson cumulative hazard estimation
time evolution of strength method, 122–123
distribution, 63–64 New better than used in expectation
Weibull distribution, 64 (NBUE), 182–184
Method of moments New better than used (NBU) life
with censored data distributions, 182, 184, 249
Cox’s result, 162–163 New worse than used in expectation
exponential distribution, 161 (NWUE) life distribution,
gamma distribution, 163 182–184
sample variance, 162–163 New worse than used (NWU) life
surrogate sample mean, 161 distribution, 182
Weibull distribution, 162 Nonhomogeneous Poisson process
confidence intervals (NHPP)
chi-square variate, 139 construction, 194
distribution parameter, 140 cumulative intensity function,
exponential distribution, 140 197–198
gamma distribution, 142–143 failure intensity function, 309, 320
Index 371

frequency probabilities, 197–198 distribution on time between


gamma distribution, 197–200 perfect repairs, 201
hazard function, 194–195 equivalent age measures, 202
imperfect PM models, 276 life and renewal time
mapping, 199 distributions, 201
minimal repair, 200 survivor function, 200
Weibull distribution, 199 Weibull distribution, 201–202
Nonparametric statistical methods minimal repair models
advantage, 95 cumulative intensity function,
data set notation and censoring, 197–199
96–97 density function, 199
disadvantage, 95 expectation of random variables,
estimates 195–196
based on order statistics, 98–99 failure intensity function, 194–195
and confidence intervals, 99–102 gamma distribution, 197–200
Kaplan–Meier estimates, hazard function, 197
102–107 inversion of cumulative rate
Nelson cumulative hazard function, 200
estimation method, 122–123 NHPP model, 194–195, 197–199
tolerance bounds nonstationary counting process, 194
construction, 107 Weibull distribution, 199
cumulative probability of nonstationary process, 193
failure, 108 Nonrenewal models, PM
hazard function, 109 cost rate function, 275
reliability value, 108–109 imperfect PM models
reliable life, 107 age-dependent imperfect repair
true cumulative failure model, 276
probability, 107 cost per unit time, 275–276
TTT transforms minimal repair cost, 276–277
cumulative failure probability, nonhomogeneous Poisson
109–111 process, 276
definition, 110 Kijima model
properties, 110–111 cost per replacement cycle, 282
scaled total time on test generalized renewal density, 282
transform, 111–122 generalized renewal function, 283
Nonrenewal and instantaneous repair hazard function, 281
equivalent age models residual life distribution, 283
device life distribution, 203 post maintenance failure
Kijima models, 203–210 distribution, 274
quasi-renewal process, 210–214 post PM equipment, 275
imperfect repair models quasi-renewal process
age-dependent imperfect repair, average availability, 278
202–203 cost rate model, 277
age-independent perfect repair distribution on kth operating
probabilities, 202 interval, 277
conditional intensity of downtime-based availability
occurrence of perfect function, 280–281
repair, 200 expected maintenance cost per
cumulative intensity function, 202 unit time, 278
372 Index

life distribution, 279–280 Opportunistic age replacement policies


quasi-renewal function, 277, 280 analysis
repair time distribution, 279 availability functions, 273–274
service time distribution, life and repair time
279–280 distributions, 272
uptime-based availability limiting availability, 271–273
function, 280 nested renewal, 272
time-and calendar-based PM renewal function, 272
policies, 274 replacement times, 273
Normal distribution Weibull life distributions, 273
bivariate, 300, 338 density function, 259–260
confidence intervals, 106, 140 failure model
maximum likelihood, 155 associated density function, 263
reliability values, 158 components failure
sampling distribution, probability, 262
141–142, 149 limiting availability, 265
cumulative normal probabilities, 344 residual life distribution, 263
density function, 49, 343 system availability, 264
distribution quantiles, 158 total operating time
hazard function, 50–51 distribution, 264
life length model, 51 full opportunistic age replacement
method of moments, 137 policy, 271
parameters, 49 life and replacement time
quasi-renewal process, 211–213 distributions, 259
second partial derivatives, 157 nested renewal process
standard normal distribution, 100, major interval, 258, 262
308, 343 minor interval, 259–261
standard normal probabilities, 49 operation, 258
Numerical approximations opportunistic failure replacement
gamma distribution, 344–345 policy, 265–268
normal distribution function, partial opportunistic age
343–344 replacement policy, 268–271
psi (digamma) function, system-level availability, 262
345–346 time-dependent availability
student’s t distribution function, 257
coordinates, 346 Opportunistic failure replacement
policy, 265–268
O
P
Observable degradation processes
density function, 294 Parametric statistical methods
disadvantage, 297 graphical methods
distribution function, 295 algebraic expression for slope, 130
estimation equations, 295 cumulative hazard function and
failure threshold, 294, 296 reliability function, 129
likelihood function, 295 linear regression, 130
regression model, 297 mean order statistic-based
residual life length, 296 estimation equation, 131
residual reliability vs. time, 297 numerical values, 133
Index 373

ordered failure data, 130–131 raw failure times, 166


regression solution, 133 three-level step stress
Weibull distribution, 131, 134 regimen, 164
Weibull failure data, 132, 134–135 Weibull distribution, 165–166
maximum likelihood method Partial opportunistic age replacement
binomial probability, 143–144 policy, 268–271
confidence interval for Path vector, 12
quantile, 155 PM, see Preventive maintenance
with data censoring, 159–160 Poissonian function, 347
density function, 152, 156 Poisson process
derivative expressions, 150 diffusion process models, 73–74
exponential distribution, 145 failure intensity, 180
extreme value distribution, 150–151 HPP, 198–199, 309
Fisher information matrix, NHPP
147–148, 150, 153 construction, 194
gamma distributions, 157 cumulative intensity function,
gamma function, 152 197–198
Hessian matrix, 147 failure intensity function, 309, 320
joint distribution, 144 frequency probabilities, 197–198
life test data, 156–157 gamma distribution, 197–200
marginal distributions, 145 hazard function, 194–195
maximum likelihood estimation imperfect PM models, 276
equation, 144 mapping, 199
normal distribution, 157–159 minimal repair, 200
partial derivatives, 152 Weibull distribution, 199
1–α probability confidence point process, 177
intervals, 149, 157 shock and cumulative damage
psi function, 154 models, 67–69, 90, 324
variance and covariance Power law model, 72, 89
values, 150 Predictive maintenance
variance–covariance matrix, continuous process monitoring
147–148, 154 marker variable, 294
variance term, 149 observable degradation processes,
Weibull distribution, 146–147, 157 294–297
method of moments unobservable degradation
with censored data, 161–163 processes, 297–304
confidence intervals, 139–143 inspection scheduling, 289–290
distribution moments, 135 instrumentation selection, 287
estimation expressions, 136–139 policy analysis
parametric reliability estimation average first cycle length, 293
methods, 129 cost function, 290
step stress testing discrete time process, 292
Arrhenius acceleration failure threshold, 290
equation, 166 gamma distribution, 291–292
equivalent data set, 165 minimum inspection interval, 291
erroneous inference, 164 next inspection time, 291
extrapolated failure time, 167 replacement threshold, 290
failure times, 165 stationary distribution, 292–293
overstress risk, 164 system deterioration, 288–289
374 Index

Preventive maintenance (PM), 287 Pro rata warranties


advantages, 2–3 automobile components, 314
availability, 247 characteristics, 314
definition, 247 discount factor, 319
nonrenewal models expected profit, 319
cost rate function, 275 vs. full replacement warranties,
imperfect PM models, 275–277 317–318
Kijima model, 281–283 repurchase price, 318
post maintenance failure repurchase probability, 318
distribution, 274 residual value of failed unit, 318
post PM equipment, 275 Psi function
quasi-renewal process, 277–281 definition, 345
time-and calendar-based PM maximum likelihood, 152–154
policies, 274 numerical approximation, 346
replacement policies series expansion, 345
age replacement, 248, 251–255, 289
block replacement, 248, 250,
Q
255–257
convenient notation, 248–249 Quasi-renewal process
cost function, 252–253 availability analysis
cost models, 248, 250–251 distributions on repair times, 235
cost per cycle, 251 downtime-based
expected cost per unit time, 252 approximation, 238
field failure reduction, 250 downtime-based form, 237
IFR life distribution, 250 exponential case values, 238, 240
NBU life distributions, 249, 251 Laplace transform, 236–237
opportunistic age replacement, operating intervals, 235
257–274 point availability, 238–239
optimal block replacement uptime-based approximation, 238
intervals, 251–252 uptime-based function, 237
random variable, 249 average availability, 278
renewal density, 251 cost rate model, 277
Weibull life distribution, 251 distribution on kth operating
resource commitment, 247 interval, 277
safety and industrial productivity, 4 downtime-based availability
Probability elements, 23 function, 280–281
Prognostics, see Predictive maintenance expected maintenance cost per unit
Proportional hazards models, 326 time, 278
advantages, 79 geometric process, 210
age acceleration, 91 intensity function, 212
base hazard function, 79 life distribution, 279–280
covariates/explanatory variables, 79 normal density functions, 211
cumulative hazard function, 79 normal distribution functions,
environment and failure process 211–212
relationship, 78 normal life distribution, 213–214
hazard rate enhancement, 86 postrepair device state, 203
reliability function, 79 quasi-renewal function, 212, 277, 280
satellite-based electronic device, repair time distribution, 279
79–80 sequence of operating intervals, 210
Index 375

service time distribution, 279–280 gamma distribution, 52


time frequency relationship, 211 Gompertz distribution, 53
uptime-based availability IDB distribution, 53
function, 280 IFRA distribution, 44
IFR distribution, 43
lognormal distribution, 51–52
R
Makeham distribution, 52–53
Reliability engineering normal distribution, 49–51
Boeing 787 aircraft, lithium-ion Weibull distribution, 46–49
battery fire, 2 life length, 39, 41
1986 Challenger accident, 4 survivor (reliability) function, 40
definition, 1 system-level models
2002 Discovery accident, 4 density function, 54, 56
economic implications, 2 distribution, 57
failure features, 3 hazard function, 55–56
human and software reliability, 3–4 IFRA, 54
in-service failure, 2 reliability function, 54
life cycle costs, 2 standby redundant system, 57
potential productivity gains, 3 Renewal and instantaneous repair
preventive maintenance, 2–4 life distributions classification
reliability analysis, 4 bounds on renewal measures,
space shuttle accidents, 4 183–186
Three Mile Island accident vs. light life lengths, 182
bulb failure, 3 NBU, 182
Reliability growth NBUE, 182–184
AMSAA model, 321 nested ordering, 183
Crow’s model, 320–321 NWU, 182, 184
cumulative mean time before NWUE, 182–184
failure, 320 operating interval length, 181
embryonic models, 320 preventive and corrective
failure intensity function, 322–323 maintenance, 173
learning curve model, 320 renewal process
TAAF process, 320, 322 backward recurrence time, 176
test data, 322 Blackwell’s theorem, 181
Reliability over time definition, 174
bathtub curve, 42 density function, 177
cumulative hazard function, 41 exponential distribution, 177, 180
density function, 40 forward recurrence time, 176
distribution function, 40 gamma distribution, 177–178
failure rate, 40 graphical representation, 175
functions of time, 39 hazard function, 179
hazard function, 40, 42 key renewal theorem, 178
infant mortality period, 42 k-fold convolution, 175
life distributions Laplace transform, 177, 180
Birnbaum–Saunders modified/delayed renewal
distribution, 52 process, 175
CFR distribution, 44 nonnegative random
DFR distribution, 43–44 variables, 174
exponential distribution, 45–46 ordinary renewal process, 175
376 Index

point process, 174–175 Repairable systems


Poisson process, 177, 180 definition, 173
probability distribution, 174, 176 nonrenewal and instantaneous
rate of failure, 179 repair
renewal density, 179–180 equivalent age models, 203–214
renewal function, 178, 180 imperfect repair models, 200–203
Weibull life distribution, 181 minimal repair models, 194–200
residual life distribution nonstationary process, 193
backward recurrence time, 189 renewal and instantaneous repair
definition, 186 life distributions classification,
forward recurrence time, 189 181–186
key renewal theorem, 188 preventive and corrective
limiting forms, 188–189 maintenance, 173
random variable value, 187 renewal processes, 174–181
renewal density, 189 residual life distribution, 186–189
Renewal density, 180, 189, 222, 336, 338 repair duration
availability function, age expected number of failures, 219
replacement policy, expected number of repair
255–256 completions, 219
Kijima model, 282 failure times, 218
major intervals, 262 lengths of repair intervals, 219
minor intervals, 261, 264 operating times, 217
renewal function, 179, 251 representative sample path,
Renewal function, 219, 272, 283, 217–218
336–337, 353 total duration, 217
age replacement policy, 254–256 statistical analysis
bivariate, 335 data from multiple identical
definition, 178 systems, 310–313
failure time, 222 data from single system, 307–310
Laplace transform, 338
life distributions, 180
S
product life distribution
function, 317 Scaled total time on test transform
quasi-renewal function, 212, 236, censored data sets, 118–122
277, 280 complete data sets, 113–118
renewal density, 179, 251 finite mean, 111
Weibull hazard functions, 112–113
equivalent infinite series scaling constant, 112
expansion, 186 Weibull distribution, 112
gamma function–based Standard normal cumulative
quantities, 349–352 probabilities, 355
infinite sum representation, 347 Statistical analysis, repairable
k-fold convolution, 347 system data
recursive definition, 349 from multiple identical systems
renewal and convolution confidence interval, 311
distribution functions, 348 failure intensity function, 311–312
renewal function and the renewal homogeneous failure times,
density coefficients, 349 311–312
values for coefficients, 352 likelihood function, 311–312
Index 377

nonhomogeneous failure Student’s t distribution


times, 313 coordinates, 346
number of degrees of probability, 356
freedom, 313 System-level availability
number of failures, 310–311 availability bounds, 229
from single system availability function, 227
cumulative failure intensity, 309 average availability, 230
distribution on order average system uptime per
statistics, 309 cycle, 231
failure intensity function, average time down per cycle, 231
307–309 component replacement rates, 231
homogeneous and individual failure and repair
nonhomogeneous failure times, 230
process, 308–309 limiting availabilities, 228, 230
joint density, 310 limiting average system
NHPP, 309 availability, 232
(1–α) probability confidence opportunistic age replacement
bound, 309 policy, 262
(1–α) probability confidence point availabilities, 228
interval, 310 series system, 228
standard normal system status, 227
distribution, 308 System structures
Status functions equivalent structures
coherent systems, 7 algebraic effort, 27
component status variables, 6 computed values, 28
component status vector, 6 cut vector, 15
functional status of device, 5 inequality, 29
and system structures lower bound, 28, 30
equivalent structures, 12–17 minimax bounds, 30–31
k-out-of-n system, 10–12 minimum cut set, 15–17
multistate components, minimum paths and minimum
18–19 cuts, 14–17, 26, 30
parallel system, 8–10 minimum path set, 13
series systems, 7–8 minimum path vector, 13
Step stress testing, parametric statistical path vector, 12
methods reliability values, 26–27
Arrhenius acceleration system reliability function, 27
equation, 166 upper bounds, 28–29
equivalent data set, 165 k-out-of-n system, 10–12,
erroneous inference, 164 25–26
extrapolated failure time, 167 modules, 31–32
failure times, 165 multistate components, 18–19
overstress risk, 164 parallel system, 8–10
raw failure times, 166 parallel systems, 25
three-level step stress regimen, 164 reliability
Weibull distribution, 165–166 allocation, 35–36
Stress–strength models definition, 5
age acceleration, 91 importance, 32–35
interference model, 62–64, 67 series systems, 7–8, 24–25
378 Index

T Magnuson–Moss Act, 314


pro rata warranties
Test-analyze-and-fix (TAAF) process,
automobile components, 314
320, 322
characteristics, 314
Test unit status, Kaplan–Meier estimates
discount factor, 319
continuous monitoring, 102–105
expected profit, 319
periodic monitoring, 105–107
vs. full replacement warranties,
Time series methods, degradation
317–318
process, 298–299
repurchase price, 318
Total time on test (TTT) transforms
repurchase probability, 318
cumulative failure probability, 109–111
residual value of failed unit, 318
definition, 110
Weibull distribution, 134
properties, 110–111
Cox’s result, 162
scaled TTT transform
extreme distribution, 49
censored data sets, 118–122
extreme value argument, 48
complete data sets, 113–118
failure times, 48
finite mean, 111
function, 46
hazard functions, 112–113
gamma distribution, 344
scaling constant, 112
hazard functions, 47–49
Weibull distribution, 112
life lengths, 184, 347
TTT transforms, see Total time on test
maximum likelihood function,
transforms
146–147, 152–153, 156,
159–160
U mean and hazard function, 112
Unobservable degradation processes method of moments, 136, 138,
conditional probability methods, 141–142, 165
300–304 Rayleigh type, 53
marker variable, 297 reliability function, interference
time series methods, 298–299 model, 64–66
renewal function, 347
three-parameter Weibull life
V
distribution, 46–47
Variance–covariance matrix, 147–148, two-parameter Weibull distribution
154, 156, 160 function, 47
Weibull renewal functions
equivalent infinite series
W
expansion, 186
Warranties gamma function–based quantities,
full replacement warranties 349–352
expected profit function, 316–317 infinite sum representation, 347
expected profit model, 315–316 k-fold convolution, 347
fixed duration, 317 recursive definition, 349
hazard function identity, 316 renewal and convolution distribution
key feature, 314 functions, 348
optimal policy time, 317 renewal function and the renewal
product generation cost, 315 density coefficients, 349
vs. pro rata warranties, 317–318 values for coefficients, 352

You might also like