u.s. Nuclear Regulatory
Commission
NUREG0492
Fault Tree Handbook
Date Published: January 1981
W. E. Vesely, U.S. Nuclear Regulatory Commission
F. F. Goldberg, U.S. Nuclear Regulatory Commission
N. H. Roberts, University of Washington
D. F. Haasl, Institute of System Sciences, Inc.
Systems and Reliability Research
Office of I\luclear Regulatory Research
U.S. Nuclear Regulatory Commission
Washington, D.C. 20555
For sale by the U.S. Government Printing Office
Superintendent of Documents, Mail Stop: SSOP, Washington, DC 204029328
NUREG0492
Available from
GPO Sales Program
Division.of Technical Information and Document Control
U.S. Nuclear Regulatory Commission
Washington, DC 20555
Printed copy price: $5.50
and
National Technical Information Service
Springfield, VA 22161
TABLE OF CONTENTS
Introduction vii
I. Basic Concepts of System Analysis 11
1. The Purpose of System Analysis 11
2. Defmition of a System 13
3. Analytical Approaches 17
4. Perils and Pitfalls 19
II. Overview ofInductive Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . .. III
1. Introduction..................................... III
2. The "Parts Count" Approach ..... . . . . . . . . . . . . . . . . . . . .. I I ~ l
3. Failure Mode and Effect Analysis (FMEA) 112
4. Failure Mode Effect and Criticality Analysis (FMECA) 114
5. Preliminary Hazard Analysis (PHA) .,. . . . . . . . . . . . . . . . . . .. 114
6. Fault ffazard Analysis (FHA) " 115
7. Double Failure Matrix (DFM) . . . . . . . . . . . . . . . . . . . . . . . . .. 115
8. Success Path Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1110
9. Conclusions ;...................... 1112
III. Fault Tree AnalysisBasic Concepts " IDl
l. Orientation...................................... IIII
2. Failure vs. Success Models 1111
3. The Undesired Event Concept . . . . . . . . . . . . . . . . . . . . . . . . .. llI3
4. Summary....................................... ID4
IV. The Basic Elements of a Fault Tree Nl
1. The Fault Tree Model " . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Nl
2. SymbologyThe Building Blocks of the Fault Tree Nl
V. Fault Tree Construction Fundamentals. . . . . . . . . . . . . . . . . . . . . . .. VI
1. Faults vs. Failures " VI
2. Fault Occurrence vs. Fault Existence " VI
3. Passive vs. Active Components V2
4. Component Fault Categories: Primary, Secondary, and Command.. V3
5. Failure Mechanism, Failure Mode, and Failure Effect V3
6. The "Immediate Cause" Concept V6
7. Basic Rules for Fault Tree Construction V8
iii
iv TABLE OF CONTENTS
VI. Probability TheoryThe Mathematical Description of Events VII
L Introduction VII
2. Random Experiments and Outcomes of Random Experiments VII
3. The Relative Frequency Defmition of Probability VI3
4. Algebraic Operations with Probabilities VI3
S. Combinatorial Analysis VI8
6. Set Theory: Application to the Mathematical Treatment
of Events VIII
7. Symbolism ' VII6
8. Additional Set Concepts VII7
9. Bayes'Theorem VII9
VII. Boolean Algebra and Application to Fault Tree Analysis VIII
1. Rules of Boolean Algebra VIII
2. Application to Fault Tree Analysis VII4
3. Shannon's Method for Expressing Boolean Functions in
Standardized Forms VIII2
4. Determining the Mitltmal Cut Sets or Minimal Path Sets of a
Fault Tree VIIIS
VIII. The Pressure Tank Example VIIII
1. System Defmition and Fault Tree Construction VIIII
2. Fault Tree Evaluation (Minimal Cut Sets) VIIII2
IX. The Three Motor Example lXI
1. System Defmition and Fault Tree Construction IXI
2. Fault Tree Evaluation (Minimal Cut Sets) '" IX7
X. Probabilistic and Statistical Analyses XI
1. Introduction XI
2. The Binomial Distribution XI
3. The Cumulative Distribution Function X7
4. The Probability Density Function X9
S. Distribution Parameters and Moments XIO
6. Limiting Forms of the Binomial: Normal, Poisson XIS
7. Application of the Poisson Distribution to System Failures
The SoCalled Exponential Distribution XI9
8. The Failure Rate Function X22
9. An Application Involving the TimetoFailure Distribution X2S
10. Statistical Estimation X26
11. Random Samples X27
12. Sampling Distributions ' X27
13. Point EstimatesGeneral X28
TABLE OF CONTENTS
14. Point EstimatesMaximum likelihood X30
15. Interval Estimators . . . . . . . . . . . . . . . . . . . . . " X35
16. Bayesian Analyses X39
XI. Fault Tree Evaluation Techniques " XII
1. Introduction..................................... XII
2. Qualitative Evaluations XI2
3. Quantitative Evaluations XI?
XII. Fault Tree Evaluation Computer Codes XIIl
1. Overview of Available Codes " XIIl
2. Computer Codes for Qualitative Analyses of Fault Trees XII2
3. Computer Codes for Quantitative Analyses of Fault Trees XII6
4. Direct Evaluation Codes XII8
5. PLMOD: A Dual Purpose Code XIIII
6. Common Cause Failure Analysis Codes XII12
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .. BIBl
INTRODUCTION
Since 1975, a short course entitled "System Safety and Reliability Analysis" has
been presented to over 200 NRC personnel and contractors. The course has been
taught jointly by David F. Haasl, Institute of System Sciences, Professor Norman H.
Roberts, University of Washington, and 'members of the Probabilistic Analysis Staff,
NRC, as part of a risk assessment training program sponsored by the Probabilistic
Analysis Staff.
This handbook has been developed not only to serve as text for the System Safety
and Reliability Course, but also to make available to others a set of otherwise
undocumented material on fault tree construction and evaluation. The publication of
this handbook is in accordance with the recommendations of the Risk Assessment
Review Group Report (NUREG/CR.{)400) in which it was stated that the fault/event
tree methodology both can and should be used more widely by the NRC. It is hoped
that this document will help to codify and systematize the fault tree approach to
systems analysis.
vii
TIME AXIS •
CHAPTER I  BASIC CONCEPTS OF SYSTEM ANALYSIS
1. The Purpose of System Analysis
The principal concern of this book is the fault tree technique, which is a
systematic method for acquiring information about a system.* The information so
gained can be used in making decisions, and therefore, before we even define system
analysis, we will undertake a brief examination of the decisionmaking process.
Decisionmaking is a very complex process, and we will highlight only certain aspects
which help to put a system analysis in proper context.
Presumably, any decision that we do make is based on our present knowledge
about the situation at hand. This knowledge comes partly from our direct experience
with the relevant situation or from related experience with similar situations. Our
knowledge may be increased by appropriate tests and proper analyses of the
resultsthat is, by experimentation. To some extent our knowledge may be based on
conjecture :Jd this will be conditioned by our degree of optimism or pessimism. For
example, w ~ may be convinced that "all is for the best in this best of all possible
worlds." Or, conversely, we may believe in Murphy's Law: "If anything can go
wrong, it will go wrong." Thus, knowledge may be obtained in several ways, but in
the vast majority of cases, it will not be possible to acquire all the relevant
information, so that it is almost never possible to eliminate all elements of
uncertainty.
It is possible to postulate an imaginary world in which no decisions are made until
all the relevant information is assembled. This is a far cry from the everyday world in
which decisions are forced on us by time, and not by the degree of completeness of
our knowledge,. We all have deadlines to meet. Furthermore, because it is generally
impossible to have all the relevant data at the time the decision must be made, we
simply cannot know all the consequences of electing to take a particular course of
action. Figure 11 provides a schematic representation of these considerations.
DIRECT
INFORMATION
I
ACQUISITION
f DECISION
INDIRECT
INFORMATION

I
ACQUISITION
I
I
TIME AT WHICH ~
DECISION MUST J
BE MADE
Figure 11. Schematic Representation of the Decisionmaking Process
*There are other methods for performing this function. Some of these are discussed briefly
in Chapter II.
11
12 FAULT TREE HANDBOOK
The existence of the time constraint on the decisionmaking process leads us to
make a distinction between good decisions and correct decisions. We can classify a
decision as good or bad whenever we have the advantage of retrospect. I make a
decision to buy 1000 shares of XYZ Corporation. Six months later, I find that the
stock has risen 20 points. My original decision can now be classified as good. If,
however, the stock has plummeted 20 points in the interim, I would have to
conclude that my original decision was bad. Nevertheless, that original decision could
very well have been correct if all the information available at the time had indicated a
rosy future for XYZ Corporation.
We are concerned here with making correct decisions. To do this we require:
(1) The identification of that information' (or those data) that would be pertinent
to the anticipated rlecision.
(2) A systematic program for the acquisition of this pertinent information.
(3) A rational assessment or analysis of the data so acquired.
There are perhaps as many definitions of system analysis as there are people
working and writing in the field. The authors of this book, after long thought, some
controversy, and considerable experience, have chosen the following defmition:
System analysis is a directed process for the orderly and timely acquisition and
investigation of specific system information pertinent to a given decision.
According to this definition, the primary function of the system analysis is the
acquisition of information and not the generation of a system model. Our emphasis
(at least initially) will be on the process (Le., the acquisition of information) and not
on the product (Le., the system model). This emphasis is necessary because, in the
absence of a directed, manageable, and disciplined process, the corresponding system
model will not usually be a very fruitful one.
We' must decide what information is relevant to a given decision before the data
gathering activity starts. What information is essential? What information is
desirable? This may appear perfectly obvious, but it is astonishing on how many
occasions this rationale is not followed. The sort of thing that may happen is
illustrated in Figure 12.
/'
" /'
...
........
Figure 12. Data Gathering Gone Awry
BASIC CONCEPTS OF SYSTEM ANALYSIS
13
The large circle represents the information that will be essential for a correct decision
to be made at some future time. Professor Jones, who is well funded and who is a
"Very Senior Person," is an expert in subarea A. He commences to investigate this
area, and his investigation leads him to some fascinating unanswered questions
indicated in the sketch by A
1
. Investigation of A
1
leads to A
2
, and so on. Notice,
however, that Professor Jones' efforts are causing him to depart more and more from
the area of essential data. Laboratory Alpha is in an excellent position to study
subarea B. These investigations lead to B
1
and B
2
and so on, and the same thing is
happening. When the time for decision arrives, all the essential information is not
available despite the fact that the efforts expended would have been able to provide
the necessary data if they had been properly directed.
The nature of the decisionmaking process is shown in Figure 13. Block A
represents certified reality. Now actual reality is pretty much of a "closed book," but
by experimentation and investigation (observations of Nature) we may slowly
construct a perception of reality. This is our system model shown as block B. Next,
this model is analyzed to produce conclusions (block C) on which our decision will
be based. So our decision is a direct outcome of our model and if our model is
grossly in error, so also will be our decision. Clearly, then, in this process, the greatest
emphasis should be placed on assuring that the system model provides as accurate a
representation of reality as possible.
z z
0 0
l I
~ ~
c.:;J c.:;J
Z
l I 0
A
CI)
B
CI)
C
(/)
w w
> >
u
z z
w
0
TRUTH MODEL
CONCLUSIONS
REALITY OUR PERCEPTION
OF REALITY
BASIS FOR
DECISION
Figure 13. Relationship Between Reality, System
Model, and Decision Process
2. Defmition of a System
We have given a defmition of the process of system analysis. Our next task is to
devise a suitable definition for the word "system." In common parlance we speak of
"the solar system," "a system of government," or a "communication system," and in
so doing we imply some sort of organization existing among various elements that
14 FAULT TREE HANDBOOK
mutually interact in ways that mayor may not be well defmed. It seems
to establish the following defInition:
A system is a deterministic entity comprising an interacting collection of
discrete elements.
From a practical this is not very useful in particular we
must specify what aspects of system performance are of immediate concern. A
system performs certain functions and the selection or" particular performance
aspects will dictate what kind of analysis is to be conducted. For instance: are we
interested in whether the system accomplishes some task successfully; are we
interested in whether the system fails in some hazardous way; or are we interested in
whether the system will prove more costly than originally anticipated? It could well
be that the correct system analyses in these three cases will be based on different
system defmtions.
The word "deterministic" in the defInition implies that the system in question be
identifiable. It is completely futile to attempt an analysis of something that cannot
be clearly identified. The poet Dante treated the Inferno as a system and divided it
up into a number of harrowing levels but, from a practical such a system
is not susceptible to identification as would be, for example, the plumbing system in
my home. Furthermore, a system must have some purposeit must do. something.
Transportation systems, circulating hot water piping systems, local school systems all
have defmite purposes and do not exist simply as figments of the imagination.
The discrete elements of the definition must also, of course, be identifiable; for
the individual submarines in the Pacific Ocean Submarine Flotilla.
Note that the discrete elements themselves may be regarded as systems. Thus, a
submarine consists of a propulsion system, a navigation system, a hull system, a
piping system, and so forth; each of in may be further broken down into
subsystems and subsubsystems, etc.
Note also from the defmition that a system is made up of partsrbr subsystems that
interact. This interaction•. which may be very complex indeed, generally insures that
a system is not simply equal to the sum of its a point that will be continually
emphasized throughout this book. if the physical nature of any part
changesfor example by failurethe system itself also changes. This is an important
point because, should design changes be made as a result of a system the
new system so resulting will have to be subjected to an analysis of its own.
for example, a fourengine aircraft. Suppose one engine fails. We now have a new
system quite different from the original one. For one thing, the landing characteris
tics have changed rather drastically. Suppose two engines fail. We now have six
different possible systems depending on which two engines are out of commission.
Perhaps the most vital decision that must be made in system defmition is how to
put external boundaries on the system. Consider the telephone sitting on the desk. Is
it sufficient to define the system simply as the instrument itself (earpiece, cord and
or should the line running to the jack in the wall be included? Should the
jack itself be included? What about the external lines to the telephone pole? What
about the junction box on the pole? What about the vast complex of lines, switching
that comprise the telephone system in the local area.. the the
BASIC CONCEPTS OF SYSTEM ANALYSIS IS
world? Clearly some external boundary to the system must be established and this
decision will have to be made partially on the basis of what aspect of system
performance is of concern. If the immediate problem is that the bell is not loud
enough to attract my attention when I am in a remote part of the house, the external
system boundary will be fairly closed in. If the problem involves static on the line,
the external boundary will be much further out.
It is also important in system definition to establish a limit of resolution. In the
telephone example, do I wish to extend my analysis to the individual components
(screws, transmitter, etc.) of which the instrument is composed? Is it necessary to
descend to the molecular level, the atomic level, the nuclear level? Here again, a
decision can be made partially on the basis of the system aspect of interest.
What we have said so far can be represented as in Figure 14. The dotted line
separates the system from the environment in which it is embedded. Thus, this
dotted line constitutes an external boundary. It is sometimes useful to divide the
system into a number of subsystems, A, B, C, etc. There may be several motivations
for doing this; most of them will be discussed in due course. Observe also that one of
the subsystems, F, has been broken down, for purposes of the analysis, into its
smallest subsubsystems. This constitutes a choice of an internal boundary for the
system. The smallest subsubsystems, a, b, c, etc., are the "discrete elements"
mentioned in the general definition of a system.
SMALLEST
SUBSUBSYSTEM
(LIMIT OF
RESOLUTION)
~ ENVIRONMENT
q
k
e
F
p
d
o
c
 ..., EXTERNAL
i,BOUNDARY
I
I
I
I
I
h
b
n
E
a
9
m
SYSTEM
C
D
B
A
TYPICAL
SUBSYSTEMS
I
I
I
I
L ~
Figure 14. System Definition: External and Internal Boundaries
The choice of the appropriate system boundaries in particular cases is a matter of
vital importance, for the choice of external boundaries determines the comprehen
siveness of the analysis, whereas the choice of a limit of resolution limits the detail of
the analysis. A few further facets of this problem will be discussed briefly now, and
will be emphasized throughout the book, especially in the applications.
The system boundaries we have discussed so far have been physical boundaries. It
is also possible, and indeed necessary in many cases, to set up temporal or timelike
boundaries on a system. Consider a man who adopts the policy of habitually trading
his present car in for a new one every two years. In this example, the system is the
16 FAULT TREE HANDBOOK
car and the system aspect of interest is the maintenance policy. It is clear that, under
the restriction of a twoyear temporal boundary, the maintenance policy adopted
will be one thing; it will be quite a different thing if the man intends to run the car
for as long as possible. In some applications, the system's physical boundaries might
actually be functions of time. An example of this would be a system whose temporal
boundaries denote different operating phases or different design modifications. After
each phase change or design modification, the physical boundaries are subject to
review and possible alteration.
The system analyst must also ask the question, "Are the chosen system
boundaries feasible and are they valid in view of the goal of the analysis?" To reach
certain conclusions about a system, it may be desirable to include a large system
"volume" within the external boundaries. This may call for an extensive,
timeconsuming analysis. If the money, time and staff available are inadequate for
this task and more efficient analysis approaches are not possible, then the external
boundaries must be "moved in" and the amount of information expected to result
from the analysis must be reduced. If I am concerned about my TV reception, it
might be desirable to include the state of the ionosphere in my analysis, but this
would surely be infeasible. I would be better advised to reduce the goals of my
analysis to a determination of the optimum orientation of mv roof antenna.
\ .'
The limit of resolution (internal boundary) can also be established' from
consideration,s of feasibility (fortunately!) and from the goal of the analysis. It is
possible to conduct a worthwhile study of the reliability of a population of TV sets
without being concerned about what is going on at the microscopic and
submicroscopic levels. If the system failure probability is to be calculated, then the
limit of resolution should cover component failures for which data are obtainable. At
any rate, once the limit of resolution has been chosen (and thus the "discrete
elements" defined), it is interactions at this level with which we are concerned; we
assume no knowledge and are not concerned about interactions taking place at lower
levels.
We now see that the external boundaries serve to delineate system outputs (effects
of the system on its environment) and system inputs (effects of the environment on
the system); the limits of resolution serve to define the "discrete elements" of the
system and to establish the basic interactions within the system.
The reader with a technical background will recognize that our definition of a
system and its boundaries is analogous to a similar process involved in classical
thermodynamics, in which an actual physical boundary or an imaginary one is used
to segregate a definite quantity of matter ("control mass") or a definite volume
("control volume"). The inputs and outputs of the system are then identified by the
amounts of energy or mass passing into or out of the bounded region. A system that
does not exchange mass with its environment is termed a "closed system" and a
closed system that does not exchange energy with its surroundings is termed an
"adiabatic system" or an "isolated system." A student who has struggled with
thermodynamic problems, particularly flow problems, will have been impressed with
the importance of establishing appropriate system boundaries before attempting a
solution of the problem.
A good deal of thought must be expended in the proper assignment of system
boundaries and limits of resolution. Optimally speaking, the system boundaries and
BASIC CONCEPTS OF SYSTEM ANALYSIS 17
limits of resolution should be defined before the analysis begins and should be
adhered to while the analysis is carried out. However, in practical situations the
boundaries or limits of resolution may need to be changed because of information
gained during the analysis. For example, it may be found that system schematics are
not available in as detailed a form as originally conceived. The system boundaries and
limits of resolution, and any modifications, must be clearly defined in any analysis,
and should be a chief section in any report that is issued.
",.. ...........
/' 10
4
.........
/
/
,.
\ BOUNDARY
I \
t J SYSTEM BOUI\IDARY
\ I
\. I
" ./'
'"
Figure 15. Effect of System Boundaries on Event Probabilities
To illustrate another facet of the bounding and resolution problem, consider
Figure 15. The solid inner circle represents our system boundary inside of which we
are considering events whose probabilities of occurrence are, say, of order 10 3 or
greater. If the system boundaries were extended (dotted circle) we would include, in
addition, events whose probabilities of occurrence were, say, of order 104 or
greater. By designing twofold redundancy into our restricted system (solid circle) we
could reduce event probabilities there to the order of (10
3
)(10
3
) =10
6
but then
the probabilities of events that we are ignoring become overriding, and we are
suffering under the delusion that our system is two orders of magnitude safer or
more reliable than it actually is. When due consideration is not devoted to matters
such as this, the naive reliability calculator will often produce such absurd numbers
as 1016 or 10 18. The low numbers simply say that the system is not going to fail
by the ways considered but instead is going to fail at a much higher probability in a
way not considered.
3. Analytical Approaches
Because we are concerned, in this volume, with certain formal processes or
models, it should come 'as no great suprise that these processes or models can be
categorized in exactly the same way as are the processes of thought employed in the
human decisionmaking process. There are two generic analytical methods by means
of which conclusions are reached in the human sphere: induction and deduction. It is
necessary at this point to discuss the respective characteristics of these approaches. .
1·8
Inductive Approaches
FAULT TREE HANDBOOK
Induction constitutes reasoning from individual cases to a general conclusion. If,
in the consideration of a certain system, we postulate a particular fault or initiating
condition and attempt to ascertain the effect of that fault or condition on system
operation, we are constructing an inductive system analysis. Thus, we might inquire
into how the loss of some specified control surface affects the flight of an airplane or
into how the elimination of some item in the budget affects the overall operation of
a school district. We might also inquire how the noninsertion of given control rods
affects a scram system's performance or how a given initiating event, such as a pipe
rupture, affects plant safety.
Many approaches to inductive system analysis have been developed and we shall
devote Chapter II to a discussion of the most important among them. Examples of
this method are: Preliminary Hazards Analysis (PHA), Failure Mode and Effect
Analysis (FMEA), Failure Mode Effect and Criticality Analysis (FMECA), Fault
Hazard Analysis (FHA), and Event Tree Analysis.
To repeatin an inductive approach, we assume some possible component
condition or initiating event and try to determine the corresponding effect on the
overall system.
Deductive Approaches
Deduction constitutes reasoning from the general to the specific. In a deductive
system analysis, we postulate that the system itself has failed in a certain way, and
we attempt to find out what modes of system/component* behavior contribute to
this failure. In common parlance we might refer to this approach as a "Sherlock
Holmesian" approach. Holmes, faced with given evidence, has the task of
reconstructing the events leading up to the crime. Indeed, all successful detectives are
experts in deductive analysis.
Typical of deductive analyses in real life are accident investigations: What chain of
events caused the sinking of an "unsinkable" ship such as the Titanic on its maiden
voyage? What failure processes, instrumental and/or human, contributed to the crash
of a commercial airliner into a mountainside?
The principal subject of this book, Fault Tree Analysis, is an example of deductive
system analysis. In this technique, some specific system state, which is generally a
failure state, is postulated, and chains of more basic faults contributing to this
undesired event are built up in a systematic way. The broad principles of Fault Tree
Analysis, as well as details relating to the applications and evaluation of Fault Trees,
are given in later chapters.
In summary, inductive methods are applied to determine what system states
(usually failed states) are possible; deductive methods are applied to determine how a
given system state (usually a failed state) can occur.
*A component can be a subsystem, a subsubsystem, and subsubsubsystem, etc. Use of the
word "component" often avoids an undesirable proliferation of "subs."
BASIC CONCEPTS OF SYSTEM ANALYSIS
4. Perils and PitfaIls
19
In the study of systems there are dangerous reefs which circumscribe the course
which the analyst must steer. Most of these problem areas assume the role of
interfaces: subsystem interfaceS and disciplinarx interfaces.
Subsystem Interfaces
Generally, a system is a complex of subsystems manufactured by several different
subcontractors or organizational elements. Each subcontractor or organizational
element takes appropriate steps to assure the quality of his own product. The trouble
is that when the subsystems are put together to form the overall system, failure
modes may appear that are not at all obvious when viewed from the standpoint of
the separate component parts.
It is important that the same fault defmitions be used in analyses which are to be
integrated, and it is important that system boundaries and limits of resolution be
clearly stated so that any potential hidden faults or inconsistencies will be identified.
The same event symbols should be used if the integrated system is to be evaluated or
quantified. Interface problems often lie in control systems and it is best not to split
any control system into "pieces." Systems which have control system interfaces (e.g.,
a spray system having an injection signal input) can be analyzed with appropriate
"places" left for the control analysis which is furnished as one entity. These
"places," or transfers, will be described later in the fault tree analysis discussions.
Disciplinary Interfaces
Difficulties frequently arise because of the differing viewpoints held by people in
different disciplines or in different areas of employment. The circuit designer regards
his black box as a thing of beauty and a joy forever, a brainchild of his own creation.
He handles it gently and reverently. The user, on the other hand, may show no such
reverence. He drops it, kicks it and swears at it with gay abandon.
One of the authors, as a mere youth, was employed as a marine draftsman. The
principal shop project was drawing up plans for minesweepers. The draftsmen were
divided into groups. There was a hull section, a wiring section, a plumbing section,
etc., each section working happily within its own technical area. When an attempt
was made to draw up a composite for one of the compartments (the gyro room, in
this case), it was found that hull features and plumbing fixtures were frequently
incompatible, that wiring and plumbing often conflicted with vent ducts, and indeed,
that the gyro room door could not be properly opened because of the placement of a
drain from the head on the upper deck. This exercise demonstrated the unmistakable
need for system integration.
Other conflicts can readily be brought to mind: the engineering supervisor who
expects quantitative results from his mathematical section, but gets only beautiful
existence proofs; the safety coordinator who encumbers the system with so many
safety devices that the reliability people have trouble getting the system to work at
all; and so on.
110 FAULT TREE HANDBOOK
As an example of an interface between operational and maintenance personnel,
consider a system that is shut down for an online maintenance check for 5 minutes
every month, and suppose that the probability of system failure due to hardware
failure is 10 6 per month. Then, on a monthly basis, the total probability that the
system will be unavailable is the sum of its unavailability due to hardware failures
and its unavailability due to the maintenance policy or:
where
106
720
1/12
=system unavailability due to hardware failure per month
= number of hours in a month
=hours required for maintenance check per month
Note that the probability of system unavailability due to our maintenance policy
(only 5 minutes downtime per month) is greater by two orders of magnitude than
the probability that the system will be down because of hardware failure. In this case
the best maintenance policy may be none at all!
The system analyst (system integrater) must be unbiased enough and knowledge
able enough to recognize interface problem areas when they occurand they will
occur.
CHAPTER II  OVERVIEW OF INDUCTIVE METHODS
1. Introduction
In the last chapter we defmed the two approaches to system analysis: inductive
and deductive. The deductive approach is Fault Tree Analysisthe main topic of this
book. This chapter}s devoted to a discussion of inductive methods.
We have felt it necessary to devote a full chapter to inductIve methods for two
reasons. First of all, these techniques provide a useful and illuminating comparison. to
Fault Tree Analysis. Second, in many systems (probably the vast majority) for which
the expenses and refinements of Fault Tree Analysis are not warranted, the inductive
methods provide a valid and systematic way to identify and correct undesirable or
hazardous conditions. For this reason, it is especially important for the fault tree
analyst to be conversant with these alternative procedures.
In everyday language the inductive techniques provide answers to the generic
question, "What happens if?" More formally, the process consists of assuming a
particular state of existence of a component or components and analyzing to
determine the effect of that condition on the system. In safety and reliability studies
the "state of existence" is a fault. This may not necessarily be true in other areas.
For systems that exhibit any degree of complexity (Le., for most systems),
attempts to identify all possible system hazards or all possible component failure
modesboth singly and in combination:become simply impossible. For this reason
the inductive techniques that we are going to discuss are generally circumscribed by
considerations of time, money and manpower. Exhaustiveness in the analysis is a
luxury that we cannot afford.
2. The "Parts Count" Approach
Probably the simplest and most conservative (Le., pessimistic) assumption we can
make about a system is that any single component failure will produce complete
system failure. Under this assumption, obtaining an upper bound on the probability
of system failure is especially straightforward. We simply list all the components
along with their estimated probabilities of failure. The individual component
probabilities are then added and this sum provides an upper bound on the probability
of system failure. This process is represented below:
Component
A
B
Failure Probability
fA
where F, the failure probability for the system, is equal to fA+f
B
+...
nl
112 FAULT TREE HANDBOOK
The failure probabilities can be failure rates, unreliabilities, or unavailabilities
depending on the particular application (these more specific terms will be covered
later).
For a particular system, the Parts Count technique can provide a very pessimistic
estimate of the system failure probability and the degree of pessimism is generally
not quantifiable. The "Parts Count" technique is conservative because if critical
components exist, they often appear redundantly, so that no single failure is actually
catastrophic for the system. Furthermore, a component can often depart from its
normal operating mode in several different ways and these failure modes will not, in
general, all have an equally deleterious effect on system operation. Nevertheless, let
us see what results the Parts Count approach yields for the simple parallel
configuration oftwo amplifiers shown in Figure III.
A
B
Figure III. A System of Two Amplifiers in Parallel
Suppose the probability of failure of amplifier A is 1 x 10
3
and the probability
of failure of amplifier Bis 1 x 10 3, i.e., fA =I x 10
3
and f
B
=I X 10
3
. Because
the parallel configuration implies that system failure would occur only if both
amplifiers fail, and assuming independence of the two amplifiers, the probability of
system failure is 1 x 10
3
x 1 x 10
3
=1 x 10
6
. By the parts count method, the
component probabilities are simply summed and hence the "parts count system
failure probability" is 1 x 10
3
+ 1 x 10
3
=2 x 10
3
which is considerably higher
than 1 x 10
6
.
The parts count method thus can give results which are conservative by orders of
magnitude if the system is redundant. When the system does have single failures, then
the parts count method can give reasonably accurate results. Because all the
components are treated as single failures (any single component failure causes system
failure), any dependencies among the failures are covered, i.e., the parts count
method covers multiple component failures due to a common cause. * Finally, the
parts count method can also be used in sensitivity studies; if the system or subsystem
failure probability does not impact or does not contribute using the parts count
method, then it will not impact or contribute using more refined analyses.
3. Failure Mode and Effect Analysis (FMEA)
Inasmuch as the Parts Count approach is very simplistic and can give very
conservative results, other more detailed techniques have been devised. We first
*Common cause failures will be discussed in subsequent chapters.
OVERVIEW OF INDUCTNE METHODS
113
discuss Failure Mode and Effect Analysis and return for a closer look at the system
shown in Figure III.
We recognize that amplifiers can fail in several ways and our first task is the
identification of these various failure modes. The two principal ones are "open" and
"short" but suppose that our analysis has also detected 28 other modes (e.g., weak
signal, intermittent ground, etc.). A short of any amplifier is one of the more critical
failure modes inasmuch as it will always cause a failure of the system. We now
describe a table containing the following information:
(1) Component designation
(2) Failure probability (failure rates or unavailabilities are some of the specific
characteristics used)
(3) Component failure modes
(4) Percent of total failures attributable to each mode
(5) Effects on overall system, classified into various categories (the two simplest
categories are "critical" and "noncritical").
The result for our redundant amplifier system might be as in Table III.
Table III. Redundant Amplifier Analysis
1 2 3 4 5
Failure % Failures Effects
Component Probability Failure Mode by Mode Critical NonCritical
A
1x10·
3
Open 90 X
Short 5 X
(5x10
5
)
Other 5 X
15x10
5
)
B
1x10
3
Open 90 X
Short 5 X
15x10
5
)
Other 5 X
15x10
5
)
Based on prior experience with this type of amplifier, we estimate that 90% of
amplifier failures can be attributed to the "open" mode, 5% of them to the "short"
mode, and the balance of 5% to the "other" modes. We know that whenever either
amplifier fails shorted, the system fails so we put X's in the "Critical" column for
these modes; "Critical" thus means that the single failure causes system failure. On
the other hand, when either amplifier fails open, there is no effect on the system
from the single failure because of the parallel configuration. What is the criticality of
the other 28 failure modes? In this example we have been conservative and we are
considering them all as critical, i.e., the occurrence of anyone causes system failure.
The numbers shown in the Critical column are obtained from multiplying the
appropriate percentage in Column 4 by 10
3
from Column 2.
Based on the table, we can now more realistically calculate the probability of
system failure from single causes, considering now only those failure modes which
are critical. Adding up the critical column, Column 5, we obtain probability of
114 FAULT TREE HANDBOOK
system failure = 5 x 10 5 + 5 x 10 5 + 5 x 10 5 + 5 x 105 = 2 x 104. This is
a less conservative result compared to 2 x 10 3 obtained from the parts count
method where the critiCal failure modes were not separated. The difference between
the two system results can be large, i.e., an order of magnitude or more, as in our
example, if the critical failure modes are a small percentage of the total failure modes
(e.g., 10% or less).
In FMEA (and its variants) we can identify, with reasonable certainty, those
component. failures having "noncritical" effects, but the number of possible
component failure modes that can realistically be considered is limited. Conservatism
dictates that unspecified failure modes and questionable effects be deemed "critical"
(as in the previous example). The objectives of the analysis are to identify single
failure modes and to quantify these modes; the analysis needs be no more elaborate
than is necessary for these objectives.
4. Failure Mode Effect and Criticality Analysis (FMECA)
Failure Mode Effect and Criticality Analysis (FMECA), is essentially similar to a
Failure Mode and Effects Analysis in which the criticality of the failure is analyzed in
greater detail, and assurances and controls are described for limiting the likelihood of
such failures. Although FMECA is not an optimal method for detecting hazards, it is
frequently used in the course of a system safety analysis. The four fundamental
facets of such an approach are (1) Fault Identification, (2) Potential Effects of the
Fault, (3) Existing or Projected Compensation and/or Control, and (4) Summary of
Findings. These four facets generally appear as column headings in an FMECA
layout. Column 1 identifies the possible hazardous condition. Column 2 explains
why this condition is a problem. Column 3 describes what has been done to
compensate for or to control the condition. Finally, Column 4 states whether the
situation is under control or whether further steps should be taken.
 .
At this point the reader should be warned of a most hazardous pitfall that is
present to a greater or lesser extent in all these inductive techniques: the potential of
mistaking form for substance. If the p r o j e ~ t becomes simply a matter of fIlling out
forms instead of conducting a proper analysis, the exercise will be completely futile.
For this reason it' might be better for the analyst not to restrict himself to any
prepared formalism. Another point: if the system is at all complex, it is foolhardy for
a single analyst to imagine that he alone can conduct a correct and comprehensive
survey of all system faults and their effects on the system. These techniques call for a
wellcoordinated team approach.
5. Preliminary Hazard Analysis (PHA)
The techniques described so far have been, for the most part, system oriented, i.e.,
the effects are faults on the system operation. The subject of this section Preliminary
Hazard Analysis (PHA), is a method for assessing the potential hazards posed, to
plant personnel and other humans, by the system.
The objectives of a PHA are to identify potential hazardous conditions inherent
within the system and to determine the significance or criticality of potential
accidents that might arise. A PHA study shol,lld be conducted as early in the product
development stage as possible. This will permit the early development of design and
procedural safety requirements for controlling these hazardous conditions, thus
eliminating costly design changes later on.
OVERVIEW OF INDUCTNE METHODS 115
The first step in a PHA is to identify potentially hazardous elements or
components within the system. This process is facilitated by engineering experience,
the exercise of engineering judgment, and the use of numerous checklists that have
been developed from time to time. The second step in a PHAis the identification of
those events that could possibly transform specific hazardous conditions into
potential accidents. Then the seriousness of these potential accidents is assessed to
determine whether preventive measures should be taken.
Various columnar formats have been developed to facilitate the PHA process.
Perhaps the simplest goes something like this:
Column (1 )Component/subsystem and hazard modes
Column (2)Possible effects
Column (3)Compensation and control
Column (4)Findings and remarks
6. Fault Hazard Analysis (FHA)
Another method, Fault Hazard Analysis (FHA), was developed as a special
purpose tool for use on projects involving many organizations, one of whom is
supposed to act as integrator. This technique is especially valuable for detecting
faults that cross organizational interfaces. It was first used to good purpose in the
Minuteman III program.
A typical FHA form uses several columns as follows:
Column (1 )Component identification
Column (2)Failure probability
Column (3)Failure modes (identify all possible modes)
Column (4)Percent failures by mode
Column (5)Effect of failure (traced up to some relevant interface)
Column (6)Identification of upstream component that could command or
initiate the fault in question
Column (7)Factors that could cause secondary failures (including threshold
levels). This column should contain a listing of those operational or environmental
variables to which the component is sensitive.
Column (8)Remarks
The FHA is generally like an FMEA or FMECA with the addition of the extra
information given in Columns 6 and 7.
As will become apparent in later chapters, Columns 6 and 7 have special
significance for the fault tree analyst.
7. Double Failure Matrix (DFM)
The previous techniques concerned themselves with the effects of single failures.
An inductive technique that also considers the effects of double failures is the
Double Failure Matrix (DFM); its use is feasible only for relatively noncompiex
systems. In order to illustrate its use, we must first discuss various ways in which
faults may be categorized. A basic categorization which is related to that given in
MIL STn 882 and modified for these discussions is as shown in Table 112.
116 FAULT TREE HANDBOOK
Table 112. Fault Categories and Corresponding System Effects
Fault Category Effect on System
I
Negligible
II
Marginal
III
Critical
IV Catastroph ic
It is desirable to give more complete definitions of the system effects:
(I) Negligibleloss of function that has no effect on system.
(II) Marginalthis fault will degrade the system to some extent but will not cause
the system to be unavailable; for example, the loss of one of two redundant pumps,
either of which can perform a required function.
(III) Criticalthis fault will completely degrade system performance; for exam
ple, the loss of a component which renders a safety system unavailable.
(IV) Catastrophicthis fault will produce severe consequences which can involve
injuries or fatalities; for example, catastrophic pressure vessel failure.
The categorization will depend on the conditions assumed to exist previously, and
the categorizations can change as the assumed conditions change. For example, if one
pump is assumed failed, then the failure of a second redundant pump is a critical
failure.
The above crude categorizations can be refined in many ways. For example, on
the NERVA project, six fault categories were defined as shown in Table 113.
Table II3. Fault Categories for NERVA Project
Fault Category Effect on System
I
Negligible
IIA
A second fault event causes a transition into Category III
(Critical)
liB
A second fault event causes a transition into Category IV
(Catastrophic)
IIC
A system safety problem whose effect depends upon the
situation (e.g., the failure of all backup onsite power sources,
which is no problem as long as primary, offsite power service
remains on)
III
A critical failure and mission must be aborted
IV A catastrophic failure
OVERVIEW OF INDUCTIVE METHODS 117
To illustrate the concept of DFM, consider the simple subsystem shown in Figure
112. In this figure, the block valves can operate only as either fully open or fully
closed, whereas the control valves are proportional valves which may be partially
open or partially closed.
BLOCK VALVE A CONTROL VALVE A
BVA
BVB
eVA
CVB
BLOCK VALVE B CONTROL VALVE B
Figure 112. Fuel System Schematic
Let us define two fault states for this system and categorize them as follows:
Fault State
No flow when needed
Flow cannot be shut off
Category
IV
III
We now proceed to consider all possible component failures and their fault
categories. For instance, if Block Valve A (BVA) is failed open we have Category IIA
because, if Control Valve A (CVA) is also failed open, we cascade into Category III.
If BVA is failed closed we have Category lIB because, if either BVB or CVB is also
failed closed, we cascade into Category IV. This type of analysis is conveniently
systematized in the Double Failure Matrix shown in Table II4. _
For illustrative purposes we have filled in the entire matrix; for a firstorder
analysis we would be concerned only with the I1;1ain diagonal terms, to wit, the single
failure states. Note that if BVA is failed open, there is only one way in which a
second failure can cascade us into Category III; namely, CVA must be failed open
too. In contrast, if BVA is failed closed, we can cascade into Category IV if either
BVB or CVB is also failed closed which is why "Two Ways" is given in Table II4.
Similar considerations apply to the single failures of CVA, BVB and CVB and this
important additional information has been displayed in the principal diagonal cells of
the matrix.
Now concentrating only on single failures, we can conduct a hazard category
count as the following table shows:
Hazard Category
IIA
lIB
Number of Ways
of Occurring
4
8
118
FAULT TREE HANDBOOK
Table 114. Fuel System Double Failure Matrix
BVA CVA BVB CVB
Open Closed Open Closed Open Closed Open Closed
(One Way)
X
III liB IIA
IIA or
IIA IIA Open
IIA liB
BVA
X
(Two Ways)
liB liB
IIA or
IV
IIA or
l IV Closed
liB liB liB
liB
(One Wayl
X
IIA
IIA or
IIA
IIA or
Open III
IIA liB liB
CVA
X
Closed liB liB
(Two Ways)
IIA IV
IIA or
IV
liB liB
IIA or
IIA
IIA or (One Wayl
X
III liB Open IIA
liB liB IIA
BVB
X
IIA or IIA or (Two Ways)
Closed
liB
IV
liB
IV
liB
liB liB
IIA or
IIA
IIA or
III liB
(One Way)
IX
Open IIA
liB liB IIA
CVB
X
Closed
IIA or IIA or
IV liB liB
(Two Waysl
liB
IV
liB liB
How can this information be used? One application would be a description and
subsequent review of how these hazard categories are controlled or are insured
against. Another application would be a comparison between the configuration of
valves shown in Figure II2 and an alternative design, for instance the configuration
shown in Figure 113.
FUEL
SUPPLY
BVA
BVB
eVA
eVB
MOTOR
Figure 113. Alternative Fuel System Schematic
For brevity let us refer to the system of Figure II2 as "Configuration I" and that
of Figure II3 as "Configuration II." For Configuration II we naturally define the
same system fault states as for Configuration I; namely, "no flow when needed" is
OVERVIEW OF INDUCTIVE METHODS
119
Category IV and "flow cannot be shut off' is Category III. We can now pose the
following question: "Which configuration is the more desirable with respect to the
relative number of occurrences of the various hazard categories that we have
defined?" The appropriate Double Failure Matrix for Configuration II is shown in
Table 115.
Table 115. Alternative Fuel System Double Failure Matrix
BVA CVA BVB CVB BVX
Open Closed Open Closed Open Closed Open Closed o C
Open
IIA
(One Way)
BVA
liB
Closed
(One Wayl
Open
IIA
(One Way)
CVA
liB
Closed
(One Way)
I
Open
IIA
(One Way)
BVB
liB
Closed
(One Way)
Open
IIA
(One Way)
CVB
liB
Closed
(One Way)
Open I
BVX
Closed I
In this case we have filled in only the principal diagonal cells which correspond to
single failure states. We see that if BVX is failed closed, Configuration II becomes
essentially identical to Configuration I, and if BVX is failed open, we have a pipe
connecting the two main flow channels.
Now, concentrating only on the single failure states, we can conduct another
hazard category count for Configuration II. The results are shown in the following
table:
Hazard Category
IIA
, lIB
Number of Ways
of Occurring
4
4
1110
FAULT TREE HANDBOOK
Comparing the two configurations, we see that they are the same from the
standpoint of cascading into Category III but that Configuration II has approxi
mately onehalf as many ways to cascade into Category IV. Therefore, using this
criterion, Configuration II is the better design. Where differences are not as obvious,
more formal analysis approaches may also be used for additional information (these
approaches will be discussed in the later sections).
8. Success Path Models
We have been and will be discussing failures. Instead of working in "failure space"
we can eqUivalently work in "success space." We give a brief example of the
equivalence and then return to our failure space approach.
Consider the configuration of two valves in parallel shown in Figure 114. This
system may be analyzed either by a consideration of single failures (the probabilities
of multiple failures are deemed negligible) or by a consideration of "success paths."
Let us take up the former first. ,
System requirements are as follows:
(1) The operation involves two phases;
(2) At least one valve must open for each phase;
(3) Both valves must be closed at the end of each phase.
The two relevant component failure modes are: valve fails to open on demand,
and valve fails to close on demand. For purposes of the analysis, let us assume the
following numbers:
P (valve does not open) = 1 x 10
4
for each phase
P (valve does not close) = 2 x 10
4
for each phase
where the symbol "P" denotes probability. The valves are assumed to be identical.
1
Figure 114. Redundant Configuration of 2 Valves
The single failure analysis of the system can be tabulated as in Table II6.
OVERVIEWOF INDUCfIVE METHODS
Table 116. Single Failure Analysis for Redundant Valve Configuration
011
FAILURE FAILURE PROBABILITY OF
COMPONENT MODE EFFECT OCCURRENCE (F)
Valve # 1 Failure to open

Failure to close System 4 x 10
4
(either phase)
failure
Valve # 2 Failure to open

Failure to close System 4 x 10
4
(either phase)
failure
The system failure probability = 8 x 104.
Now let us see whether we can duplicate this result by considering the possible
successes. There are three identifiable success paths which we can specify botp.
verbally and schematically. If R
o
denotes "valve i opens successfully," and R
c
denotes "valve i closes successfully," and P (Path i) denotes the success probability
associated withthe ith success path, we have the following:
Path 1: Both valves function properly for both cycles.
P(path 1) = (RoR
e
J
4
Path 2: One valve fails to open on the first cycle but the other valve functions
properly for both cycles.
P(path 2) = 2(1  Ro)(RoRc)2
Path 3: One valve fails to open on the second cycle but the other valve
functions properly for both cycles.
P(path 3) = 2(1  RoXR
o
Rc)3
ll12
FAULT TREE HANDBOOK
Numerically, system reliability is given by
RSYSTEM = (RoRd
4
+2(1 Ro)(RoRc)2 +2(1 Ro)(RoRc)3
=0.99880027 + 0.00019988 +0.00019982
=0.99919997 '" 1  8 x 10
4
which is essentially the same result as before but it can be seen that the failure
approach is considerably less laborious.
9. Conclusions
Although the various inductive methods that we have discussed can be elaborated
to almost any desirable extent, in actual practice they generally play the role of
"overview" methods and, on many occasions, this is all that is necessary. For any
reasonably complex system, the identification of all component failure modes will be
a laborious, and probably unnecessary, process. Worse yet, the identification of all
possible combinations of component failure modes will be a truly Herculean task. In
general, it is a waste of time to bother with failure effects (single or in combination)
that have little or no effect on system operation or whose probabilities of occurrence
are entirely negligible. Thus, in all of these analyses the consequences of a certain
event must be balanced against its likelihood of occurrence.
CHAPTER III  FAULT TREE ANALYSIS
BASIC CONCEPTS
1. Orientation
In Chapter I we introduced two approaches to system analysis: inductive and
deductive. Chapter II described the major inductive methods. Chapter III presents
the basic concepts and definitions necessary for an understanding of the deductive
Fault Tree Analysis approach, which is the subject of the remainder of this text.
2. Failure vs. Success Models
The operation of a system can be considered from two standpoints: we can
enumerate various ways for system success, or we can enumerate various ways for
system failure. We have already seen an example of this in Chapter II, section 8.
Figure 1111 depicts the Failure/Success space concept.
SUCCESS SPACE
MAXIMUM
AI\ITlCIPATED
SUCCESS
MINIMUM
ANTICIPATED
SUCCESS
MINIMUM
ACCEPTABLE
SUCCESS
TOTAL
SUCCESS
________+ + 1
COMPLETE
FAILURE
MAXIMUM
TOLERABLE
FAILURE
MAXIMUM
ANTICIPATED
FAILURE
MINIMUM
ANTICIPATED
FAILURE
FAILURE SPACE
Figure IIII. The Failure SpaceSuccess Space Concept
It is interesting to note that certain identifiable points in success space coincide
with certain analogous points in failure space. Thus, for instance, "maximum
anticipated success" in success space can be thought of as coinciding with "minimum
anticipated failure" in failure space. Although our first inclination might be to select
the optimistic view of our systemsuccessrather than the pessimistic onefailure,
we shall see that this is not necessarily the most advantageous one.
From an analytical standpoint, there are several overriding advantages that accrue
to the failure space standpoint. First of all, it is generally easier to attain concurrence
on what constitutes failure than it is to agree on what constitutes success. We may
desire an airplane that flies high, travels far without refueling, moves fast and carries
a big load. When the final version of this aircraft rolls off the production line, some
of these features may have been compromised in the course of making the usual
III1
III2
FAULT TREE HANDBOOK
tradeoffs. Whether the vehicle is a "success" or not may very well be a matter of
controversy. On the other hand, if the airplane crashes in flames, there will be little
argument that this event constitutes system failure.
"Success" tends to be associated with the efficiency of a system, the amount of
output, the degree of usefulness, and production and marketing features. These
characteristics are describable by continuous variables which are not easily modeled
in terms of simple discrete events, such as "valve does not open" which characterizes
the failure space (partial failures, Le., a valve opens partially, are also difficult events
to model because of their continuous possibilities). Thus, the event "failure," in
particular, "complete failure," is generally easy to define, whereas the event,
"success," may be much more difficult to tie down. This fact makes the use of
failure space in analysis much more valuable than the use of success space.
Another point in favor of the use of failure space is that, although theoretically
the number of ways in which a system can fail and the number of ways in which a
system can 'succeed are both infinite, from a practical standpoint there are generally
more ways to success than there are to failure. Thus, purely from a practical point of
view, the size of the population in failure space is less than the size of the population
in success space. In analysis, therefore, it is generally more efficient to make
calculations on the basis of failure space.
We have been discussing why it is more advantageous for the analyst to work in
failure space as opposed to success space. Actually all that is necessary is to
demonstrate that consideration of failure space allows the analyst to get his job done,
and this, indeed, has been shown many times in the past. The drawing of tree
diagrams for a complex system is an expensive and timeconsuming operation. When
failures are considered, it may be necessary to construct only one or two system
models such as fault trees, which cover all the significant failure modes. When
successes are considered, it may become necessary to construct several hundred
system models covering various definitions of success. A good example of the
parsimony of events characteristic of failure space is the Minuteman missile analysis.
Only three fault trees were drawn corresponding to the three undesired events:
inadvertent programmed launch, accidental motor ignition, and fault launch. It was
found that careful analysis of just these three events involved a complete overview of
the whole complex system.
To help fix our ideas, it may be helpful to subject some everyday occurrence (a
man driving to his office) to analysis in failure space (see Figure 1112).
The "mission" to which Figure 1112 refers is the transport of Mr. X by
automobile from his home to his office. The desired arrival time is 8:30, but the
mission will be considered marginally successful if Mr. X arrives at his office by 9:00.
Below "minimum anticipated failure" lie a number of possible incidents that
constitute minor annoyances, but which do not prevent Mr. X from arriving at the
desired time. Arrival at 9:00 is labeled "maximum anticipated failure." Between this
point and "minimum anticipated failure" lie a number of occurrences that cause Mr.
X's arrival time to be delayed half an hour or less. It is perhaps reasonable to let the
point "maximum tolerable failure" coincide with some accident that causes some
damage to the car and considerable delay but no personal injury. Above this point lie
incidents of increasing seriousness terminating in the ultimate catastrophe of death.
FAULT TREE ANALYSIS  CONCEPTS 1113
ARRIVES AT 8:30
(NO DIFFICULTIES WHATSOEVER)
WINDSHIELD WIPERS INOPERATIVE
(CLEAR WEATHER)
LOST HUBCAP
WINDSHIELD WIPERS INOPERATIVE
(LIGHT RAIN)
TRAFFIC CONGESTION
WINDSHIELD WIPERS INOPERATIVE
(HEAVY RAIN)
TRAFFIC JAM
ACCIDENT
(DEATH OR CRIPPLING INJURY)
FLAT TIRE
MINOR ACCIDENT
MINIMUM ANTICIPATED AT 8AS
TOTAL
MAXIMUM ANTICIPATED AT 9:00
MAXIMUM TOLERABLE FAILURE
COMPLETE FAILURE
Figure III2. Use of Failure Space in Transport Example
Note that an event such as "windshield wipers inoperative" will be positioned along
the line according to the nature of the environment at that time.
A chart such as Figure 1112 might also be used to pinpoint events in, for example,
the production of a commercial airliner. The point "minimum anticipated failure"
would correspond to the attainment of all specifications and points below that would
indicate that some of the specifications have been more than met. The point
"maximum anticipated failure" would correspond to some tradeoff point at which
all specifications had not been met but the discrepancies were not serious enough to
degrade the saleability of the airplane in a material way. The point "maximum
tolerable failure" corresponds to the survival point of the company building the
aircraft. Above that point, only intolerable catastrophes occur. Generally speaking,
Fault Tree Analysis addresses itself to the identification and assessment of just such
catastrophic occurrences and complete failures.
3. The Undesired Event Concept
Fault tree analysis is a deductive failure analysis which focuses on one particular
undesired event and which provides a method for determining causes of this event.
The undesired event constitutes the top event in a fault tree diagram constructed for
the system, and generally consists of a complete,or catastrophic failure as mentioned
above. Careful choice of the top event is important to the success of the analysis. If it
is too general, the analysis become unmanageable; if it is too specific, the analysis
does not provide a sufficiently broad view of the system. Fault tree analysis can be
an expensive and timeconsuming exercise and its cost must be measured against the
cost associated with the occurrence of the relevant undesired event.
1114 FAULT TREE HANDBOOK
We now give some examples of top events that might be suitable for beginning a
fault tree analysis:
(a) Catastrophic failure of a submarine while the submarine is submerged. In the
analysis we might separate "failure under hostile attack" from "failure under routine
operation."
(b) Crash of commercial airliner with loss of several hundred lives.
(c) No spray when demanded from the containment spray injection system in a
nuclear reactor.
(d) Premature fullscale yield of a nuclear warhead.
(e) Loss of spacecraft and astronauts in the space exploration program.
(f) Automobile does not start when ignition key is turned.
4. Summary
In this chapter we have discussed the "failure space" and "undesired event"
concepts which underlie the fault tree approach. In the next chapter we will define
Fault Tree Analysis and proceed to a careful definition of the gates and fault events
which constitute the building blocks of a fault tree.
CHAPTER IV  THE BASIC ELEMENTS OF A
FAULT TREE
I . The Fault Tree Model
A fault tree analysis can be simply described as an analytical technique, whereby
an undesired state of the system is specified (usually a state that is critical from a
safety standpoint), and the system is then analyzed in the context of its environment
and operation to find all credible ways in which the undesired event can occur. The
fault tree itself is a graphic model of the various parallel and sequential combinations
of faults that will result in the occurrence of the predefined undesired event. The
faults can be events that are associated with component hardware failures, human
errors, or any other pertinent events which can lead to the undesired event. A fault
tree thus depicts the logical interrelationships of basic events that lead to the
undesired eventwhich is the top event of the fault tree.
It is important to understand that a fault tree is not a model of all possible system
failures or all possible causes for system failure. A fault tree is tailored to its top
event which corresponds to some particular system failure mode, and the fault tree
thus includes only those faults that contribute to this top event. Moreover, these
faults are not exhaustivethey cover only the most credible faults as assessed by the
analyst.
It is also important to point out that a fault tree is not in itself a quantitative
model. It is a qualitative model that can be evaluated quantitatively and often is. This
qualitative aspect, of course, is true of virtually all varieties of system models. The
fact that a fault tree is a particularly convenient model to quantify does not change
the qualitative nature of the model itself.
A fault tree is a complex of entities known as "gates" which serve to permit or
inhibit the passage of fault logic up the tree. The gates show the relationships of
events needed for the occurrence of a "higher" event. The "higher" event is the
"output" of the gate; the "lower" events are the "inputs" to the gate. The gate
symbol denotes the type of relationship of the input events required for the output
event. Thus, gates are somewhat analogous to switches in an electrical circuit or two
valves in a piping layout. Figure IVI shows a typical fault tree.
2. SymbologyThe Building Blocks of the Fault Tree
A typical fault tree is composed of a number of symbols which are described in
detail in the remaining sections of this chapter and are summarized for the reader's
convenience in Table IVI.
PRIMARY EVENTS
The primary events of a fault tree are those events, which, for one reason or
another, have not been further developed. These are the events for which
IVI
IV2
TEST SIGNAL
REMAINS ON
KJCOll FOR
t"60SEC
FAULT TREE HANDBOOK
Figure IVI. Typical Fault Tree
probabilities will have to be provided if the fault tree is to be used for computing the
probability of the top event. There are four types of primary events. These are:
The Basic Event
o
The circle describes a basic initiating fault event that requires no further
development. In other words, the ci.rc1e signifies that the appropriate limit of
resolution has been reached.
BASIC ELEMENTS OF AFAULT TREE
Table IVI. Fault Tree Symbols
IV3
o
o
<>
o
o
PRIMARY EVENT SYMBOLS
BASIC EVENT  A basic initiating fault requiring no further develop
ment
CONDITIONING EVENT  Specific conditions or restrictions that
apply to any logic gate {used primarily with PR lOR lTV AND and
INHIBIT gates}
UNDEVELOPED EVENT  An event which is not further developed
either because it is of insufficient consequence or because infor
mation is unavailable
EXTERNAL EVENT  An event which is normally expected to occur
INTERMEDIATE EVENT SYMBOLS
INTERMEDIATE EVENT  A fault event that occurs because of one
or more antecedent causes acting through logic gates
GATE SYMBOLS
AND  Output fault occurs if all of the input faults occur
OR  Output fault occurs if at least one of the input faults occurs
EXCLUSIVE OR  Output fault occurs if exactly one of the input
faults occurs
PRIORITY AND  Output fault occurs if all of the input faults occur
in a specific sequence {the sequence is represented by a CONDI
TIONING EVENT drawn to the right of the gate}
INHIBIT  Output fault occurs if the (single) input fault occurs in the
presence of an enabling condition {the enabling condition is
represented by a CONDITIONING EVENT drawn to the right of
the gate}
TRANSFER SYMBOLS
TRANSFER IN  Indicates that the tree is developed further at the
occurrence of the corresponding TRANSFER OUT (e.g., on
another page)
TRANSFER OUT  Indicates that this portion of the tree must be
attached at the corresponding TRANSF ER IN
IV4
The Undeveloped Event
FAULT TREE HANDBOOK
The diamond describes a specific fault event that is not further developed, either
because the event is of insufficient consequence or because information relevant to
the event is unavailable.
The Conditioning Event
0
The ellipse is used to record any conditions or restrictions that apply to any logic
gate. It is used primarily with the INHIBIT and PRIORITY ANDgates.
The External Event
c
The house is used to signify an event that is normally expected to occur: e.g., a
phase change in a dynamic system. Thus, the house symbol displays events that are
not, of themselves, faults.
INTERMEDIATE EVENTS
o
An intermediate event is a fault event which occurs because of one or more
antecedent causes acting through logic gates. All intermediate events are symbolized
by rectangles.
GATES
There are two basic types of fault tree gates: the ORgate and the ANDgate. All
other gates are really special cases of these two basic types. With one exception, gates
are symbolized by a shield with a flat or curved base.
The ORGate
The ORgate is used to show that the output event occurs only if one or more of
the input events occur. There may be any number of input events to an ORgate.
BASIC ELEMENTS OF AFAULT TREE NS
Figure IV2 shows a typical twoinput ORgate with input events A and Band output
event Q. Event Q occurs if A occurs, Boccurs, or both A and Boccur.
OUTPUT a
INPUT A INPUT B
Figure IV2. The ORGate
It is important to understand that causality never passes through an ORgate. That
is, for an ORgate, the input faults are never the causes of the output fault. Inputs to
an ORgate are identical to the output but are more specifically defined as to cause.
Figure IV3 helps to clarify this point.
VALVE IS
FAILED
CLOSED
VALVE IS
CLOSED DUE
TO HARDWARE
FAILURE
VALVE IS
CLOSED DUE
TO HUMAN
ERROR
VALVE IS
CLOSED DUE
TO TESTING
Figure IV3. Specific Example of the ORGate
Note that the subevents in Figure N3 can be further developed; for instance, see
Figure N4.
IV6
VALVE IS
NOT OPENED
FROM LAST
TEST
FAULT TREE HANDBOOK
VALVE IS
CLOSED DUE
TO HUMAN
ERROR
VALVE IS
INADVERTENTl Y
CLOSED DURING
MAINTENANCE
However, the event
Figure IV4. ORGate for Human Error
VALVE IS
INADVERTENTLY
CLOSED DURING
MAINTENANCE
is still a restatement of the output event of the first ORgate
VALVE IS
FAILED
CLOSED
with regard to a specific cause.
One way to detect improperly drawn fault trees is to look for cases in which
causality passes through an ORgate. This is an indication of a missing ANDgate (see
following definition) and is a sign of the use of improper logic in the conduct of the
analysis.
The ANDGate
The ANDgate is used to show that the output fault occurs only if all the input
faults occur. There may be any number of input faults to an ANDgate. Figure IVS
shows a typical twoinput ANDgate with input events A and B, and output event Q.
Event Q occurs only if events A and Bboth occur.
BASIC ELEMENTS OF A FAULT TREE
OUTPUT Q
INPUT A INPUT B
IV?
Figure IVS. The ANDGate
In contrast to the ORgate the ANDgate does specify a causal relationship
between the inputs and the output, Le., the input faults collectively represent the
cause of the output fault. The ANDgate implies nothing whatsoever about the
antecedents of the input faults. An example of an ANDgate is shown in Figure IV6.
A failure of both diesel generators and of the battery will result in a failure of all
onsite DC power.
ALL ONSITE
DC POWER IS
FAILED
1
~ I I
DIESEL DIESEL
BATTERY
GENERATOR 1 GENERATOR 2
IS FAILED
IS FAILED IS FAILED
Figure IV6. Specific Example of an ANDGate
When describing the events input to an ANDgate, any dependencies must be
incorporated in the event definitions if the dependencies affect the system logic.
Dependencies generally exist when the failure "changes" the system. For example,
when the first failure occurs (e.g., input A of Figure IVS), the system may
automatically switch in a standby unit. The second failure, input B of Figure IVS, is
IV8 FAULT TREE HANDBOOK
now analyzed with the standby unit assumed to be in place. In this case, input Bof
Figure IV5 would be more precisely defined as "input B given the occurrence of A."
 .
The variant of the ANDgate shown in Figure IV? explicitly shows the
dependencies and is useful for those situations when the occurrence of one of the
faults alters the operating modes and/or stress levels in the system in a manner
affecting the occurrence mechanism of the other fault.
Q OCCURS
A OCCURS
8 OCCURS GIVEN
THE OCCURENCE
OF A
BOCCURS
A OCCURS GIVEN
THE OCCURENCE
OF B
Figure IV? ANDGate Relationship with Dependency Explicitly Shown
That is, the subtree describing the mechanisms or antecedent causes of the event
I
A OCCURS
I
will be different from the subtree describing the mechanisms for the event.
I
A OCCURS
GIVEN THE
OCCURRENCE
OF B
I
For multiple inputs to an ANDgate with dependencies affecting system logic among
the input events, the "givens" must incorporate all preceding events.
The INHIBITGate
BASIC ELEMENTS OF AFAULT TREE IV9
The INHIBITgate, represented by the hexagon, is a special case of the ANDgate.
The output is caused by a single input, but some qualifying condition must be
satisfied before the input can produce the output. The condition that must exist is
the conditional input. A description of this conditional input is spelled out within an
ellipse drawn to the right of the gate. Figure IV8 shows a typical INHIBITgate with
input A, conditional input B and output Q. Event Q occurs only if input A occurs
under the condition specified by input B.
OUTPUT a
INPUT A
Figure IV8. The INHIBITGate
To clarify this concept, two examples are given below and are illustrated in Figure
IV9.
(a) Many chemical reactions go to completion only in the presence of a catalyst.
The catalyst does not participate in the reaction, but its presence is necessary.
(b) If a frozen gasoline line constitutes an event of interest, such an event can
occur only when the temperature T is less than Tcritical' the temperature at which
the gasoline freezes. In this case the output event would be "frozen gasoline line,"
the input event would be "existence of low temperature," and the conditional input
would be "T <Tcritical'"
CHEMICAL
REACTION
GOES TO
COMPLETION
ALL
REAGENTS
PRESENT
FROZEN
GASOLINE
LINE
EXISTENCE
OFLOW
TEMPERATURE T
Figure IV9. Examples of the INHIBITGate
IVIO FAULT TREE HANDBOOK
Occasionally, especially in the investigation of secondary failures (see Chapter V),
another type of INHIBITgate depicted in Figure IVlO is used.
OUTPUT Q
CONDITION A
Figure IVIO. An Alternative Type of INHIBITGate
In Figure IVlO, condition A is the necessary, but not always sufficient, single
cause of output Q; i.e., for Q to occur we must have A, but just because A occurs it
does not mean that Q follows inevitably. The portion of time Q occurs when A
occurs is given in the conditional input ellipse.
The gates we have described above are the ones most commonly used and are now
standard in the field of fault tree analysis. However, a few other special purpose gates
are sometimes encountered.
.The EXCLUSIVE ORGate
The EXCLUSIVE ORgate is a special case of the ORgate in which the output
event occurs only if exactly one of the input events occur. Figure IVII shows two
alternative ways of depicting a typical EXCLUSIVE ORgate with two inputs.
A
o
OR
A
o
Figure IVII. The EXCLUSIVE ORGate
BASIC ELEMENTS OF AFAULT TREE
IVII
The exclusive OR differs from the usual or inclusive OR in that the situation
where both input events occur is precluded. Thus, the output event Qoccurs if A
occurs or B occurs, but not if both A and B occur. As we will see later in Chapter
VII, the quantitative difference between the inclusive and exclusive ORgates is
generally so insignificant that the distinction is not usually necessary. In those special
instances where the distinction is significant, it can be accounted for in the
quantification phase.
The PRIORITY ANDGate
The PRIORITY ANDgate is a special case of the ANDgate in which the output
event occurs only if all input events occur in a specified ordered sequence. The
sequence is usually shown inside an ellipse drawn to the right of the gate. In practice,
the necessity of having a specific sequence is not usually encountered. Figure IV·12
shows two alternative ways of depicting a typical PRIORITY ANDgate.
A
Q
OR
A
Q
Figure IVI2. The PRIORITY ANDGate
In Figure IVII, the output event Q occurs only if both input events A and B
occur with A occurring before B.
TRANSFER SYMBOLS
TRAN",,"N L
~ TRANSFER OUT
The triangles are introduced as transfer symbols and are used as a matter of
convenience to avoid extensive duplication in a fault tree. A line from the apex of
the triangle denotes a "transfer in," and a line from the side, a "transfer out." A
"transfer in" attached to a gate will link to its corresponding "transfer out." This
"transfer out," perhaps on another sheet of paper, will contain a further portion of
the tree describing input to the gate.
CHAPTER V  FAULT TREE CONSTRUCTION FUNDAMENTALS
In Chapter IV we defmed and discussed the symbols which are the building
blocks of a fault tree. In this chapter we cover the concepts which are needed for the
proper selection and defmition of the fault tree events, and thus, for the construction
of the fault tree.
1. Faults VS. Failures
We fIrst make a distinction between the rather specific word "failure" and the
more general word "fault." Consider a relay. If the relay closes properly when a
voltage is impressed across its terminals, we call this a relay "success." If, however,
the relay fails to close under these circumstances, we call this a relay "failure."
Another possibility is that the relay closes at the wrong time due to the improper
functioning of some upstream component. This is clearly not a relay failure;
however, untimely relay operation may well cause the entire circuit to enter an
unsatisfactory state. We shall call an occurrence like this a "fault" so that, generally
speaking, all failures are faults but not all faults are failures. Failures are basic
abnormal occurrences, whereas faults are "higher order" events.
Consider next a bridge that is supposed to open occasionally to allow the passage
of marine traffIc. Suddenly, without warning, one leaf of the bridge flips up a few
feet. This is not a bridge failure because it is supposed to open on command and it
does. However, the event is a fault because the bridge mechanism responded to an
untimely command issued by the bridge attendant. Thus, the attendant is part of this
"system," and it was his untimely action that caused the fault.
In one of the earliest battles of the American Civil War, General Beauregard sent a
message to one of his offIcers via mounted messenger #1. Some time later, the overall
situation having changed, he sent out an amended message via mounted messenger
#2. Still later, a further amended message was sent out via mounted messenger #3.
All messengers arrived but not in the proper sequence. No failure occurred, but such
an event could well have a deleterious effect on the progress of the battleand, in
this case, it did. Again, we have a fault event but not a failure event.
The proper defInition of a fault requires a specification of not only what the
undesirable component state is but also when it occurs. These "what" and "when"
specifications should be part of the event descriptions which are entered into the
fault tree.
2. Fault Occurrence vs. Fault Existence
In our discussion of the several varieties of fault tree gates, we have spoken of the
occurrence of one or more of a set of faults or the occurrence of all of a set of faults.
A fault may be repairable or not, depending on the nature of the system. Under
conditions of no repair, a fault that occurs will continue to exist. In a repairable
system a distinction must be made between the occurrence of a fault and its
existence. Actually this distinction is of importance only in fault tree quantification
(to be discussed in a later chapter). From the standpoint of constructing a fault tree
VI
V2
FAULT TREE HANDBOOK
we need concern ourselves only with the phenomenon of occurrence. This is
tantamount to considering all systems as nonrepairable.
3. Passive vs. Active Components
In most cases it is convenient to separate components into two types: passive and
active (also called quasistatic and dynamic). A passive component contributes in a
more or less static manner to the functioning of the system. Such a component may
act as a transmitter of energy from place to place (e.g., a wire or busbar carrying
current or a steam line transmitting heat energy), or it may act as a transmitter of
loads (e.g., a structural member). To assess the operation of a passive component, we
perform such tests as stress analysis, heat transfer studies, etc. Further examples of
passive components are: pipes, bearings, journals, welds, and so forth.
An active component contributes in a more dynamic manner to the functioning of
its parent system by modifying system behavior in some way. A valve which opens
and closes, for example, modifies the system's fluid flow, and a switch has a similar
effect on the current in an electrical circuit. To assess the operation of an active
component, we perform parametric studies of operating characteristics and studies of
functional interrelationships. Examples of active components are: relays, resistors,
pumps, and so forth.
A passive component can be considered as the transmitter of a "signal." The
physical nature of this "signal" may exhibit considerable variety; for example, it may
be a current or force. A passive component may also be thought of as the
"mechanism" (e.g., a wire) whereby the output of one active component becomes the
input to a second active component. The failure of a passive component will result in
the nontransmission (or, perhaps, partial transmission) of its "signal."
In contrast, an active component originates or modifies a signal. Generally, such a
component requires an input signal or trigger for its output signal. In such cases the
active component acts as a "transfer function," a term widely used in electrical and
mathematical studies. If an active component fails, there may be no output signal or
there may be an incorrect output signal.
As an example, consider a postman (passive component) who transmits a signal
(letter) from one active component (sender) to another (receiver). The receiver will
then respond in some way (provide an output) as a result of the message (input) that
he has received.
From a numerical reliability standpoint, the important difference between failures
of active components and failures of passive components is the difference in failure
rate values. As shown in WASH1400 [39], active component failures in general have
failure rates above 1x 10
4
per demand (or above 3 x 10
7
per hour) and passive
component failures in general have failure rates below these values. In fact, the
difference in reliability between the two types of components is, quite commonly,
two to three orders of magnitude.
In the above, the defmitions of active components and passive components apply
to the main function performed by the component; and failures of the active
component (or failures of the passive component) apply to the failure of that main
function. (We could have, for example, "passive" failure modes of active compo
nents, e.g., valve rupture, if we attempted to classify specific failure modes according
to the "active" or "passive" defmition.)
FAULT TREE CONSTRUCTION FUNDAMENTALS
4. Component Fault Categories: Primary, Secondary, and Command
V3
It is also useful for the fault tree analyst to classify faults into three categories:
primary, secondary and command. A primary fault is any fault of a component that
occurs in an environment for which the component is qualified; e.g., a pressure tank,
designed to withstand pressures up to and including a pressure Po, ruptures at some
pressure p ,,;;;; Po because of a defective weld.
A secondary fault is any fault of a component that occurs in an environment for
which it has not been qualified. In other words, the component fails in a situation
which exceeds the conditions for which it was designed; e.g., a pressure tank,
designed to withstand pressure up to and including a pressure Po, ruptures under a
pressure p > PO'
Because primary and secondary faults are generally component failures, they are
usually called primary and secondary failures. A command fault in contrast, involves
the proper operation of a component but at the wrong time or in the wrong place;
e.g., an arming device in a warhead train closes too soon because of a premature or
otherwise erroneous signal origination from some upstream device.
5. Failure Mechanism, Failure Mode, and Failure Effect
The definitions of system, subsystem, and component are relative, and depend
upon the context of the analysis. We may say that a "system" is the overall structure
being considered, which in turn consists of subordinate structures called "sub
systems," which in turn are made up of basic building blocks called "components."
For example, in a pressurized water reactor (PWR), the Spray Injection System
may consist of two redundant refueling spray subsystems which deliver water from
the refueling water storage tank to the containment. Each of these subsystems in
turn may consist of arrangements of valves, pumps, etc., which are the components.
In a particular analysis, defmitions of system, subsystem, and component are
generally made for convenience in order to give hierarchy and boundaries to the
problem.
In constructing a fault tree, the basic concepts of failure effects, failure modes,
and failure mechanisms are important in determining the proper interrelationships
among the events. When we speak of failure effects, we are concerned about why the
particular failure is of interest, Le., what are its effects (if any) on the system. When
we detail the failure modes, we are specifying exactly what aspects of component
failure are of concern. When we list failure mechanisms, we are considering how a
particular failure mode can occur and also, perhaps, what are the corresponding
likelihoods of occurrence. Thus, failure mechanisms produce failure modes which, in
turn, have certain effects on system operation.
To illustrate these concepts consider a system that controls the flow of fuel to an
engine. See Table VI. The subsystem of interest consists of a valve and a valve
actuator. We can classify various events which can occur as viewed from the system,
subsystem, or component level. Some of the events are given in the lefthand column
of the table below. For example, "valve unable to open" is a mechanism of
subsystem failure, a mode of valve failure, and an effect of actuator failure.
V4 FAULT TREE HANDBOOK
Table VI. Fuel Flow System Failure Analysis
Description of Event System Subsystem Valve Actuator
No flow from
subsystem Mechanism Mode Effect
when required
Valve unable Mechanism Mode Effect
to open
Binding of Mechanism Mode
actuator stem
Corrosion of Mechanism
actuator stem
To make the mechanismmodeeffect distinction clearer, consider a simple system
of a doorbell and its associated circuitry from the standpoints of the systems man, the
subsystems man, and the component designer. The system is shown schematically in
Figure VI.
BELL
R
. PUSHBUTTON
I
. ~ •
0"APPE
SWITCH
T , T .

BATTERY
••
SOLENOID
I
Figure VI. Doorbell and Associated Circuitry
From the viewpoint of the systems man, the system failure modes are:
(1) Doorbell fails to ring when thumb pushes button.
(2) Doorbell inadvertently rings when button is not pushed.
(3) Doorbell fails to stop ringing when push button is released.
If the systems man now sat down and made a list of the failure mechanisms
causing his failure modes, he would generate a list that corresponded to the failure
modes of the subystems man who actually procures the switch, bellsolenoid unit,
battery, and wires. These are:
(l) Switch  (a) fails to make contact (including an inadvertent open)
(b) fails to break contact
(c) inadvertently closes
(2) Bellsolenoid unitfails to ring when power is applied (includes failure to
continue ringing with power applied)
(3) Batterylow voltage condition
(4) Wireopen circuit or short circuit.
FAULT TREE CONSTRUCTION FUNDAMENTALS vs
It is emphasized again that this last list constitutes failure mechanisms for the
systems man and failure modes for the subsystems man. It is also a list of failure
effects from the standpoint of the component designer. Let us try to imagine what
sort of list the component designer would make. See Table V2.
Table V2. Doorbell Failure Analysis
Failure Effect Failure Mode Mechanism
Switch fails to
•
Contacts broken
•
Mechanical shock
make contact
•
High contact
•
Corrosion
resistance
Bellsolenoid unit
•
Clapper broken or
•
Shock
fails to ring not attached
•
Corrosion
•
Clapper stuck
•
Open circuit
•
Solenoid link in solenoid
broken or stuck
•
Short circuit
•
Insufficient magneto in solenoid
motive force
Low voltage
•
No electrolyte
•
Leak in casing
from battery
•
Positive pole
•
Shock
broken
From the table we see that the system failure modes constitute the various types
of system failure. In fault tree terminology these are the "top events" that the
system analyst can consider. He will select one of these top events and investigate the
immediate causes for its occurrence. These immediate causes will be the immediate
failure mechanisms for the particular system failure chosen, and will constitute
failures of certain subsystems. These latter failures will be failure modes for the
subsystems man and will make up the second level of our fault tree. We proceed, step
by step, in this "immediate cause" manner until we reach the component failures.
These components are the basic causes dermed by the limit of resolution of our tree.
If we consider things from the component designer's point of view, all of the
subsystem and system failures higher in the tree represent failure effectsthat is,
they represent the results of particular component failures. The component
designer's failure modes would be the component failures themselves. If the
component designer were to construct a fault tree, anyone of these component
failures could constitute a suitable top event. In other words, the designer's "system"
is the component itself. The lower levels of the designer's fault tree would consist of
the mechanisms or causes for the component failure. These would include quality
control effects, environmental effects, etc., and in many cases would be beneath the
limit of resolution of the system man's fault tree.
V ~ FAULT TREE HANDBOOK
6. The "Immediate Cause" Concept
Now getting back to the system analyst, he first, as we have seen, defmes his
system (i.e., determines, its boundary) and then selects a particular system failure
mode for further analysis. The latter constitutes the top event of the system analyst's
fault tree. He next determines the immediate, necessary, and sufficient causes for the
occurrence of this top event. It should be noted that these are not the basic causes of
the event but the immediate causes or immediate mechanisms for the event. This is
an extremely important point which will be clarified and illustrated in later
examples.
The immediate, necessary, and sufficient causes of the top event are now treated
as subtop events and we proceed to determine their immediate, necessary, and
sufficient causes. In so doing, we place ourselves in the position of the subsystems
man for whom our failure mechanisms are the failure modes; that is, our subtop
events correspond to the top events in the subsystem man's fault tree.
In this way we proceed down the tree continually transferring our point of view
from mechanism to mode, and continually approaching fmer resolution in our
mechanisms and modes, until ultimately, we reach the limit of resolution of our tree.
This limit consists of basic component failures of one sort or another. Our tree is
now complete.
As an example of the application of the "immediate cause" concept, consider the
simple system in Figure V2.
B
A
c
o
E
Figure V2. System Illustrating "Immediate Cause" Concept
This system is supposed to operate in the following way: a signal to A triggers an
output from A which provides inputs to B and C. Band C then pass a signal to 0
which finally passes a signal to E. A, B, C and 0 are dynamic subsystems.
Furthermore, subsystem 0 needs an input signal from either B or C or both to trigger
its output to E. We thus have redundancy in this portion of our system.
The system of Figure V2 can be interpreted quite generally. For example, it
\ could represent an electrical system in which the subsystems are analog modules
(e.g., comparators, amplifiers, etc.); it could be a piping system in which A, B, C and
o are valves; or it could represent a portion of the "chain of command" in a
corporation.
FAULT TREE CONSTRUCTION FUNDAMENTALS Y7
Let us choose for our top event the possible outcome "no signal to E," and let us
agree that in our analysis we shall neglect the transmitting devices (passive
components) which pass the signals from one subsystem to another. This is
tantamount to assigning a zero failure probability to the wires, pipes, or command
links.
We now proceed to a stepbystep analysis of the top event. The immediate cause
of the event, "no signal to E," is "no output from D." The analyst should strongly
resist the temptation to list the event, "no input to D" as the immediate cause of "no
signal to E." In the determination of immediate causes, one step should be taken at a
time. The "immediate cause" concept is sometimes called the "Think Small" Rule
because of the methodical, onestepatatime approach.
We have now identified the subtop event, "no output from D," and it is next
necessary to determine its immediate cause or causes. There are two possibilities:
(1) "There is an input to D but no output from D."
(2) "There is no input to D."
Therefore, om subtop event, "no output from D," can arise from the union of the
two events, I or 2. The reader should note that if we had taken more than one step
and had identified (improperly) the cause of "no input to D," then event 1 above
would have been missed. In fact, the motivation for considering immediate causes is
now clear: it provides assurance that no fault event in the sequence is overlooked.
We are now ready to seek out the immediate causes for our new mode failures,
events 1 and 2. If our limit of resolution is the subsystem level, then event 1 (which
can be rephrased, "D fails to perform its proper function due to some fault internal
to D") is not analyzed further and constitutes a basic input to our tree. With respect
to event 2, its immediate, necessary and sufficient cause is "no output from B and no
output from C," which appears as an intersection of two events, Le.,
1 = 3 and 4
where
3 = "no output from B," and
4 = "no output from C."
As a matter of terminology, it is convenient to refer to events as "faults" if they
are analyzed further (e.g., event 2). However, an event such as 1 which represents a
basic tree input and is not analyzed further is referred to as a "failure." This
terminology is also fairly consistent with the mechanistic definitions of "fault" and
"failure" given previously.
We must now continue the analysis by focusing our attention on events 3 and 4.
As far as 3 is concerned, we have
3 = 5 or 6
where
5 = "input to B but no output from B," and
6 = "no input to B." ...
We readily identify 5 as a failure (basic tree input). Event 6 is a fault which can be
analyzed further. We deal with event 4 in an analogous way.
The further steps in the analysis of this system can now be easily supplied by the
reader. The analysis will be terminated when all the relevant basic tree inputs have
been identified. In this connection, the event "no input to A" is also considered to
be a basic tree input.
Our analysis of the top event ("no input to E") consequently produced a linkage
of fault events connected by "and" and "or" logic. The framework (or system
V8
FAULT TREE HANDBOOK
model) on which this linkage is "hung" is the fault tree. The next section provides
the necessary details for connecting the fault event linkage to its framework (fault
tree).
At this juncture the reader may be interested in conducting a short analysis of his
own with reference to the doorbell circuit previously given in Figure VI. A suitable
top event would be "doorbell fails to ring when finger pushes button." The analysis
would commence with the statement:
DOORBELL
FAILS TO RING
BATTERY
DISCHARGED
OR
SOLENOID
NOT ACTIVATED
in which the event "battery discharged" represents a failure or basic tree input.
7. Basic Rules for Fault Tree Construction
The construction of fault trees is a process that has evolved gradually over a
period of about 15 years. In the beginning it was thought of as an art, but is was soon
realized that successful trees were all drawn in accordance with a set of basic rules.
Observance of these rules helps to ensure successful fault trees so that the process is
now less of an art and more of a science. We shall now examine the basic rules for
successful fault tree analysis.
Consider Figure V3. It is a simple fault tree or perhaps a part of a larger fault
tree. Note that none of the failure events have been ''written in"; they have been
designated just Q, A, B, C, D.
Q
A
B
c
Figure V3. A Simple Fault Tree
D
FAULT TREE CONSTRUCTION FUNDAMENTALS
V9
Now, when we have a specific problem in hand, it becomes necessary to describe
exactly what such events such as Q, A, B, C, D actually are, and the proper procedure
for doing this constitutes Ground Rule I:
Write the statements that are entered in the event boxes as faults; state
precisely what the fault is and when it occurs.
The "whatcondition" describes the relevant railed (or operating) state of the
component. The "whencondition" describes the condition of the systemwith
respect to the component of interestwhich makes that particular state of
existence of the component a fault.
Note that Ground Rule I may frequently require a fairly verbose statement. So
be it. The analyst is cautioned not to be afraid of wordy statements. Do not tailor
the length of your statement to the size of the box that you have drawn. If
necessary, make the box bigger! It is permissible to abbreviate words but resist the
temptation to abbreviate ideas. Examples of fault statements are:
(l) Normally closed relay contacts fail to open when EMF is applied to coil.
(2) Motor fails to start when power is applied.
The next step in the procedure is to examine each boxed statement and ask the
question: "Can this fault consist of a component failure?" This question and its
answer leads us to Ground Rule II:
If the answer to the question, "Can this fault consist of a component
failure?" is "Yes," classify the event as a "stateofcomponent fault." If
the answer is "No," classify the event as a "stateofsystem fault."
If the fault event is classified as "stateofcomponent," add an ORgate below the
event and look for primary, secondary and command modes. If the fault event is
classified as "stateofsystem," look for the minimum necessary and sufficient
immediate cause or causes. A "stateofsystem" fault event may require an ANDgate,
an ORgate, an INHIBITgate, or possibly no gate at all. As a general rule, when
energy originates from a point outside the component, the event may be classified as
"stateofsystem."
To illustrate Ground Rule II, consider the simple motorswitchbattery circuit
depicted in Figure V4.
SWITCH
I.
L
0
BATTERY
MOTOR
LOAD
Figure V4. Simple MotorSwitchBattery System
vtO FAULT TREE HANDBOOK
The system can exist in two states: operating and standby. The following faults
can be identified and classified using Ground Rule II:
OPERATING STATE
SYSTEM FAULT
Switch fails to close when thumb pressure is
applied.
Switch inadvertently opens when thumb
pressure is applied.
Motor fails to start when power is applied
to its terminals.
Motor ceases to run with power applied to
terminals.
STANDBY STATE
SYSTEM FAULT
Switch inadvertently closes with no thumb
pressure applied.
Motor inadvertently starts.
CLASSIFICATION
State of component
State of component
State of component
State of component
CLASSIFICATION
State of component
State of system
In addition to the above ground rules, there are a number of other pro
cedural statements that have been developed over the years. The first of these
is the No Miracles Rule:
If the nonnal functioning of a component propagates a fault se
quence, then it is assumed that the component functions nonnally.
We might find, in the course of a system analysis, that the propagation of a
particular fault sequence could be blocked by the miraculous and totally unexpected
failure of some component. The correct assumption to make is that the component
functions normally, thus allowing the passage of the fault sequence in question.
However, if the normal functioning of a component acts to block the propagation of
a fault sequence, then that normal functioning must be defeated by faults if the fault
sequence is to continue up the tree. Another way of stating this is to say that, if an
AND situation exists in the system, the model must take it into account.
Two other procedural statements address the dangers of not being methodical and
attempting to shortcut the analysis process. The first is the CompletetheGate Rule:
All inputs to a particular gate should be completely defined before
further analysis of anyone of them is undertaken.
FAULT TREE CONSTRUCTION FUNDAMENTALS
The second is the No GatetoGate Rule:
Gate inputs should be properly defined fault events, and gates should not
be directly connected to other gates.
Yll
The CompletetheGate Rule states that the fault tree should be developed in
levels, and each level should be completed before any consideration is given to a
lower level. With regard to the NoGatetoGate Rule, a "shortcut" fault tree is
shown in Figure V5.
Q
A B
Figure V5. A ShortCut Fault Tree
The "gatetogate" connection is indicative of sloppy analysis. The "gatetogate"
short cutting may be all right if a quantitative evaluation is being performed and the
fault tree is being summarized. However, when the tree is actually being constructed,
the gatetogate shortcuts may lead to confusion and may demonstrate that the
analyst has an incomplete understanding of the system. A fault tree can be successful
only if the analyst has a clear and complete understanding of the system to be
modeled.
CHAPTER VI  PROBABILITY THEORY:
THE MATHEMATICAL DESCRIPTION OF'EVENTS
I. Introduction
Having completed our discussion of fault tree fundamentals, we are almost ready
to begin some actual fault tree construction examples. However, because the
examples we will be discussing in Chapters VIII and IX include not only construction
but evaluation of the trees, we must first digress in Chapters VI and VII to cover
some basic mathematical concepts which underlie that evaluation.
Chapter VI addresses the basic mathematical technique involved in the quantita
tive assessment of fault trees: probability theory. Probability theory is basic to fault
tree analysis because it provides an analytical treatment of events, and events are the
fundamental components of fault trees. The topics of probability theory which we
shall consider include the concepts of outcome collections and relative frequencies,
the algebra of probabilities, combinatorial analysis, and some set theory. We begin
with the concept of outcome collections which can be conveniently described in
terms of a random experiment and its outcomes.
2. Random Experiments and Outcomes of Random Experiments
A random experiment IS defined to be any observation or series of observations in
which the possible result or results are nondeterministic. A result is deterministic if
it always occurs as the outcome of an observation; a result is nondeterministic ifit is
only one of a number of possibilities that may occur. Thus, if we toss an ordinary
coin and our purpose is to determine whether the coin will land Heads or Tails, we
are performing a random experiment, which is the flipping of the coin. If our coin is
"phony," however, and possesses, say, two Tails, we are not performing a random
experiment in the sense of the definition because the outcome, Tails, may be
expected on every toss. Similarly, the throw of a die constitutes a random
experiment unless the die is loaded to give exactly the same outcome on every trial.
The term "random experiment" is a very general one and the reader will be able to
supply innumerable examples, such as: the measurement of stiffness of a certain
spring, the time to failure of a motor, the measurement of the amount of iron in a
meteorite, etc. It is clear that many nontrivial observations of our environment
constitute random experiments.
A random experiment may be characterized by itemizing all its possible outcomes.
This is easily done in simple cases when the number of outcomes is not great, but can
also be done, at least in theory, when the number of outcomes is very great, perhaps
infinite. The itemization of the outcomes of a random experiment is known,
mathematically, as an outcome space but we believe that the term outcome
collection is somewhat more descriptive. The notation {E
1
, E
2
, ... , En} will be used
to denote the outcome collection of possible events E
1
, E
2
, ... ,En' The concept of
an outcome collection will now be illustrated by means of several simple examples.
VII
VI2 FAULT TREE HANDBOOK
(a) Random experimentone toss ofa fair coin.
Objectto determine whether coin lands Heads (H) or Tails (T).
Outcome collection: {T, H}.
Note that if there is present danger of the coin's disappearing into the depths of a
nearby crevasse, then this latter outcome (or event) should be added to the outcome
collection.
(b) Random experimenttoss of a coin onto a ruled surface.
Objectto determine the coordinates of the coin when it has come to rest.
Outcome collection: {(xl' YI)' (x2' Y2), (x3' Y3)""} where (Xl' YI)are the
Cartesian coordinates which locate the coin on the ruled surface. Note that this
outcome collection has an infinite number of elements.
(c) Random experimentstart of a diesel.
Objectto determine whether the diesel starts (S) or fails to start (F).
Outcome collection: {S, F }.
(d) Random experimentthe throw of a die.
Objectto determine what number is uppermost when the die comes to rest.
OlitCOme collection: {I, 2,3,4,5,6}.
(e) Random experimentthe operation of a diesel which has successfully started.
Objectto determine the time of failure (t) to the nearest hour.
Outcome collection: {t
l
, t
2
, ... }.
(f) Random experimentan attempt to close a valve.
Objectto determine if valve closes (C) or remains open (0).
Outcome collection: {C, O}.
If, in addition, we were to consider partial failure modes, the outcome collection
would contain such further events as "Valve closes less than halfway," "Valve closes,
then opens," etc.
(g) Random experimentthe operation of a system for some prescribed length of
time 1. The system includes two critical components, A and B, and will fail if either
or both of these fail. (A and Bare thus single failures.)
ObjectTo determine if (and how) the system fails by time t.
Outcome collection: {System does not fail,
System fails due to failure of A alone,
System fails due to failure of Balone,
System fails due to joint failure of A and B }.
Note that the outcome collection does not include the time of failure because we
are observing only whether the system fails or succeeds in some given time.
(k) Random experimentoperation of two parallel systems. If the two systems are
designated A and B, respectively, we can define F
i
=the event that system i fails and
0i =the event that system i does not fail (i =A, B).
Objectto determine the state of the two systems after a given time interval.
Outcome collection: {(FA,OB)' (OA,F
B
), (OA,OB)' (FA,F
B
)}·
In this example only the event (FA ,F
B
) constitutes overall system failure.
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI3
3. The Relative Frequency Definition of Probability
Consider some general random experiment with the outcome collection, E
1
,E
2
,E
3
,
... , En' Suppose we repeat the experiment N times and watch for the occurrence
of some specific outcome, say, El' After N repetitions or trials we find, by actual
count, that outcome E
1
has happened N1 times. We then set up the ratio,
which represents the relative frequency of occurrence of E
1
in exactly N repetitions
of this particular random experiment. The question now arises: does this ratio
approach some definite limit as N becomes very large ( N ~ ) ? If it does, we call the
limit the probability associated with event E
1
, in symbols, P(E
1
). Thus
peE ) = lim (N
1
)
1 N ~ o o N
Some obvious properties ofP(E
1
) arise from this definition:
0< P(E
1
) < 1
If P(E
1
) = I, E
1
is certain to occur.
IfP(E
1
) = 0, E
1
is impossible.
(VII)
A more formal definition of probability involves set theory; however, the
"empirical" definition given by Equation (VIl) is adequate here.
4. Algebraic Operations with Probabilities
Consider a random experiment and designate two of its possible outcomes as A
and B. Suppose that A and Bare mutually exclusive. This simply means that A and B
cannot both happen on a single trial of the experiment. For instance, we expect to
get either Heads or Tails as a result of a coin toss. We could not possibly get both
Heads and Tails on a single toss. If events A and B are mutually exclusive, we can
write down an expression for the probability that either A or B occurs:
peA or B) = peA) +PCB) (VI2)
This relation is sometimes referred to as the "addition rule for probabilities" and is
applicable to events that are mutually exclusive. This formula can be readily
extended to any number of mutually exclusive events A, B, C, D, E, ....
P(A or B or C or D or E) = peA) +PCB) +P(e) +P(D) +P(E) (VI3)
VI4 FAULT TREE HANDBOOK
For events which are not mutually exclusive a more general formula must be used.
For example, suppose that the random experiment is the toss of a single die and let
us define two events as follows:
A"the number 2 turns up"
B"an even number turns up"
Clearly these events are not mutually exclusive because if the result of the toss is 2,
both A and Bhave occurred. The general expression for peA or B) is now
peA or B) = peA) + PCB)  peA and B) (VI4)
If A and B are mutually exclusive, peA and B) = 0 and (VI4) reduces to (VI2). The
reader should also note that (VI2) always gives an upper bound to the true
probability (VI4) when the events are not mutually exclusive. Now, if we return to
our single die problem and define A and B as above, we can calculate peA or B)
numerically:
peA or B) = 1/6 + 1/2  1/6 = 1/2
Equation (IV4) can be extended to any number of events. For example, for 3
events A, B, C
peA or B or C) = peA) + PCB) + P(C)  peA and B)  peA and C)
 PCB and C) + peA and B and C)
For n events E
l
, E
2
, ..., En' the general formula can be expressed as:
n nl n
peEl or E
2
or ... or En) = L: P(E
i
)  L: L P(E
i
and E
j
)
i=1 i=1 j=i+l
(VI5)
n2 nl n
+ L L: L: P(E
i
and E
j
and E
k
) ...
i=1 j=i+1 k=j+1
(VI6)
where " ~ " is the summation sign.
If we ignore the possibility of any two or more of the events E
i
occurring
simultaneously, equation (VI6) reduces to
n
peEl or E
2
or ... or En) = L: P(E
i
)
i=1
(VI?)
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI5
Equation (VI7) is the socalled "rare event approximation" and is accurate to within
about ten percent of the true probability when P(E
j
) <0.1. Furthermore, any error
made is on the conservative side, in that the true probability is slightly lower than
that given by equation (VI7). The rare event approximation plays an important role
in fault tree quantification and is discussed further in Chapter XI.
Consider now two events A and B that are mutually independent. This means that
in the course of several repetitions of the experiment, the occurrence (or
nonoccurrence) of A has no influence on the subsequent occurrence (or non·
occurrence) of B and vice versa. If a wellbalanced coin is tossed randomly, the
occurrence of Heads on the first toss should not cause the probability of Tails on the
second toss to be any different from 1/2. So the results of successive tosses of a coin
are considered to be mutually independent outcomes. Also, if two components are
operating in parallel and are isolated from one another, then the failure of one does
not affect the failure of the other. In this case the failures of the components are
independent events. If A and B are two mutually independent events, then we can
write,
peA and B) =peA) PCB) (VI8)
This is often called the "multiplication rule for probabilities" and its extension to
more than two events is obvious.
peA and Band C and D) =peA) PCB) P(e) P(D) (VI·9)
Very often, we encounter events that are not mutually independent, that is, they
are mutually interdependent. For instance, the overheating of a resistor in an
electronic circuit may very well change the failure probability of a nearby transistor
or of related circuitry. The probability of rain on Tuesday will most likely be
influenced by the weather conditions prevailing on Monday. In order to treat events
of this nature, we introduce the concept of conditional probability, and we need a
new symbol: P(BIA) which is the probability of B, given A has already occurred. The
probability of A and B both occurring then becomes
peA and B) =peA) P(BIA) =PCB) P(AIB). (VIlO)
If A and B are mutually independent, then P(A/B) =peA) and PCBIA) =PCB) and
(VI·IO) reduces to (VI8). Thus Equation (VI·IO) constitutes a general expression for
the probability of the joint occurrence of two events A and B.
For three events A, B, C, we have
peA and B and C) == peA) P(B/A) P(CIA and B) (VI·II)
where P(CIA and B) is the probability of C occurring given A and B have already
occurred. For n events E
l
' E
2
, ..• En'
peEl and E
2
and ... En) =peEl) P(E
2
IE
l
) P(E
3
IE
I
and E
2
)
... PeEn lEI and E
2
... and EnI)
(VI·12)
VI6 FAULT TREE HANDBOOK
An example of the use of the formulae in this section may be helpful. Consider
the following random experiment: we select one card at random from a wellshuffled
regulation deck of 52. We note its face value (e.g., "seven," "Queen," etc.) and lay it
aside (i.e., we do not put it back in the deckthis is known as "sampling without
replacement"). We then choose a second card at random from the deck of 51 and
note its face value. We now calculate the probability of getting at least one Ace in the
two draws. There are three mutually exclusive possibilities:
(Ace on first draw then nonAce on second draw) or
(nonAce on first draw then Ace on second draw) or
(Ace on first draw then Ace again on second draw).
Expressed in mathematical language, this becomes much more succinct:
P (at least one Ace in two draws) =peA)
=peAl and A
2
) +peAl and A
2
) +peAl and A
2
)
= peAl) P(A
2
I
A
I
) +peAl) P(A
2
IA
I
) +peAl) P(A
2
IA
I
)
in which the subscripts refer to the order of the draw, A stands for "Ace" and A
stands for "nonAce." We can evaluate this expression numerically as follows:
(
4)(48) (48)(4) (4 )(3) 396 33
peA) = 52 51 + 52 51 + 52 51 =(52)(51)::: 221 ::: 0.149
Let us now calculate the probability of getting an Ace and a King in two draws.
peA and K) = peAl) P(K
2
IA
l
) +P(K
l
) P(A2IKl)
=Gi)(5
4
1) + ( 5 ~ ) (  5 4 1 ) =( 5 2 ~ ~ 5 1 ) =6:3 =0.012
We now note an important point. If .the events "getting an Ace" and "getting a
King" were independent, we would have peA and K) =peA) P(K) =(0.149)2 =0.022,
but we have just shown that peA and K) =0.012 =1= 0.022. Therefore, the events in
question are not independent. At this point the reader is invited to present an
argument that the events in question would have been independent if the first card
drawn had been replaced in the deck and the deck had been shuffled before the
second draw.
A result with important applications to fault tree analysis is the calculation of
the probability of occurrence of at least one of a set of mutually independent events.
Consider the set of mutually independent events,
and define the event E
l
as the nonoccurrence of E
l
, the event E
2
as the
nonoccurrence of E
2
, etc. Because a particular event must either occur or not occur,
we have
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI7
P(E
i
) + P(E
i
) =1 (VI. B)
P(E
i
) =1  P(E
i
) (VII4)
Now there are two possibilities with regard to the events El' E
2
, ... , En: at least
one E
i
occurs or none of the E
i
occur. Therefore,
P (at least one E
i
occurs)
=1  P (no E
i
occurs)
=1  P(E
l
and E
2
... and En)
We know that the E's are mutually independent and, for this reason, it is perhaps
intuitively obvious that E's must also be mutually independent. As a matter of fact,
this result can be easily proven. Therefore,
(VIIS)
But from Equation (VII4) this is the same as
(VII 6)
so that our final result is
P(E
1
or E
2
or E
3
or ... or En) = 1  {[I P(E
1
)] [1  P(E
2
)]
[1  P(E
3
)] ... [1 P(E
n
)]} (VII?)
In the especially simple case where P(E
1
) =P(E
2
) =... =P(E
n
) =p, the righthand
side of (VII?) reduced to 1  (1 p)n .
Our general result (VII?) finds application in fault tree evaluation. For example,
consider a system in which system failure occurs if anyone of the events E
1
,
E
2
, ... , En occurs, it being assumed that these events are mutually independent.
The probability of system failure is then given by (VI17). For example, the events
E
1
, ... , En may be failures of critical components ·of the system, where each
component failure will cause system failure. If the component failures are
independent, then Equation (VI17) is applicable. In the general case, the events E
1
,
E
2
, etc., represent the modes by which system failure (top event of the fault tree)
can occur. These modes are termed the minimal cut sets of the fault tree and if they
are independent, Le., no minimal cut sets have common component failures, then
Equation (VII?) applies. We shall discuss minimal cut sets in considerable detail in
later chapters.
To conclude this section we return to the concept of an outcome collection
because we are now in a position to define it in more detail. The elements of an
outcome collection possess several important characteristics:
(a) The elements of an outcome collection are all mutually exclusive.
VI8 FAULT TREE HANDBOOK
(b) The elements of an outcome collection are collectively exhaustive. This means
that we have included in the outcome collection every conceivable result of the
experiment.
(c) The elements of the outcome space may be characterized as being continuous
or discrete; e.g., the timestofailure for some systems are continuous events and the
52 possible cards when a card is drawn are discrete events.
5. Combinatorial Analysis
We now discuss combinatorial analysis because it allows us to easily evaluate the
probabilities of various combinations of events, such as failures of redundant
systems. As an introduction, we review the difference between "combination" and
"permutation." Consider a collection or set of four entities: {A, B, C, D}. Suppose we
make a random choice of three items from the four available. For example, we may
end up with the elements ABD. This group of three may be rearranged or permuted
in six different ways: ABD, ADB, BAD, BDA, DAB, DBA. Thus, there are six
rearrangements or permutations corresponding to the single combination ABD.
Briefly, when we talk about permutations, we are concerned with order; when we
talk about combinations, we are not concerned with order. Whether order is
important to us or not will depend on the specific nature of the problem. For a
certain redundant system to fail we may need only some particular number of
component failures without regard to the order in which they occur. For example, in
a twooutofthree logic system we may need any twooutofthree component
failures to induce system failure. In this case we are concerned with combinations of
events. However, when a particular sequence of events must occur, we are generally
concerned with permutations. For example, failure of containment spray injection
necessarily implies failure of containment spray recirculation and hence here we are
concerned with order, Le., injection, then recirculation.
We now seek a few simple rules that will enable us to count numbers of
combinations or permutations as desired. Consider the problem of choosing (at
random) a sample of size r from a population of size n. This process can be
conducted in two ways: (a) with replacement, and (b) without replacement. If we
sample with replacement, we put each element drawn back into the population after
recording its characteristics of interest. If we sample without replacement, we lay
each item aside after "measuring" it. The reader will recall that this concept was
mentioned briefly in our example involving the draw of two cards from a desk.
If we sample with replacement, how many different samples of size r are possible?
We have n choices for the first item, n choices for the second item, n choices for the
third item, and so on until a sample of size r is completed. Thus, under a replacement
policy there are n
r
possible samples of size r. Observe that under a replacement
policy duplication of items in the sample is possible.
If we sample without replacement, we have n choices for the first item, (nl)
choices for the second item, (n2) choices for the third item, (nr+1) choices for the
r
th
item. Thus, the total number of different samples of size r that can be drawn
from a population of size n without replacement is
(n) (nl) (n2) ... (n r+l) == (n)r (VII 8)
As shown in Equation (VI18), the special symbol (n)r is used for this product. The
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI9
symbol (n)r may be written in a more familiar way by using the following algebraic
device:
(n) (nl) (n2) ... (nr+l) (nr) (nrl) ... 3·2·1
(n)r= (nr)(nrl) ... 3.2.1
(VI19)
using the definition of the factorial numbers.
Equation (VI19) constitutes one of the fundamental relationships that we have
been seeking. It gives us the number of permutations of n things taken r at a time.
n!
When r =n, we get (n)n =O! = n! because O! == 1. Thus the number of ways of
rearranging n items among themselves is just n!.
If we are interested in combinations and not permutations, we do not desire to
count the r! ways in which the r items of the sample can be rearranged among
themselves. Thus the number of combinations of n things taken r at a time is given
by
n! _ (n)
(nr)! r! = r
(VI·20)
where ( ~ ) is the symbol used to denote this quantity. If we choose at random three
elements from the population {A, B, C, D}, we have 1~ 3 ! = 4 possible combinations
and they are ABC, ABD, ACD, and BCD. Each of these can be permuted in 6 ways so
that there should be a total of 24 permutations and this agrees with
n! 4!
(
_ )1 = 1' = 4 x 3 x 2 x 1 = 24
n r. .
Consider now a population of n items, p of which are all alike, q of which are all
alike, and r of which are all alike, with (p+q+r) = n. For example, we might have the
quantities p resistors from one manufacturer, q from the second manufacturer, and r
from a third. We can distinguish among manufacturers, but resistors from the same
manufacturer are treated as being indistinguishable. Thus, there will be some
permutations among the n! possible rearrangements that cannot be differentiated.
Specifically, rearrangements of the Manufacturer 1 resistors among themselves will
lead to no new distinguishable arrangements and similarly for Manufacturer 2 and
Manufacturer 3. Thus, the total number of distinct arrangements is given by
n!
p! q! r!
(VI21 )
In general, for n items, n1 of which are alike, n2 of which are alike, ... , and nk
of which are alike (n 1 +n2 + ... +nk = n), the number of distinct arrangements is
given by
VItO
n!
FAULT TREE HANDBOOK
(VI22)
An example will show how these expressions are used in the solution of a
problem. I.et us try to calculate the probability of being dealt a full house
(3ofakind and a pair) in ordinary 5card poker.
The total number of possible 5card poker hands is simply the number of
combinations of 52 things taken 5 at a time or
(
52) 52!
5 =47! 5!= N
pH
where the subscript PH stands for "poker hands." If we can find out how many of
N
these contain a full house N
FH
, then the ratio N
FH
will give us the probability of
PH
being dealt a full house. First let us determine the number NFH. We can write a full
house symbolically as XXXYY, where X and Y stand for any two of the 13card
categories 2, 3,4, ... , J, Q, K, A. In how many ways can the categories X, Y be
chosen? Clearly we have 13 possible choices for X and then 12 possible choices for Y
so that the product (13)(12) represents the total number of ways of selecting the
categories X and Y. Now there are 4 suits and the triplet XXX must be constituted in
some way from the 4 available X's. The number of ways of accomplishing this is (j)
=4. Similarly, the pair YY must be constituted in some way from the 4 available Y's
and the number of ways of doing this is (i) = 6. Consequently N
FH
=
(13)(12)(4)(6) and
_ NFH _ (13)(12)(4)(6) _ 6 . 1
P(Full House)  ~  (52)  4168 or approxImately 700
PH 5
For an example more pertinent to reliability analysis, consider a system consisting
of n similar components; the system fails if m out of n components fail. For
example, the system may consist of 3 sensors where 2 or more sensor failures are
required for system failure (2outof3 logic). The number of ways the system can fail
if m components fail is ( ~ ) , or the number of combinations of n items taken m at a
time. If any component has a probability p of failing, the probability of anyone
combination leading to system failure is
Thus the probability of system failure from m components failing is
In addition to m components failing, the system can also fail if m+I components
fail or if m+2 components fail, etc., up to all n components failing. The number of
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VIII
ways the system can fail from k component failures is ( ~ ) where k = m+I, m+2, ... ,
k. The probability for a particular combination of k failures is
k = m+1, m+2, ... , n.
Hence, the probability of system failure from k components failing is
k=m+l,m+2, ... ,n.
To obtain the total system failure probability we add up the probabilities for m
components failing, m+1 components failing, etc., and hence the total system failure
probability is
which can also be written as
n
L ( ~ ) pk (l_p)nk
k=m
The probabilities are examples of the binomial distribution which we shall discuss
later in more detail.
6. Set Theory: Application to the Mathematical Treatment of Events
As seen in the previous section, combinatorial analysis allows us to determine the
number of combinations pertinent to an event of interest. Set theory is a more
general approach which allows us to "organize" the outcome events of an experiment
to determine the appropriate probabilities. In the most general sense, a set is a
collection of items having some recognizable features in common so that they may
be distinguished from other things of different species. Examples are: prime
numbers, relays, scram systems, solutions of Bessel's equation, etc. Our application
of set theory involves a considerable particularization. The items of immediate
interest to us are the outcome events of random experiments and our development of
the elementary notions of set theory will be restricted to the event concept.
We can think of an event as a collection of elements. Consider, for example, the
following possible events of interest associated with the toss of a die:
Athe number 2 turns up
Bthe result is an even number
Cthe result is less than 4
Dsome number turns up
Ethe result is divisible by 7.
VI12 FAULT TREE HANDBOOK
Each of these events can be considered as a particular set whose elements are drawn
from the basic outcome collection of the experiment: {I, 2, 3, 4,5,6}.
We have:
A ={2}
B={2, 4, 6}
C={I,2,3}
D= {I,2,3,4,5,6}
E = ¢ (the null, void, or empty set),
where braces, "{ }," are used to denote a particular set and the quantities within the
braces are the elements of that set.
Event A can be represented as a set having a single elementthe number 2. Both B
and C can be represented as 3element sets. Event D contains all the possible results
of the experiment. It thus coincides with the outcome collection. Any such set that
contains all the outcomes of an experiment is referred to as the universal set and is
generally denoted by the symbol n or I (also sometimes by the number 1 when the
notation is informal). E is an impossible event and can be represented by a set
containing no elements at all, the socalled null set symbolized by ¢.
Returning to our diethrowing example, we note that the element "1" belongs to
C and D but not to A or B. This fact is symbolized as follows:
IEC, lED, ItfA, ItfB.
where the symbol "E" means "is an element of' and the symbol tf means "is not an
element of."
We also note that the elements of A, B, C are contained in set D. We call A, B, C
subsets of D and write ACD, BCD, CCD. Observe also that A is a subset of both B
and C. If X and Yare two sets such that X is a subset of Y, i.e., XCV and Y is a
subset of X, i.e., YCX, then X and Yare equal (i.e., they are the same set).
As another example, consider the time of failure t of a diesel (in hours) and
consider the sets
A = {t=O}
B ={t
i
, 0 <t :s;;; 1 }
C= {ti,t>I}
The failure to start of the diesel is represented by A, i.e., "zero failure time." B
represents times of failure which are greater than zero hours (i.e., the diesel started)
and less than or equal to one hour. C represents time of failure greater than 1 hour.
Each of these sets, i.e., events, could be associated with different consequences if an
abnormal situation existed (i.e., loss of offsite power).
There exists a graphical procedure that permits a simple visualization of the set
theoretic concept which is known as a Venn diagram. The universal set is
represented by some geometrical shape (usually a rectangle) and any subsets (events)
of interest are shown inside. Figure VII represents a Venn diagram for the previous
toss of a die example.
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI·13
D
Figure VII. Venn Diagram Representation of Sets
Operations on sets (events) can be defined with the help of Venn diagrams. The
operation of union is portrayed in Figure VI2.

. .......

I
"
,
..... ,
y
fx

I .
I
I
"
I
'
,
,
.......
"
,
"'
'"
~

Figure VI2. The Operation of Union
The union of two sets of X, Y is the set that contains all elements that are either in X
or in Y or in both, is written as XUY, and is indicated by the shaded area in Figure
VI2. Returning to the die example, the union of B, Cis written,
BUC= {1,2,3,4,6}.
Note that the word "or" translates into the symbol "U."
The operation of intersection is portrayed in Figure VI3. The intersection of two
sets X,Y is the set that contains all elements that are common to X and Y, is written
as xny, and is indicated by the shaded area in Figure VI3. In the die Example
EnC ={2 } =A. Note that the word "and" translates into the symbol "n."
The operation of complementation is portrayed in Figure VI4. The complement
of a set X is the set that contains all elements that are not in X, is written X(or X'),
and is indicated by the shaded area in Figure VI4. For the die example, the
complement of the set BUC is (BUC)' = (BUC) = {5 }.
VI14
FAULT TREE HANDBOOK
Q
Figure VI3. The Operation of Intersection
x
Figure VI4. The Operation of Complementation
There is another operation (unnamed) that is sometimes defined, but it is not
independent of the operations we have already given. The operation is illustrated in
Figure VI5. If we remove from set Yall elements that are common to both X and Y,
we are left with the shaded area indicated in Figure VI5. This process is occasionally
written (YX) but the reader can readily see that
(YX) =ynx'
Q
~   Y    I
Figure VI5. The Operation (YX)
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VIIS
so that the symbol (VX) is not needed and hence will not be further used.
As an example of the use of the settheoretic approach, consider a simple system
consisting of three components A, B, C. Let us use the symbols A, B, C not only to
designate the components themselves but also the events "component success" for A,
B, C, respectively. Thus, by A, 13, C, we shall understand the events "component
failure" for A, B, C, respectively. The symbol ABC will consequently be used to
represent the event, "A operates successfully, Bfails, and C operates successfully."
Because there are 3 components and 2 modes of operation for each component
(success/failure), combinatorial analysis tells us that there are 2
3
= 8 combinations
which give all system modes of failure or operation. Thus, the universal set (or
outcome collection) is given by:
n = {ABC, ABC, ABC, ABC, ABC, ABC, ABC ABC}.
Suppose that we have determined (perhaps by fault tree analysis) that our system
will fail if any two or more of its components fail. Then the events corresponding to
system failure are:
51 = ABC
52 = ABC
53 = ABC
54 = ABC
and the event, "system failure," S, can be represented by the subset
5 = 51 U 52 U53 U 54 = {ABC, ABC, ABC, ABC}. We have enumerated, in an
exhaustive manner, all the ways the system can fail. This information can be used in
various applications. For example, knowing the probability of component failures,
we can calculate the probability of system failure. In the same way we have done
above, we may take intersections and unions of any of the basic elements (simple
events) of the universal set to generate still other events which can be represented as
sets consisting of particular elements.
This approach of enumerating the outcome events is sometimes used in the
inductive methods of system analysis (according to which we enumerate all possible
combinations of component operation or nonoperation and determine the effects of
each possibility on system behavior). The socalled "matrix approaches" are of this
nature. If our system is relatively simple or if the "components" are relatively gross
(e.g., subsystems), these inductive approaches can be efficient, in that the number of
combinations in the universal set will be fairly small. We can also use an inductive
approach to determine which combination has the most serious consequences. This
latter event can then be more fully analyzed by means of fault tree analysis.
Using the set theory concepts we have developed, we can now translate
probability equations into set theoretic terms. For example,
peA or B) = peA) +PCB)  peA and B) becomes
P(AUB) = peA) +PCB)  P(A(lB). (VI23)
VI16
FAULT TREE HANDBOOK
The equation peA and B) =P(AIB) PCB) =P(BIA) P(A) becomes
p(AnB) = P(AIB) PCB) = P(BIA) peA). (VI24)
We have introduced a new mathematical entity, the set (or, more particularly, the
event). An algebra based on the defined operations of union, intersection, and
complementation is called a Boolean algebra. By using the basic operations of union,
intersection, and complementation, Boolean algebra allows us to express events in
terms of other basic events. In our fault tree applications, system failure can be
expressed in terms of the basic component failures by translating the fault tree to
equivalent Boolean equations. We can manipulate these equations to obtain the
combinations of component failures that will cause system failure (i.e., the minimal
cut sets) and we calculate the probability of system failure in terms of the
probabilities of component failures. We shall further pursue these topics in later
sections.
7. Symbolism
Boolean algebra, which is the algebra of events, deals with event operations which
are represented by various symbols. Unfortunately, set theoretic symbolism is not
uniform; the symbology differs among the fields of mathematics, logic, and
engineering as follows:
Operation Probability Mathematics Logic Engineering
Union of A and B AorB AUB AvB A+B
Intersection of
AandB A and B AnB AI\B A·B or AB
Complement of A not A A' or A A A' or A
The symbols used in mathematics and in logic are very similar. The logical
symbols are the older; in fact the symbol "v" is an abbreviation for the Latin word
"vel" which means "or." It is unfortunate that engineering has adopted "+" for "u"
and an implied multiplication for "n." This procedure "overworks" the symbols "+"
and ".". As an example of the confusion that might arise when "+" is used for U,
consider the expression
P(AUB) = peA) + PCB)  p(AnB).
If "+" is written for "u" on the lefthand side, we have an equation with "+"
meaning one thing on the left and a totally different thing on the right.
Despite these difficulties and confusing elements in the symbology, the
engineering symbology is now quite widespread in the engineering literature and any
expectation of a return to mathematical or logic symbols seems futile. In fault tree
analysis, use of the engineering notation is widespread, and, as a matter of fact, we
.shall use it later in this book. If, however, the reader is unacquainted with event
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI17
algebra, it is strongly recommended that he use the proper mathematical symbols
until attaining familiarity with this type of algebra. TItis will serve as a reminder that
set algebraic operations are not to be confused with the operations of ordinary
algebra where numbers, and not events, are manipulated.
8. Additional Set Concepts
We shall now proceed further with certain other set concepts which will illustrate
the difference between simple events and compound events. This will be useful
groundwork for some of the fault tree concepts to follow and will also lead to a more
rigorous definition of "probability."
Consider again the throw of a single die. The event A ={2} is a simple event; in
fact it constitutes an element of the outcome collection. In contrast, the events
B ={2, 4,6 } and C ={l, 2, 3 } are compound events. They do not constitute, per se,
elements of the outcome collection, even though they are made up of elements of
the outcome collection. B and C have an element in common; therefore, their
intersection is nonempty (i.e., they are not "disjoint" subsets or, in probability
language, they are not mutually exclusive). The elements of outcome collections are,
by definition, all mutually exclusive and, thus, all mutually disjoint.
Now compound events (like B and C) are generally the ones that are of
predominant interest in the real world and it is necessary, because they are not
included in the outcome collection, to define a mathematical entity that does include
them. Such a mathematical entity is called a class. A class is a set whose elements are
themselves sets and these elements are generated by enumerating every possible
combination of the members of the original outcome collection.
As an example, consider the 4element outcome collection S ={I, 3,5, 7}. If we
list every possible combination of these 4elements we shall generate the class ~
defined on the original set S as follows:
~ = {1}, {3}, {5}, {7}, {1,3}, {1,5},
{l, 7}, {3, 5}, {3, 7}, {5, 7}, {I, 3, 5},
{1, 5, 7 }, {l, 3, 7 }, {3, 5, 7 }, {I, 3, 5, 7 }, {¢}.
Notice that the null set ¢ is considered an element of the class to provide a
mathematical description of the "impossible event." If we count the number of
elements in the class §., we fmd 16 which is 2
4
where 4 is the number of elements in
the original set S. In general, if the original set has n elements, the corresponding
class* will have 2
n
elements.
The utility of the class concept is simply that the class will contain as elements,
every conceivable result (both simple and compound) of the experiment. Thus, in the
die toss experiments, S will have 6 elements and S. will have 2
6
=64 elements, two of
which will be B= {2, 4, 6} and C = {1, 2,3 }. In the throw of 2 dice, S will contain
36 elements and .§ will have 2
36
(a number in excess of 1010) elements. Embedded
somewhere in this enormous number of subsets we find the compound event
"sum =7" which can be represented in the following way:
E ={(1, 6), (2,5), (3, 4), (4, 3), (5,2), (6, 1)}.
*When certain appropriate mathematical restrictions are imposed, a class is often referred to,
in more advanced texts, as a Borel Field.
VII 8 FAULT TREE HANDBOOK
Consider again the simple threecomponent system A, B, C, where system failure
consists of any two or more components failing. In that example, the universal set
comprised eight elements, and these eight elements gave all modes of system failure
or operation. The class based on this set would contain 2
8
= 256 elements and would
include such events as:
"A operates properly": {ABC, ABC, ABC, ABC}
"Both Band C fail": {ABC, ABC}
"Five components fail": </>
"System fails": {ABC, ABC, ABC, ABC}
The reader should bear in mind that the elements of classes are sets. Thus, the event
ABC is an element of the original universal set, whereas {ABC} is a set containing
the single element ABC and is an element of the class generated from the universal
set. The utility of the class concept is that it enables us to treat compound events in a
formal manner simply because all possible compound events are included in the class.
Perhaps the most useful feature of the class concept is that it provides us with a
basis for establishing a proper mathematical definition of the probability function.
The set theoretic definition of the probability function is shown schematically in
Figure VI6.
s
o
Figure VI6. Set Theoretic Definition of Probability
PIE)
1
In Figure VI6, the box labeled "s" represents the outcome collection for some
random experiment. It could, for instance, represent the totality of the 36 possible
outcomes in the twodice experiment. The circle labeled " ~ " represents the class
generated from set S by enumerating all combinations of the elements of S. In the
twodice example the class § possesses an enormous number « 10
1
0) of elements.
These elements represent every conceivable outcome (simple and compound) of the
experiment. Specifically, the event E = "sum of 7" is a member of .§. It is shown
schematically in Figure VI6. Next the axis of real numbers between 0 and 1 is
drawn. A function can now be defined that "maps" event E into some position on
this axis. This function is the probability function peE).
The concept of mapping may be unfamiliar. For our purpose a mapping may be
considered simply as a functional relationship. For instance the relation y =x
2
maps
all numbers x into a parabola (x =±1, Y=+1; x =±2, y = +4; etc.). The relation y =x
maps all numbers x into a straight line making a 45° angle with the yaxis. In these
examples one range of numbers is mapped into another range of numbers and we
speak of pointfunctions. The function peE) is somewhat more complicated; it maps
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI19
a set onto a range of numbers, and we speak of a set function instead of a point
function although the underlying concept is the same. In unsophisticated terms,
however, a probability function simply assigns one unique number, a probability, to
each event.
Two things should be noted. One is that probability has now been defined
without making use of the limit of a ratio. The second thing is that this definition
doesn't tell us how to calculate probability; rather, it delineates the mathematical
nature of the probability functions. If E is the event "sum = ? ," we already know
how to calculate its probability assuming that all 36 outcomes in the outcome
collection are equally likely:
6 I
P(E)==
36 6
Of course this is a particularly simple example. In other cases we may have to
investigate the physical nature of the problem in order to develop the probabilities of
events of interest.
9. Bayes' Theorem
The formula of Bayes plays an important and interesting role in the structure of
probability theory, and it is of particular significance for us because it illustrates a
way of thinking that is characteristic of fault tree analysis. We shall first develop the
formula using a set theoretic approach, and then discuss the meaning of the result.
Figure VI·? portrays the "partitioning" of the universal set on into subsets Ai'
A
2
, A
3
, A
4
, A
5
·
B
Figure VI7. Partition of the Universal Set
The A's have the following characteristics:
i=5
Ai UA
2
UA
3
UA
4
UA
5
= U~ =on
i=1
(VI25)
VJ20
FAULT TREE HANDBOOK
(VI26)
Any set of A's having the properties of Equation (VI·2S) is said to constitute a
partition of the universal set. Also shown in Figure VI? is another subset B. The
reader can show (by shading the appropriate regions in the Venn diagram) that
(Actually BnA
1
= ¢ but ¢UX = X, where X is any set.) The expression for B above
can be written in a more mathematically succinct form:
i=5
B= UBnA
i
i=1
in which the large union symbol implies a succession of unions just as the symbol ~
implies a succession of sums. Similarly, the large intersection symbol implies a
succession of intersections just as the symbol nimplies a succession of products. We
shall return to (VI·26) in a moment.
Consider now the probability equation for an intersection,
p(AnB) = P(AIB) P(B) = P(BIA) P(A).
This is true for any arbitrary events A, B. In particular it will be true for Band any
one of the A's in Figure VI? Thus, we can write
(VI27)
or
(VI·28)
We can now write P(B) in a different way by using Equation (VI26).
{
iS }
P(B) = P UBnA
i
1=1
i=S i=5
= L p(BnA
i
) = L P(BIA
i
) P(A
i
)
i=1 i=1
which can be done because the events (BnA
i
) are mutually exclusive. If we substitute
this expression for P(B) into (VI·28), we obtain
P(BIA
k
) P(A
k
)
P(A IB)  
k  ~ P(BiA.) P(A.)
. 1 1
1
(VI·29)
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI·2!
This is Bayes' theorem. The equation is valid in general for any number of events ~ ,
A
2
, •.. , An which are exhaustive and are mutually exclusive (see Equation (VI2S)).
The summation then extends from i =1 to i =n instead of i =1 to i =S.
We now discuss some of the meanings of Equation (VI29). Suppose that some
event B has been observed and that we can make a complete list of the mutually
exclusive causes of the event B. These causes are just the A's. Notice, however, the
restrictions on the A's given by the relations (VI2S). * Now, having observed B, we
may be interested in seeking the probability that B was caused by the event A
k
. This
is what (VI29) allows us to compute, if we can evaluate all the terms of the
righthand side.
The Bayseian approach is deductive: given a system event, what is the probability
of one of its causative factors? This is to be contrasted with the inductive approach:
given some particular malfunction, how will the system as a whole behave? The use
of Bayes' formula will now be illustrated by a simple example for which it is
particularly easy to enumerate the A's.
Suppose that we have three shipping cartons labeled I, II, III. The cartons are all
alike in size, shape, and general appearance and they contain various numbers of
resistors from companies X, Y, Z as shown in Figure VJ8.
3X
9 ITEMS
4Y 2Z
1X
6 ITEMS
2Y
II
3Z 2X
9 ITEMS
3Y
III
4Z
Figure VI8. Illustration of the Use of Bayes' Formula
A random experiment is conducted as follows: First, one of the cartons is chosen at
random. Then two resistors are chosen from the selected box. When they are
examined, it is found that both items are from Company Z. This latter event we
identify with event B in our general development of Bayes' rule. The "causes" of B
are readily identified: either carton I was chosen or carton II was chosen or carton III
was chosen. Thus,
Al = choice of carton I
A
2
= choice of carton II
A
3
= choice of carton III
Now we should be able to work out the probability that, given event B, it was carton
I that was originally chosen.
*The restrictions on the A
j
given by (VI2S) are equivalent to the restrictions: ~ P(A
j
) = I and
p(AjnA
j
) = O;i *j.
VI22 FAULT TREE HANDBOOK
It would appear natural enough to set
because the boxes are all alike and a random selection among them is made. We now
need to evaluate the terms P(BIA
I
), P(BIA
2
) and P(BIA
3
). This is easily done from
Figure VIB.
P(BIAI) =( ~ ) ( ~ ) = 3
1
6
P(BIA
2
) =( ~ ) (;) =+
P(BIA
3
) = ( ~ ) ( ~ ) = t·
When these numbers are substituted into Bayes' formula we obtain
In a similar way we can calculate
36 30
P(A
2
I
B
) = 7i and P(A
3
IB) =71'
Thus, if event B is actually observed, the chances are about 5050 that carton II was
originally chosen.
As another example, refer once more to our simple system made up of three
components. We have already determined that the system can fail in anyone of the
four modes, Sl' S2' S3' S4' If the system fails and we wish to know the probability
that its failure mode was S3' we can compute:
_ _ P(SIS
3
) P(S3)
P(S3
IS
) = P(SIS
I
) P(SI) +P(SIS2) P(S2) +P(SIS
3
) P(S3) +P(SIS
4
) P(S4) .
This can be written in a simpler form because the system will surely fail if anyone of
the events Sl' S2' S3' or S4 occurs.
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI23
  P(S3)
P(S3
IS
) = peSt) +P(S2) +P(S3) +P(S4)
the quantities P(Si) can be estimated from reliability data. quantity
P(SiIS) is sometimes called the "importance" of system failure cause Si. Bayes'
Theorem is sometimes applied to obtain optimal repair schemes and to determine the
most likely contributors to the system failure (i.e., which of the P(SiIS) is the
largest).
CHAPTER VII  BOOLEAN ALGEBRA AND APPLICATION
TO FAULT TREE ANALYSIS
I. Rules of Boolean Algebra
In the previous chapter we developed the elementary theory of sets as applied,
specifically, to events (outcomes of random experiments). In this chapter we are
going to further develop the algebra of events, called Boolean algebra, with particular
application to fault trees. Boolean algebra is especially important in situations
involving a dichotomy: switches are either open or closed, valves are either open or
closed, events either occur or they do not occur.
The Boolean techniques discussed in this chapter have immediate practical
importance in relation to fault trees. A fault tree can be thought of as a pictorial
representation of those Boolean relationships among fault events that cause the top
event to occur. In fact, a fault tree can always be translated into an entirely
equivalent set of Boolean equations. Thus an understanding of the rules of Boolean
algebra contributes materially toward the construction and simplification of fault
trees. Once a fault tree has been drawn, it can be evaluated to yield its qualitative and
quantitative characteristics. These characteristics cannot be obtained from the fault
tree per se, but they can be obtained from the equivalent Boolean equations. In this
evaluation pr0cess we use the algebraic reduction techniques discussed in this
chapter.
We present the rules of Boolean algebra in Table VIII along with a short
discussion of each rule. The reader is urged to check the validity of each rule by
recourse to Venn diagrams. Those readers who are mathematically inclined will
detect that the rules, as stated, do not constitute a minimal necessary and sufficient
set. Here and elsewhere, the authors have sometimes sacrificed mathematical elegance
in favor of presenting things in a form that is more useful and understandable for the
practical system analyst.
According to (la) and (lb), the union and intersection operations are commutative.
In other words, the commutative laws permit us to interchange the events X, Y with
regard to an "AND" operation. It is important to remember that there are
mathematical entities that do not commute; e.g. the vector cross product and
matrices in general.
Relations (2a) and (2b) are similar to the associative laws of ordinary
algebra: a(bc) = (ab)c and a + (b+c) = (a+b) + c. If we have a series of "OR"
operations or a series of "AND" operations, the associative laws permit us to group
the events any way we like.
The distributive laws (3a) and (3b) provide the valid manipulatory procedure
whenever we have a combination of an "AND" operation with an "OR" operation. If
we go from left to right in the equations, we are simply reducing the lefthand
expression to an unfactored form. In (3a), for example, we operate with X on Y and
on Z to obtain the righthand expression. If we go from right to left in the equations,
we are simply factoring the expression. For instance, in (3b) we factor out X to
obtain the lefthand side. Although (3a) is analogous to the distributive law in
ordinary algebra, (3b) has no such analog.
VIII
VD2
FAULT TREE HANDBOOK
Table VII2. Rules of Boolean Algebra
Mathematical
Symbolism
(1a) xnv = vnx
(1b) X u V = VU X
(2a) X n (V n z) = (X n V) n Z
(2b) X u (V u z) + (X u V) u Z
(3a) xn(vuz) = (xnv)u(xnz)
(3b) X U(V nz) = (X UV) niX UZ)
(4a) X nx = X
(4b) X ux = X
(5a) X n (X u VI = X
(5b) Xu (X n V) = X
(6a) X nx' = </J
(6b) X u X' = H = 1*
(6c) (X')' = X
(7a) (X n V)' = X' u V'
(1b) (X u V)' = X' n V'
(Ia) ¢nx = 1>
(Ib) <;)UX = X
(I,e) s ~ n x = X
(Id) nUx = H
(Ie) <;)' " n
(Bf) n' = <;)
(9a) X U (X' n V) = X u V
(9b) X; n (X u V') = X' n V' = (X u V)'
Engineering
Symbolism
X'V = V'X
X+V = V+X
x. (V • Z) = (X • V) • Z
X(VZ) = (XV)Z
X + (V + Z) " (X + V) +Z
X • (V + Z) = X • V + X • z
XIV +Z) = XV +XZ
X + V • Z = (X + V) • (X + Z)
X· X = X
X+X = X
x . (X +V) = X
x+x,v=x
x· X' =¢
X+X' " n = I
(X')' = X
(X • V)' = x' + V'
(X+V)· = X"V'
9' x "¢
rp+ x " x
n·x " x
n+x = n
t/J' = 52
n' "¢
x+x'· V = X +V
x' • (X + V') = X' . V' = (X + V)'
Designation
Commutative Law
Associative Law
Distributive Law
Idempotent Law
Law of Absorption
Complementation
de Morgan's Theorem
Operations with q, and n
These relationships are
unnamed but are fre
quently useful in the
reduction process.
*The symbol I is often used instead of n to designate the Universal Set. In engineering notation S2 is often replaced by 1
and ¢by O.
The idempotent laws (4a) and (4b) allow us to "cancel out" any redundancies of
the same event.
The laws of absorption (Sa) and (Sb) can easily be validated by reference to an
appropriate Venn diagram. With respect to (Sa), we can also argue in the following
way. Whenever the occurrence of X automatically implies the occurrence ofY, then
X is said to be a subset ofY. We can symbolize this situation as XCY or X+Y. In this
case X+¥= Y and XY= X. In (Sa), if X occurs then (X+Y) has also occurred and
XC(X+Y); therefore X(X+Y) = X. We can develop a similar argument in the case of
(Sb).
De Morgan's theorems (7a) and (7b) provide the general rules for removing primes
on brackets. Suppose that X represents the failure of some component. Then X'
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII3
represents the nonfailure or successful operation of that component. In this light
(7a) simply states that for the double failure of X and Y not to occur, either X must
not fail or Y must not fail.
As an application of the use of these rules, let us try to simplify the expression
(A+B) 0 (A+C) 0 (D+B) 0 (D+C).
We can apply (3b) to (A+B) 0 (A+C) obtaining
(A+B) 0 (A+C) =A+(BC).
Likewise,
(D+B)o(D+C) =D+(BoC).
We thus have as an intermediate result
(A+B)
0
(A+C)
0
(D+B)o (D+C) =(A+BoC)o(D+BoC).
Ifwe now let E represent the event BoC, we have
(A+BC)o(D+BC) =(A+E)
0
(D+E) =(E+A)o(E+D).
Another application of (3b) yields
(E+A)o(E+D) = E+AoD = BoC + AoD.
We, therefore, have as our final result
(A+B)o(A+C)o(D+B)o(D+C) =BoC + AoD.
The original expression has been substantially simplified for purposes of evaluation.
There are more general rules for simplifying Boolean functions and we will discuss
these later in this chapter. For the moment we are concerned with what can be
accomplished by more or less unsystematic manipulation of the algebra. A few
examples of this procedure are now given. The reader should work carefully through
these illustrations and ascertain at each step which of the rules, (1 a)  (9b), are being
used.
Example IShow that
[(Ao B) + (Ao B') + (A' 0 B')]' =A' 0 B.
This example can be worked either by (a) removing the outermost prime as a first
step or by (b) manipulating the terms inside the large brackets and removing the
outermost prime as a last step. In either case the removal of primes is accomplished
by using (7a) or (7b).
(a) [(AoB) + (AoB') + (A'oB')]'
=(AoB)' 0 (AoB')' 0 (A'oB')'
= (A'+B') 0 (A'+B) 0 (A+B)
=A' + (B'oB) 0 (A+B)
= (A'+¢) 0 (A+B)
=A' 0 (A+B)
=(A' 0 A) +(A' 0 B)
= ¢ + (A'oB)
=A' 0 B.
VII4
(b) [(AoB)+(AoB')+(A'oB')]'
= [(Ao(B+B') + (A' 0 B')]'
= [A·n + (A'oB'»)'.
= [A +(A' 0 B')]'
= [(A+A') 0 (A+B')]'
= [n • (A+B')]'
= [A+B']' = A'· B.
Example 2Show that
(A' 0 BoC')'  (A0 B' ·C')' = C + [(A' 0 B') + (Bo A)]
(A' 0 BC')' 0 (Ao B' c')'
= (A+B'+C) 0 (A'+B+C)
= C + [(A+B') 0 (A'+B)]
= C + [(A+B') 0 A' + (A+B') 0 B]
= C + [(A'· A) + (A'· B') + (Bo B') + (Bo A)]
= C + [p + (A' 0 B') + ¢ + (Bo A)]
= C + [(A'oB'.) + (BA)].
Example 3Show that
[(Xo Y) + (Ao BoC)] 0 [(Xo Y) + (A'+B'+C')] = Xo Y
FAULT TREE HANDBOOK
Applying de Morgan's theorem to the second tenn inside the second bracket, we have
[(X0 Y) +(A0 B0 C)] 0 [(X0 Y) +(A· B· C)'].
Now let L= XoY, M= (ABoC), and we have
(L+M)· (L+M') = L 0 (M+M') = L + n = L= X·Y
and the original statement is proved. Notice in this example that A,B,C are
completely redundant.
2. Application to Fault Tree Analysis
In this section we shall relate the Boolean methodology to fault trees. A fault tree,
as we now know, is a logic diagram depicting certain events that must occur in order
for other events to occur. The events are termed "faults" if they are initiated by
other events and are tenned "failures" if they are the basic initiating events. The
fault tree interrelates events (faults to faults or faults to failures) and certain symbols
are used to depict the various relationships (see Chapter IV). The basic symbol is the
"gate" and each gate has inputs and an output as shown in Figure VIII.
The gate output is the "higher" fault event under consideration and the gate
inputs are the more basic ("lower") fault (or failure) events which relate to the
output. When we draw a fault tree, we proceed from the "higher" faults to the more
basic faults (i.e., from output to inputs). In this process (Chapter V), certain
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII5.
I
OUTPUT
I
GATE
,.. 1 11 .....
INPUT INPUT
Figure VIII. The Gate Function in a Fault Tree
INPUT
techniques are used to determine which category of gate is appropriate. The two
basic gate categories are the ORgate and the ANDgate. Because these gates relate
events in exactly the same way as the Boolean operations that we have just discussed,
there is a onetoone correspondence between the Boolean algebraic representation
and the fault tree representation.
The ORGate
The fault tree symbol CJ is an ORgate which represents the union of the
events attached to the gate. Anyone or more of the input events must occur to cause
the event above the gate to occur. The ORgate is equivalent to the Boolean symbol
"+." For example, the ORgate with two input events, as shown in Figure VII2, is
equivalent to the Boolean expression, Q=A+B. Either of the events A or B, or both
must occur in order for Q to occur. Because of its equivalence to the Boolean union
operation denoted by the symbol "+," the ORgate is sometimes drawn with a "+"
inside the gate symbol as in Figure VII2. For n input events attached to the ORgate,
the equivalent Boolean expression is Q = Al + A
2
+ A
3
+ ... + An'
In the terms of probability, from Equation (VI4):
P(Q) = P(A) + P(B) p(AnB)
or
= P(A) + P(B)P(A)P(BIA)
(VIII)
VII6
A
FAULT TREE HANDBOOK
Q
B
Figure VII2. A TwoInput ORGate
From our discussion of probabilities in Chapter VI, we can make the following
observations about Equation (VIII):
• If A and B are mutually exclusive events, then p(AnB) =aand
P(Q) = peA) +PCB);
• If A and B are independent events, then P(BIA) =PCB) and
P(Q) =peA) +PCB)  peA) PCB);
• If event B is completely dependent on event A, that is, whenever A
occurs, B also occurs, then P(BIA) =1 and
P(Q) = peA) +PCB)  peA)
=PCB);
• The approximation P(Q) =:: peA) + PCB) is, in all cases, a conservative
estimate for the probability of the output event Q, i.e.,
peA) +PCB) ~ peA) +PCB)  p(AnB) for all A, B;
• If A and B are independent, low probability events (say peA),
PCB) < 10 1), then p(AnB) is small compared with peA) +PCB) so
that peA) +PCB) is a very accurate approximation of P(Q). The reader
will remember peA) +PCB) as the "rare event approximation"
discussed in Chapter VI.
These observations, especially the last two, will become very important when we
discuss quantification in Chapter XI.
In Chapter IV, we briefly discussed the EXCLUSIVE ORgate. As the reader
should remember, the output event Q of an EXCLUSIVE ORgate with two input
events A and B occurs if event A occurs or event B occurs, but not both. The
probability expression for the output event Q of an EXCLUSIVE ORgate is:
P(Q)EXCLUSIVE OR= peA) + PCB)  2P(AnB)
(VII2)
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII7
Comparing Equations (VIII) and (VII2), we observe that if A and Bare
independent low probability component failures, the difference in probability
between the two expressions is negligible. This is why the distinction between the
inclusive and exclusive ORgates is generally not necessary in Fault Tree Analysis
where we are often dealing with independent, low probability component failures. It
may sometimes, however, be useful to make the distinction in the special case where
the exclusive OR logic is truly required, and in addition where there is a strong
dependency between the input events and the failure probabilities are high. In this
latter case, the intersection term may be large enough to significantly effect the
result. In conclusion, it should be observed that in any case, the error which is made
by using the inclusive rather than the exclusive ORgate biases the answer on the
conservative side because the inclusive OR has the higher probability. In the
remainder of this text, unless otherwise noted, all references to the ORgate should
be interpreted as the inclusive variety.
Figure VII3 shows a realistic example of an ORgate for a fault condition of a set
of normally closed contacts.
NORMALLY CLOSED
RELAY CONTACTS
FAIL TO OPEN
RELAY COIL
NOT
DE·ENERGIZED
RELAY CONTACTS
FAIL
CLOSED
Figure VII3. A Specific TwoInput ORGate
Tllis ORgate is equivalent to the Boolean expression
RELAY CONTACTS
FAIL TO OPEN
RELAY COIL
NOT
DE·ENERGIZED +
CONTACTS FAIL
CLOSED
Instead of explicitly describing the events, a unique symbol (Q, A
2
, etc.) is usually
associated with each event as shown in Figure VII2. Therefore, if the event "relay
contacts fail to open" is labeled "Q," "relay coil not deenergized" is labeled "A,"
VII'8 FAULT TREE HANDBOOK
and "contacts fail closed" is labeled "B," we can represent the ORgate of Figure
VII3 by the Boolean equation Q=A+B.
An ORgate is merely a reexpression of the event above the gate in terms of the
more elementary input events. The event above the gate encompasses all of these
more elementary events; if anyone or more of these elementary events occurs, then
B occurs. This interpretation is quite important in that it characterizes an ORgate
and differentiates it from an ANDgate. The input events to an ORgate do not cause
the event above the gate; they are simply reexpressions of the event above the gate.
We have discussed this topic before, but we feel it is so important to fault tree
analysis that we cover it again, having now gone through the Boolean algebra
concepts.
Consider two switches in series as shown in Figure VII4. The points A and Bare
points on the wire. If wire failures are ignored then the fault tree representation of
the event, "No Current to Point B" is shown in Figure VII5.
SWITCH 1 SWITCH 2
_ I    " ' ~ •        e ~ +
Figure VII4. Two Switches in Series
NO CURRENT
TO POINT B
SWITCH 1
IS OPEN
SWITCH 2
IS OPEN
NO CURRENT
TO POINT A
Figure VII5. A Specific ThreeInput ORGate
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII9
If the events are denoted by the symbols given below, then the Boolean
representation is B = Al +A
2
+A
3
.
NO CURRENT
TO POINT B = EVENT B
SWITCH 1
IS OPEN = EVENT A1
SWITCH 2
IS OPEN = EVENT A2
NO CURRENT
TO POINT A = EVENT A3
The event B occurs if Al or A
2
or A
3
occurs. Event B is merely a reexpression of
events AI' A
2
, A
3
. We have classified the particular events AI' A
2
, A
3
as belonging
to the general event B.
The ANDGate
The fault tree symbol 0 is an ANDgate which represents the intersection
of the events attached to the gate. The ANDgate is equivalent to the Boolean symbol
".". All of the input events attached to the ANDgate must exist in order for the
event above the gate to occur. For two events attached to the ANDgate, the
equivalent Boolean expression is Q=A· B, as shown in Figure VII6. Because of its
equivalence to the Boolean intersection operation denoted by the symbol ".", that
symbol is sometimes included inside the ANDgate as in Figure VII6. For n input
events to an ANDgate, the equivalent Boolean expression is
In this case, event Q will occur if and only if all the Ai occur. In terms of probability,
from Equation (VIlO):
P(Q) = P(A)P(BIA) = P(B)P(AIB). (VII3)
From our discussion of probability in Chapter VI, we can make the folloWing
observations about Equation (VII3):
• If A and B are independent events, then P(BIA) = PCB), P(AIB) = peA),
and P(Q) = peA) PCB);
• If A and B are not independent events, then P(Q) may be significantly
greater than P(A)P(B). For example, in the extreme case where B
depends completely on A, that is, whenever A occurs, B also occurs,
tbn P(BIA) = 1 and P(Q) = peA).
VIItO FAULT TREE HANDBOOK
Q
A
•
I I
I I
A B
Figure VII6. A TwoInput ANDGate
Again, the reader should bear these observations in mind for future reference when
we discuss fault tree quantification in later chapters.
The events attached to the ANDgate are the causes of the event above the gate.
Event Q is caused only if every one of the input events occurs. This causal
relationship is what differentiates an ANDgate from an ORgate. If the event above
the gate occurs when anyone of the input events occurs, then the gate is an ORgate
and the event is merely a restatement of the input events. If the e v ~ n t above the gate
occurs only when combinations of more elementary events occur, then the gate is an
ANDgate and the inputs constitute the cause of the event above the gate.
We conclude this section with an example showing how Boolean algebra can be
used to restructure a fault tree. Consider the equation D = A  (B+C). The
corresponding fault tree structure is shown in Figure VII?
Now according to Rule 3a, event D can also be expressed as D =(A B) + (AC).
The fault tree structure for this equivalent expression for D is shown in Figure VII8.
The two fault tree structures in Figures VII? and VII8 may appear to be
different; however, they are equivalent. Thus, there is not one "correct" fault tree
for a problem but many correct forms which are equivalent to one another. The rules
of Boolean algebra can thus be applied to restructure the tree to a simpler, equivalent
form for ease of understanding or for simplifying the evaluation of the tree. Later,
we shall apply the rules of Boolean algebra to obtain one form of the fault tree,
called the minimal cut set form, which allows quantitative and qualitative evaluations
to be performed in a straightforward manner.
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIIII
o
A
B
BORC
C
Figure VII7. Fault Tree Structure for D = A • (B+C)
o
AANDB AANDC
A B A
c
Figure VII8. Equivalent Form for the Fault Tree of Figure VII7
VII12 FAULT TREE HANDBOOK
3. Shannon's Method for Expressing Boolean Functions
in Standardized Fonns
In the previous sections we have discussed how Boolean functions may be
expressed in a number of ways by applying the algebraic rules presented earlier in
this chapter. In this section we shall discuss Shannon's method for expanding
Boolean functions. This method is a general expansion technique applicable to any
Boolean function. According to Shannon's method, a Boolean function of n variables
can be expanded about one, two, ..., or all n of the variables. When the expansion is
"complete" (Le., when it is taken about all n of the variables), the result is referred
to as an "expansion in minterrns." The latter constitutes a standard or canonical
form consisting of all combinations of occurrences and nonoccurrences of the events
of interest.
First some preliminaries need to be reviewed. A Boolean variable is a twovalued
variable. For instance if E designates some event of interest, then E=l indicates that
the event has occurred and E=O indicates that the event has not occurred. For this
reason, theorems in Boolean algebra are much more easily proved than theorems in
ordinary algebra in which the variable may take on an infinity of values.
Consider a function of the n Boolean variables Xl' X
2
, X
3
, ... , X
n
:
f(X
I
, X
2
, X
3
, ... , Xn)'
The Boolean function can take on only two values: 1 (occurs) and 0 (does not
occur). This function may be expanded about one of its arguments (say Xl) in the
following way:
(VII4)
In equation (VII4), because we are dealing with events, we must remember that the
dots (.) and pluses (+) represent the intersection and union operations. The
symbolism f(1, X
2
,· .. , Xn) indicates that 1 has been substituted for Xl' We
may simplify f(1, X
2
... , Xn) using the rules of Boolean algebra. For example
if f(X
I
, X
2
) = Xl • X
2
, then f(1, X
2
) = 1 • X
2
= X
2
. The reader can readily
show that equation (VII4) is correct by considering the only two possibilities,
Xl = 1 and Xl = 0, Le., Xl occurs or does not occur.
Equation (VII4) can be extended to the expansion of a Boolean function about
two, three, or, indeed, all of its arguments. For instance, the extension of equation
(VII4) to the expansion about the two variables Xl and X
2
is:
f(XI ' X
2
, X3' ... , ~ ) = [XI • X
2
• f(1, 1, X3' . . . , X
n
)]
+ Xl • X
'
2
• f(1, 0, X
3
,· .. , Xn)] +X'I· X
2
• f(O, 1, X
3
,· .. , ~ ) ]
+X' I • X' 2 • f(O, 0, X3' ... , X
n
)]
(VII5)
The reader should note that equation (VII5) can be obtained by simply expanding
f{1, X
2
, ... , Xn) and f(O, X
2
, ..• , Xn) about X
2
in the manner of equation (VII4).
When the expression is carried out about all the variables Xl' X
2
, X
3
,· .. , Xn' the
Boolean expressions lying outside the functional representations are called minterms
which consist of combinations of certain X's occurring and others not occurring. In
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII13
the complete expansion there will be 2
n
minterms consisting of all combinations of
the possible occurrences and nonoccurrences of the X variables. Each minterm
expression will have as a "coefficient" the function f evaluate(f at the appropriate 1's
and D's corresponding to the occurrences and nonoccurrences of X.
In addition to the minterm expansion, the complementary maxterm expansion
can be obtained for any Boolean function. The maxterm expansion is simply
obtained from the minterm expansion by interchanging' for +and afor 1, and vice
versa. Thus from equation (VII4), the maxterm expansion in one variable is
f(XI ' X
2
' ... , ~ ) =[XI + f(0, XI ... , ~ ) ] . [X' I +f( 1, Xx' ... , ~ ) ]
(VII6)
Analogous forms are obtained for higher order expansions. Instead of obtaining a
sum (union) of combinations (intersections) as in the minterm expansion, we obtain
a combination (intersection) of sums (unions) for the maxterm expansion.
In the minterm (or maxterm) expansion, each combination term is disjoint* from
all the others. Thus in equation (VIIS) the four terms
X
I
'X
2
• f(I, 1, X 3 " " ' ~ )
X
I
'X'2 • f(l, 0, X 3 " " ' ~ )
X'I •X
2
• f(O, 1, X
3
' , X
n
)
X'I •X' 2 • f(O, 0, X
3
' , X
n
)
are disjoint from one another. This characteristic is generally true for any order
expansion and it can be exploited in quantifying fault trees whenever Shannon's
expansion is used. Another reason for representing a Boolean function in either
minterms or maxterms is that these expressions are unique for any given function.
Such an expansion, then, provides a general technique for determining whether two
Boolean expressions are equal, because if they are, they will have identical minterm
(or maxterm) forms.
Shannon's expansion will now be illustrated for the two 3variable functions:
(a) f(X,Y,Z) =(X·y) +(X'·Z) +(Y·Z)
(b) f(X,Y,Z) = (X·y) +(X'·Z).
The reader can show, in a few simple algebraic steps, that the functions in (a) and (b)
are indeed equal but we wish to use (a) and (b) to exemplify the general process
involved in using Shannon's expansion. We shall start with (a) and expand it using the
minterm expansion.
(X· Y) +(X' .Z) + (y. Z)
=[X·Y·Z·f(I, 1, 1)] + [X·Y·Z
I
·f(1, 1,0)] + [X·Y'·Z·f(1, 0,1)]
+ [X·Y·Z'·f(1, 0, 0)] + [X"Y'Z'f(O, 1, 1)] + [XI·Y·Z'·f(O, 1,0)]
+ [X'.Y'.Z'f(O, 0,1)] + [X'·Y'·Z'·f(O, 0, 0)]. (VII7)
*Two events are disjoint if their intersection is the null set, or equivalently, two events are
disjoint if the probability of both events is zero.
VII14
FAULT TREE HANDBOOK
Note that this expression is valid for any 3variable Boolean function. Now f(1, I, 1),
f(1, 1, 0), etc., can be readily evaluated by making the appropriate substitutions in
the original functional form as follows:
f(1,I,I)=(11)+(OI)+(11)= 1+0+1 = I
f(1,1,0)=(11)+(O0)+(10)= 1+0+0= 1
f(1 ,0,1) = (1,0) + (0·1) + (0·1) = 0+0+0 = °
f(l ,0,0) = (10) + (00) + (00) = 0+0+0 = °
f(O,I,I) = (01) + (11) + (11) = 0+1+1 = 1
f(O,1 ,0) = (01) + (10) + (10) = 0+0+0 = °
f(O,O,I) = (00) + (11) + (01) = 0+1 +0 = 1
f(O,O,O) = (00) + (10) + (00) = 0+0+0 = °
When these values are substituted into the expanded form of (a) above, the result is
the unique minterm expansion of (a):
(XY) +(X' Z) +. (YZ) =
(XYZ) + (XYZ') + (X'YZ) + (X'Y'Z).
in which all of the terms in parentheses are disjoint.
It now remains to evaluate f(I,I,l), f(1,1 ,0), etc., for expression (b).
f(1,I,I)={lI)+(OI)= 1+0= 1
f{l,I,O)={lI)+(OO)= 1+0= 1
f(1,O,l) = (10) + (01) = 0+0 = °
f(1 ,0,0) = (10) + (0,0) = 0+0 = °
f(O,I,I) = (0,1) + (lI) = 0+1 = 1
f(O,I,O) = (0,1) + (10) = 0+0 = °
f(O,O,I)=(O'O)+(II)=O+1 = 1
f(O,O,O) = (0,0) + (1'0) = 0+0 = °
It is now apparent that (b) has exactly the same minterm expansion as (a) and there
fore we have proved that (X· Y) + (X' •Z) + (YZ) = (X Y) + (X' Z).
It is of interest to view the minterms of equation (VII?) with the help of Venn
diagrams. This is shown in Figure VII9. The reader should note from Figure VII9
two important properties of the minterms: They are all mutually disjoint and their
collective union yields the universal set.
The maxtermexpansion of a 3variable Boolean function can be obtained from an
extension of equation (VII6) as follows:
f(X, Y, Z)
= [X+Y+Z+f(O,O,O)]  [X+Y+Z'+f(O,O,I)] • [X+Y'+Z+f(O,I,O)]
• [X+Y'+Z'+f(O,1 ,1)] . [X'+Y+Z+f{1,O,O)]  [X'+Y+Z' +f(1,O,1)]
• [X'+Y'+Z+f{l ,1,0)]  [X'+Y'+Z'+f(1,1 ,1)]
(VII8)
The reader should now be able to show that the maxterm expansions of (a) and (b)
are identical.
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIIIS
X·V·z
X' • V • Z
X • V • Z'
X' • V • Z'
X • V' • Z
X' • V' • Z
X • V' • Z'
X' • V' • Z'
Figure VII9. Venn Diagram Representation of the Minterms of a
3Variable Boolean Function
4. Determining the Minimal Cut Sets or Minimal Path Sets
of a Fault Tree
One of the main purposes of representing a fault tree in terms of Boolean
equations is that these equations can then be used to determine the fault tree's
associated "minimal cut sets" and "minimal path sets." The minimal cut sets define
the "failure modes" of the top event and are usually obtained when a fault tree is
evaluated. Once the minimal cut sets are obtained, the quantification of the fault tree
is more or less straightforward. The minimal path sets are essentially the
complements of the minimal cut sets and define the "success modes" by which the
top event will not occur. The minimal path sets are often not obtained in a fault tree
evaluation; however, they can be useful in particular problems.
Minimal Cut Sets
We can formally define a minimal cut set as follows: a minimal cut set is a smallest
combination of component failures which, if they all occur, will cause the top event
to occur.
By the definition, a minimal cut set is thus a combination (intersection) of
primary events sufficient for the top event. The combination is a "smallest"
combination in that all the failures are needed for the top event to occur; if one of
the failures in the cut set does not occur, then the top event will not occur (by this
combination).
Any fault tree will consist of a finite number of minimal cut sets, which are
unique for that top event. The onecomponent minimal cut sets, if there are any,
represent those single failures which will cause the top event to occur. The
twocomponent minimal cut sets represent the double failures which together will
VIII 6
FAULT TREE HANDBOOK
cause the top event to occur. For an ncomponent minimal cut set, all n components
in the cut set must fail in order for the top event to occur.
The minimal cut set expression for the top event can be written in the general
form,
where T is the top event and M
1
are the minimal cut sets. Each minimal cut set
consists of a combination of specific component failures, and hence the general
ncomponent minimal cut can be expressed as
where Xl' X
2
, etc., are basic component failures on the tree. An example of a top
event expression is
T=A+BC
where A, B, and C are component failures. This top event has a onecomponent mini
mal cut set (A) and a twocomponent minimal cut set (BC). The minimal cut sets
are unique for a top event and are independent of the different equivalent forms the
same fault tree may have.
To determine the minimal cut sets of a fault tree, the tree is first translated to its
equivalent Boolean equations and then either the "topdown" or "bottomup"
substitution method is used. The methods are straightforward and they involve
substituting and expanding Boolean expressions. Two Boolean laws, the distributive
law and the law of absorption, are used to remove the redundancies.
Consider the simple fault tree shown in Figure VIItO; the equivalent Boolean
equations are shown below the tree.
Figure VIIto. Example Fault Tree
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIII 7
T =E
1
·E
2
E
1
=A+E
3
E
3
= B+C
E
2
= C+£4
E
4
=A'B
We will first perform the topdown substitution. We start with the top event equation
and substitute and expand until the minimal cut set expression for the top event is
obtained. Substituting for E
1
and E
2
and expanding we have:
T =(A+E
3
) • (C+E
4
)
=(A' C) + (E
3
'C) + (E
4
' A) + (E
3
•£4)
Substituting for E
3
:
T =A· C +(B+C) • C + E
4
•A + (B+C) • E
4
=A·C + B·C + C'C + E
4
·A + E
4
·B + E
4
·C.
By the idempotent law, C'C =C, so we have:
But A'C +B'C +C +E
4
• C = C by the law of absorption. Therefore,
T =C+ E
4
• A+E
4
• B.
Finally, substituting for E
4
and applying the law of absorption twice
T =C + (A· B)' A + (A' B)' B
=C+A'B.
The minimal cut sets of the top event are thus C and A· B, one single component
minimal cut set and one double component minimal cut set. The fault tree can thus
be represented as shown in Figure VII·II which is equivalent to the original tree
(both trees have the same minimal cut sets).
Figure VIIII. Fault Tree Equivalent of Figure VIII 0
VIII 8 FAULT TREE HANDBOOK
The bottomup method uses the same substitution and expansion techniques,
except that now the operation begins at the bottom of the tree and proceeds upward.
Equations containing only basic failures are successively substituted for higher faults.
The bottomup approach can be more laborious and timeconsuming; however, the
minimal cut sets are now obtained for every intermediate fault as well as the top
event.
Consider again our example tree (we repeat the equivalent Boolean equations for
the reader's convenience).
T = E
1
E
2
E
1
=A+E
3
E
3
= B+C
E
2
= C+E
4
E
4
=AB
Because E
4
has only basic failures, we substitute into E
2
to obtain
E
2
=C+AB.
The minimal cut sets of £2 are thus C and A B. E
3
is already in reduced form having
minimal cut sets Band C. Substituting into £1' we obtain E
1
= A+B+C so E
1
has
three minimal cut sets A, B, and C. Finally, substituting the expressions for E
1
and
E
2
into the equation for T, expanding and applying the absorption law, we have
T=(A+B+C)· (C+AB)
= A'C + AA'B + B·C + B'A'B + C'C + C'A'B
= A C + A· B + B C + A' B + C + A B C
=C +AB.
The minimal cut sets of the top event are thus again C and A B.
As a very simple example, suppose we have the pumping system shown in Figure
VIII 2.
PUMP 1
INEXHAUSTABLE
WATER
SOURCE
PUMP 2
V A ~ E V
REACTOR
Figure VII12. Water Pumping System
Assume that out undesired event is "no flow of water to reactor." Ignoring the
contribution of the pipes, we can model this system by fault tree of Figure VIIII
where:
T = "no flow of water to reactor"
C = "valve V fails closed"
A = "pump 1 fails to run"
B="pump 2 fails to run"
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII19
We have just shown that the minimal cut sets of this tree are C and A· B. This tells
us that our undesired event "no flow of water to tank" will occur if either valve V
fails closed or both pumps fail to run. In this simple case, the cut sets do not really
provide any insights that are not already quite obvious from the system diagram. In
more complex systems, however, where the system failure modes are not so obvious,
the minimal cut set computation provides the analyst with a thorough and systematic
method for identifying the basic combinations of component failures which can
cause an undesired event.
For smaller fault trees, the determination of the minimal cut sets, using either the
topdown or bottomup method, can be done by hand. For larger trees, various
computer algorithms and codes for fault tree evaluation are available. These are
discussed in Chapter XII.
Minimal Path Sets and Dual Fault Trees
The top event of a fault tree represents system failure. This event is of great
interest from the point of view of system safety. From the point of view of reliability
we would be more concerned with the prevention of the top event. Now we know
that the top event of a fault tree may be represented by a Boolean equation, and
because this equation may be complemented, there is also a Boolean equation for the
complement (i.e., nonoccurrence) of the top event. This complemented equation, in
tum, corresponds to a tree which is the complement of the original tree. This
complemented tree, called the dual of the original fault tree, may be obtained direct
ly from the original tree by complementing all the events and substituting ORgates
for ANDgates and vice versa. In either case, whether we complement the top event
equation or the tree itself, we are applying de Morgan's theorem given in our Table
of Boolean Algebra Laws. The minimal cut sets of the dual tree are the socalled
"minimal path sets" of the original tree, where a minimal path set is a smallest
combination (intersection) of primary events whose nonoccurrence assures the
nonoccurrence of the top event.
The combination is a smallest combination in that all the primary event
nonoccurrences are needed for the top event to not occur; if anyone of the events
occurs then the top event can occur. The minimal path set expression for the top
event T can be written as
T
'
=PI +P2 +... +P
k
where T' denotes the complement (nonoccurrence) of T. The terms PI' P2' ... , Px
are the minimal path sets of the fault tree. Each path set can be written as
Pi = X' 1 • X' 2 • . . . • X'
m
where the Xi are the basic events in the fault tree and the X'i are the complements.
We can find the minimal path sets of a given tree by forming its dual and then
using either the topdown or bottomup method to find its minimal cut sets. These
cut sets are the minimal path sets of the original tree which we desire.
Alternatively, if the minimal cut sets of the tree have already been determined, we
can take the complement of the minimal cu t set equation and obtain the minimal
VU·lO FAULT TREE HANDBOOK
path sets directly. For our sample tree, we obtained the following minimal cut
expression in the previous section,
T= C+ AB.
Taking the complement
T
'
=(C + AB)'
=C'(AB)'
using de Morgan's theorem. Applying de Morgan's theorem to the term (AB)'
T
'
="C'(A'+B
'
)
and using the distributive law (i.e., expanding)
T' ="C
'
A' +c
'
B'
Therefore, the minimal path sets of the tree are C
'
A' and C' B
'
. In terms of our
pumping system of Figure VII12, this tells us that we can prevent the undesired
event and assure system success if either
(1) Valve V is open and pump 1 is running, or
(2) Valve V is open and pump 2 is running.
CHAPTER VIII  THE PRESSURE TANK EXAMPLE
1. System Defmition and Fault Tree Construction
In this and the next chapter, we are going to defme undesired events for two
simple systems. The reader will then be shown how the corresponding fault trees are
developed stepbystep with the help of the rules described in Chapter V. From the
trees so constructed, some obvious conclusions will be drawn, but detailed evaluation
procedures will be postponed until Chapter XI.
Consider now Figure VIIII which shows a pressure tank  pumpmotor device and
its associated control system. First we present details of system operation. The
operational modes are given in Figure VIII2.
SWITCH
Sl
FROM RESERVOIR
PRESSURE
TANK
Figure VIIII. Pressure Tank System
The function of the control system is to regulate the operation of the pump. The
latter pumps fluid from an infmitely large reservoir into the tank. We shall assume
that it takes 60 seconds to pressurize the tank. The pressure switch has contacts
which are closed when the tank is empty. When the threshold pressure has been
reached, the pressure switch contacts open, deenergizing the coil of relay K2 so that
relay K2 contacts open, removing power from the pump, causing the pump motor to
cease operation. The tank is fitted with an outlet valve that drains the entire tank in
an essentially negligible time; the outlet valve, however, is not a pressure relief valve.
When the tank is empty, the pressure switch contacts close, and the cycle is repeated.
Initially the system is considered to be in its dormant mode: switch 81 contacts
open, relay Kl contacts open, and relay K2 contacts open; i.e., the control system is
VIlll
TRANSITION TO PUMPING
RELAY ''2  ENERGIZED
IC10SED)
TIMER REl  STARTS TIMING
PRESSURE SW  CONTACTS CLOSE
PUMP STARTS
PUMPING MODE
READY MODE
DORMANT MODE
S1  CONTACTS OPEN
S1  CONTACTS OPEN
SN SI  CONTACTS OPEN STARTUP TRANSITION RELAY '1  CONTACTS CLOSED TRANSITION TO READY
RELAY "1  CONTACTS CLOSED
RELAY 1  CONTACTS OPEN RElAY , ~  CONTACTS CLOSEO
RELAY ''2  CONTACTS OPEN
RELAY , ~  CONTACTS OPEN TIMER REt  CONTACTS CLOSED RELAY '"2  DEENERGIZED
TIMER REL  CONTACTS CLOSED
TIMER REL  CONTACTS CLOSEO
S1  MOMENTARILY
& TIMING (OPEN)
PRESSURE SW  CONTACTS OPEN
PRESSURE SW  CONTACTS CLOSED
CLOSED
PRESSURE SW  CONTACTS CLOSED TIMER REL  RESETS TO ZERO
& MONITORING
RELAY '1  ENERGIZED &
& MONITORING TIME
LATCHED
PRESSURE SW  CONTACTS OPEN
RELAY ''2  ENERGIZED
(CLOSED) PUMP STOPS
TIMER REL  STARTS TIMING
PRESSURE SW  MONITORING
PRESSURE
PUMP STARTS
RELAY '1  CONTACTS OPEN
RELAY ''2  CONTACTS OPEN
TIMER REL  TIMES OUT &
MOMENTARIL Y
OPENS
EMERGENCY
PRESSURE SW  FAILED CLOSED
SHUTDOWN PUMP STOPS
TRANSITION
(ASSUME PRESSURE
SWITCHINGUP)
EMERGENCY SHUTDOWN
S1  CONTACTS OPEN
RELAY "1  CONTACTS OPEN
RELAY'2  CONTACTS OPEN
TIMER REL  CONTACTS CLOSED
PRESSURE SW  CONTACTS CLOSED
Figure VIII2. Pressure Tank Example
PRESSURE TANK EXAMPLE
VIII3
deenergized. In this deenergized state the contacts of the timer relay are closed. We
will also assume that the tank is empty and the pressure switch contacts are therefore
closed.
System operation is started by momentarily depressing switch S1. This applies
power to the coil of relay Kl, thus closing relay KI contacts. Relay Kl is now
electrically selflatched. The closure of relay Kl contacts allows power to be applied
to the coil of relay K2, whose contacts close to start up the pump motor.
The timer relay has been provided to allow emergency shutdown in the event that
the pressure switch fails closed. Initially the timer relay contacts are closed and the
timer relay coil is deenergized. Power is applied to the timer coil as soon as relay Kl
contacts are closed. This starts a clock in the timer. If the clock registers 60 seconds
of continuous power application to the timer relay coil, the timer relay contacts
open (and latch in that position), breaking the circuit to the Kl relay coil (previously
latched closed) and thus producing system shutdown. In normal operation, when the
pressure switch contacts open (and consequently relay K2 contacts open), the timer
resets to 0 seconds.
For our undesired event let us take:
RUPTURE OF
PRESSURE TANK
AFTER THE START
OF PUMPTNG
It will simplify things considerably if we agree to neglect plumbing and wiring
failures and also all secondary failures except, of course, the one of principal interest:
"tank rupture after the start of pumping."
The reader may object that a system that includes an infinitely large reservoir and
an outlet valve that drains the tank in a negligible time is unrealisticand now we are
suggesting the neglect of plumbing and wiring faults that might contribute to the
occurrence of the top event. The point is that with this simplified system we can
illustrate most of the important steps in fault tree construction. In a more complex
system the reader might tend to lose sight of the overall system and become too
involved in the details.
First we check to make sure that our top event is written as a fault and that it
specifies a "what" and a "when." Next we apply our test question: "Can this fault
consist of a component failure?" Because the answer is "Yes," we immediately add
an ORgate beneath the top event and consider primary, secondary, and command
modes.* Our tree has now developed to the point shown in Figure VIII3.
In this problem we shall establish our limit of resolution at the "component
failure level." By "component" we shall mean those items specifically named in
Figure VIIII. Thus the primary failure of the tank (e.g., a fatigue failure of the tank
wall) is already at the limit of resolution and is shown in a circle. Whether or not we
include the statement in the diamond is moot. We could just assume at the beginning
that the tank was an appropriate one for the operating pressures involved. At any
rate, we choose not to trace this fault any further.
*In this case, however, there is no command mode.
VIII4 FAULT TREE HANDBOOK
RUPTURE OF PRESSURE
TANK AFTER THE
START OF PUMPING
TANK RUPTURE
(SECONDARY FAILURE)
Figure VIII3. Fault Tree Construction  Step 1
Thus our attention is now directed to the secondary failure of the tank. The
reader will remember from Chapter V, that in contrast to a primary failure, which is
the failure of a component in an environment .for which it is qualified, a secondary
failure is the failure of a component in an environment for which it is not qualified.
Because the secondary failure of the tank can consist of a component failure, we
introduce another ORgate and our tree assumes the form shown in Figure VIII4.
Here again we indicate, in a diamond, a set of conditions whose causes we choose
not to seek. Notice that the fault spelled out in the rectangle is a specific case of the
top event with a more detailed description as to cause.
Now it might happen that our tank could miraculously withstand continuous
pumping for t > 60 seconds but an application of our "No Miracles" rule constrains
us to the statement that the tank will always rupture under these conditions. We can
indicate this on the fault tree by using an Inhibit gate whose input is "continuous
pump operation for t > 60 seconds" (see Figure VIII5).
Can the input event to the Inhibit gate consist of a component failure? No, the
pump is simply operating and pump operation for any length of time cannot consist
of a component failure. Therefore this fault event must be classified "stateof
system." We now recall the rules of Chapter V. Below a stateofsystem fault we can
have an ORgate, an ANDgate, or no gate at all. Fwthermore, we look for the
minimum, immediate, necessary and sufficient cause or causes. In this case the
immediate cause is "motor runs for t > 60 seconds," a stateofsystem fault. Its
immediate cause is "power applied to motor for t > 60 seconds," a stateofsystem
fault. The immediate cause of the latter event is "K2 relay contacts remain closed for
t> 60 seconds." We have now added the string of events shown in F i g w ~ VIII6.
In this case, nothing is lost by jumping from "pump operates continuously for
t > 60 seconds" directly to "K.2 relay contacts remain closed for t > 60 seconds."
There is, however, no harm done in detailing the intermediate causes and, as a matter
of fact, the opportunity for error is lessened thereby.
PRESSURE TANK EXAMPLE
TANK RUPTURE
(SECONDARY FAILURE)
TANK RUPTURES DUE TO
INTERNAL OVER·PRES·
SURE CAUSED BY CON·
TlNUOUS PUMP OPERA·
TlON FOR t> 60 SEC
VIIIS
Figure VIII4. Fault Tree Construction  Step 2
TANK RUPTURES DUE
TO INTERNAL OVER·
PRESSURE CAUSED BY
CONTINUOUS PUMP
OPERATION FOR
t> 60 SEC
PUMP OPERATES
CONTINUOUSLY
FORt>60SEC
Figure VIII5. Fault Tree Construction  Step 3
VIII6 FAULT TREE HANDBOOK
PUMP OPERATES
CONTINUOUSLY
FOR t> 60 SEC
MOTOR RUNS
FOR t > 60 SEC
POWER APPlIEO
TO MOTOR
FOR t> 60 SEC
K2 RElAY CONTACTS
REMAIN CLOSEO
FORt>60SEC
Figure VIII6. Fault Tree Construction  Step 4
We have now to consider the fault event, "K2 relay contacts closed for t > 60
seconds." Can this consist of a component failure? Yes, the contacts could jam, weld,
or corrode shut. We thus draw an ORgate and add primary, secondary, and
command modes as shown in Figure VIII7.
The event of interest here is the conunand mode event described in the rectan'gIe.
Recall that a command fault involves the proper operation of a component, but in
the wrong place or at the wrong time because of an erroneous command of signal
from another component. In this case, the erroneous signal is the application of EMF
to the relay coil for more than 60 seconds. This stateofsystem fault can be analyzed
as.shown in Figure VIII8.
Notice that both input events to the ANDgate in Figure VIII8 are written as
faults. In fact, as we know, all events that are linked together on a fault tree should
PRESSURE TANK EXAMPLE
K2 RELAY CONTACTS
REMAIN CLOSED
FOR t> 60SEC
VIII7
EMF APPLIED TO
K2 RELAY COIL
FORt>60SEC
Figure VIII7. Fault Tree Construction  Step 5
EMF APPLIED TO
K2 RELAY COIL
FORt>60SEC
A
•
I
I
EMF REMAINS ON
PRESSURE SWITCH
PRESSURE SWITCH
CONTACTS CLOSED CONTACTS WHEN
FORt>60SEC
PRESSURE SWITCH
CLOSED FOR t> 60 SEC
Figure VIII8. Fault Tree Construction  Step 6
be written as faults except, perhaps, those statements that are added as simply
remarks (e.g., statements in ellipses). The pressure switch contacts being closed is not
a fault per se, but when they are closed for greater than 60 seconds, that is a fault.
Likewise the fact that an EMF is applied to the pressure switch contacts is not itself a
fault. Notice that the condition that makes this event a fault is framed in terms of
the other input event to the ANDgate.
The fault event, "pressure switch contacts closed for t > 60 seconds," can consist
of a component failure, so both input events in Figure VIII8 are followed by
ORgates. We analyze them separately, starting with the lefthand event (see Figure
VIII9.
VIII8 FAULT TREE HANDBOOK
PRESSURE SWITCH
CONTACTS CLOSEO
FOR t> 60SEC
Figure VIII9. Fault Tree Construction  Step 7
We see that this, leg of the tree has reached its terminus (all input events are either
circles or diamonds) unless, for some reason, we wish to pursue the event in the
lefthand diamond somewhat further (e.g., ruptured diaphragm, etc.).
We now analyze the righthand event in Figure VIII8 as shown in Figure VIIIlO.
Both the input events in Figure VIIIlO are stateofcomponent faults. The
lefthand one is the more easily analyzed as shown in Figure VIIIII.
EMF REMAINS ON PRES·
SURE SWITCH CONTACTS
WHEN PRESSURE SWITCH
CONTACTS CLOSED FOR
t> 60SEC
EMF THRU K1 RelAY
CONTACTS WHEN PRES·
SURE SWITCH CONTACTS
CLOSED FOR t> 60 SEC
EMF THRU S1 SWITCH
CONTACTS WHEN PRES·
SURE SWITCH CONTACTS
CLOSED FOR t> 60 SEC
Figure VIIIIO. Fault Tree Construction  Step 8
PRESSURE TANK EXAMPLE
EMF THRU S1 SWITCH
CONTACTS WHEN PRES
SURE SWITCH CONTACTS
CLOSED FOR t> 60 SEC
VIII9
Figure VIIIII. Fault Tree Construction  Step 9
Here we have reached another tree terminus. The analysis of the remaining input
event in Figure VIIIIO is shown in Figure VIII12. The reader will note from this
latter figure that we have fma1ly worked our wayin this stepbystep fashiondown
to timer relay faults. Finally, we show the complete fault tree for the pressure tank
example in Figure VIIII 3.
Actually, the fault tree of Figure VIII13 (fould be considered too complete.
Because the only secondary failure that we developed was the rupture of the pressure
tank due to overpumping, other secondary failures (the dotted diamonds) could
simply be omitted from the diagram. Further simplifications can also be made,
leading to the basic fault tree of Figure VIII14, where the circles represent primary
failures as shown in the legend and the fault events EI, E2 etc. are defined as
follows:
The E's are fault events.
El Pressure tank rupture (top event).
E2 Pressure tank rupture due to internal overpressure from pump
operation for t > 60 seconds which is equivalent to K2 relay contacts
closed for t > 60 seconds.
E3 EMF on K2 relay coil for t > 60 seconds.
E4 EMF remains on pressure switch contacts when pressure switch
contacts have been closed for t > 60 seconds.
E5 EMF through KI relay contacts when pressure switch contacts have
been closed for t > 60 seconds which is equivalent to timer relay
contacts failing to open when pressure switch contacts have been closed
for t > 60 seconds.
VIIItO FAULT TREE HANDBOOK
EMF THRU Kl RELAY
CONTACTS WHEN PRES
SURE SWITCH CONTACTS
CLOSED FOR t> 60 SEC
EMF NOT REMOVED FROM
Kl RELAY COIL WHEN
PRESSURE SWITCH
CONTACTS CLOSED
FOR t> 60SEC
TIMER RELAY CONTACTS
FAIL TO OPEN WHEN
PRESSURE SWITCH
CONTACTS CLOSED
FORt>60SEC
Figure VIII12. Fault Tree Construction  Final Step
TANK
RUPTURES ~ U E TO
IMPROPER SELECTION
OF INSTALLATION
(WRONG TANK)
/" .........
./ ""
/" PRESSURE SWITCH ......... '>
<....... (SECONDARY FAILURE) /"
......... /"
......... /"
......."
EXCESS
PRESSURE NOT
SENSED BY PRESSURE
ACTUATED
SWITCH
~ .
..
..
VIII12
2. Fault Tree Evaluation (Minimal Cut Sets)
FAULT TREE HANDBOOK
Fault tree evaluation in general will be taken up in Chapter XI. It is useful at this
point, however, to evaluate the fault tree so that we can assess, in a gross way, the
strengths and particularly the weaknesses of the pressure tank control circuit.
From the basic fault tree of Figure VIII14 we can express the top event as a
Boolean function of the primary input events using the method explained in Chapter
VII. This is accomplis)led by starting at the top of the tree and working down:
EI = T+E2
= T + (K2 +E3)
= T + K2 +(S • E4)
= T + K2 + S • (Sl + E5)
= T + K2 + (S • Sl) + (S • E5)
= T + K2 + (S,' Sl) + S • (KI + R)
= T + K2 + (S • Sl) + (S . KI) + (S • R)
This expression of the top event in terms of the basic inputs to the tree is the
Boolean algebraic equivalent of the tree itself. EI appears as the union of various
combinations (intersections) of basic events and is the minimal cut expression for the
top event. In our example, we have found five minimal cut setstwo singles and
three doubles:
K2
T
S 0 Sl
So KI
So R
Each of these defines an event or series of events whose existence or joint existence
will initiate the top event of the tree.
We are now in a position to make, first, a qualitative assessment of our results and
then, armed with some data, a gross quantitative assessment. Qualitatively, the
leading contributor to the top event is the single relay K2 because it represents a
primary failure of an active component. Therefore, the safety of our system would
be considerably enhanced by substituting a pair of relays in parallel for the single
relay K2. Actually, however, our system contains a much more serious design error:
We are monitoring the controls instead of the parameter of interest (pressure in this
case). It should be just the other way around! Thus, the most obvious way to
improve the system would be to install a pressure relief valve on the tank and remove
the timer.
The next basic event, in order of importance, is T, the primary failure of the
pressure tank itself. Because the tank is a passive component, recall from Chapter V
that the probability of the event T should be less (by an order of magnitude or so)
than the probability of event K2. Of less importance are the three double component
cut sets SoSI, SoKI, and SoK2, although we do note that the failure of the pressure
switch contributes to each of them.
To make a quantitative assessment of our results we need estimates of failure
probabilities for our components. Table VIIII gives values for the failure
probabilities for the components in the system. The calculations involved in getting
these numbers will be discussed later; at this time we simply assume the values as
being "data."
PRESSURE TANK EXAMPLE
VBI13
E1
E5
E4
E2
TOP EVENT
INTERMEDIATE FAULT EVENTS
PRIMARY FAILURE OF TIMER RELAY
PRIMARY FAILURE OF PRESSURE SWITCH
PRIMARY FAILURE OF SWITCH S1
PRIMARY FAILURE OF RELAY K1
PRIMARY FAILURE OF RELAY K2
PRIMARY FAILURE OF PRESSURE TANK
LEGEND: FAULTS
E3
E1
E2.E3.E4.E5
R
S
S1
K1
K2
T
Figure VIII14. Basic (Reduced) Fault Tree
for Pressure Tank Example
vm14
FAULT TREE HANDBOOK
Table VIIII. Failure Probabilities for Pressure Tank Example
COMPONENT
Pressure Tank
Relay K2
Pressure Switch
Relay KI
Timer Relay
Switch Sl
SYMBOL
T
K2
S
KI
R
Sl
FAlLURE PROBABILITY
5 x 10
6
3 x 10
5
I X 10
4
3 x 10
5
I x 10
4
3 x 10
5
Because a minimal cut is an intersection of events, the probabilities associated
with our five minimal cut sets are obtained by multiplying the appropriate
component failure probabilities (assuming independence of failures);
P[T] = 5 x 10
6
P[K2] = 3 x 1 O ~ 5
P[S ,. KI] = (1 x 10
4
) (3 x 10
5
) =3 x 10
9
P[S • R] = (1 x 10
4
) (1 x 10
4
) = I x 10
8
P[S • Sl] = (1 x 10
4
) (3 x 10
5
) = 3 x 10
9
We now wisht<> estimate the probability of the top event, E
1
. Observing that the
top event probability is given by the probability of the union of the minimal cut sets,
and that the probability of each individual minimal cut set is low, we conclude that
the rare event approximation of equation (VI7) is applicable. We therefore simply
sum the minimal cut set probabilities and obtain:
P(E
1
) ~ 3.4 x 10
5
The relative quantitative importance of the various cut sets can be obtained by
taking the ratio of the minimal cut set probability to the total system probability:
Cut Set
T
K2
S • KI I
S·R
S • Sl
Importance
14%
86%
Less than 0.1%
The pressure tank example has been provided on numerous occasions as a
workshop exercise for students learning the basics of fault tree construction. The
most frequent analytical error made by these students is the tendency to leap
directly from the event of rupture to the pressure switch. When this is done the single
failure minimal cut set K2, i.e., the primary failure pf K2, is missed completely. This
illustrates the importance of applying the rules described in Chapter V.
One designrelated principle that arises out of this example is that a system should
be designed so that ANDgates appear as close to the treetop as possible in an effort
to eliminate single event cut sets. The family automobile is a good example of a
system which is not of this type, and the reader can amuse himself by making a
lengthy list of single events that will immobilize his car.
CHAPTER IX  THE THREE MOTOR EXAMPLE
1. System Definition and Fault Tree Construction
In this chapter we discuss a somewhat more complicated example of fault tree
construction and evaluation. * Figure IXI displays a power distribution box and
Figure IX2 gives the system modus operandi. With contacts KTI, KT2, and, KT3
normally closed, a momentary depression of pushbutton SI applies power from
Bi;lttery I to the coils of cutthroat relays KI and K2. Thereupon KI and K2 close
and remain electrically latched.
TEST SIGNAL
L.@J, O",,"".
,...
I
I •
• I
I
I
,
I
I
KT3
I I
I •
I I
, I'"
_
BATTERY 2 KT
3
KT2 KTl
.0'1. A.
I
I
I
.
...,&,. ..... ......
BATTERY 1
Figure IXI. Power Distribution Box Fault Tree Example
Next, a 60second test signal is impressed through K3, the purpose being to check
the proper operation of Motors 1,2, and 3. Once K3 has closed, power from Battery
I is applied to the coils of relays K4 and KS. The closure of K4 starts Motor I. The
closure of KS applies power from Battery 2 to the coil of K6 and also starts Motor 2.
Finally, the closure of K6 applies power from Battery I to the coil of K7. Closure of
K7 starts Motor 3.
After an interval of 60 seconds, K3 is supposed to open, shutting down the
operation of all three motors. Should K3 fail closed after the expiration of 60
seconds, all three timers (KTI, KT2, KT3) open, deenergizing the coil of KI, thus
shutting down system operation. Suppose K3 opens properly at the end of 60
seconds, but K4 fails closed. In that case KTI opens to deenergize KI and Motor I
stops. KT2 and KT3 act Similarly to stop Motor 2 or Motor 3 should either KS or K7
fail closed.
*For an actual example of a fault tree analysis applied to a major nuclear safety system see
U.S. Nuclear Regulatory Commission, "Reactor Safety StudyAn Assessment of Accident Risks
in U.S. Commercial Nuclear Power Plants," WASH1400 (NUREG75/014), Appendix II, "Fault
Trees," Section 2, "Fault Tree Analysis."
IXl
TRANSITION TO STANDBY MODE
K3
I
KT1
}
TIMER
K4 CONTACT KT2
RESET
K5 TRANSFER KT3
KG OPEN
K7
DORMANT MODE STANDBY MODE OPERATE MODE
Sl
Sl
CONTACTS
Sl
CONTACTS
K1 OPEN OPEN
K2
K1
}
CONTACTS K1
I
K'3
CONTACTS
TRANSITION TO· . K2 CLOSED TRANSITION TO,
K2
K4
OPEN
STANDBY MOOt"·
K3
)
OPERATE MODE
K3
K5
K4
K4
CONTACTS
Sl  MOMENTARILY
CONTACTS CLOSED KG
K5 K3 K5
CLOSED
OPEN K7
KG K4
ICONTACTS
KG
KT1
j
K1 } CLOSED AND
K7 K5 TRANSFER K7
CONTACTS
KT2 K2 ELECTRICAllY
KT1
t
KG CLOSED KT1
}
CONTACTS
CLOSED
KT3 LATCHED
KT2
CONTACTS
K7 KT2 CLOSED
KT3 J
CLOSED
KT1
}
KT3 & TIMING
KT2
START
KT3
TIMING
1. GIVEN A PRIMARY FAILURE OF K4, K5. OR
SHUTDOWN MODE
K7, THE SINGLE TIMER TRANSFERS OPEN
CONTACTS
AND DEENERGIZES Kl COIL. S1
OPEN
2. GIVEN A PRIMARY FAILURE OF K3. ALL K1
I
TRANSITION TO
TIMERS TRANSFER OPEN TO DEENERGIZE K2
EMERGENCY
K1 COIL. K3
SHUTDOWN
K4
CONTACTS
K5
OPEN
K6
K7
KT1
}
CONTACTS OF
KT2 ONE OR MORE
KT3 OPEN
Figure IX2. Power Distribution Box Component Status Diagram

>.<
~
THREE MOTOR EXAMPLE
OVERRUN OF ANY
MOTOR AFTER
TEST IS INITIATED
IX3
EMF APPLIED
TO MOTOR 1
FOR t> 60 SEC
EMF APPLIED
TO MOTOR 2
FORt>60SEC
EMF APPLIED
TO MOTOR 3
FOR t> 60 SEC
Figure IX3. "Treetop" for Motor Overrun Problem
Our main concern in this problem is the "overrun" (t > 60 seconds) of anyone of
the motors after test initiation. Thus our "tree top" appears as in Figure IX3.
Each of the three inputs to the ORgate must be analyzed separately. We propose
to treat the overrun of Motor 2 first. To avoid a clutter of trees on a single sheet,
"transfer out" symbols, as shown, are applied to the fault events relevant to Motor I
and Motor 3.
At this point, we have to decide how exhaustive our analysis is going to be. The
simplest approach would be to restrict ourselves to the failures of relays and
switches, and we shall do this first. A fuller analysis (which we shall hold in abeyance
for awhile) would also include possible wiring defects. Limiting the analysis to the
former problem for the moment, we are faced with the top event:
EMF APPLIED
TO MOTOR 2
FORt>60SEC
Because this is a stateofsystem fault, we look for immediate necessary and
sufficient causes. We see from Figure IXI that two fault events have to exist to cause
the occurrence of the event of interest:
(I) K5 relay contacts remain closed for t > 60 seconds.
(2) K2 relay contacts fail to open when K5 relay contacts have been closed for
t > 60 seconds.
IX4
FAULT TREE HANDBOOK
These two fault events are shown as inputs to the 2nd level ANDgate in Figure IX4
which displays the completed tree for overrun of Motor 2. Notice that the failures of
KS or K2 contacts to open are not, of themselves, faults; they become faults,
however, when the time restriction is specified.
We now direct our attention to the fault event:
K5 RELAY CONTACTS
REMAIN CLOSED
FOR t>60 SECS.
Figure IX4. Fault Tree for Overrun of Motor 2 (Relay Logic Only)
Because this is a stateofcomponent fault, it demands an ORgate with primary,
secondary, and command inputs. To Simplify the problem, let us not pursue
secondary failures. Thus, Figure IX4 shows only primary and command inputs. The
primary failure of KS relay is already a basic tree input (a limit of resolution); we
tum our attention therefore, to the command input which is
THREE MOTOR EXAMPLE
EMF REMAINS
ON K5 COIL
FOR t> 60 SEC
IXS
We note that this event is a stateofsystem fault. From Figure IXl we fmd that this
fault event will occur when both K3 contacts and K1 contacts fail to open with the
time restriction t > 60 seconds. These two events are shown in Figure IX4 as inputs
to the 4th level ANDgate on the left. The fault event relevant to relay K3 is a
stateofcomponent fault which demands an ORgate followed by primary and
command inputs. In this case, the command input is shown in a diamond because its
cause (or causes) lie "outside" of our defined system. This leg of the tree is now
complete.
Returning now to the fault event relevant to Kl relay, we realize that it will be
the "top event" of a fairly substantial subsidiary tree of its own. We therefore
"transfer out" to another sheet of paper (see Figure IX5). As a matter of fact, this
strategy proves to be particularly useful inasmuch as this same fault event recurs in
the analysis of the overruns of Motor I and Motor 3. The task of studying the details
of the subsidiary fault tree of Figure IX5 is left to the reader.
Referring once more to Figure IX4, we now have to consider the righthand input
to the 2nd level ANDgate. This fault event is
K2 RELAY CONTACTS
FAil TO OPEN WHEN
K5 RELAY COIHACTS
CLOSED FOR t> 60 SEC
This is a stateofcomponent fault whose command input is
EMF NOT REMOVED
FROM K2 RELAY COIL
WHEN K5 RELAY
CONTACTS CLOSED
FORt>60SEC
a fault. This is really a single input event except that we have also
chosen to consider the possibility (highly improbable) of an EMF on the K2 coil
through the S1, KT1, KT2 and KT3 contacts. This latter event appears in a diamond
in Figure IX4.
The details of the rest of the right leg of the tree are left to the reader. The basic
inputs are primary failures of Kl relay, S1 switch and KT2 timer contacts.
IX6 FAULT TREE HANDBOOK
Kl RELAY CONTACTS
FAIL TO OPEN WHEN
KJ CONTACTS CLOSED
FORt>60SEC
EMF NOT REMOVED
FROM Kl RELAY
COIL WHEN K3
CONTACTS CLOSED
FOR t / 60 SEC
EMF TO Kl COIL
THRU TIMER
CIRCUIT WHEN
K3 CONTACTS CLOSED
FOR t> 60 SEC
EMF TO K1 COIL
THRU S1 CONTACTS
WHEN K3 CONTACTS
CLOSED FOR t> 60 SEC
KT3 TIME RCONTACTS
FAIL TO OPEN
.. WtiEI\I K3 CONTACTS
CLOSED FOR t '> 60 SEC
KT2 TIMER CONTACTS
FAIL TO OPEN
WHEN K3 CONTACTS
CLOSED FOR t> 60 SEC
KT1 TIMER CONTACTS
FAIL TO OPEN
WHEN K3 CONTACTS
CLOSED FOR t> 60 SEC
Figure IX5. Analysis of Fault Event Relevant to Kl Relay
THREE MOTOR EXAMPLE IX'
The fault trees relevant to overruns of Motor I and Motor 3 are shown in Figures
IX6 and IX7. A more exhaustive study of the overrun problem for Motor 2
(including wiring defects) is shown in Figures IX8, and IX9. The reader may now
care to try his hand at the analysis of a different top event: Motor I fails to start in
presence of test signal. A solution for Motor I is displayed in Figure IXlO.
2. Fault Tree Evaluation (Minimal Cut Sets)
We now turn to the qualitative evaluation of the fault tree of Figure IX4 and
determine the minimal cut sets for the overrun of Motor 2. For ease of presentation
we will deal with a reduced version of the fault tree of Figure IX4 where the
diamonds and houses are removed. The reduced tree is shown in Figure IXII.
In Figure IXII the circles represent primary failures of the components denoted,
and the fault events E
1
, E
2
, etc. are defined as follows:
E
1
EMF applied to Motor 2 for t > 60 seconds.
E
2
KS relay contacts remain closed for t > 60 seconds.
E
3
K2 relay contacts fail to open when KS contacts have been closed
to t > 60 seconds.
E
4
EMF remains on KS coil for t > 60 seconds.
E
s
KI relay contacts fail to open when KS contacts have been closed for
t > 60 seconds.
E
6
KI relay contacts fail to open when K3 contacts have been closed for
t > 60 seconds.
E
7
EMF not removed from KI relay coil when K3 contacts have been
closed for t > 60 seconds.
The reader should note that events E
s
and E
6
, although similar, are not the same
because the "when" is different. Thus, the tendency to transfer the fault event E
s
to
the subsidiary tree of Figure IXS must be strongly resisted.
In the equations below each fault event is expressed in terms of its equivalent
Boolean equation. Engineering notation is used.
E
1
= E
2
• E
3
E
2
= K5 +E
4
E
3
= K2 +E
s
E
4
= K3 • E
6
E
s
= KI + KT2 +SI
E
6
= KI + E
7
+Sl
E
7
= KTI • KT2 • KT3
Next, starting from the bottom of the tree, each fault event is expressed in terms of
the basic tree inputs.
E
7
= KT1 • KT2' KT3
E
6
=Kl +(KTI • KT2 • KT3) +SI
E
s
= Kl + KT2 + Sl
IX8
FAULT TREE HANDBOOK
E
4
= K3 • (Kl + KTI • KT2 • KT3 + SI)
=K3' Kl +K3· (KTI' KT2' KT3)+K3' SI
E
3
= K2 + Kl + KT2 + SI
E
2
=K5 + K3 • Kl + K5 + K3 • (KTI • KT2 • KT3) + K5 + K3 • SI
E
1
= [K5 + K3 • Kl + K5 + K3 • (KTI • KT2 • KT3) + K5 + K3 • S1]
• [K2+Kl+KT2+S1]
The Boolean expression for E
1
simplifies to:
E
1
= {K5 + (K3 • Kl) + [K3 • (KTI • KT2 • KT3)] + (K3 'SI)}
• {K2 + K1 + KT2 +Sl}
Expanding this expression using the distributive law, the minimal cut sets (all
redundancies eliminated) are:
K5' K2
K5 • Kl
K5 • KT2
K5 • SI
K3' Kl
K3 • SI
K3 • KTI • KT2 • KT3
Note, for instance, that because (K3' K1) is minimal from the relationship
(K3 • K1 • K1) = (K3 • KI), then (K3 • KI • K2), for example, is not minimal.
All the minimal cut sets above are "doubles" except the last one which appears to
be a "quadruple." If all three timers, however, are identical, they might be likely
candidates for a socalled "common case" failure. In this case, our ostensible
quadruple minimal cut set becomes essentially a "double." In later sections, we will
further discuss common cause failures.
The reader should now try his hand at identifying the minimal cut sets in the
other two problems: overrun of Motor 1 and overrun of Motor 3. Following the
procedure detailed for Motor 2 he should be able to check the following solutions:
MINIMAL CUT SETS
(Overrun of Motor 1)
KI • K4
K4·51
K4' KTI
K1 • K3
K3 • 51
K3 . KT1 • KT2 • KT3
MINIMAL CUT SETS
(Overrun of Motor 3)
Kl • K7
K2 • K7
K7' KT3
K7' Sl
Kl • K3
K3 • 51
K3 • KT1 • KT2 • KT3
THREE MOTOR EXAMPLE
IX·9
EMF APPLIED
TO MOTOR 1
FOR t> 60 SEC
K4 RELAY CONTACTS
REMAIN CLOSED
FOR t >60SEC
Kl RELAY CONTACTS
FAil TO OPEN
WHEN K4 CONTACTS
CLOSED FOR t> 60 SEC
EMF REMAINS
ON K4 COIL
FOR t > 60 SEC
EMF NOT REMOVED
FROM Kl RELAY
COIL WHEN K4
CONTACTS CLOSED
FORt>60SEC
K3 RELAY CONTACTS
REMAIN CLOSED
FOR t >60SfC
Kl RELAY CONTACTS
FAIL TO OPEN
WHEN K3 CONTACTS
CLOSED FOR t> 60 SEC
EMF TO K1 COIL
THRU Sl CONTACTS
WHEN Kl CONTACTS
CLOSED FOR t> 60 SEC
EMF TO K1 COIL
THRU TIMER
CIRCUIT WHEN K4
CONTACTS CLOSED 
FORt>60SEC
KTl TIMER
DOES NOT "TIME OUT"
DUE TO IMPROPER
INSTALLATION OR
SETTING
K3
TIMER
RESET
KTl TIMER CONTACTS
fAIL TO OPEN
WHEN K4 CONTACTS
CLOSED FOR t > 60 SEC
KT2
TIMER
RESET
Figure IX6. Fault Tree for Overrun of Motor 1 (Relay Logic Only)
IXtO
k3 RELAV
CONTACTS
FAIL TO
OPEN
TEST
SIGNAL REMAINS
ON K3 COIL FOR
t 60SEC
KT1
TIMER
RESET
Figure IX7. Fault Tree for Overrun of Motor 3 (Relay Logic Only)
FAULT TREE HANDBOOK
FAULTS
TO+ EMF IN
K2, K5& K7
WIRE CIRCUITS
+EMF
TO K2COIL
THRU 51. KT1. KT2 &
KTJ CIRCUITS
+ EMF
FROM
BATTERY 1
APPLIED
TO K1
CONTACTS
KTl
AND
K13
TIMER
RESET
_EMF
FROM
BATTERY 1
APPLIED
TO S1
CONTACTS
GROUID
AVAILAILE
TOIIOTOR
1
TERilIIAL
TEST
SIGNAL REMAINS
ON K3COIl
FORI>6DSEC
K3 RELAY
CONTACTS
FAil TO
OPEN
RESET
SIGNAL
INADVERTENTl Y
APPLIED OR NOT
REMOVED FROM
SWITCHSI
SISWITCH
INAOVERTENTl Y
CLOSES OR
FAILS TO
OPEN
KT2TIMER
DOES NOT "TIME OUT"
DUE TO IMPROPER
INSTALLATION OR
SETTING
KT2
TIMER
CONTACTS
FAIL TO
OPEN
JX12
.. EMF
FROM
BATTERY 1
AVAILABLE
TOS1
CONTACTS
EMF TO K1 COIL
THRU TIMER
CIRCUIT WHEN K3
CONTACTS CLOSED
FOR t >&OSEC
FAULTS
TO .. EMF IN
K1 8. SI TIMER
CIRCUITRY
EMF TO KI COIL
THRU SI CONTACTS
WHEN K3 CONTACTS
CLOSED FOR t> &0 SEC
FAULT TREE HANDBOOK
RESET
SIGNAL
INADVERTENTLY APPLIED
OR NOT
REMOVED FROM
SWITCH S1
.. EMF
FROM
BATTERY 1
AVAILABLE
TO K1
CONTACTS
KT3TIMER
DOES NOT TIME OUT
DUE TO IMPROPER
INSTAllATION OR
SETTING
KT2T1MER
DOES NOT TIME OUT
DUE TO IMPROPER
INSTALLATION OR
SETTING
KT1 TIMER
DOES NOT TIME OUT
DUE TO IMPROPER
INSTALLATION DR
SETTING
Figure IX9. Subtree for Transfer "T" in Motor Overrun Problem
WIRE
FAULTS IN
K3 RELAY &COMP
CIRCUITRY
WIRE
FAULTS IN
K1 RELAY LATCHING
CIRCUIT
IX14 FAULT TREE HANDBOOK
E1
E2 E3
E7
Figure IXII. Basic Fault Tree for Overrun of Motor 2
CHAPTER X  PROBABILISTIC AND STATISTICAl
ANALYSES
1. Introduction
In this chapter we shall attempt to present those basic elements that are necessary
for understanding the probabilistic and statistical concepts associated with the fault
tree. This material will also serve as the basis for future chapters on the
quantification of fault trees. The reader who feels wellversed in statistics and
probability may prefer to skip this chapter and proceed directly to Chapter. XI. He
may fmd it convenient at a later time to review this chapter as the need arises.
We shall start with a discussion of probability distribution theory. To introduce
this discussion we shall first look at the binomial distribution and shall then proceed
to distribution theory in general, with emphasis on some specific distributions that
are of special interest to the systems analyst. We shall then attack the basics of
statistical estimation.
The method of presentation, although perhaps not in the best tradition of
mathematical statistics, has been developed by one of the authors in the course of
teaching statistics to engineering students and engineers in the field. In this process
mathematical rigor has sometimes been sacrificed to expedite the explication of basic
ideas.
2. The Binomial Distribution
Suppose that we have 4 similar systems that are all tested for a specific operating
time. At the end of each test let us further suppose that the test results allow us to
categorize each run unambiguously as "success" or "failure." If the probability of
success on each run is designated by p (and thus the probability of failure is (1p)),
what is the probability of x =0, 1, 2, 3,4 successes out of the four runs or trials?
The outcome collection for this experiment can be easily written down (subscripts
'denote run number, and " S ' ~ and "F" stand for success and failure):
SlS2
S
3
F
4
SlS2
F
3
S
4
SlF
2
S
3
S
4
SlS2
F
3
F
4
SlF
2
S
3
F
4
F
1
S
2
S
3
F
4
F
1
S
2
F
3
S
4
SlF
2
F
3
S
4
F
1
F
2
S
3
S
4
F
1
F
2
F
3
S
4
F
1
F
2
S
3
F
4
F
1
S
2
F
3
F
4
A symbol such as Sl F
2
S
3
F4 means, "first run successful, second run a failure, third
run successful, fourth run a failure." The probability of this particular outcome is
. p • (lp) • p • (lp) =p2 • (l_p)2. Note that in 4 trials, it is possible to have 4
XI
X2 FAULT TREE HANDBOOK
successes (x =4) in only 1 way, 3 successes (x =3) in 4 ways, 2 success (x = 2) in 6
ways, 1 success (x = 1) in 4 ways, and no successes (x = 0) in only 1 way. In general
for n trials the number of ways of achieving x successes is
(
n) n'
x x! (nx)!
which is the number of combinations of n items taken x at a time. Note also that
there are a total of 16 = 2
4
outcomes. If there were n trials and each trial could be
classified as "success" or "failure," there would be 2
n
outcomes.
Now let us classify these results in a somewhat different way as follows:
No. of Successes No. of Ways Probability
x=O 1
1pO(lp)4
x=I 4
4
p
l(lp)3
x=2 6
6
p
2(1_p)2
x=3 4
4
p
3(l_p)1
x=4 1
I
p
4(l_p)O
Consider the terms in the last column. For example, 4p3(lp) represents the
probability of having exactly 3 successes in 4 trials. Three successes in 4 trials can
occur in 4 ways (thus the coefficient 4) and if we have had 3 successes, we must have
had also 1 failure. The probability of a particular outcome of three successes and one
failure is p3(lp) and because there are 4 possible outcomes in which .three successes
are obtained the total probability is 4p3(l_p). These expressions in the last column
represent individual terms in the binomial distribution, the general form of which is
written as follows:
If the probability of success on any trial is p, then
P[exactly x successes in n trials] = ( ~ ) pX (l_p)nx
=b(x;n,p)
(XI)
in which b (x; n, p) represents the binomial distribution in what is termed probability
density form. (The probability density form is discussed further in succeeding
sections.)* If the reader will set n = 4 and let x range through the values of 0 to 4, he
will see that the individual terms in the probability column of our previous example
are generated by equation (XI).
*For a discrete variable such as x, the probability density is also sometimes called the
probability mass function.
PROBABILISTIC AND STATISTICAL ANALYSES X3
Questions regarding the probabilities of having at most x successes in n trials and
at least x successes in n trials can be answered by summing the appropriate individual
terms. Thus:
x
Plat most x successes in n trials] = L ( ~ ) pS(1_p)ns == B(x; n,p) (X2)
s=O
n
Plat least x successes in n trials] = L ( ~ ) pS(1p)ns
s=x
(X3)
where B (x; n,p) is the binomial distribution in cumulative distribution form (the
cumulative distribution form is discussed further in upcoming sections). At this stage
we may simply interpret the cumulative binomial as giving the probability that the
number of successes is less than or equal to some value. Thus, returning to our
example, the probability of having at least 2 successes in 4 trials is,
The binomial distribution is extensively tabulated, often in the form of equation
(X2) but also occasionally in the form of equation (X3) and sometimes in the form
of equation (XI). See for example References [1], [33], and [41].
The statistical average of the binomial distribution is np and its variance is
np(1p). The average is a measure of the location of the distribution and the variance
is a measure of its dispersion, or spread. These terms are discussed further in
subsequent sections.
We have made a number of assumptions (tacitly so far) in our own use of the
binomial. It is of the utmost importance to list these assumptions explicitly:
(1) Each trial has only one of two possible outcomes. We have called these
"Success"/"Failure" but they could just as well be called anything else (e.g.,
'defective/nondefective).
(2) There are exactly n random trials and n is numerically specified.
(3) All n trials are mutually independent.
(4) The probability of "success" (or whatever you want to call it) on any trial is
designated by some letter such as p, and p remains constant throughout the sequence
of trials.
It is important for the reader to realize that if a problem comes up and one or
more of these assumptions is violated, then use of the binomial distribution will be
questionable unless the effect of the violation is investigated. All distributions and, as
a matter of fact, all mathematical formulae are characterized by underlying
assumptions and restrictions, and a valid use of these distributions or formulae
involves a knowledge of what the associated assumptions and restrictions are.
X4 FAULT TREE HANDBOOK
Let us now return to our list of assumptions and consider them in more detail.
What can be done in case one or more of these restrictions happens not to be true?
We will look at only a few cases to indicate the possible violations to the
assumptions.
(1) What if each trial has more than 2 possible outcomes? For instance, in certain
testing methods, 3 decisions are possible after each trial: accept the lot, reject the
lot, continue testing. If we are drawing single chips from a container holding a
mixture of white, green, red, blue, and yellow chips, then each trial has 5 possible
outcomes. This case does not pose a serious problem, for, instead of using the
binomial distribution, we simply use its generalization, which is known as the
multinomial distribution and which is adequately discussed in many statistics texts
(see, e.g., reference [52]). When applicable, we might also lump the outcomes
into "success" and "failure" and use the binomial on this more gross categorization.
(The probability of "success" would be the sum of the probabilities of the outcomes
categorized as "success.")
(2) Now suppose that the number of trials, n, is not known, but only the number
of successes. For example, we may agree to throw a single die until the number "5"
comes up. It is not known, a priori, how many tosses will be necessary. Or we may
agree to test similar relays until one fails. Again the number of relays tested is not
known. Under such conditions (i.e., whenever a number of trials are performed until
a preassigned number of successes are obtained) we cannot use the binomial
distribution, but another distribution closely related to the binomial and called the
negative binomial distribution is available (see reference [13]). The negative
binomial 1)(x; k,p) gives the probability that the k
th
success occurs on the x
th
trial
and is written,
~ ( x ; b,p)= ( ~ = D pk (l_p)xk (X4)
(3) If the outcomes of the n trials are mutually interdependent (i.e., the (x+l)th
outcome depends in some way on the x
th
outcome and possibly on preceding
outcomes as well), a number of difficulties arise. Some sort of conditional
probability representation is required. The probability of a specific sequence of
outcomes would depend on the order of occurrence ("serial outcome space") and
each different sequence would have a different probability. For example, it would
generally be invalid to use the binomial distribution to estimate the probability that
the daily weather be rainy or not rainy because weather patterns often persist for
days or even weeks at a time and what happens on Wednesday is dependent on what
has happened on the preceding Tuesday. If independence is in doubt, there are
statistical tests to check independence before the binomial is used (see reference
[11]).
(4) If the probability of "success" (or whatever) changes throughout the series of
trials, where samples are chosen from a fIXed population without replacement, the
socalled hypergeometric distribution can be employed (see reference [52]). This
distribution is:
(X5)
PROBABILISTIC AND STATISTICAL ANALYSES
where a = number of items exhibiting characteristic A in the population
b = number of items exhibiting characteristic Bin the population
a+b = N = population size or lot size
n = size of sample drawn from the population
x = number of items exhibiting characteristic A in the sample.
xs
For example, characteristic A may be "defective" and characteristic B, "non
defective." Here h (x; n, a, b) gives the probability that exactly x of the n sample
items exhibit characteristic A.
Use of the hypergeometric distribution is necessary when sampling of a small
population is carried out without replacement. ("Small" means that N and n in
equation (X5) are comparable in size.) For instance, if we receive a shipment of 50
transistors, 10 of which are defective, the initial proportion of inoperative ones is
1/5, but this proportion will change as we withdraw a sample of, say, 20 without
replacement.
As the reader will observe from equation (X5), use of the hypergeometric
frequently entails complicated arithmetical calculations involving factorial numbers,
and for this reason, the binomial is often used in these cases to give approximate
results. The binomial distribution provides a good approximation whenever N ;;;. IOn
where N is the population (or lot or batch) size and n is the sample size.* In this case
a/n is taken as the approximate value of p.
As a specific example of the use of the binomial distribution, consider the
following problem: ABC Corporation mass produces resistors of a certain type. Past
experience indicates that one out of a hundred of these resistors is defective.
Therefore, the probability of obtaining a defective resistor in a sample is p = 0.01. If
a sample of 10 resistors is selected randomly from the production line, what is the
probability of its containing exactly one defective resistor? We have:
x= l,n= 1O,p=0.01 and
b(x = 1; n = 10, p = 0.01) = (\0) (0.01) (0.99)9
If tables of the cumulative binomial distribution are available, this can be most
readily evaluated by fmding B(I)  B(O) because
B(1) = prO or 1 defective resistors], and
B(O) = P [exactly 0 defective resistors] .
B(I)  B(O) = 0.9957  0.9044 =0.0913
By making calculations similar to the foregoing we can now draw a distribution
function giving the probability of finding x = 0, 1,2,3, ... , 10 defective resistors in
the sample of 10. This distribution is shown in Figure XI. The continuous curve in
Figure XI is not really legitimate because the binomial distribution is inherently
discrete, but this kind of continuous interpolation serves to make the general shape
of the distribution visible.
*Some authors recommend a somewhat less conservative rule of thumb of N ;;;. 8n.
X6
FAULT TREE HANDBOOK
P(O) = 0.9044
P(1) = 0.0913
P(2) = 0.0042
P(3 OR MORE) ;; 0
1.0
0.9
U)
:i:_
WCl
1
u..
wo
:::w
1...1
0.5
uCl.
w:lE
u..e:t
wu)
°2
x
~
0.1
0 1 2 3 4 5
NUMBER OF DEFECTIVE ITEMS
Figure XI. Distribution of x Defective Items in a Sample of 10
when p = 0.01
In safety and reliability evaluations, the binorriial distribution is applicable to
systems featuring redundancy if each redundant component operates independently
and if each redundant component has the same (or approximately the same)
probability of failure. For example, suppose we have n redundant components and, if
more than x of these fail, the system fails. Let p be the probability of component
failure in some designated operating period. Tbe probability that the system will not
fail is then the probability that x or fewer components fail and this is just the
binomial cumulative probability B(x; n, p).
Alternatively, consider a situation in which n events are possible and, if more than
x of these events occur, we have an accident. If the n events are independent and if
they all have approximately the same probability of occurring, the binomial
distribution is again applicable. In general, the binomial distribution may be used
when we have n repetitions of some event ("n trials") and we desire information
about the probability of x occurrences of an outcome, fewer than x occurrences of
an outcome, or more than x occurrences of an outcome, always presuming that the
assumptions previously listed are satisfied. The "n trials" can be n components, n
years, n systems, n human actions, or any other appropriate quantity.
We shall return to the binomial distribution shortly because two of its limiting
forms are of special interest to us. For the moment, however, we will discuss
distributions and distribution parameters in general.
PROBABILISTIC AND STATISTICAL ANALYSES
3. The Cumulative Distribution Function
X7
Let us use the symbol X to designate the possible results of a random experiment.
X is usually referred to as a random variable* and it may take on values that are
either discrete (e.g., the number of defective items in a lot) or continuous (e.g., the
heights or weights of a population of men). Actually, the latter category of values is
also strictly discrete because the measuring apparatus employed will have some limit
of resolution. It is convenient mathematically, however, to consider such values as
representing a continuous variable. It will be convenient to use the corresponding
lower case letter x to designate a specific value of the random variable.
The fundamental formula that we are about to present will be given for the
continuous case with detours here and there to point out differences between the
continuous and discrete cases whenever it seems important to do so. In general, it is
operationally a matter of replacing integral signs with summation signs. The
cumulative distribution function F(x) is defined as the probability that the random
variable Xassumes values less than or equal to the specific value x.
F(x) = P[X ~ xl (X6)
According to equation (X6), F(x) is a probability and thus must assume values
only between (and including) zero and one:
o~ F(x) ~ 1
If Xranges from 00 to +00, then
F(oo) = 0
F(+oo) = I
If X has a more restricted range, Xl <X <xu' then
F(XI) = 0
F(x
u
) = 1
An important property of the cumulative distribution function is that F(x) never
decreases in the direction of increasing x. F(x) is a nondecreasing function although
not necessarily monotonically so, in the strict mathematical sense. This can be stated
more succinctly as follows:
One further important property of F(x) is stated in equation (X?) below. **
(X?)
*Random variables will be denoted in this text by boldface type.
** For discrete variables the formula is P[ x1 < X " x2] = F(x
2
)  F(x
l
)·
X8
1           
o
CONTINUOUS VARIABLE CASE
1
o
FAULT TREE HANDBOOK
DISCRETE VARIABLE CASE
Figure X2. Typical Shapes for F(x)
The binomial cumulative distribution B(x; n, p) encountered in section 2 is one
specific example of F(x). Typical shapes for F(x) for a continuous variable and for a
discrete variable are shown in Figure X2.
The properties of the cumulative distribution function that we have presented in
the above formulae are generally valid for continuous and discrete random variables.
As an example of a random variable and its corresponding cumulative distribution,
consider a random experiment in which we observe times to failure of a single
component. Whenever the component fails, we repair it, set t = 0 and note the new
time to failure (see sketch below).
FAILURE 1 FAILURE 2 FAILURE 3 FAILURE 4
FAI LURE 5
Let us assume that repair ("renewal") does not alter the component, Le., in every
instance its repair state coincides with its initial operating state. The random variable
of interest, T, is the time to failure from renewal or repair. We represent a specific
value of T by the symbol t
i
. The cumulative distribution F(t) for any t thus gives the
probability that the time to failure will be less than or equal to t.
As another example consider a random experiment in which we are performing
repeated measurements on some item. The random variable X represents the
"measurement outcome" in general and Xi represents some specific measurement
value. The cumulative distribution F(x) gives the probability that the measurement
value is less than or equal to x. We could estimate values of F(x) from Fest ( ~ ) where
in which n gives the total number of measurements and ni is the number of
measurements in which Xassumes values less than or equal to Xi' As n gets larger and
larger Fest (Xi) will approach more and more closely the true value F ( ~ ) . In
application, the cumulative distribution function must either be determined from
theoretical considerations or be estimated by statistical methods.
PROBABILISTIC AND STATISTICAL ANALYSES
4. The Probability Density Function
X9
For a continuous random variable, the probability density function (pdf), f(x), is
obtained from F(x) by a process of differentiation: *
An equivalent statement is,
d
f(x) = dx F(x)
F(x) =LX f(y)dy
(X8)
(X9)
Because f(x) is defined as the slope of a nondecreasing function, we must have
f(x) ~ a (XI 0)"
When the pdf is integrated over the entire range of its argument, the result is
unity.
f: f(x)dx= I
This property of f(x) permits us to treat areas under f(x) as probabilities.
The fundamental meaning ofthe pdf is stated in equation (XI 2).
f(x)dx = P[x <X<x +dx]
(XII)
(X12)
Our previous equation (X?) can now be stated in another, especially useful, form:
(Xl3)
Typical shapes for f(x) are illustrated in Figure X3 in which (a) represents a
symmetrical distribution, (b) a distribution skewed to the right, and (c) a distribution
skewed to the left. (In the figures x increases as we move to the right).
In the case of a continuous variable, probabilities must be expressed in terms of
intervals. This is because the probability associated with some specific value x is
always zero because there are an infinite number of values of X in any range. Thus
f(x)dx is the probability that the quantity of interest will lie in the interval between
x and x + dx. Of course, the interval length dx may be made as small as we please.
The quantity f(x) itself is therefore the probability per unit interval. In the case of
*We assume F(x) is wellbehaved and allows differentiation.
XtO
(a) (b)
FAULT TREE HANDBOOK
(e)
Figure X3. Typical Shapes for f(x)
discrete variables we replace integral signs by summation signs and sum over the
individual probabilities of the xvalues in the range of interest. Equation (Xl3), for
example, then becomes applicable for a discrete variable X.
With regard to our previous time to failure example, f(t)dt gives the probability
that the component will fail in the time interval between t and t +dt. In the
measurement example f(x)dx gives the probability that the measurement outcome
will lie between x and x +dx. Empirically, if we consider a large number of
measurements, f(x)dx could be estimated by
n
where n is the total number of measurements and is the number of
measurements for which X lies between x and x +
5. Distribution Parameters and Moments
The characteristics of particular probability density functions are described by
distribution parameters. One variety of parameter serves to locate the distribution
along the horizontal axis. For this reason such a parameter is called a location
parameter.
The most common location parameter is the statistical average. Other location
parameters commonly employed are: the median (50% of the area under the.
probability density curve lies to the left of the median; the other 50%, to the right),
the mode, which locates the "peak" or maximum of the probability curve (there may
be no "peak" at all or there may be more than one as in bimodal or trimodal
distributions); the midrange, which is simply the average of the minimum and
maximum values when the variable has a limited range, and others of lesser
importance. For an illustration of these concepts refer to Figure X4.
In (a) the median is indicated by x.50' From the definition of the median, 50% of
the time the outcome will be less than or equal to x.5 0' and 50% of the time it will
be greater than x.50' Therefore p(X.s;;; x.50) = .50 and, in terms of the cumulative
distribution, F(x.50) =.50. The median is a particular case of the general
apercentile, xo:' defined such that F(xo:) =a. For example, the 90% percentile is
such that F(x.90) = .90 and 90% of the time the outcome value x will be less than or
equal to x.90'
PROBABILISTIC AND STATISTICAL ANALYSES Xll
><0.50
(AI MEDIAN (SI MODE
x + X
(C) MIO·RANGE· X = 1 2
. mr
2
Figure X4. The Median, Mode, and MidRange
In (b) of Figure X4, the mode is indicated by x
m
and gives the most probable
outcome value. In (c), we see how the midrange is calculated from the two extreme
values xl and x2'
The average is also termed the mean, or the expected value. If we repeat a random
experiment many times and average our outcome values, this empirical average will
approximate the true average and will approach the true average more and more
closely as the number of repetitions is increased. (We assume the distribution has an
average and is such that the empirical average converges to the population average.)
In the case of a symmetrical distribution as shown in (a) of Figure X3, the mean,
median, and mode all coincide. For skewed distributions, as in Figure X3 (c), the
median will fall between the mode and the mean. In Figure XS we see two
symmetrical distributions with the same values of mean, median, and mode. They
are, however, strikingly different from the standpoint of how the values are clustered
about the central position. Parameters used to describe this aspect of a distribution
are known as dispersion parameters. Of these, the most familiar are the variance and
the square root of the variance or standard deviation. Other dispersion parameters,
less frequently employed, are the median absolute deviation and the range between a
lower bound value and an upper bound value. We will calculate that variance of the
distribution in a later section.
There are, in fact, many other types of distribution parameters, but those we have
mentioned are some of the most basic ones. It is essential that we become familiar
with the general methods for calculating distribution parameters once the functional
Figure X5. Two Symmetrical Distributions with
Different Dispersion Parameters
X12 FAULT TREE HANDBOOK
form of the pdf is known. Some of these general methods entail calculating the
moments of the distribution and are of great importance in theoretical statistics.
Distribution moments may be calculated about any specified point but we shall
restrict ourselves to the discussion of (a) moments about the origin, and (b) moments
about the mean.
(a) Moments About the Origin
The first moment about the origin is defmed as
J.ll =I:x f(x)dx
(XI 4)
This gives us the mean or expected value of X, written E[X]. We will use the symbol
J.l for the mean for the sake of brevity although actually E[X] = J.l.
The second moment about the origin is defined as
J.l2 =J: x
2
f(x)dx
and gives us the expected value of X2, E[X
2
] .
In general, the nth moment about the origin is defmed as
J . l ~ =foo
oo
x
n
f(x)dx
(XIS)
(X16)
and gives us the expected value of xn, E[xn] .
If Y =g(X) is any function of X, and X is distributed according to the pdf f(x),
the expected value of g(X) may be obtained from
E[Y] =E[g(X)] =I:g(x) f(x)dx
(b) Moments About the Mean
The first moment about the mean is defined as
Because J.ll is always and invariably equal to 0, it is not of great utility.
(XI?)
(XI 8)
PROBABILISTIC AND STATISTICAL ANALYSES
The second moment about the mean is defined as
P.2 =I:(xp.)2 f(x)dx.
XI3
(X19)
This gives us the variance a
2
or E[(Xp.)2J. In general, the nth moment about the
mean is defined as
P.n =[00(xp.)n f(x)dx
and gives us E[(Xp.)nJ.
There is a useful relationship between P.2, J.l2, and J.ll ' namely,
(X20)
(X21)
Equation (X21) permits us to calculate the variance by evaluating the integral in
(XIS) rather than the integral in (X19) which is more complicated algebraically.
Equation (X21) is easily proved as follows:
J.l2 =1: (x _J.l)2 f(x)dx
In the case of a discrete random variable, the first moment about the origin is
written
n
p. = p.i = L.: xi P(xi)
i=l
(X22)
where P(xi) is the probability associated with the value xi" The commonly used
formula for fmding the average of n values,
n
 1 L.:
x = x·
n l'
i=l
is a special case of equation (X22) which can, be used whenever all the individual
values are treated as having the same probability or "weight," namely lIn. Thus, in
the case of a single die we have
X·14 FAULT TREE HANDBOOK
Il = Il' = I +2 +3 ~ 4 +5 +6 = 2i = 3.5
so that the expected value is 3.5 despite the fact that such an outcome is impossible
in practice.
Also, if the random variable is discrete, the second moment about the mean (the
variance) assumes the form:
n
112 = L(xi  1l)2 P(Xi) (X23)
i=l
In case all the xi have the same "weight"..!., equation (X23) reduces to a sampling
n
formula for computing the variance of a sample of n readings, *
(X24)
a
We conclude this section with a simple example of the use of distribution
moments. Consider the rectangular pdf shown in Figure X6 in which any value
between a and b is equally likely. Because any value is equally likely, f(x) = f
o
, a
constant. The area under a pdf must integrate to I and hence we have
1
Area = f
o
(ba) = 1 so that f
O
= ba .
f(x)
f
o
~       ......,
L...........l...... x
b
Figure X6. The Rectangular Distribution
*(X23) provides a biased estimate of the population variance a
2
. The bias is eliminated by
multiplying (X23) by the factor n/(nI), "Bessel's correction." For large n the difference is
slight. The question of bias is discussed later in this chapter.
PROBABILISTIC AND STATISTICAL ANALYSES
The mean of this distribution is given by
, I
b
1
J.I. = J.l.I = E(X) = x  dx
ba
a
=_1_ rx2Jb = b
2
 a
2
= (b  a)(b +a) = b +a .
ba L2 a 2(b  a) 2(b  a) 2
The variance of the distribution is given by
Var =a
2
=112 =112  CIlJ)2 =[bx
2
dx  r
=  (b;a)
2
= (b
3
; a
3
) _ (b:a) 2
= b
2
 2ab + a
2
= (ba)2
12 12
6. Limiting Forms of the Binomial: Normal, Poisson
XIS
Several important distributions can be obtained as limiting forms of the binomial
distribution. For example, consider
lim pX (1  p)nx] .
p fixed
n + ex>
We read the above as the limit when p is ftxed and n goes to infmity. Omitting the
mathematical details, this process leads to the famous normal or Gaussian*
distribution
f(x;ll, a)= a exp [HX:IlY]
(X25)
*We suspect that the normal distribution is sometimes called Gaussian in commemoration of
the fact that it was one of the few things Gauss didn't discover! Actually it was first investigated
by de Moivre.
X16 FAULT TREE HANDBOOK
where IJ. and a are the mean and standard deviation respectively. The normal
distribution is extensively tabulated but not in the form (X25). A direct tabulation
of (X25) would require an extensive coverage of values of the parameters IJ. and (J
and would be impossibly unwieldly. It is possible, however, to fInd a transformation
which has the effect of standardizing IJ. to 0 and a to I. This transformation is
xIJ.
z=
(J
and the corresponding distribution in terms of z is
fez) =_I_
e
z2/2
V2i
(X26)
(X27)
which is known as the standardized normal distribution and which forms the basis
for all tabulations.
The reader should note that the passage from equation (X25) to equation (X27)
via equation (X26) is not a matter of simple substitution. The transformation from
one distribution to another requires, among other things, a factor known as the
Jacobian of the transformation (see reference [25]). In the present case the
Jacobian is just a, which neatly cancels the a in the original coeffIcient 1/(...fI/i u). A
graph of fez) is shown in Figure X7.
f(z)
a
.,
.. a
t
I
z
z = 1 z =0 z = +1
z,
(x = J.l u) (x = J.l) (x = J.l + a)
Figure X7. The Standardized Normal Distribution
Several characteristics of the standardized normal distribution are tabulated. Of
primary interest to us is the area under the curve included between two points on the
horizontal axis. The reader should remember that such an area can be treated as a
probability because the total area under the curve is unity. Suppose that an arbitrary
point zl is a point of interest (see Figure X7). Some tabulations record the area
under the curve from zl to +00 (shaded in the fIgure). Some tabulations record the
area from zl to 00 and still others, the area from zl to the origin. It is, of course,
PROBABILISTIC AND STATISTICAL ANALYSES X17
essential to ascertain what area is being tabulated before the tables can be used
intelligently.
For the normal distribution, in terms of the original variable X, 0 measures the
distance from the mean /l to the inflection point of the curve. For X, the area under
the pdf curve from /l 0 to /l + 0 is approximately 0.68 and the area from /l 20 to
/l + 20 is approximately 0.95.
It is assumed that the reader already has some familiarity with the normal
distribution and its tabulations. Nonetheless, a simple numerical example will be
given, which can be skipped by the experienced. The widths of slots in forgings are
normally distributed with mean /l = 0.9000 inch and standard deviation 0 = 0.0030
inch. If the specification limits (acceptance limits) are 0.9000 inch ± 0.0050, what
percentage of the total output will be rejected? The rejected forgings will be those
widths that fall into either one of the shaded regions indicated in the sketch.
f(x)
0.8950 0.900 0.9050
x
The value of z corresponding to x = 0.9050 is
0.9050  0.9000
z = 0.0030
1.67
From standard normal tables, the righthand tail area for P[Z ~ 1.67] is 0.0475. This
is the probability that X ~ 0.9050. By symmetry the lefthand tail area is also
0.0475. Then the area of both tails is 2(0.0475) = 0.0950. This is the probability of
fmding a slot width that is outside the specifications. Therefore, 9.5% of the total
production will be rejected.
This rejection rate is fairly high. If no change in the specifications is possible, we
could lower the rejection rate by controlling production more carefully so that 0 is
reduced. Suppose that our goal is a rejection rate of 1/1000 = 0.001. What is the
maximum allowable 0, say 0', that is consistent with such a rejection rate?
X18 FAULT TREE HANDBOOK
If the rejection rate is 0.001 the area in each tail (see sketch) must be
0.001/2 =0.0005. From tables, the value of z that cuts off a tail area of 0.0005 is
3.3. From z =(xJ.L)/a we find a' =(xJ.L)/z,
th
' =0.9050  O . ~ O O O =0.005 =0 00152' h
us a 3.3 3.3· mc .
Therefore, for a rejection rate of .001, the maximum allowable value of a is 0.00152
inch.
There are several reasons for us to be interested in the normal distribution. One is
that the mean of a large number of measurements tends to be normally distributed
regardless of the underlying distribution of each measurement (from the Central
Limit Theorem). Another is that the normal distribution provides a fairly good
statistical model of many wearout processes.
Another critically important distribution, from the standpoint of systems and
reliability analysis, is obtained as the following limiting form of the binomial
distribution:
lim { ( ~ ) pX (I_p)nx}
n + 00
n+O
(X28)
In equation (X28) the limit is taken in such a way that the product np remains
finite. The outcome of the limiting process is
(
np)X mX
__ enp = _ em
x! m!
(X29)
where m =np. (The mathematical details of the limiting process can be found in
many texts, e.g.: reference [32], pp. 4546.)
Equation (X29) gives the probability of exactly x occurrences of a rare event
(p ~ 0) in a large number of trials (n ~ (0). The expected number of occurrences of
PROBABILISTIC AND STATISTICAL ANALYSES X19
the event is np = m. The distribution given in (X29) is known as the Poisson
distribution. As can be shown by the method of moments (although there are easier
ways) that the mean and variance of the Poisson distribution are both numerically
equal to m.
The Poisson distribution provides a fairly good approximation to the binomial
even though p is not particularly small and n is not especially large. For example,
suppose that in a mass production process the probability of encountering a defective
unit is 0.1 (p = 0.1). What is the probability of finding exactly one defective unit in a
random lot of 10 (n = 10)? The exact result can be found by using the binomial
distribution:
b(1; 10,0.1)=0.3874
The Poisson approximation yields the result 0.3679 which does not differ greatly
from the exact value. If we increase the random lot size to 20 (n = 20), the
agreement is even closer:
binomial  0.2702
Poisson  0.2707
The Poisson distribution is important not only because it approximates the
binomial, but because it describes the behavior of many rare event occurrences,
regardless of their underlying physical process. The Poisson distribution also has
numerous applications in describing the occurrence of system (or component)
failures under steadystate conditions. We will now further discuss these system
applications.
7. Application of the Poisson Distribution to System FailuresThe
SoCalled Exponential Distribution
Assume we have a given system which is in steadystate, when it is not in the
process of being burnt in and is it not wearing out. Further, we will assume that
when the system fails, it is restored to a condition that is essentially as good as new
and that the repair time is negligible. Let the event of interest be system failure. We
are specifically interested in the probability of exactly °occurrences of system
failure. Thus, applying the Poisson distribution we are led to set x =0 in equation
(X29). The result is:
P[0 occurrences of system failure] =e
m
where m =expected number of system failures in a large number of "trials."
Now, as far as failures are concerned, the parameter of interest is time. We thus
seek a way of somehow expressing m in terms of time. It turns out that this can be
readily done.
Assume we have data for this system and, on the average, a failure occurs every 50
hours. We say the mean time to failure (0) is 50 hours. If we operate this system for
X20 FAULT TREE HANDBOOK
100 hours, we expect to experience 2 failures; that is, 100/50 =2. Using the symbol t
for operating time we have in general:
Expected number of failures in time t =t/O =At if A=I/O
But m = expected number of failures. Therefore,
P[D occurrences of system failure] =e
m
=e
t
/
8
=eXt.
Now the reliability of a system, R(t), is by defmition the probability of continuous
successful operation for a time t. Hence we have
(X3D)
The probability of system failure prior to time t is given by the cumulative
distribution function F(t). The system either fails prior to time to or it does not; we
have, therefore,
R(t) = e
Xt
= 1 F(t)
and
F(t) =1  e
xt
.
The pdf corresponding to equation (X31) can now easily be found.
f(t) =.! F(t) =..! (l  e
Xt
)
dt dt
f(t) =Ae
Xt
(X.31)
(X32)
The pdf in equation (X32) is generally referred to as the "exponential distribution
of timetofailure." Also, the expression (X3D) is sometimes referred to as simply the
exponential distribution.
The reliability, cumulative distribution, and pdf, given by equations (X3D),
(X31), and (X32), respectively, are widely used in system analysis and reliability.
The reason is clear. The exponential is an especially simple distribution. Only one
parameter (A, the failure rate or 0, the mean time to failure) must be determined
empirically. We must;however, be extremely cautious in applying equation (X3D) to
the calculation of system reliability for the follOWing reasons. We saw that equation
(X3D) arises from the Poisson distribution. The latter is obtained as a limiting form
of the binomial distribution. The use of the binomial distribution is restricted by a
number of assumptions that we took care to list earlier in this chapter. Some of these
assumptions have been modified in the limiting process, but one of them remains
untouched: the assumption that all "trials" are mutually independent. Translated
into failure language, a trial is an "opportunity to fail in some interval 0 f time."
PROBABILISTIC AND STATISTICAL ANALYSES X21
When system failures are repairable, then our interpretation of the independent
trial assumption is as follows: the probability of failure in some future interval is a
function of only the length of the interval and is independent of the number of past
failures. If the system is not able to be repaired, our interpretation of the probability
behavior must be modified to the following: given no prior failure, the probability of
failure in some future time interval is a function only of.the length of that interval.
We need the restriction "given no prior failure" because, for a nonrepairable
component, if we have had a failure at an earlier time, then the probability of the
failure existing at a later time is 1 and the probability of the failure occurring at a
later time is 0 because it has already occurred.
Another way of characterizing the failure processes is as follows. For the
exponential distribution, given no failure up to time t, the probability of failure in
the interval (t, t + At) is the same as the probability of failure in any interval of the
same length (given no failure has occurred up to that time). Specifically, it is the
same as the probability of failure in the interval (0, At). Thus, because we start
operation at t =0, we are "as good as new" at time t, which is another description of
the exponential.
If we start with the assumption that the probability of failure in a certain interval
is a function only of interval length, we can derive the exponential distribution from
that assumption alone. Consider, for example, a nonrepairable system that can exist
in one of two states: E
I
system operating, Eosystem failed. Let us define
PI (t) = probability that system is in state E
I
at time t
poet) =probability that system is in state EO at time t
and let us assume that at the start of time (t = 0) the system is in its operating state
E
I
·
Now PI (t + At) represents the probability that the system is in state E
1
at time
t + At. We can write
where, according to our initial assumption (probability of failure in a given interval is
a function only of interval length), the quantity ALit gives the probability that the
system makes a transition from state E
1
to state Eo in the time interval At and that A
is some constant (the failure rate). Consequently (1  AAt) gives the probability that
the system does not make a transition from E
I
to EO (i.e., does not fail) in time
interval At. Algebraic rearrangement yields the following difference equation:
If we allow the length of the time interval to approach zero (At*O), the limiting
form of the lefthand side of the equation is just, by definition, the derivative of
PI (t) with respect to t. Thus
. [PI(t+ At)PI(t)] d ,
~ ~ At = dtPI(t)=PI(t)=API(t)
X22 FAULT TREE HANDBOOK
in which the prime stands for differentiation with respect to time, a convention
originally proposed by Newton. We now have the differential equation,
Pi(t)=API(t).
This is easily integrated if we remember our boundary condition PI (t=O) = 1.
t t
[In PI (t)] 0 = [ At] 0
which is just the reliability of the system. Also, because Po(t) +PI (t) = 1, we have
poet) = 1  ei\t = I  R(t) = F(t).
The corresponding pdf is
f(t) = Aei\t
and we recognize the exponential distribution.
8. The Failure Rate Function
Recall from previous sections that
F(t) = P[failure occurs at some time prior to t]
and that
f(t)dt = P[failure occurs between t and t +dt].
We now define a conditional probability, A(t), called the failure rate function:
A(t)dt = P [failure occurs between t and t +dt I no prior failure]. (X33)
For any general distribution, there is an important relationship between the three
functions A(t), f(t), and" F(t);
_ f(t)
A(t)  I  F(t) .
(X34)
PROBABILISTIC AND STATISTICAL ANALYSES X23
The validity of equation (X34) is easily demonstrated as follows.
Let us designate the time at which failure occurs as T. T is a random variable
according to the defmition given in equation (X.33),
X(t)dt = P [t <T < t + dt It <T].
Let (t <T <t + dt) be denoted as event A and (t <T) be denoted as event B. We
remember from basic probability theory that in general,
P (A I B) =peA n B)
PCB) .
Therefore,
( )d
_ P [(t <T < t + dt)n(t < T)]
X t t  pet <T)
Now event A is just a special case of event B, Le., when A occurs then B
automatically occurs. In set theoretic notation, ACB (A is a subset of B) and, under
these circumstances, AnB = A. It follows that
X(t)dt= P [t<T<t+dt]=P[A] = f(t)dt.
P(t<T) P[B] IF(t)
Finally,
A(t)  f(t)
1  F(t) ,
which is just equation (X.34).
If A(t) is plotted against time for a general system, the curve shown in Figure X8
results. For obvious reasons this relationship between A(t) and t has become known
as the "bathtub curve."
A(t)
\I III
"""""'._t
Figure X8. Plot of A(t) vs. t for a General System
X24 FAULT TREE HANDBOOK
This curve can be divided into three parts which are labeled I, II, III in Figure X8.
Region I is termed the region of "infant mortality" where sometimes an underlying
distribution is difficult to determine. The distribution appropriate to this part of the
curve may depend quite critically on the nature of the system itself. Manufacturers
will frequently subject their product to a burnin period attempting to eliminate the
early failures before lots are shipped to the consumer. Region II corresponds to "a
constant failure rate function," and is the region of chance failures to which the
exponential distribution applies. Region III corresponds to a wearout process for
which the normal distribution often provides an adequate model. For an actual
system, the X(t)vs.t curve may be quite different from that depicted in Figure X8.
For example, the exponential Region II may be entirely missing or the burnin region
may be negligible.
Returning to the failure rate function equations, it is convenient to solve (X.34)
explicitly for both F(t) and f(t). This is accomplished by writing equation (X34) in
the form
X(t)dt =_ [F'(t) dt]
1  F(t)
where
F
'
( ) =d F(t)
t dt.
Integrating both sides of equation (X35) we have
t
LX(x) dx =In [1  F(t)],
o
which is equivalent to
1 F(t) = e x p ~ t X(X)dX}
and therefore F(t) =1  e x p ~ t X(X)dX].
If we differentiate equation (X36), we obtain
f(t) = X(t) e x p ~ t X(X)dx}
(X35)
(X36)
(X3?)
The reader should convince himself that if we put X(t) =X=constant in (X36)
and (X3?), we obtain
PROBABILISTIC AND STATISTICAL ANALYSES
F(t) = 1  e
xt
and f(t) = ~ £ X t ,
X2S
which is just the exponential distribution. For the exponential distribution, then, the
failure rate function is a constant (independent of t) and
A(t)dt = P [failure occurs between t and t + dt I no prior failure] = Adt.
If we choose to describe a component failure distribution with the exponential
distribution, we are assuming that we are on the constant, steady state portion of the
bathtub curve with no burnin or wearout occurring. Because of the constancy of
the failure rate function in this case, the exponential distribution is often referred to
as the "random failure distribution," i.e., the probability of future failure is
independent of previous successful operating time.
It is also valuable to note that if we use e
t
/ e to represent reliability, we are being
conservative even if wearout is occurring (but no bumin), i.e., R(t) ;;;. et/e where
R(t) is the actual reliability and 0 is the actual mean time to failure. This relation is
true for t ~ 8. (For a proof of this see Reference [15]).
Equations (X36) and (X37) may be used to investigate a wide variety of failure
rate models. For instance, if A(t) = kt (linearly increasing failure rate) we find that
R(t) = 1  F(t) = exp(kt
2
/2)
which is known as the Rayleigh distribution. An important distribution of
timestofailure, the Weibull distribution, is obtained by putting ACt) = Kt
m
(m>  1) whence
(
ktm+l)
f(t) = kt
m
exp  m+1
and
(
ktm+l)
R(t)=IF(t)=exp  m+1 .
The Weibull distribution is a twoparameter distribution, where k is known as the
scale parameter and m as the shape parameter. For m = 0 we have the exponential
distribution and as m increases a wearout behavior is modeled. When m increases to
2, f( t) approaches normality. When m is less than 0 but greater than 1, the burnin
portion of certain bathtub curves may be modeled. Thus, by changing the value of m,
we can use the Weibull distribution to embrace regions I, II, or III of the bathtub
curve. The reader can find a more detailed description of the Weibull distribution in
the literature of reliability (see reference [23] pp. 137138, [32], Appendix D, and
[36] , p. 190 et seq.).
9. An Application Involving TimeToFailure Distribution
The concept of a distribution of timestofailure is an extremely important one
and in order to impress this more thoroughly on the reader's mind, we consider the
follOWing example.
X26 FAULT TREE HANDBOOK
Similar components are being purchased from two manufacturers, A and B.
Manufacturer A claims a mean life of 100 hours (0 A = 100 hrs.) and states that the
distribution of timestofailure for his components is exponential. B also claims a
mean life. of 100 hours (OB = 100 hrs.) but states that the appropriate distribution of
timestofailure in his case is normal with a mean of 100 hours and a standard
deviation of 40 hours.
For both types of components let us attempt to compute the reliability for 10
hours of operation. First we consider the component from manufacturer A.
tie
RA(t)=e A
R
A
(10) = elO/lOO = e
O
.
1
= 0.905
Thus for Manufacturer A there is a 90.5% reliability.
Now we consider the components from manufacturer B. The distribution here is
normal; we need to find a value of the transformed variate z corresponding to t =10
hours.
tO
B
10100
z== =2.25
uB 40
This value of z cuts off a tail area of 0.01222 (from standard normal tables) and
represents the probability of failure prior to t = 10 hours. * Thus
RB(t = 10) = 1 0.01222 = 0.988
Thus for Manufacturer B the reliability is 98.8%.
From the above example we note the between R
A
and R
B
despite the
fact that 0A = 0B' This difference arises because of the difference in the failure
distributions. As t increases, eventually the exponential will give higher values of
reliability than the normal. For instance, for t = 100 hours, R
A
= 36.8% and
R
B
= 50.0% but for t = 200 hours, R
A
= 13.5% and R
B
= 0.62%. Somewhere
between t = 100 hours and t = 200 hours both distributions give the same value for
reliability. The reader may care to calculate the value oft that gives R
A
=R
B
.
10. Statistical Estimation
Suppose that we were engaged in a study of the heights of men between the ages
of 20 and 30 in the greater Los Angeles area. This is a very large population, and
although we might be eager and willing to measure every member of the population,
this would assuredly prove to be physically impossible.
By way of compromise, we take a random sample from the population. The
importance of the sample's being random will be discussed shortly. From the sample
we can estimate any sample parameters that may be of interest, such as the sample
mean, the sample median, the sample variance, and so forth. The problem simply
*This is not exactly correct because values of z less than 2.5 correspond to negative values of
time. However, the area in the tail corresponding to z =2.5 is extremely small and we ha.ve not
considered it necessary to make the small correction involved.
PROBABILISTIC AND STATISTICAL ANALYSES X27
this: how good are these sample statistics as estimates of corresponding population
quantities? As a matter of fact, have we any assurance that a sample mean is a better
estimate of the population mean than is, for instance, a sample median or a sample
midrange? To answer questions such as these, we have to know precisely what is
meant statistically by statements like:
"1i'a is a good estimator of population parameter 0"
"1J'b is a best estimator of population parameter 8"
"9
b
is a better estimator of ethan is '8'c"*
These matters will be taken up in Section 13. First we must discuss the importance
of choosing random samples, and then we must establish the concept of a sampling
distribution, specifically the sampling distribution of the mean.
11. Random Samples
A random sample can be defined as a sample in which every member of the
population has the same chance of being included. Most statistical calculations are
based on the assumption of randomness; conclusions drawn from a sample which is
thought to be random but which actually does not accurately reflect the
characteristics of the population, can, therefore, be seriously in error.
A classic case in which the assumption of randomness was invalid involved the
Literary Digest Poll of 1936. The poll attempted to sample public opinion on the
subject of whether Mr. Roosevelt or Mr. Landon would win election to the
presidency of the United States. The poll predicted a Landon victory whereas, in
fact, Roosevelt won with a popular plurality of 11,069,785 votes and an electoral
vote of 523 to 8. In this case, polling was carried out largely by means of telephone.
In the depressed economic state of the country in those days, most of the telephones
were owned by wealthy to moderately wealthy Republicans who intended to vote
for Landon. The conclusions drawn from this nonrandom sample were spectacularly
in error. Shortly thereafter the Literary Digest became defunct.
If you wish to select a random sample of electronic components from a large crate
that has just been delivered, it would be incorrect to select only from those
components at the top of the crate. In this case you might remain blissfully unaware
that the crate had been dropped in transit and that every component at the botton of
the crate had been smashed. Whenever sampling is performed, care must be taken to
ensure a random sample because usual estimation techniques are based on this
assumption. One simple method, for example, is to use random number tables to
select the specific samples. Other random sample approaches are described in
Reference [10].
12. Sampling Distributions
Suppose we take a random sample of size n from some population and compute a
n
sample mean xl' where Xl =~ L:: Xi' We could now take a second sample of size n
i=1
*The symbol (j is added to designate an estimator of some population parameter ().
X28 FAULT TREE HANDBOOK
and compute its mean x2. In a like manner we could generate other sample means,
x3' x3' x5' etc. We do not expect these sample means to all be the same. In fact,
these sample means are values of a random variable. Let the sample means in general
be represented by the symbol Xwhich denotes the random variable. The question
arises, how is X distributed? A partial answer is provided by socalled restricted
central limit theorem which states:
If X (the random variable of interest) is distributed normally with mean
J.1 and variance a
2
, then X is distributed normally with mean J.1X =J.1 and
variance (ax)2 =a
2
/n where n is the sample size.
This theorem is exactly true only for populations of infinite size; for finite
populations of size N and for samples of size n,
More important is the general central limit theorem which states that if X is
distributed with mean J.1 and variance a
2
but with distribution otherwise unknown,
the distribution of X is still very closely approximated by the normal distribution
with mean J.1 and variance a
2
In, at least for large n (e.g., n ~ 50).
Consequently, whenever we deal with sample means of large samples, we are
concerned with normal distributions. The variance of the sampling distribution of the
mean decreases as the sample size increases and this provides a justification for taking
as large a sample as possible. Note that the z transformation corresponding to an x
value is
xJ.1
z ='
alyn
Other estimators (e.g., median, range, variance, etc.) are characterized by their
corresponding sampling distributions, most of which can be found in advanced texts
on statistics, e.g., Reference [30]. For instance, for a normal distribution, the
variance estimate, s2, is a function of a value X
2
from the chi square distribution
through the relation
The chi square distribution has been extensively tabulated and is widely used in
decision criterion, in goodness of fit tests, and in hypothesis testing.
13. Point EstimatesGeneral
A single numerical value (such as i) calculated from a sample constitutes a point
estimate of some corresponding parameter. The associated random variable represent
ing the collection of values is called the sample estimator. To facilitate the discussion
PROBABILISTIC AND STATISTICAL ANALYSES
X29
let us use the symbol (J to designate the population parameter being estimated and
the symbols 'U'a' 'U'b' 'U'c' ... to designate various sample estimators for (J. For
instance, if (J were the population mean, then 'U'a could be the sample mean
estimator; 'U'b' the sample median estimator; 'U'c' the sample midrange estimator, and
so forth. The estimators 'U'a' '8'b' 'U'c' ... will all have sampling distributions. The
reader should note that the following characteristics of estimators relate to these
sampling distributions.
a) Unbiased Estimators
An estimator is said to be unbiased if its sampling distribution has a mean that
equals the population parameter being estimated. Thus, if 'U'a is an unbiased estimator
of the population mean Ji, then
From the properties of the expectation, we know that the sample mean Xis an
unbiased estimator of Ji because E(X) =Ji. On the other hand
is a biased estimator of 0
2
. If we multiply by n/(nl )Bessel's correctionwe have
n
S2 =_1_ ,,(X.  X)2
nl L..J 1 '
i=1
which is an unbiased estimator of 0
2
.
b) Minimum Mean Square Error Estimators
and Minimum Variance Estimators
The mean square error (MSE) of an estimator is defmed as
(X38)
The MSE thus is a measure of the amount an estimator ft deviates from the true value
(J. By adding and subtracting E(ft) inside the parenthesis in equation (X38), and
manipulating the result, we can always rewrite the MSE as
The first term on the righthand side is the variance of the estimator and the second
term is called the square of the bias of the estimator. If the estimator is unbiased
then E(11') = (J and
X30
MSE =E[ 8'  E(8)] 2.
FAULT TREE HANDBOOK
Hence the MSE for an unbiased estimator is simply the variance of the estimator.
If among several estimators 1f
a
, 1f
b
, 1f
c
' ... , one of them has the smallest MSE,
then that estimator is called the minimum mean square error (MMSE) estimator. If
among several estimators, all of which are unbiased, one has minimum variance, then
that estimator is called the minimum variance unbiased estimator and abbreviated
MVUE.
The choice of estimator depends on the situation. If we are going to use an
estimator in numerous applications then we generally want the estimator to be
unbiased, because on the average we want the estimator to equal the true value. If we
choose between two or more unbiased estimators, we usually choose the one with
minimum variance. In comparing the variance of two unbiased estimators 1f
l
and 1f
2
,
we say that the one with the smaller variance is relatively more efficient and in fact,
we use the ratio
var ('8'2)
var (1f
l
)
as one measure of the relative efficiency of estimator 1f
l
with respect to 1f
2
.
If, however, we are going to use an estimator only once or a few times, then a
(biased) MMSE estimator may be more efficient. In this case we are more interested
in minimizing the deviation from the true value than we are in the longrun unbiased
property.
c) Consistent Estimators
If1f
a
is a consistent estimator of 0, then
P[I1faOI<€J > (1<5) for all n>n'
where € and <5 are arbitrarily small positive numbers and n' is some integer. We can
interpret the above equation as saying that as the sample size n increases, the pdf of
the estimator concentrates about the true value of the parameter. When n becomes
very large then the probability that the estimator deviates from the true value goes to
zero. In this case we say that "1f
a
converges in probability to 8."
Properties (a), (b), and (c) are the principal characteristics that determine whether
an estimator is good or not. Further discussion of estimator considerations is given in
reference [24] .
14. Point EstimatorsMaximum Likelihood
There is one very important technique for calculating estimators that is called the
method of maximum likelihood. This method is widely used, for example, in
PROBABILISTIC AND STATISTICAL ANALYSES X31
computing parameter estimates in life testing. For large sample sizes (n+oo) under
rather general conditions, the maximum likelihood technique yields estimators which
are consistent and which are both MMSE and MVUE. Even for moderate or small
samples, the maximum likelihood technique yields estimates which are generally
usable. The technique is based on the supposition that the particular random sample
drawn from the population is the most likely one that could have been chosen. To
make this supposition reasonable, consider two examples.
Bridge players do not expect to pick up a hand containing all 13 spades. The
probability of such a hand is extremely small because it can arise in only one way.
The probability of a 13spade hand, however, is the same as that associated with any
hand that is preassigned card by card, because that hand can also arise in only one
way. The kind of hand you usually get may look something like this:
4 Spades
2 Hearts
4 Diamonds
3 Clubs
This hand can arise in an enormous number of waysto be precise,
(11) (1i) (11) (1i) = (13)4 (11)3 (0)2 (3) ways (over 10 billion). In fact, you will
get the distribution 4432 (regardless of suit) about 20% of the time. For most of
the rest of the time you will be getting hands that are very similar to this. The
maximum likelihood technique is based on the premise that the sample we have is a
most probable one, or is near the most probable one.
As a more physical illustration, assume that we had a special camera that
photographed the molecules in a box full of gas. When the photographs are
developed, not only do we see the actual, instantaneous locations of the molecules
but also their individual vector velocities. We could take photographs like this for
thousands of years and they would all look pretty much the same: a homogeneous
distribution of molecules in space going all directions. Even so, there is a positive
(although miniscule) probability that we shall fmd a picture shOWing all the particles
crowded into one corner of the box and all heading due north. If we were to apply
the maximum likelihood technique to one sample (a given photograph), then we
would be making the assumption that the sample was a probable one and not the
very unlikely one.
In general, the technique of maximum likelihood is founded on the assumption
that our sample is representative of the most likely one we could have withdrawn
from the populationalways with the proviso that we have gone to some pains to
assure that it is a random one.
Suppose that we are taking random samples from a population that is distributed'
according to the pdf f(x; 0) where 0 is some unknown population parameter that we
desire to estimate. Assume that our sample (size n) is xl' x2" .. , x
n
and that the
sample variables are independent. Using the pdf, we write down an expression that
gives us the probability associated with this particular sample and then apply the
condition that it be a maximum.
X32 FAULT TREE HANDBOOK
The probability that out fust reading will be xl in the interval dXI is simply
f(xI; O)dxI' The probability that our first reading will be xl in dX
1
and that our
second reading will be x2 in dX2 is
Following this line of reasoning, we can now write down an expression that gives us
the probability of our sample:
If we drop off the differentials we get an expression that is known as the
likelihood function:
n
Likelihood Function = f(xI; 0) f(x2; 0) ... f(Xn; 0) = IIf(xi; 8) (X40)
i=1
where the symbol n stands for a continued product. The likelihood function is no
longer equal to the probability of the sample but it represents a quantity that is
proportional to that probability.*
The next step is to take the natural logarithm ("In") of the likelihood function.
This is simply a matter of convenience because most of the pdfs we encounter are
exponential in form and it is easier to work with the natural logarithm than with the
function itself.
n n
In (Likelihood Function) = L(O) = In II f(xI; 0) = Lin f(Xi; 8) (X41)
i=1 i=1
Notice that this is a function only of 0 because all the Xi's (our sample values) are
known. We are now in a position to determine 0 for which L(O) is a maximum. We
do this by setting the derivative of L(O) with respect to () equal to zero and solving
for O.
d
d(} L(O) = O.
(X42)
Assuming a solution is obtainable (which it generally is), the result is written (}ML'
the maximum likelihood estimate of the unknown population parameter O.
We shall now consider some specific examples to indicate how the maximum
likelihood technique is applied in practice. Suppose that we are taking random
samples from a population whose distribution is normal with unknown mean p. and
known variance a
2
= 1.
*If the sample variables were not independent, then the likelihood would consist of a
multivariate distribution, and we would attempt to maximize that (see Reference [24]).
PROBABILISTIC AND STATISTICAL ANALYSES
1 [(X
p
)2]
f(x; p, a = 1) = V2i exp  2 .
We desire to make :1. maximum likelihood estimate of p.
The likelihood function is
= 1 .. exp [_1 (x.p)21.
(21r)n/2 2 1 J
Taking natural logarithms we have
n
L(p) =  In (21r) 
i=l
Applying the maximization condition,
dL(p) (1)
<i;" = "2 (2) L.,., (xCp)(l) =L.,., (xCp) = O.
i=l i=l
This yields
 nil = 0
1 ,.. ,
so
X33
n
_
PML =  L.,., Xi = x.
n i=l
Thus, the maximum likelihood estimate of p is just the ordinary arithmetic mean.
If more than one population parameter is to be estimated, the process is similar.
Suppose that in our previous example we knew neither p nor a
2
. Then our
fundamental pdf is
1 [ (X
p
)2]
f(x; p, a
2
) = exp  .
v0/ffa 2a
2
The likelihood function becomes
X34
Taking natural logarithms we have
FAULT TREE HANDBOOK
n
L(J.L, a
2
) = In (21T)  In a
2
 I: .
2a i=1
We now fmd and 3L{[3(a
2
)] and set the results equal to zero. The first
operation yields the same result as before, namely,
n
II: 
= Xi =x.
n i=l
The second operation yields
n
a
2
ML = I: (xc
x
)2 .
i=l
This is a biased estimate of a
2
. The bias may be removed by multiplying by the
quantity n/(nl). Then
n
2 __1 _)2
aunbiased  nl L....J Xi X •
i=1
There is little difference between and if n, the sample size, is large
(say n 30). The estimated value of the variance is often denoted by the symbol s2.
As a final example let us return to the exponential distribution and find the ML
estimate for 0, the mean life. .
f(t; 0) =0
1
e
t
/
8
(t> 0).
Assume we have observed n times of failure for n components tested. Then
and
Therefore,
and
so that
n
0ML = I: t
i
, again the simple arithmetic mean.
i=1
PROBABILISTIC AND STATISTICAL ANALYSES X3S
15. Interval Estimators
We have seen in the previous sections how we can make point estimates of
population parameters on the basis of one or more random samples drawn from the
population. We can, if we desire, adopt a different approach. This involves making an
assertion such as:
P [(1J'lower < 8 < ~ p p e r ) ] =71
where 8 is some unknown population parameter, '8tower and 1rupper are estimators
associated with a random sample and 71 is a probability value such as 0.99, 0.95,0.90,
etc. If, for instance, 71 =0.95 we refer to the interval
for particular values of'O'lower and lfupper as a 95% confidence interval. In this case
we are willing to accept a 5% probability (risk) that our assertion is not, in fact, true.
To help clarify the concept of a confidence interval we can look at the
situation in a geometrical way. Suppose we draw repeated samples (Xl' x2)
from a population, one of whose parameters, 8, we desire to bracket with a
confidence interval. We construct a threedimensional space with the vertical axis
corresponding to 8 and with the two horizontal axes corresponding to values of X I
and x2 (see Figure X9). The actual value of the population parameter 8 is marked on
the vertical axis and a horizontal plane is passed through this point. Now we take a
random sample (Xl' x2) from which we calculate the values 8
u
and 8
L
at, say, the
95% confidence level. The interval defined by 8
u
and 8L is plotted on the figure.
£u'
.
'e'
.:. L
e
I:::
X
2
x,
Figure X9. Geometrical Interpretation of the Concept
of a Confidence Interval
X36 FAULT TREE HANDBOOK
Next we take a second sample (xi, X2) from which we calculate the values 0u and
0i at the 95% level. This interval is plotted on the figure. A third sample (xi', x2)
yields the values 0u and 0L' etc. In this way we can generate a large family of
confidence intervals. The confidence intervals depend only on the sample values (xl'
x2), (xi, xi), etc., and hence we can calculate these intervals without knowledge of
the true value of 0. If the confidence intervals are all calculated on the basis of 95%
confidence and if we have a very large family of these intervals, then 95% of them
will cut the horizontal plane through °(and thus include 0) and 5% of them will not.
The process of taking a random sample and computing from it a confidence
interval is equivalent to the process of reaching into a bag containing thousands of
confidence intervals and grabbing one at random. If they are all 95% intervals, our
chance of choosing one that does indeed include ewill be 95%. In contrast, 5% of
the time we will be unlucky and select one that does not include e(like the interval
(Ou, 0D in Figure X9). If a risk of 5% is judged too high, we can go to 99% intervals,
for which the risk is only 1%. As we go to higher confidence levels (and lower risks)
the lengths of the intervals increase until for 100% confidence the interval includes
every conceivable value of °(I am 100% confident that the number of defective
items in a population of 10,000 is somewhere between 0 and 10,000.). For this
reason 100% confidence intervals are of little interest.
Now we shall give an example shOWing how the quantities 0L and 0u are
computed from a sample drawn from a normally distributed population with mean fJ.
and standard deviation a. For the example, we will assume that we wish to bracket fJ.
and that we know a (based on previous data and knowledge). If each sample is drawn
from a normal distribution then the sample mean Xhas a normal distribution with
mean fJ. and standard deviation a/Vii, where n is the sample size. Even if each sample
value is not drawn from a normal distribution, then by the Central Limit Theorem,
for sufficiently large n, X will still be approximately normal with mean fJ. and
standard deviation a/yn. The quantity Z will then be the standardized normal
random variable, where
XfJ.
Z=
a/yn
and where the distribution of Z is tabulated. Because the distribution of Z is
tabulated we can find, for any assigned probability 17, values wand w such that
P [  w < Z ~ w ] =17.
For example, for 17 =0.95, w =1.96. Substituting for Z in the above equation we
thus have
where again w is known for any given 17. (For example, we could substitute the values
0.95 for 17 and 1.96 for w.) .
We are now going to concentrate on the inequality on the lefthand side of the last
equation and attempt to convert it into the form
PROBABILISTIC AND STATISTICAL ANALYSES X37
Let us first multiply through by the factor a/yn thus converting the inequality to
[
a  a]
w vIi < XJ.l < +w v n ~ r
Now we subtract X from every term of the inequality
Next we multiply through by 1 remembering, in the process, to reverse the
directions of the inequalities.
[
a  a J
w+ X>J.l>w+ X
vn vn
which we can write in the form
[
 a  a J
Xw<p<X+w .
vn ..;n
Substituting a particular value xfor the random variable Xwe thus have a particular
interval,
The above inequality yields our desired confidence interval for p. In the case of the
population mean, then
_ a _ a
0L =x  w vn and 0u =x + w ...;n .
If a confidence coefficient 1] has been assigned, X, n, and ware known quantities.
The value of a is also assumed to be known. If we do not know the value of a then
we can, as described earlier for the normal distribution, estimate a from our sample
thus obtaining the quantity s. Now we can form the standardized value t, where
xJ.l
t=.
s/vn
Proceeding as we did with the Z variable, we can form the inequality
_ ts _ ts
x<p<x +.
..;n vn
X38·
FAULT TREE HANDBtJv.."
The value t, however, is not a value from the standard normal distribution. The
distribution was investigated in the early part of this century by W.S. Gossett and is
known as the t distribution; its properties are now extensively tabulated: So if a is
unknown, we find the value of t for given 'Y/ from t tables and not from normal
tables. The values of the t distribution depend on the sample size (degrees of
freedom). As it turns out, when the sample size is greater than 25 or 30 the t table
value is indistinguishable from the normal table value so that in that case, normal
tables may be used.
In terms of estimating the reliability or mean time to failure, onesided confidence
intervals are more commonly seen than twosided intervals. If the sampling
distribution is symmetric (equal tail areas on the ends), then a twosided interval may
easily be converted into a corresponding onesided interval; e.g., if
0.95 < R < 0.98
at the 95% confidence level, then
R>0.95
at the 97.5% confidence level.
For the exponential distribution, the mean time between failures, (J', was
estimated by the point estimate
nOML
where t
i
are the observed times of failure. It can be shown that Xl = 0 follows a
chi square distribution with 2n degrees of freedom. Letting X
2
(97.5, 2n) and
X
2
(2.5, 2n) be the chi square values corresponding to cumulative distribution values
of 97.5% and 2.5%, we have for a twosided 95% confidence interval
nOML
X
2
(2.5, 2n)< 0 <X
2
(97.5, 2n)
or, equivalently,
nOML . nOML
2 <0 < 2
X (97.5, 2n) X (2.5,2n)
Other intervals for different levels of confidence can be obtained from the tabulated
values of the chi square distribution. Confidence intervals for the failure rate A. = 110
can be easily obtained by inverting the above inequalities for (J.
For more involved experiments, the confidence intervals, as well as the maximum
likelihood point estimates, depend on the way the data are collected. In what is
called a type 1 test, for example, n components having the same failure rate are
operated for a prearranged time interval T. The number of components failing in this
time interval is then random. For a type 2 test, n components are operated until a
PROBABILISTIC AND STATISTICAL ANALYSES X39
preassigned number of components fail and this preassigned number may be less than
n.
The text by Mann, Schafer, and Singpurwalla (see reference [24]), cited earlier,
further discusses tests of type 1 and 2, giving associated point estimates and
confidence intervals. Testing with replacement is also discussed and confidence
intervals and point estimates are given for the Weibull and gamma distributions as
well as the exponential distribution under a variety of circumstances.
16. Bayesian Analyses
In the past discussions, we have been treating the parameters of the sampling
distributions as being fixed. In many applications, this is a questionable assumption.
In the Bayesian approach, the parameters of the sampling distribution are treated not
as fixed, but as random variables. With regard to the exponential distribution
f(x) =t ex/e, the mean time to failure (J is thus treated as itself being associated
with a probability distribution. Expressing the exponential in terms of the failure
rate A= l/e, we have f(x) = Ae
AX
• The failure rate is thus also associated with a
probability (because of the relation A= l/e, the distribution of Ais given
by the distribution for (J and vice versa). From here on A and () will denote the
random variables associated with the parameters Aand ().
Let the pdf for A be denoted by peA). The pdf peA) is known as the prior
distribution and it represents our prior knowledge of A before a given sample is
taken. Assume now that a given sample (t
1
, t
2
, ... , t
n
) of component failure times
is collected. We then talk about the posterior distribution of Awhich represents our
updated knowledge of the distribution of A with the additional sample data
incorporated.
The posterior distribution of A, whose pdf we shall denote by p(AID), is easily
obtained by applying Bayes theorem* (the symbol "D" denotes the data sample, e.g.,
(t l' t
2
, ... , t
n
)). Now Bayes theorem says
P(A/B) PCB)
P(BIA) = PCB)
B
Letting "A" denote the data sample "D," and "B" the event that the failure rate lies
between Aand A+dA, we have
exp [t At
i
] An peA)
1=1
p(AID)=
exp [t At
i
]' An peA) dA
1=1
*See Chapter VI, Section 9.
X40 FAULT TREE HANDBOOK
Here we have replaced the summation sign by the integral sign. Because the
denominator does not involve A(it is integrated out) we can write the above equation
as
p(AID) = K exp [t At
i
] An peA)
1=1
where K is treated as a normalizing constant. The distribution p(AID) is the posterior
distribution of Anow incorporating now both our prior knowledge and the observed
data sample.
Bayes theorem thus gives a formal way of updating information about the failure
rate A(i.e., going from peA) to p(AID». If a second sample D' were collected (say ti,
t
2
, ... , t ~ ) then the distribution of X could be updated to incorporate both sets of
data. If p(XID,D') represents the posterior distribution of Xbased on both sets of
data D and D' then we simply use the above equation with p(AID) now as our prior
giving
p(XID, D') = K exp [t At
i
] An p(AID)
1=1
Choices for the initial prior peA) as well as techniques for handling various kinds
of data are described in detail in various texts (see references [24] and [30]). In
Bayesian approaches, the pdfs obtained for the parameters (such as p(A/D» give
detailed information about the possible variability and uncertainty in these
parameters. We can obtain point values, such as the most likely value of Xor the
mean value of A. We can also obtain interval values, which are probability intervals
and are sometimes called Bayesian confidence intervals. For example, having
determined p(XID) we can then determine lower and upper 95% values of AL and AU'
such that there is a 95% probability that the failure rate lies between these values,
Le.,
f
AU
p(A/D) dA = 0.95.
XL
Other bounds and other point values can be obtained in the Bayesian approach
because the parameter distribution (e.g., p(XID» is entirely known and this
distribution represents our knowledge about the parameter. The Bayesian approach
has the advantage that engineering experience and general knowledge, as well as
"pure" statistical data, can be factored into the prior distribution (and hence
posterior distribution). Once distributions are obtained for each relevant component
parameter, such as the component failure rate, then the distribution is straight
forwardly obtained for any system parameter quantified in the fault tree, such as the
system unavailability, reliability, or mean time to failure. One must be very careful in
determining the priors which truly represent the analyst's knowledge and in
ascertaining the impact of different priors if they are all potentially applicable.
Bayesian approaches are further discussed in reference [24].
CHAPTER XI  FAULT TREE EVALUATION TECHNIQUES
1. Introduction
This chapter describes the techniques which form the bases for manual and
automated fault tree evaluations and discusses basic results obtained from these
evaluations. Once a fault tree is constructed it can be evaluated to obtain qualitative
and/or quantitative results. For simpler trees the evaluations can be performed
manually; for complex trees computer codes will be required. Chapter XII discusses
computer codes which are available for fault tree evaluations.
Two types of results are obtainable in a fault tree evaluation: qualitative results
and quantitative results. Qualitative results include: (a) the minimal cut sets of the
fault tree, (b) qualitative component importances, and (c) minimal cut sets
potentially susceptible to common cause (common mode) failures. As previously
discussed, the minimal cut sets give all the unique combinations of component
failures that cause system failure. The qualitative importances give a "qualitative
ranking" on each component with regard to its contribution to system failure. The
common cause/common mode evaluations identify those minimal cut sets consisting
of multiple components which, because of a common susceptibility, can all
potentially fail due to a single failure cause.
The quantitative results obtained from the evaluation include: (a) absolute
probabilities, (b) quantitative importances of components and minimal cut sets, and
(c) sensitivity and relative probability evaluations. The quantitative importances give
the percentage of time that system failure is caused by a particular minimal" cut set or
a particular component failure. The sensitivity and relative probability evaluations
determine the effects of changing maintenance and checking times, implementing
design modifications, and changing component reliabilities. Also included in the
sensitivity evaluations are error analyses to determine the effects of uncertainties in
failure rate data.
Listed below is a summary of the type of results obtained from a fault tree
evaluation. In the follOWing sections we discuss fault tree evaluation in more detail.
Classes of Results Obtained From a Fault Tree Evaluation:
c. Common cause potentials:
a. Minimal cut sets:
b. Qualitative importances:
Qualitative Results
Combinations of component failures causing
system failure
Qualitative rankings of contributions to
system failure
Minimal cut sets potentially susceptible to a
single failure cause
Quantitative Results
a. Numerical probabilities:
b. Quantitative importances:
Probabilities of system and cut set failures
Quantitative rankings of contributions to
system failure
XII
XI2
c. Sensitivity evaluations:
2. Qualitative Evaluations
FAULT TREE HANDBOOK
Effects of changes in models and data, error
determinations
For the qualitative evaluations, the minimal cut sets are obtained by Boolean
reductIon of the fault tree as previously described in Chapter VII, section 4.
Additional examples of minimal cut set determinations are given in this section to
further familiarize the reader with Boolean reduction techniques. The minimal cut
sets obtained are used not only in the subsequent qualitative evaluations but in all
the quantitative evaluations as well.
(a) Minimal Cut Set Determinations
Because the minimal cut sets form the bases for all types of evaluations considered
here, we shall briefly review the determination of minimal cut sets from the tree. In
general, as stated in Section 4 of Chapter VII, our goal is to obtain the top event in
the minimal cut set form
The minimal cut sets M
j
consists of combinations of primary failures (component
failures), e.g., M
j
= C
1
C
2
C
3
, which are the smallest combinations of primary failures
that cause system failure. The substitution can be either a topdown substitution or a
bottomup substitution as previously described. Most computer algorithms for
determining minimal cut sets are based on these principles. (Computer codes are
discussed in Chapter XII.)
Let us now consider the pressure tank fault tree of Figure XI·I. Figure XI} is
similar to the detailed pressure tank fault tree constructed in Chapter VIII and shown
in Figure VIII13. In Chapter VIII, minimal cut sets were determined for a reduced
version of the tree. We will determine here the minimal cut sets of the detailed tree as
a first type of qualitative and/or quantitative evaluation.
In the fault tree in Figure XI} we designate primary failures by PI' P
2
, ... in
circles; secondary failures by SI' S2 ... in diamonds; undeveloped events by E1> E
2
... in diamonds; and all higher fault events by G
I
, G
2
... except for the top event
which we designate T.
The set of Boolean equations equivalent to the fault tree of Figure XI} is:
T = PI +Sl +G
I
G
1
= E
1
+G
2
G
2
=P
Z
+Sz +03
G
3
= G
4
• G
s
G
4
= G
6
+G
7
G
s
= P
3
+S3 +E
3
G
6
= P4 +S4 +E
4
G
7
= P5 +Ss +Gg
Gg = P6 +S6 +E
6
.
Note that we have one equation for each gate in the tree.
FAULT TREE EVALUATION TECHNIQUES
XI3
Figure IXI. Pressure Tank Fault Tree
XI4 FAULT TREE HANDBOOK
We will use the bottomup procedure and write each gate equation in terms of
primary events (the P's, E's, and S's) by substituting for the G's. Using the
distributive law and the law of absorption we will then transform each gate equation
to minimal cut set form. G
S
is already in minimal cut set form. G
7
contains only the
higher fault G
s
and so substituting it becomes
which is now in minimal cut set form. Both G
6
and G
s
are already in minimal cut set
form. G
4
contains G
6
and G
7
, which are alieady in minimal cut set form, and thus
becomes
Substituting for G
4
and G
S
in the equation for G
3
we have
Using the distributive law we now express G
3
in its expanded form:
G
3
= (P4 • P
3
) + (P4 • 8
3
) + (P4 • E
3
)
+ (8
4
• P
3
) + (S4 '8
3
) + (8
4
• E
3
)
+(E
4
' P
3
)+(E
4
' 8
3
)+(E
4
• E
3
)
+(Ps' P
3
)+(ps' 8
3
)+(ps • E
3
)
+(Ss • P3) +(8
s
• S3) +(8
s
• E
3
)
+(P6· P
3
)+(P6' S3)+(P6 • E
3
)
+ (S6 • P
3
) + (8
6
• 8
3
) + (S6 • E
3
)
+ (E
6
• P
3
) + (E
6
• 8
3
) + (E6 • E
3
)·
G
1
= E
1
+ P
2
+ S2 + G
3
sothatT=P
1
+SI +E
1
+P
2
+S2 +G
3
·
Substituting the expanded form of G
3
into this last equation, we fmally have T in
correct minimal cut set form.
The top event, or system model, thus contains:
5 single component minimum cut sets, and
24 double component minimum cut sets.
FAULT TREE EVALUATION TECHNIQUES XIS
We note that if the S's denote secondary failures, then we do not really need to
explicitly show them. All we need to do is to redefine the primary failures P to
denote any type of cause and we can, if we wish, separate the causes in the
quantitative analyses. By deleting the S's in the fault tree, we delete all the cut sets
containing S. We now have
3 single component minimal cut sets, and
10 double component minimal cut sets.
With larger trees, the savings in number of minimal cut sets will be even greater if we
do not explicitly show the secondary failures.
(b) Qualitative Importances
After obtaining the minimal cut sets, some idea of failure importances can be ob
tained by ordering the minimal cut sets according to their size. The singlecomponent
minimal cut sets (if any) are listed first, then the doublecomponent minimal cut sets,
then the triple, etc. Computer codes usually list the minimal cut sets in this order.
Because required computer times can increase dramatically as the size of the mini
mal cut sets increase, it is often the practice to obtain only the single, double, and
perhaps triplecomponent minimal cut sets. As an additional calculation, higher order
minimal cut sets (quadruples, etc.) can also be obtained on a selective basis if they
show potential susceptibility to common cause failures (discussed further in the next
section).
Because the failure probabilities associated with the minimal cut sets often de
crease by orders of magnitude as the size of the cut set increases, the ranking accord
ing to size gives a gross indication of the importance of the minimal cut set. For
example, if individual component failure probabilities are of the order of 10 3, a
singlecomponent cut set probability will be o(the order of 10 3, and a double cut
set 10 6, a triple 10 9, etc. Component failure probabilities are in general different
and depend on testing intervals, downtimes, etc.; therefore the ranking of minimal
cut sets according to size gives only a general indication of importance.
The minimal cut set information can sometimes be used directly to check design
criteria. For example, if a design criterion states that no single component shall fail
the system, then this is equivalent to stating that the system shall contain no single
component minimal cut sets. The minimal cut sets can be checked to see if this
criterion is satisfied. Similar checks can be done on criteria that restrict specific types
of failures which may not individually fail a system (e.g., active components or
human errors).
(c) Common Cause Susceptibilities
The primary failures (component failures) on a fault tree do not necessarily have
to be independent. A single, more basic cause may result in multiple failures which
fail the system. For example, an operator may have miscalibrated all the sensors, or
one steam line break may cause all instrumentation to fail in a control panel.
Multiple failures which can fail the system and which can originate from a common
cause are termed common cause failures.
In evaluating a fault tree, we do not know which failures will be common cause
failures; however, we can indicate the susceptibility that compone!!t failures may
XI6 FAULT TREE HANDBOOK
have to a common initiating cause. Now by defmition, the top event ocCurs, i.e.,
system failure occurs, if all the primary failures in a minimal cut set occur. Therefore,
we are interested only in those common causes which can trigger all the primary
failures in a minimal cut set. A cause which does not trigger all the primary failures in
a minimal cut set will not by itself cause system failure.
To identify minimal cut sets which are susceptible to common cause failures we
can first define common cause categories, which are general areas that can cause
component dependence. Examples of common cause categories include manufac
turer, environment, energy sources (not explicitly shown in the tree), and humans.
The list below gives some example categories which might be considered in a
common cause susceptibility evaluation.
List of Common Cause Categories to be Evaluated
Manufacturer
Location
Seismic Susceptibility
Flood Susceptibility
Temperature
Humidity
Radiation
Wearout Susceptibility
Test Degradation
Maintenance Degradation
Operator Interactions
Energy Sources
Dirt or Contamination
For each common cause category, we then defme specific "elements." For
example, for the category "Manufacturer" the elements would be the particular
manufacturers involved which we might code as "Manufacturer I." "Manufacturer
2," etc. For the "Location" category, we might divide the plant into a given number
of physical locations which would be the elements. For the category "Seismic
Susceptibility," we might defme several sensitivity levels ranging from no sensitivity
to extreme sensitivity, and for more specificity we would defme acceleration ranges
where failure would likely occur.
Our next task in the common cause susceptibility evaluations involves component
coding. As part of the component name code or in associated component description
fields, for each component failure we denote the element of each category associated
with the component. The categories and elements can be indexed or keyed according
to any convenient coding system. For example, "MV2183" may denote manual
valve 2 ("MV2") having element 1 associated with category 1, having element 8
associated with category 2, and element 3 associated with category 3 ("183"). This
kind of naming can easily be coded on the fault tree and in any subsequent computer
input.
Having performed this coding, we can then identify the potentially susceptible
minimal cut sets among the collection of minimal cut sets determined for the fault
tree. The minimal cut sets which are potentially susceptible to common cause failures
FAULT TREE EVALUATION TECHNIQUES XI7
are those whose primary failures all have the same element of a given category.
Having identified the potentially susceptible minimal cut sets, we then need to finally
screen these cut sets to determine those which,may require further action. This final
screening may be based on past histories of common cause occurrences, some sort of
quantification analysis, and/or engineering judgement. This last step is the most
difficult and most time consuming. Chapter XII, section 6, discusses computer codes
which can perform the initial searches.
3. Quantitative Evaluations
Once the minimal cut sets are obtained, probability evaluations can be performed
if quantitative results are desired. The quantitative evaluations are most easily
performed in a sequential manner, first determining the component failure
probabilities, then the minimal cut set probabilities, and fmally the system, i.e., top
event, probability. Quantitative measures of the importance of each cut set and of
each component can also be obtained in this process.
If the failure rates are treated as random variables, then random variable
propagation techniques can be used to estimate the variabilities in system results
which result from the failure rate variations. We first discuss the usual "point
estimations" where one value is assigned to each failure rate and where one value is
obtained for each minimal cut set probability and for the system probability.
Afterwards, we will discuss random variable analyses.
(a) Component Failure Probability Models
By "component" we mean any basic primary event shown on the fault tree
(circle, diamond, etc.). For any component, we consider only failure probability
models for which either a constant failure rate per hour applies or a constant failure
rate per cycle applies. In using these constant failure rate models, we ignore any
timedependent effects such as component burnin and component wearout. The
constant failure rate models which we discuss are usually used for order of magnitude
results. Where timedependent effects such as bumin or wearout are important or
where precision better than, say, a factor of lOis required, then more sophisticated
models are required. These more sophisticated models include, for example, Weibull
and Gamma failure distribution modeling; the reader is referred to [12] and [17] for
discussions of these other models.
(b) Constant Failure Rate Per Hour Model: Probability Distributions
Consider first a component whose failures are modeled as having a constant failure
rate per hour. Let the constant failure rate per hour be denoted by A. When we use
the constant failure rate per hour model, which we shall simply call the Amodel,
then we are assuming that failure probabilities are directly related to component
exposure times. The longer the exposure time period, the higher the probability of
failure. The failure causes can be human error, test and maintenance, or
environmental, such as contamination, corrosion, etc. The Amodel is the model most
often used in fault tree quantifications.
For the Amodel, the resulting first failure probability distributions are exponen
tial distributions. We shall review the distribution properties to refresh the reader's
XI8 FAULT TREE HANDBOOK
memory. The probability F(t) that the component suffers its first failure within time
period t, given it is initially working, is:
F(t) = I  e
At
. (XII)
The quantity F(t) is the cumulative probability distribution as discussed in
section 3 of Chapter X. In reliability terminology, F(t) is called the component
unreliability. The complementary quantity to F(t) is the quantity I  F(t), which is
the probability of no failure in time period t (given the component is initially
working):
IF(t) =e
At
. (XI2)
In statistical terminology, the quantity I  F(t) is called the complementary
cumulative probability. In reliability terminology, I  F(t) is the component
reliability and is denoted by R(t):
R(t) =1 F(t). (XI3)
The density function denoted by f(t) is the derivative of F(t); the quantity f ( t ) ~ t ,
where the interval ~ t approaches zero, is the probability that the component does
not fail in time period t but then does fail In some small interval ~ t about 1. As part
of the defmition of f ( t ) ~ t , we again assume the component is initially working at the
beginning of the period. This "initially working" assumption is applied to all the
calculations, and we will not explicitly state it in the forthcoming discussions.
For the exponential distribution, the density function f(t) is:
f(t) =Ae At.
(XI4)
The constant failure rate per hour model is so named because a formal calculation
of the timedependent failure rate A(t) simply gives the constant value A. Estimates of
the constant failure rate A are given for a variety of components in various data
sources. The analyst needs to obtain a value of Afor each component failure on his
fault tree for which the constant failure rate model is applied. Table XII gives some
representative failure rates for various component failures; the data are taken from
WASHI 400 (reference [38]).
FAULT TREE EVALUATION TECHNIQUES
Table XII. Representative Failure Rates from WASH1400
XI9
...
Z
w.
a:
a:
Cl
::Ez
w
w
Z
;t ",::>
wCl o "'0
lllw
<t:li
Failure to Operate 3,1()"4/l'J lx 1()"4  lx '0.
3
CLUTCH
ELEC
Premature Open lx 10.6 HR lit 10 7 lx 10. 5
Failure to Open 3,1()"7/HR
3>10
8
3"0
6
CLUTCH
MECH
Failure to Operate
3,10.
4
/0 lx 10.
4
lx 10 3
SCRAM Failure to Insert
1x10·
4
,D 3,10' .3,10
4
RODS
(Single Rod I
Failure to Start
3, ,04
,
D
1,10.
4
1.. 10 J
ELECTRIC
MOTORS
Failure to Run lx 10'/HP. 3,10. 6
Failure to Run
lx10 3,HR 1,10
4
"'0.
2
(Extreme ENVIR 1
Failure to lx10
4
·0
3,10 '
],10
4
EnergIze
FI.hne NO Contact
3x:l0 7 HR 110.10
'
1.. 10
6
to Close
RELAYS
5hort AcrolS
1)( 10.8 M'R 1.10
9 ,,, 10 1
NO/NC Contact
Open NC Contact 1xl0 7 HA
3,'0
8
3,10
7
Limit: Failure to Operate 3,10
4
:D hlQ 4 1l( 10
3
Torque: Fill to OPER 1x 10
4
D
3,10 ,. 3.. 10 4
'"
w
:I:
Cl
U
0
t: Pressure Fill to OPEA
11( 10
4
0 3>10'· 3",10
4
::E
;t
w
'"
a:
::>
Manual, Fill to TRANS 1x 10" 'D 3.. 10
6
J, 10'
...J
Contacts Short hl0
8
1(R 11110 9 11(10
7
Failure to
lx 10
3
D
]J( 10 4 _
3,10
3
CIRCUIT
Operate
BREAKERS
",0·
6
/HR 3"0
7
3,'0
6 Prlm.ture
Transfer
Pr.mlture, Open lx 1O·
6
'HR 3x 10· 7311 iO·
6
FUSES
Filiure to Open
,,105/0 3,,0
6
3,10
5
Open 3, ,06/HR hl0
6
. hlO·
5
WIRES Short to GND
3,,0
7
/HR 3,,0
8
. 3,,0
6
Short to PI'!R
1,,0.8 HR ,"0·9_ 1x '0 7
Open CKT 1,10·
6
'HR 3"0 7 3111O·
6
TRANSFORMERS
Short ",0
6
. HR 3"0
7
 3"0
6
Falls to Function 3,10
6
/HR 3,,0
7
3, 10"
HI PWR Application
SOLID
Shorts "10
6
/HR
hlO·
7
h10
5
STATE
DEVICES
" 1O·
6
'HR
1x 10
7
lx 10.
5
Fails To Function
low PWR Application
Shorts "1O·
7
/HR
",08_ lx 10.6
...
Z
a: a:C
w wZ
;t t::>
wcl
g",::>5l
Failura To Start
"10
3
/0 3,,04 3,10
3
Pumps
Failur. To Run
3,,05/HR 3x 106·3, 104
Normal
Faitur. To Run
"103/HR "104."10
2
Extrlml ENV
To Operata
" 103/D
3x 10
4
.3, '0.
3
IPlug)
Val_: Failure To
1x 10
4
/0
3,10.5.3,10
4
MOV_ Remain Open
Ext.,...1 Leek Or Rupture
"loB/HR "10.
9
1,10
7
Val_
Fills To Operata
",0
3
/D
3,10
4
.3,10. 3
(SOV)
Fails To Operate
3,104
10 ",0
4
",0
3
Val_
(Plug)
lxl0
41
D 3,10.
5
3,10
4
(AOV)
Failure·To
Rlmlin Open
Ext.,...ll.kRupture
",0·
8
/HR 1xl0
9
1xl0
7
Failure To Open
" 10
4
/0
3,10 5 3,10 4
Val.,.
Rev«se leek
3,10' '/HR
1x'07 1x 10.
6
(ChIct<)
Ext.n11llukRupture
"10·
8
,HR lx 10
9
1x 10
7
Failure To Operate
3,10
5
/D ','0
5
",04
Va'"
(Vacuum)
Ruptur.
" 10·
8
/HR ",0
9
1x10
7
'"
w
cl
V.lves: Onfices,
0
" 1O·
8
'HR
lxl0
9
,,'0
7
IE
Met.s,
Rupture
w
(Tilt)
'"
Val_
Failure To
::>
Remain Open
" 10.
4
/0
3,10
5
3"0
4
...J
(Manual)
«
(Plug)
...
Fall To Open/D
",0 '/0
3,10.6 3,,0
5
Va'"
(Ralief)
Premature Open/HR
" 1O·
5
,HR
3,10.6 3,,0.
5
PipoV 3"
Rupture
lx 1O.
1O
/HR 3,'0
,2
3,10.
9
HI Quahly (SectIon)
Rupture
"1O·
9
/HR
3,10. 1' 3,10
8
Pipas< 3"
Gaskltl
L.k
3'l06/HR
"10
7
1x1O
4
Laak/Rupturo 3,lO·
7
/HR "10.
8
."10
5
E.:
Wakll Laak 3xl0
9
/HR
1,10·
lO
lxl0
7
Failure to
3,10
2
/0 "'0.
2
."10
'
Start Od'
0_(Com.,...a Plant)
(E...._
F8ilurlto
3,10
3
/HA
3,104.3,10.
2
Loechl Run
0_1 (Engi... Only)
Failure to Run
3x,04/HR 3,10
5
.3,10
3
Batt.ies
Pow. NO/Output
3,10
6
/HR "108."10
5
Supplies
"
FaUura to
"108IHR "10
7
."'0
5
InstrurMntatton
(Amplific8tion.
Annuciaton. Shift
3,,06/HR
3,108·3,104
T....... c.librltion.
Comb_)
XItO FAULT TREE HANDBOOK
Extreme precision is not required (and is not believed!) in a fault tree evaluation;
it is the order of magnitude size of the failure rate that is of concern, i.e., is the
failure rate 10 6 per hour or 105 per hour. To this "order of magnitude" precision,
detailed environments and detailed component specifications are often not important
in obtaining gross estimates of the failure rate. The analyst, however, should of
course use all the available information in obtaining as precise an estimate as he can
for Afor each component or basic event on his tree.
Because extreme precision is not required in a fault tree evaluation, the
exponential distribution can be approximated by its fust order term to simplify the
calculations. The cumulative exponential distribution, i.e., the component unreli
ability, is thus approximated as:
F(t) == At. (XIS)
The above approximation is accurate to within 5% for failure probabilities (F(t»
less than 0.1 and the slight error made is on the conservative side. Furthermore, the
approximation error is small compared to uncertainties in A.
The failure rate Aused in the Amodel can be either a standby failure rate or an
operating failure rate; data sources give both types of rates. If Ais a standby failure
rate, then the time period t used in Equation (XIS) should be the standby time t,
Le., the time period during which the component is "ready" for actual operation.
For these standby situations, F(t) is the probability that the failure will occur in
standby. If Ais an operating failure rate, then t is the actual operating time period
and F(t) is the probability that the failure will occur in operation. Many components
will have both a standby failure rate and an operating failure rate; for example, a
pump will have a standby failure rate when it is not operating and will have an
operating failure rate when it is operating. The analyst must ensure that the proper
failure rate is used with the proper time period. For a standby and an operating phase
the total failure probability is
where the subscript "s" denotes the standby phase and the subscript "0" the
operating phase. For small probabilities (less than, say, 0.1) we can thus simply sum
the failure probabilities.
(c) The Constant Failure Rate Per Hour Model: Reliability Characteris
tics
As stated in the previous section, the quantity R(t) = I  F(t) is the probability of
no failure in time t and is termed the component reliability. The component
unreliability F(t) is the probability of at least one failure in time period t. It is also
the probability of first failure in time t. These definitions of F(t) incorporate the
pOSSibility of more than one failure occurring if the component failures are repairable
(the reason for the phrase "at least one failure"). If the component failure is not
repairable, then at most one failure will occur.
FAULT TREE EVALUATION TECHNIQUES XIll
When we say failures are repairable, we mean that the component is repaired or
replaced when it fails. The repair or replacement need not begin immediately after
failure, and when begun will require some time to complete. The repair or
replacement operation can be characterized by the downtime of the component
denoted by d, which is the total period of time the component is down and
unavailable for operation. For a standby component d is the downtime during which
a demand on the component may occur. If the plant is shutdown sometime after the
failure occurs then d is only that period of online downtime in which the component
could still be "required to operate (to respond to, say, an accident).
The cumulative distribution for the downtime denoted by G(d) is defined as
follows:
G(d) = the probability that the downtime period will be less than d. (XI6)
The cumulative distribution G(d) is obtained from experience data on repairs/
replacements and completely defines the repair/replacement process for quantitative
evaluations.
Let q(t) be the component unavailability and be dermed as:
q(t) = the probability that the component is down at time t
and unable to operate if called on. (XI7)
1  q(t) is the component availability and is the probability that the component is
up and able to operate were it called on.
If component failures are not repairable, then the component will be down at
time t if and only if it has failed in time 1. Consequently, for nonrepairable failures,
where the component is up at t = 0, the unavailability q(t) is equal to the
unreliability F(t):
q(t) = F(t); nonrepairable failures. (XI8)
For the exponential distribution, the unavailability q(t) is thus simply given by
the approximation:
q(t) ~ At. (XI·9)
For nonrepairable failures, the constant failure rate Ais thus all that is needed to
calculate the basic component characteristics F(t) and q(t) which are used in a fault
tree evaluation.
For repairable failures, the component unavailability q(t) is not equal to the
unreliability and we need information on the repair process* to calculate q(t).
We will assume that the repair restores the component to a state where it is
essentially as good as new. This assumption is optimistic but is usually made. The
effects of test inefficiencies can be investigated by more sophisticated analyses. (For
other treatments see references [37] and [41].)
*From here on, we will speak only of "repair" but we will mean repair or replacement.
XI12
FAULT TREE H ~ B O O K
For repairable failures, we consider two cases: (1) when failures are monitored,
and (2) when failures are not detectable until a periodic surveillance test is
performed. For the monitored case, when a failure occurs an alarm, annunciator,
light, or some other signal alerts the operator. In this case, the unavailability q(t)
quickly reaches a constant asymptotic value qM which is given by:
q = XTD
M 
1+ XT
D
=:: XT
D
·
(XIlO)
(XIll)
The failure rate Xis the standby failure rate and the quantity TD is the average
online downtime obtained by statistically averaging the downtime distribution
(described by the cumulative distribution G(d». The downtime which is evaluated is
again the online downtime during which the system is up and the component may be
called on (for example, to respond to an accident situation). For simplified
evaluations, the downtimes can often be broken into several discrete values with
associated probabilities and a statistical average taken over the discrete values. The
approximation given by Equation (XI· 11) is conservative and is within 10% accuracy
for XT
D
< 0.1.
For components which are not monitored but which are periodically tested, any
failures occurring are not detectable until the test is performed. This is the situation,
for example, when surveillance tests are performed monthly; any failure which
occurred during the past month would be detected only when the test is performed.
(We assume perfect testing here, in that essentially 100% of the failure modes are
detected.)
For periodic tests performed at intervals of T, the unavailability rises from a low
of q(t = 0) =0 immediately after a test is performed to a high value of q(t =T) =
I  e?\ T :: XT immediately before the next test is performed. Because the
exponential can be approximated by a linear function (for XT < 0.1 say) the average
unavailability between tests is approximately XT/2. The average value is applicable
for fault tree evaluations if we assume that a demand on the component may occur
uniformly at any time in the interval.
If the component is found failed at a surveillance test, then it will remain down
during the necessary repair time. Considering this additional repair contribution, we
have the following expression for the total average unavailability qT for periodically
tested components:
(XI12)
In the above formula, Xis again the standby failure rate per hour and TR is the
average repair time obtained from downtime considerations. The repair time
evaluated is again the online repair time during which the component may be called
on to function. The subscript "R" is used on TR to denote that it is the average
repair time and not the total downtime which is the sum of the repair time plus the
undetected downtime from time of failure to time of detection.
FAULT TREE EVALUATION TECHNIQUES XII 3
In general, TR is small compared to T and the second term on the right hand side
of Equation (XII 2) is negligible; we thus have:
(XI13)
For repairable failures, the unavailability is thus given by qM or qT depending on
whether monitoring exists or periodic testing is performed with no monitoring
between tests. (If monitoring is performed, qM applies regardless of whether any
additional periodic testing is performed.) For each repairable component of the tree,
X and TD (monitored) or X, TR' and T (periodically tested) are required as data
inputs. Failure rate data sources supply X and operating specifications for the
component are sources for T
R
, T, and T
D
.
In addition to the component unavailability, there is one component reliability
characteristic which is of interest when an operational system is being evaluated. This
component characteristic is the component failure occurrence rate wet) and is
defined such that:
w(t),M =the probability that the component fails in time
t to t +.11. (XII4)
In the definition of wet), we are not given that the component has operated
without failure to time t as was the case for the failure rate definition X(t) (see
Section 8 of Chapter X). In fact, if the component is repairable, then it can have
failed many times previously; the quantity w(t).1t is the probability that it fails in
time t to t +.1t irrespective of history.
The occurrence rate w(t) is applicable for both nonrepairable and repairable
components. For both repairable and nonrepairable components, the expected
number of failures in some time interval t
1
to t
2
, denoted by n(t
1
, t
2
), is given by
the integral of wet) from t
1
to t
2
:
(XIIS)
For nonrepairable component failures, the component can only fail once.
Therefore, wet) is equal to the probability density function for first failure:
wet) =f(t)
=Xe Xt
(XII 6)
(XI·I?)
where Equation (XII?) is for the constant failure rate model (the Xmodel).
For time t small compared to I/X (such that Xt <0.1), eXt is approximately 1,
and hence Equation (XII?) becomes,
w(t):::: X, Xt <.1. (XII8)
XI·14
FAULt TREE HANDBOOK
For repairable failures, wet) can be a complex function of time; however, it
approaches A as time progresses, and this asymptotic value of Xis generally precise
enough for applications:
wet) ~ A. (XI19)
Hence for both nonrepairable and repairable failures, wet) = A is generally a
reasonable approximation. (Some of the computer codes discussed in the next
chapter can compute the time dependent values ofw(t).)
(d) Reliability Characteristics for the Constant Failure Rate Per Cycle
Model
Instead of modeling component failures as having a constant failure rate per hour,
we can use a constant failure per cycle model. In the constant failure rate per cycle
model, the component is assumed to have a constant probability of failing when it is
called on (i.e., when it is "cycled"). This probability of failing per cycle, which we
denote by p, is independent of any exposure time interval, such as the time interval
between the test or the time that the component has existed in standby.
The constant failure rate per cycle model, which we shall simply call the pmodel,
is applied when failures are inherent to the component and are not caused by
"external" mechanisms which are associated with exposure time. For cyclic failures,
the cycling of the component may actually cause the failure (because of stress, etc.).
For example, a component which is obtained from a manufacturer and immediately
placed in the field may be modeled as having a certain failure probability p due to
manufacturing defects. After preoperational testing (Le., burnin testing), many of
the inherent component failures would be detected and the failures might then be
best modeled by the Amodel, Le., the constant failure rate per hour model.
In past practice, the pmodel has been applied to relatively few components,
whereas the Xmodel, the constant failure rate per hour model, has been applied to
the majority of components. The analyst must decide which component model, the
pmodel or the Amodel, is most applicable for his analyses. Failure rate data sources
sometimes indicate the appropriate models; otherwise, the analyst must decide based
on knowledge of the failure causes and mechanisms.
The reliability characteristics for the pmodel are straightforward and are all based
on the one characterizing value p, which is the probability of failure per cycle, or per
demand. We again use the definition of component unreliability, F(t), and
component unavailability, q(t), as given by Equations (XI8) through (XIll). For n
demands in time t and assuming independent failures, the reliability (R
e
) and the
unavailability (qc) are given by:
or
1R ~ =q" ~ np, np <0.1.
(The subscript c in R
c
and qe denotes "cyclic" values.)
(XI20)
(YI./n
FAULT TREE EVALUATION TECHNIQUES XIIS
As noted in the above equations, the reliability and unavailability do not depend
explicitly on time but on the number of cycles (demands) occurring in that time. For
one demand (n = 1), we note that 1  R
c
,= qc = p. For each component on the fault
tree modeled by the pmodel, the user must obtain the appropriate value of p and the
number of demands (usually one).
(e) Minimal Cut Set Reliability Characteristics
Once the component reliability characteristics are obtained, the reliability
characteristics for the minimal cut sets can be evaluated. For a fault tree of a standby
system, such as a nuclear safety system, the characteristic of principal concern is the
minimal cut set unavailability denoted by Q:
Q(t) = the probability that all the components in the minimal
cut set are down at time t and unable to operate. (XI22)
Because a minimal cut set can be viewed as being a particular failure mode of the
system we can also define Q as:
Q(t) = the probability that the system is down at time t
due to the particular minimal.cut set. (XI23)
We can thus also call Q(t) the system unavailability due to a minimal cut set.
We can index each minimal cut set of the fault tree in any way we choose, and
then Qi(t) would be the unavailability for minimal cut set L To determine Qi(t) we
note that, by its definition, the minimal cut set is an intersection of the associated
component failures; the minimal cut set occurs if and only if all the component
failures occur. Assuming the component failures are independent, recall from
Chapter VII (Equation VII3) that the probability of an intersection (Le., an AND
gate) is simply the product of the component probabilities. Hence:
(XI24)
where ql (t), q2(t), etc., are the unavailabilities of the component in the particular
minimal cut set and ni is the number of components in the cut set. As an example in
applying Equation (XI24), if a minimal cut set has two components, having
respective unavailabilities of 1 x 10
2
and 1 x 10 3, then the cut set unavailability
is
The component unavailabilities are those given in the previous sections; any
combinations of component unavailabilities can be used (e.g., one component can be
periodically tested and the other can have a cyclic failure rate, etc.). If the
components in the minimal cut set are all repairable or cyclic, then constant values
can be used for the component unavailabilities, e.g., (Equation (XIlO) or (XIH)), in
which we ignore any timedependent transient behaviors. For this completely
XI16 FAULT TREE HANDBOOK
repairable or cyclic case and within our approximations, the minimal cut set
unavailability is then simply a constant and independent of time.
If the fault tree is of an operational system, then instead of the unavailability, the
number of system failures and the probability of no system failure are of primary
interest. A minimal cut set characteristic giving information on these reijability
related concerns and one which is easily calculable is the minimal cut set occurrence
rate denoted by Wet). The minimal cut set occurrence rate Wet) is defmed such that:
= the probability that the minimal cut set
failure occurs in time t to time t + (XI25)
The quantity is a small increment of time. The occurrence rate itself, Wet), is
thus a probability per unit time of the minimal cut set failure occurring. Because a
minimal cut set can be considered a system failure mode we can equivalently define
to be:
=the probability that a system failure occurs in time
t to t + by the particular minimal cut set. (XI26)
If we index all the minimal cut sets of the tree, then Wlt) refers to the occurrence
rate of minimal cut set i.
To calculate Wi(t), we use the basic definition of a minimal cut set and the
concept of an "occurrence." A minimal cut set failure occurs at time t to t + if all
the components except one are down at time t and the other component then fails at
time t to t + Assuming independence of the component failures Wj(t) is thus
given by:
(XI27)
where the q(t)'s are the component unavailabilities and the w(t)'s are the component
occurrence rates (Equation (XI. 14». The first term on the right hand side of
Equation (XI27) is the probability that all components except component 1 are
down at time t and then component I fails. The second term is the probability that
component 2 is the component that fails in time t to t + all other components
being already down, etc. The contribution from each of the failing in
time t to t + is summed over the nj components in the cut set to obtain the total
FAULT TREE EVALUATION TECHNIQUES
XII'
occurrence rate given by Equation (XI27). The At's cancel in Equation (XI27),
giving:
(XI28)
The minimal cut set occurrence rate Wlt) is strictly applicable when all the
components have a per hour failure rate (the Xmodel). Cyclic components (the
pmodel), as previously stated, do not have any explicit timeassociated behaviors.
The above equation can be applied to minimal cut sets having cyclic components, if
we use for the cyclic components qc(t) ~ n(t)p and wc(t) ~ pk(t), where p is the
cyclic component failure probability, net) is the (expected) number of demands in
time t, and k(t) is the probability of a demand occurring per unit time at time t. (The
quantity k(t)At is thus the probability that a demand occurs in time t to t + At.) The
quantities net) and k(t) must be obtained from operational considerations for the
component.
The expected number N
i
(t
1
,t2) of failures of minimal cut set i occurring during
some time period from t
1
to t
2
is:
(XI29)
If the components in the minimal cut set are all repairable, and constant values are
used for the component unavailabilities and occurrence rates (ignoring any
transients), then Wi(t) is a constant, Wi(t) =Wi' In this constant value case N
i
(t
1
,t
2
)
is simply equal to the time interval multiplied by the constant minimal cut set
occurrence rate:
(XI30)
Because the system fails every time the minimal cut set fails (by the minimal cut
set definition), N
i
(t
1
,t
2
) is the expected number of times the system fails in time
period t
1
to t
2
, due to minimal cut set i. When all components are nonrepairable
then N
i
(t
1
,t2) is also the probability that the minimal cut set fails in the time period
t
1
to t2 (the expected number being equal to the probability in this case). Even when
components are repairable, if the system failure probability over the time period of
interest is small (say, less than 0.1), then N
i
(t
1
,t2) is less than 1. Even though
Nlt
1
,t2) is strictly the expected number of failures, it is also a reasonably good
approximation for the probability of the minimal cut set failing in time period t
1
to
t2' This approximation in the repairable case is conservative (the true probability is
XII 8
FAULT TREE HANDBOOK
less than N
i
(t
1
,t2)) and fairly accurate, generally agreeing to within 10% of the true
probability for Ni(tl ,t2) < 0.1.
Using our previous nomenclature, the probability of minimal cut set failure is the
minimal cut set unreliability and hence, using the above reasoning, when Ni(t
1
,t
2
) <
0.1:
NiCt1 ,t2) ~ minimal cut set unreliability in time t
1
to t2' (XI31)
In terms of system failure, when N
i
(t
1
h) < 0.1, then N
i
(t
1
,t
2
) is thus also
approximately the probability of system failure in time t
1
to t2 due to minimal cut
set i. We can thus say, within our approximation, that N
i
(t
1
,t2) is the system
unreliability due to minimal cut set i. When some components are repairable, the
exact minimal cut set and system unreliabilities are generally difficult to' calculate
and N
i
(t1,t2) consequently provides a useful and generally accurate approximation
which is simple to calculate and good enough for most applications.
For a fault tree evaluation, the minimal cut set unavailabilities, Qi(t), and the
minimal cut set occurrence rates, Wi(t), provide comprehensive information on th.e
probabilistic behavior of the minimal cut sets. If a standby system, such as a nuclear
safety system, is being evaluated, then only the cut set unavailability Qi(t) is usually
calculated. The minimal cut set characteristics need to be calculated for all the
dominant minimal cut sets of the tree. For small numbers of minimal cut sets, the
characteristics can be calculated for all the cut sets of the fault tree. For fault trees
having very large numbers of minimal cut sets, the minimal cut set characteristics are
generally only computed for the lower order cut sets, e.g., singlecomponent and
doublecomponent cut sets. Because component failures are assumed to be
independent, the values of Qi(t) and Wi(t) for higher order cut sets (e.g., triples and
higher) are generally negligible compared to those for the lower order cut sets (singles
and doubles). The independence assumption represents an optimal condition and the
true cut set characteristics, for doubles and up, may be quite higher than the
calculated values of Qi(t) and Wi(t) because of dependencies among the component
failures. If the fault tree has only double cut sets and higher, then the calculated
values for Qi(t) and Wi(t) represent design capability numbers useful principally for
relative evaluations; the actual achieved values of Qi(t) and Wi(t) may be quite higher
and will be much more difficult to estimate. (See for example reference [12] for
further considerations.)
(D System (Top Event) Reliability Characteristics
Once the minimal cut set characteristics are obtained, the determination of the
system characteristics is quite straightforward. The system unavailability denoted by
Qit) (the subscript "s" denoting "system") is defmed as:
Qs(t) =the probability that the system is down at time t
and unable to operate if called on.
(XI32)
For a standby system, such as a nuclear safety system, Qs(t) is the most critical
system charaO{eristic. If the top event of the fault tree is not a system failure but
FAULT TREE EVALUATION TECHNIQUES XI19
some general event, then Qs(t) is the probability that the top event exists at time t
(having occurred earlier and persisting to time t).
Now, the system is down if and only if anyone or more of the minimal cut sets is
down. If we ignore the possibility of two or more minimal cut sets being
simultaneously down, the system unavailability Qs(t) can thus be approximated as
the sum of the minimal cut set unavailabilities Qi(t):
N
Qs(t) =::: I: Qi(t)
i=l
(XI33)
where ~ denotes a summation of Qi(t) over the N minimal cut sets considered in the
tree.
Equation (XI.33), the socalled "rare event approximation," was first introduced
in Chapter VI. It generally gives results agreeing within 10% of the true unavailability
for Qs(t) < 0.1. Furthermore, any error made is on the conservative side in that the
true unavailability is slightly lower than that computed by Equation (XI33).
Equation (XI33) is usually used in fault tree evaluations; it is simple to calculate,
and it can be truncated at any value of N to consider only those cut sets contributing
most to Qs(t). If the component failures are all repairable or cyclic and constant
values used for the component unavailabilities, then Qs(t) is independent of time and
is simply a constant value Qs'
For online operational systems, the system failure occurrence rate Ws(t) is of
interest and is defined such that:
W S < t ) ~ t =the probability that the system fails in time
t to t +~ t . (XI34)
The occurrence rate itself, Ws(t), is the probability per unit time of system failure
at time t. (For any general top event, Wit) is the probability per unit time that the
top event occurs at time 1.)
The system failure occurs if and only if anyone or more of the minimal cut sets
occurs. The system failure occurrence rate then, Ws(t), can be expressed as the sum
of the minimal cut set occurrence rates Wi(t):
N
Ws(t) = I: Wi(t).
i=l
(XI3S)
Equation (XI3S) is another application of the "rare event approximation." It is
quite accurate for low probability events because for such events, the probability of
two or more minimal cut sets occurring simultaneously is negligible. Equation
(XI·3S) is again simple to evaluate and it can be truncated so that only the N
dominant minimal cut contributors are considered.
XI20
FAULT TREE HANDBOOK
If we use Wset), the expected number of system failures N
s
(tl ,t2) in time tl to t2
is:
(XI36)
As a particular use of the above formula, the expected number of system failures
in time t, Ns(t), is:
t
Ns(O,t) = f Wit')dt'.
o
(XI·3?)
If the component failures are all repairable or cyclic and constant values used for
the component unavailabilities, then Ws(t} is siinpiy a constant W
S
' and N
s
(t
1
t2} is
simply W
s
times the interval t2  tl.
Using the same rationale as for the minimal cut set N
i
(t
1
,t
2
), for N
s
(tl,t
2
) less
than 0.1, Nitt ,t2) is also a reasonably accurate approximation for the probability of
system failure in time t 1 to t
2
, which is the system unreliability:
(XI38)
The quantity Ns(O,t) is thus a reasonably accurate approximation of the system
unreliability in time period t.
The system unavailability Qs(t), the system failure occurrence rate Ws(t), and the
expected number of system failures Ns(t
1
,t
2
) give comprehensive information on the
probabilistic description of system failure. In using these results, the reader must
keep in mind the assumptions and limitations of the calculations, particularly the
assumption of independence of component failure occurrences. As discussed for the
minimal cut set characteristics (section e), if the fault tree has only double cut sets
and higher, the system results as calculated here may be very much below the true
values due to dependencies among the component failures. When these dependencies
exist to the extent that they significantly increase failure probabilities, then Qs(t),
Ws(t) and N
s
(t
1
,t
s
) represent optimal designbased numbers which are useful for
relative evaluations but are not useful for absolute evaluations.
(g) Minimal Cut Set and Component Importances
As an additional evaluation we describe a quantitative technique for determining
the "importance" of each minimal cut set and each component failure. We define the
minimal cut set importance to be the fraction of system failure probability
contributed by a particular minimal cut set. We derme the component importance to
be the fraction of system failure probability contributed by .the particular
component failure. Different formulas can be used to calculate the importances (see
reference [21] for a discussion of the different approaches); for our discussion here
we will use one of the simplest methods of calculating the importances.
The minimal cut set importance and the component importance can be calculated
with regard to the system unavailability, Qit), or the system failure occurrence rate,
FAULT TREE EVALUATION TECHNIQUES
XI21
Ws(t). The rule in either case is the same: to calculate the minimal cut set
importance, we take the ratio of the minimal cut set characteristic over the system
characteristic. For the component importance, we sum the characteristic of all
minimal cut sets containing the component and divide by the system characteristic.
Let Ej(t) be the importance of minimal cut set i at time t, and let ek (t) be the
importance of component k at time t (we have simply indexed the cut sets and
components for ease of identification). With regard to system unavailability, then:
and:
= fraction of system unavailability contributed by minimal
cut set i
= fraction of system unavailability contributed by
failure of component k.
(XI39)
(XI40)
(XI4I)
(XI42)
The symbol ~ in equation (XI4I) denotes a sum of Qj(t) over all those minimal
cut sets containing component k as one of its components. Because the system can
be down if and only if one or more of the cut sets is down, the sum of Qj(t) in
Equation (XI4I) is the probability that the system is down due to component
failure k being one of the causes. In terms of conditional probabilities, Elt) is
approximately the probability that the system is down due to minimal cut set i, given
the system is down. The quantity ek(t) is approximately the probability that the
system is down due to component k being one of the causes, given the system is
down. (The quantities are approximate because intersections of minimal cut sets are
ignored, i.e., the rare event approximation is used.)
When all components of the fault tree are repairable or cyclic, and constant values
are used for the component unavailabilities then the importances Ej(t) and ek(t) are
constant and independent of time: Ej(t) =E
j
and ek(t) =ek' The minimal cut set and
component importances thus can be ranked from largest to smallest without regard
to the time considered.
With regard to the system failure occurrence rate, the minimal cut set importance
/, ~
Ej(t) and the component importance ek(t) are:
= fraction of system failure occurrences at
time t contributed by minimal cut set i
(XI43)
(XI44)
XI22
and:
FAULT TREE HANDBOOK
=Fraction of system failure occurrences at time t
in which component k is one of the contributors.
(XI45)
(XI46)
The reasoning for the above two formulas is the same as used previously. With
regard to ~ ( t ) , component k is defined to be one of the contributors to system
failure at time t if it is either down at time t or if it fails at time t. If constant values
are used for all the component characteristics then ti(t) and edt) are again simple
constants which can be ranked from highest to lowest without regard to time.
For the convenience of the reader, Tables XI2 and XI3 summarize all the
formulas which have been presented for the evaluation of a fault tree.
fable XI2. Summary of Equations for Reliability Characteristics
ReqUired Data Unavailability Occurrence Rate Unreliability
Components
Amodel
Nonrepairable A q(t) = 1  e
At
'" At, At <0.1 wIt) = Ae
At
'" A, At <0.1 F(t) = 1  e
At
'" At, At <0.1
Repairable, monitored A, TO
q(t) =
ATO '" ATO' ATO <0.1
wIt) = A(asymptotic) same as above
1 + ATO
Repairable, periodically
q(t) = AT + ATR '" AI. TR <0.1 T
wIt) = A(asymptotic) same as above
tested A, T, TR
2
2
pmodel
Cyclic Component p, nll). kIt) qlt) = 1  (1  p)n(t) '" n(tlp, nlt)p <0.1 wit) = pklt) F(t) = same as q(t)
n'
Minimal Cut Sets OJ(t) = III
qk1t) where "j = the Wilt) = qZ(t) q3(t) ... qnj(t) W,{t) F(t) '" Njltl. t2); NI (t,. t2) <0.1
k=l number of + qilt) q3lt) ... qnjlt) wzlt)
components in + ql(t) qZll) ... qnilt) W31t)
the i
th
minimal
cut set
+ ql It) q2(t) ... qni l(t) wnilt)
N
N
System 0slt) = L: Ojlt) where N = the WsW =.L: W·(t) FIt) '" N
s
(tl. t2); Ns(t,. tZ) <0.1
j= ,
number of
1=1 I
minimal cut sets
where A = component failure rate per hour (operating or standbY,as applicable)
TO = average downtime per failure in hours
T =test interval In hours
TR = average repair time per failure in hours
p = probability of cyclic component failure per demand
nIt) = expected number of demands in time t
kIt) = cyclic component demand rete per hour at time t
XI24 FAULT TREE HANDBOOK
Table XI3. Summary of Equations for Quanitative Importance
L Wilt)
kin i
System Unavailability
System Failure Occurrence Rate
ith Cut Set Importance
A Wj(t)
Ei(tJ =
Ws(t)
kth Component Importance
ek(tl
= L OJ(t)
kin i
°s(t)
~ k ( t l =

(h) Sensitivity Evaluations and Uncertainty Analyses
In the previous sections, the computation of point estimates for the unavailability
and failure occurrence rate of the top event of a fault tree were described. In this
section we briefly take up the question of how to evaluate the sensitivity of these
estimates to variations or uncertainty in the component data or models.
Sensitivity studies are performed to assess the impact of variations or changes to
the component data or to the fault tree model. It is particularly convenient to assess
effects of component data variations using the formulas presented previously in this
chapter because they explicitly contain component failure rates, test intervals, and
repair times as variables. In sensitivity analyses different values may be assigned these
variables to determine the differences in any result. For example, if T is a periodic
testing interval, then the effects on system unavailability with regard to different
testing intervals can be studied by varying the T's for the components. This can entail
as simple a calculation as redoing the computations with different T's or as complex
as employing dynamic programming. likewise, failure rates (A) can be changed to
determine the effects of upgrading or downgrading component reliabilities.
As a type of sensitivity study, scopingtype evaluations can also be performed by
using a high failure rate and a low failure rate for a particular event on the tree. If the
system unavailability does not change significantly, then the event is not important
and no more attention need be paid it. If the system unavailability does change
significantly, then more precise data must be obtained or the event must be further
developed to more basic causes. A wide spectrum of sensitivity analyses can be
performed, depending on the needs of the engineer.
In judging the significance of an effect, it is important that the analyst take into
account the precision of his data. For example, although a factor of 2 variation in the
system unavailability might be very significant when failure rates are known to 3
significant figures, the same factor of 2 variation would probably not be significant
when failure rates are known only to an order of magnitude.
As a type of sensitivity evaluation, formal error analyses can be performed to
determine the error spread in any fmal result due to possible data uncertainties or
variabilities. The error spread obtained for the results gives the uncertainty or
variability associated with the result. The error analyses employ statistical or
probabilistic techniques, which are independent of the fault tree evaluation
techniques per se; the discussion therefore, will be short.
FAULT TREE EVALUAnON TECHNIQUES XJ25
A variety of error analysis methods can be used, and we will briefly explain the
approach when data are treated as random variables. * For the random variable
approach, the method most adaptable to a fault tree evaluation is the Monte Carlo
simulation technique. The Monte Carlo method can accommodate general distribu
tions, general sizes of the errors, and dependencies.
In the Monte Carlo method, the fault tree evaluations are repeated a number of
times (each repetition is called a "trial") using different data values (e.g., A's and
TR's) for each calculation. The variation in the data values is "simulated" by
randomly sampling from probability distribution functions which describe the
variability in the data. The probability distributions can be Bayesian prior
distributions on the parameters A, TR' etc., or can be distributions representing
planttoplant variation in the failure rates and other data. Each trial calculation will
give one value for the system result of interest such as the system unavailability or
occurrence rate. The whole set of repeated calculations will give a set of system
results from which an error spread is determined (e.g., picking the 5% largest value
and 95% largest value to represent the 90% range for the result).
The above method is completely analogous to repeating an experiment many
times to determine the error in the experimental value. The final error spread on the
result is the estimate of the result variability arising from the variability in the failure
rates and other data treated as random variables. (Section 2b of Chapter XII
describes several fault treeoriented computer codes for performing Monte Carlo
simulation.)
*When failure rates and other data are treated as constants, with uncertainties arising from the
variability of the estimators, then classical confidence bound approaches should be used. The
reader is referred to Mann, Schafer, and Singpurwalla (reference (30]) for further details.
CHAPTER XII  FAULT TREE EVALUATION COMPUTER CODES
1. Overview of Available Codes
This chapter covers the computer code methodology currently available for fault
tree analysis. The codes discussed are divided into five groups. The numbers in
brackets refer to references in the bibliography.
Group 1. Codes for Qualitative Analysis
PREP [42] 1970
ELRAFT [35] 1971
MOCUS [11] 1972
TREEL and MICSUP [29] 1975
ALLCUTS [39] 1975
SETS [46] 1974
FTAP [43] 1978
Group 2. Codes for Quantitative Analysis
KlTTI [42] 1969, KITT2 [40] 1970
SAMPLE [38] 1975, MOCARS [25] 1977, et al.
FRANTIC J41] 1977
Group 3. Direct Evaluation Codes
ARMM [26] 1965
SAFTE [13] 1968
GO [14] 1968, GO "Fault Finder" 1977
NOTED [45] 1971
PATREC [19] 1974, PATRECMC [20] 1977
BAM [34] 1975, WAMBAM [22] 1976,.WAMCUT [9] 1978
Group 4. A DualPurpose Code
PLMOD [28] 1977
Group 5. Common Cause Failure Analysis Codes
COMCAN [3] 1976
BACKFIRE [5] 1977
SETS [47] 1977
Group 1 consists of codes that perform the qualitative evaluation of a fault tree
(i.e., codes that compute minimal cut and/or path sets). The codes in Group 2
perform quantitative (probabilistic) analysis based on the structural information
XIIl
XII2
FAULT TREE HANDBOOK
embodied in the cut sets. The codes in Group 3 are designed to perform direct
numerical evaluation of a fault tree without computing cut sets as a necessary
intermediate step; however, many of them will generate cut sets on request as a
nonintegral part of the analysis. PLMOD, a dual purpose code that can be used both
for the qualitative and quantitative analysis of a fault tree constitutes Group 4, and,
finally, Group 5 contains the codes developed for use in common cause analysis. The
five groups of codes are described in the succeeding sections of this chapter.
2. Computer Codes for Qualitative Analysis of Fault Trees
This section deals with codes that compute the minimal cut (and/or path) sets of a
fault tree. The computation of the minimal cut sets is often referred to as the
qualitative evaluation of the fault tree because the results are based solely on the
structure of the tree and are independent of the probabilities associated with the
basic events. In contrast, the probabilistic assessment is called the quantitative
evaluation of the fault tree.
The division between qualitative and quantitative aspects develops naturally
because the probabilistic analysis often involves repeated evaluation of the tree (i.e.,
at different time points, using a distribution of failure or repair rates to perform
sensitivity or error analysis). Thus, it is often most efficient to perform the
time<onsuming structural analysis once, save the results in some convenient form
(usually minimal cut sets) and then use these results to quantify the tree using
different sets of data, as required. Other advantages afforded by the computation of
the minimal cut sets are (1) the minimal cut sets themselves provide much useful
information to the analyst, even in the absence of any quantitative data, because
they indicate the minimal sets of components whose failure will cause the system to
fail; (2) non<ontributing cut sets (usually based on cut set size) can be discarded
prior to quantification, thus increasing computational efficiency and reducing data
requirements; (3) the ability to compare the minimal cut sets with the original tree
provides a valuable error check; and (4) cut sets are required as part of the input to
the common cause analysis codes.
One disadvantage of the minimal cut set codes is that the storage and computer
time required to process even mediumsize trees can become quite prohibitive. This is
because the number of cut sets can increase exponentially with the number of gates
and can easily reach the millions or even billions of terms (e.g., one example tree
with 299 basic events and 324 gates, had in excess of 64 million cut sets). The
problem is complicated further because a simple count of events and gates is a very
poor indicator of the expected number of minimal cut sets, and even the number of
minimal cut sets may not be a good prediction of reqUired processing time. Thus it
is often difficult to predict the storage requirements and run time for a given tree.
Several methods can be used to overcome or at least alleviate the problems
associated with obtaining the minimal cut sets. The most common is to eliminate,
during the processing, cut sets whose size (number of events) is greater than some
prespecified number n. This is often very effective for trees having low order cut sets
which dominate the high order cut sets. In WASH·1400 [38] for example, only the
single and double event cut sets were retained for the independent failure
computations; the higher order cut sets were analyzed only for common mode and
FAULT TREE EVALUATION COMPUTER CODES X1I3
common cause failure potentials. Another similar approach is to reduce the tree
based directly on cut set probability instead of order. This requires the input of
component failure probabilities at the outset, however. Disadvantages of using a tree
reduction process are that (1) it is impossible to determine the total failure
probability discarded and (2) analysis of dependencies such as events dependent on
common causes requires separate evaluation of the higher order cut sets.
Other techniques used in some codes are efficient "packed" and/or bitlevel
storage schemes, use of auxiliary storage media during cut set processing, and
automatic tree decomposition schemes. The latter seems to be a promising method,
and is discussed further in later sections (see sections on SETS, FTAP, and PLMOD).
In the remainder of this section, we discuss the individual qualitative analysis
codes. PREP, discussed in section lea), was the first cut set code. It is included
mainly for background; its algorithm has been superseded by improved methods.
ELRAFT, MOCUS, MICSUP, ALLCUTS, SETS, and FTAP discussed in sections
l(b)(g), all use variations on the "topdown" and "bottomup" methods described
earlier in Chapter vn, section 4. SETS is somewhat different from the other codes in
that it provides a very general and flexible tool for manipulating the fault tree in the
form of its corresponding Boolean equations.
(a) PREP
The PREP and KITI codes [40] [42], written in FORTRAN IV for the IBM 360
computer and released in 1970, were the first computerized fault tree evaluation
codes. PREP is a minimal cut set (or path set) generator, and KITTI and KITT2
perform time dependent fault tree analysis in the context of Kinetic Tree Theory,
using the results from PREP. The KIIT codes will be discussed in the quantification
section.
PREP consists of two parts: PREPTREBIL and PREPMINSET. TREBIL (for
"tree build") takes the user's input description of a fault tree and builds a
FORTRAN subroutine of the Boolean equations for the tree. MINSET then uses the
TREE subroutine produced by TREBIL to find the tree's minimal cut and/or path
sets.
PREPMINSET has two options for cut set generation: COMBO and FATE.
COMBO systematically fails all single basic events, pairs of basic events, groups of
three basic events, etc., to determine which combinations cause the top event of the
fault tree to occur. The user determines the maximal size of the cut sets to be
computed (for low probability events, such as those found in nuclear power plant
fault trees, doubles or triples usually suffice). FATE incorporates quantitative data
on the component's reliability to fmd minimal cut sets which are "most likely" to
occur. It does this by performing Monte Carlo simulation.
The main disadvantage of PREP is that COMBO requires a prohibitive amount of
computer time for large order cut sets of large fault trees, whereas FATE is not
guaranteed to find all the minimal sets. Also, the input to PREP is limited to AND
and OR gates, so NOT gates, both explicit and implicit (e.g., exclusive OR gates), are
prohibited; and special gates, such as koutofn gates, must be input in terms of their
basic AND and OR gate structure. The basic events are assumed to be independent;
unlimited replicated events are allowed; there is no way to generate cut sets for
intermediate gates; and there is no easy method to input replicated portions of the
tree. PREP allows a maximum of 2000 components and 2000 gates; minimal cut sets
found by COMBO are limited to maximum length of 10 components.
XII4
(b) ELRAFT
FAULT TREE HANDBOOK
The ELRAFT (Efficient Logic Reduction of Fault Trees) code [35] uses the
unique factorization property of the natural numbers to find the minimal cut sets of
a fault tree. Every integer greater than 1 can be expressed as a unique (except for
order) product of prime factors. In the ELRAFT code, each basic event is assigned a
unique prime number. The tree is processed from the bottom up, and cut sets for the
gates at successively higher levels are represented by the product of the numbers
associated with their input events. The major drawback of ELRAFT is that for large
trees, the product of the prime factors can soon exceed the capacity of the machine
to represent the number. Coded in FORTRAN IV for the CDC 6600, ELRAFT is
capable of finding minimal cut sets of up to six basic events for the top event and
other specified intermediate events.
(c) MOCUS
The MOCUS code [11] was written in 1972 to replace PREP as a minimal cut set
generator for the KITT codes. The socalled "Boolean indicated cut sets" (BICS) are
generated by successive substitution into the gate equations beginning with the top
event and working down the tree until all gates have been replaced by basic events. *
If the tree contains no replicated events, the BICS will be minimal; otherwise, the
nonminimal BICS must be discarded. The MOCUS algorithm may be used to find the
minimal cut or path sets for up to 20 gates in a given tree. The user may place an
upper limit on the length of cut sets found if desired. Other aspects of the MOCUS
code are identical to PREP. MOCUS is written in FORTRAN IV for the IBM 360
series computer.
(d) TREEL AND MICSUP
 _..
TREEL and MICSUP [29] are based on an idea similar to that used in MOCUS
except that instead of working from the top event down, MICSUP (Minimal Cut Set
UPward) starts with the lowest level gate basic inputs and works upward to the top
tree event. * TREEL is a preprocessor that checks the tree for errors and determines
in advance the maximum number and size of the Boolean indicated cut and path sets.
As a result of processing the tree from bottom to top, MICSUP has the advantage of
generating the BICS for each intermediate gate of the tree. Nonminimal BICS and
BICS of length greater than a userspecified limit can be discarded as they appear,
thus reducing the computer time and storage requirements. As with MOCUS, most
other aspects of the code are similar to PREP.
(e) ALLCUTS
Another code for finding minimal cut sets is ALLCUTS [39] developed by the
Atlantic Richfield Company. ALLCUTS uses a topdown algorithm, similar to that of
MOCUS. An auxiliary program BRANCH can be used to check the input and cross
reference the gates and input events, and a plot program KILMER can be used to
produce a Calcomp plot of the fault tree based on the fault tree input description
and conversational plotting instructions. ALLCUTS optionally allows input of basic
event probability data. If this data is input, ALLCUTS can compute the top event
probability, sort and print up to 1000 minimal cut sets in descending order of
probability, and select cut sets in specified probability ranges. ALLCUTS handles up
*The basic topdown and bottomup algorithms for minimal cut set generation are explained
in chapter VII, section 4.
FAULT TREE EVALUATION COMPUTER CODES
XII·!
to 175 basic events and 425 gate events; the current version of the code uses 110 K,
octal. ALLCUTS is written in FORTRAN IV and COMPASS (assembly language) for
the CDC 6600 computer.
(f) SETS
The Set Equation Transfonnation System [46], developed by Sandia Labora
tories, is a general program for the manipulation of Boolean equations which can be
applied to fault trees and used to find minimal cut or path sets. The advantages of
the SETS code are its generality and flexibility, one example of which is the ability
to dynamically manipulate the tree via SETS user programs. This capability gives the
user a great deal of control over the processing, a feature which can be especially
helpful when analyzing large trees. For example, a SETS user program may be
written to decompose the original tree and process it in stages without requiring any
changes to the original fault tree input description. A recently added feature enables
SETS to automatically identify the independent subtrees and select stages for
efficient processing of large trees. A packed, bitlevel storage scheme and use of
auxiliary storage are other SETS features aimed at efficient processing of large trees.
Unlike PREP, ELRAFT, MOCUS, ALLCUTS, and MICSUP, SETS can handle
complemented events, exclusive or gates, and special gates represented by any valid
Boolean expression defined by the user. It can be used to find the "prime
implicants" (a more general term than minimal cut sets which includes the possibility
of having both an event and its complement in a Boolean equation) of any
intermediate gate. Other useful features are free field input, the ability to easily input
replicated subtrees, the option of saving the cut sets or factored equation for any
event on a file for future use. The factored equation is a compact form of the
minimal cut set equation from which cut sets of any order can be generated for
enumeration purposes.
SETS allows tree reduction based on both cut set order and cut set probability. It
will also rank and print the minimal cut sets in order of descending probability
(assuming basic event probabilities have been input). SETS is written in FORTRAN
for the CDC 6600.
(g) FTAP
The Fault Tree Analysis Program [43] is a cut set generation code developed at
the University of California Berkeley Operation Research Center. FTAP is unique in
offering the user a choice of three processing methods: topdown, bottomup and the
"Nelson" method. The topdown and bottomup approaches are basically akin to the
methods used in MOCUS and MICSUP, respectively. The Nelson method is a prime
implicant algorithm which is applied to trees containing complement events and uses
a combination of topdown and bottomup techniques. FTAP is the only fault tree
code, other than SETS, which can compute the prime implicants.
FTAP uses two basic techniques to reduce the number of nonminimal cut sets
produced and thereby increase the code's efficiency. The first technique, used in the
bottomup and Nelson methods, is modular decomposition. This approach is quite
similar to that used in PL·MOD (see section 4) and somewhat similar to the SETS
algorithm for identifying and processing independent subtrees (see section 2(f)). The
XII6 FAULT TREE HANDBOOK
second technique, used in the topdown and Nelson methods, is called the "dual
algorithm" in the FTAP documentation [43] . The algorithm involves transformation
of a product of sums into a sum of products whose dual is then taken using a special
method. The author claims that the nonminimal sets appearing during construction
of the dual "will always be less than the number of such sets in [the original product
of sums] , usually many times less."
Other features of FTAP are the ability to reduce the tree based on cut set order or
probability, the ability to find either path sets or cut sets, direct input of symmetric
(koutofn) gates, and considerable flexibility and user control over processing and
output.
FTAP is written in FORTRAN and assembly language, and is available in versions
for the CDC 6600/7600 and IBM 360370 model computers.
3. Computer Codes for Quantitative Analysis of Fault Trees
This section deals with codes which perform quantitative evaluations of fault
trees. The input to these codes consists of two parts:
(1) the equation for the top event unavailability or unreliability (usually from the
minimal cut sets but can be obtained from other nonfault tree models such as block
diagrams or schematics)
(2) failure rate, test, and repair data for the components appearing in the
equation.
Given the above inputs, several types of quantitative results may be computed
including:
Numerical probabilities: probabilities of system and component failures;
Quantitative importances: quantitative rankings of contributions to system failure;
Sensitivity evaluations: effects of changes in models and data, error bounding;
The KlTT and FRANTIC codes, described in sections 2(a) and 2(c), respectively,
compute timeaveraged and timedependent point estimates for the system failure
probability. KITT also computes quantitative importances. SAMPLE and MOCARS.
described in section 2(b). compute a distribution and error bounds for system failure
probability based on uncertainty, error, or variation in the component failure
characteristics.
(a) The KITT codes
KITTI and KlTT2 [40] [42] perform time dependent fault tree quantification
based on the minimal cut or path set description of the tree. The codes can thus be
used in conjunction with any qualitative analysis code which generates the minimal
cut sets in terms of components (basic events) such as PREP. MOCUS, SETS, etc.
PREP and MOCUS generate the cut sets in a form which is directly usable as input to
the KlTT codes. Other required inputs are the component failure rates and repair
characteristics. Components are assumed to have exponential failure distributions.
Each component may have a constant repair time, an exponential repair distribution,
or may be nonrepairable. In addition, KlIT2 allows each component to have its own
unique time phases whereby its failure and repair data may vary from phase to phase.
FAULT TREE EVALUATION COMPUTER CODES
XII'
The KITT codes compute the following five probability characteristics for the
system failure (top event), each component, and each minimal cut or path set at
arbitrary time points specified by the user:
The probability of the failure existing at time t (the unavailability).
The probability of the failure not occurring to time t (the reliability).
The expected number of failures occurring to time t.
The failure rate per hour.
The occurrence rate per hour.
In addition to the above, the KITT codes rank the events in single and double
component cut sets by qualitative and quantitative importance. See Chapter XI for a
discussion of importance measurements.
(b) SAMPLE, MOCARS, et al.
Several codes have been written to compute the probability distribution of a
calculated system result (such as unavailability) when probability distributions are
assigned to the component failure rates to account for data variability. These codes
use Monte Carlo simulation in which the component failure rates are sampled from
input probability distributions. The sample values for the failure rates are then
combined by means of a system function given in a usersupplied FORTRAN
subroutine to determine the sample system results. After a number of these "trials,"
the different system values can be tabulated and the resulting empirical distribution
can be characterized. By this method, the effect on the system unavailability of
uncertainties or variations in the component failure rates can be assessed.
Typically, the usersupplied system unavailability function might be the minimal
cut set equation obtained from one of the qualitative analysis codes.
SAMPLE [38] was the Monte Carlo code used in WASH1400. SAMPLE allows
normal, lognormal, or logUniform distributions to be specified for the component
failure rates. The output distribution is presented in terms of estimated empirical
probability percentiles from which the estimated median and upper and lower
bounds can easily be read. The output also includes the estimated mean and standard
deviation of the distribution and a tabular histogram of the system density function.
Sample is written in FORTRAN IV.
MOCARS [25] is similar to principle and operation to SAMPLE, but allows a
larger variety of sampling distributions including exponential, normal, gamma, beta,
lognormal, binomial, Poisson, Weibull, and empirical distributions. It allows the
system unavailability function to be specified either as FORTRAN statements or in
terms of cut sets. Other options include microfJlm plotting using the Integrated
Graphics System (IGS) and the ability to perform a KolmogorovSmirnov goodness
offit test on the output distribution to see if it resembles a normal, lognormal, or
exponential function. MOCARS was written in FORTRAN to run on the INEL CDC
761973 operating system.
Other expanded versions of SAMPLE [4] have been written, but they are all quite
similar, and so we will not discuss them here.
XII8
(c) FRANTIC
FAULT TREE HANDBOOK
The FRANTIC (Formal Reliability Analysis including Normal Testing, Inspection
and Checking) code [41] computes the average and time dependent unavailability of
any general system model such as a fault tree or event tree, incorporating in detail
the effects of different periodic testing schemes. The program can be used to assess
the effects on system unavailability of test downtimes, repair times, test efficiency,
test bypass capabilities, testeaused failures, and different test staggerings. In addition
to periodically tested components, nonrepairable and monitored components as well
as human error and common cause contributions can also be modeled.
As in the SAMPLE code, the system model function is input in the form of
FORTRAN subroutine. For each component, the failure rate, and test and repair
characteristics must be provided. Exponential failure distributions are assumed.
Other input includes the time period for the calculations, and print and plot options.
Calcomp plots of the time dependent system unavailability function may be
produced.
A Monte Carlo version of the FRANTIC code [16] is available in which sampling
distributions may be input for the component failure rates. FRANTIC is written in
FORTRAN IV for the IBM 360370 series computers.
4. Direct Evaluation Codes
As their name implies, the direct evaluation codes quantify the system model in a
single step. Thus they do not produce cut sets as an integral part of the analysis and
they require probabilistic input for each component from the outset of the
processing. The output from these codes is generally in the form of point estimates
for the system unavailability or failure probability.
The GO and WAM·BAM codes, described in sections 3(c) and 3(0 respectively,
offer the advantage of allowing complement events and some modeling of
dependencies. GO also allows switches and time delays and models all system states
instead of a single fault event. Both GO and WAMBAM reduce storage requirements
by eliminating low probability paths at an intermediate stage of the processing and at
the same time keep track of the total of the discarded path probabilities.
Disadvantages are mainly connected with the inability to produce cut sets (many of
the direct evaluation codes have added the option to compute cut sets, but not as an
integral part of the analysis), and the necessity of inputting probabilities for all of the
components even though many may be insignificant contributors to system failure.
Also, a change in probabilities often requires a complete rerun.
(a) ARMM
The ARMM (Automatic Reliability Mathematical Model) code [26], developed by
North American Aviation for the U.S. Air Force and modified by Holmes and Narver
for application to nuclear power plant systems, was the first direct evaluation code.
ARMM models a reliability block diagram using a success path approach. The
component failure probabilities are determined using failure density functions
supplied for each component. The program is capable of handling Weibull
FAULT TREE EVALUATION COMPUTER CODES
XII9
(timedependent failure rate) density functions, dependent components, and mutu
ally exclusive failure modes. It is written in FORTRAN IV for the IBM 360
computer.
(b) SAFTE
The SAFTE (Systems Analysis by Fault Tree Evaluation) codes [13], SAFTE1,
SAFTE2, and SAFTE3, are Monte Carlo simulation programs using techniques
similar to the FATE option of PREP to generate random times to failure of
components in a fault tree. However, instead of computing the cut sets, the SAFTE
codes directly generate a distribution of times to failure for the system. The SAFTEI
code works as described above; the SAFTE2 code also includes the ability to sample
from normal repair distributions to get a time to repair for each component. In this
version, a failed component may be repaired (made as good as new) before system
failure, and then resume operation with a new random time to failure and time to
repair. In both SAFTEI and SAFTE2, the random times to failure are generated
from exponential failure distributions.
SAFTE3 computes the probability of system failure based on steady state repair,
using either direct or importance sampling techniques. The SAFTE codes are written
in FORTRAN IV for the IBM 360 computer.
(c) GO
The GO methodology [14], developed in the mid1960's by Kaman Sciences
Corporation, differs from the fault tree approach in that the normal operating
sequence is modeled and all possible system states are considered. The input model
used is called a GO chart and it resembles a schematic or flowchart made up of a set
of standardized operators which describe the logical operation and interconnection
of the system components. Some of the 16 GO operators are similar to fault tree
gates, but in addition to logic functions, time delays and switches can be modeled as
well as complementary event logic and mutually exclusive states. GO also provides a
simplified method for modeling repeated portions of the tree through the use of
"supertypes." In addition to specifying the types of operators and their interconnec
tions, the user also specifies the probabilities associated with the possible operational
modes of each component. This process is analogous to supplying failure probabil
ities for components in a fault tree; however, in the GO approach, probabilities are
given for states other than simple success or failure (e.g., the probability of
premature operation, or the probabilities of response over a series of time points are
supplied for some operators). The outputs from GO are the probabilities of
occurrence of individual output events or the joint probability density of degrees of
performance for several output events. The output events can include system success,
and various degrees of failure such as spurious or premature operation, delayed or
partial operation, and complete failure to operate. The effects of component repair
cannot be modeled.
The numerical evaluations are performed in a onestep process as the signals
(event probabilities) are traced through the model using a Markov chain (event tree)
approach to propagate the values. This means that a change in component
probabilities, such as for sensitivity studies, requires a complete reevaluation even
XII1O
FAULT TREE HANDBOOK
though the system structure remains unchanged. Because the probability tree can
easily become very large, GO has options for pruning the tree of branches with
probabilities lower than a selected value, and of deleting signals which are no longer
needed, while keeping track of the total probability of the discarded paths. GO also
includes a "Fault Finder" option to compute the 4th order cut sets for a selected
output event.
Because of the diversity and detail of the GO operators, and the necessity of
inchiding all system components, the modeling process for GO is somewhat more
complex than that for fault trees. However, it may be argued that because GO charts
are similar to the familar system schematics, the modeling process is easily learned by
designers and engineers. If the analyst wishes to use the fault tree model instead of
the GO chart, it is still possible to evaluate the tree using GO. In this case only the
subset of the GO operators analogous to the fault tree gates would be used and the
output would be a point estimate of the top event failure probability.
GO is written in FORTRAN for the CDC 7600.
(d) NOTED
NOTED [45], developed by the United Kingdom Atomic Energy Authority in
1971, is similar in concept to GO. However, instead of analyZing the system at a
series of discrete time points, NOTED produces a graph of the cumulative failure
probability as a continuous function of time at any of several points in the system.
Similarly, the behavior of the input components is described by continuous failure
distributions including exponential lognormal, normal, Weibull and forms including
repair times.
(e) PATREC
PATREC [19] was the first computer code to apply list processing techniques to
fault tree evaluation. Rather than generating cut sets, PATREC evaluates the tree
directly, using a pattern recognition algorithm implemented in the PL/l programm
ing language. A set of subtree patterns along with their corresponding probability
equations are stored in the computer code's library. The fault tree is then searched
for occurrences of the library patterns. Each recognized pattern is replaced by a
supercomponent with probability of occurrence equal to that stored in the library.
The whole tree is thus eventually reduced to a single leaf which corresponds to the
probability of failure of the total system.
PATREC can evaluate trees containing both an event and its complement; direct
input of koutofn gates is also supported. Cut sets can be generated, if desired, using
an algorithm similar to that of MOCUS, but they are not used in the evaluation of
the fault tree. PATREC's greatest limitation is in its handling of replicated events.
The pattern recognition scheme yields the correct probabilities only when no events
are replicated. Therefore, PATREC replaces a single fault tree having r different
replicated events by 2
r
fault trees with no replicated events. Even with approxima
tions which allow some of the 2
r
fault trees to be discarded, PATREC cannot
efficiently evaluate fault trees with more than about 20 replicated events.
FAULT TREE EVALUATION COMPUTER CODES XIIII
PATREC is capable of performing time dependent system unavailability analysis
where each basic event may have a failure distribution which is exponential, Weibull,
normal, or log normal. For the exponential failure case, the component can
optionally be assumed repairable with an exponential repair distribution. In addition,
the user can include a constant "failureondemand" (e.g., failure to start) probability
which is independent of time.
PATRECMC [20] is a Monte Carlo version of PATREC which can be used to
assess the effects of uncertainty in the component reliability parameters. The
operation of the code is similar to that of SAMPLE (see section 2(b) of this chapter)
except for the representation of the system function. In PATRECMC, a calculation
is first made to identify the patterns in the tree using the list processing methods
described earlier. The patterns are then stored in memory so that they can be
repeatedly evaluated during the Monte Carlo trials. Note that storage of the patterns
for subsequent reevaluation means that PATRECMC is not really a direct evaluation
code because the stored patterns are actually the result of an intermediate qualitative
analysis independent of the component probabilities. This distinction will come up
again in our discussion ofPLMOD (see section 4 of this chapter).
(f) WAMBAM
The WAMBAM codes [9] [22] [34] were developed at Science Applications
Inc. for EPRI beginning in 1975. The WAMBAM package actually consists of four
codes: WAM, WAMTAP, BAM, and WAMCUT. WAM and WAMTAP are input
preprocessors for the evaluation code BAM (Boolean Arithmetic Model). The WAM
preprocessor, like PREP, is designed to ease the input preparation process. It
generates the numeric input for BAM from the input description of the fault tree and
the event probabilities. At the user's option, the input to BAM can be saved and
subsequently modified by WAMTAP. WAMTAP allows the probability of single
components or groups of components to be changed in order to run sensitivity
studies or to include common cause contributions. WAMCUT can be used to
compute minimal cut sets, and the mean and variance of the probability of any gate.
It can also generate the input to a Monte Carlo code, SPASM, which computes a
distribution on gate probability.
The evaluation code, BAM, uses a combination of concepts from the GO
methodology and fault tree analysis. The GO computational scheme is used but the
operations are modeled as gates on a fault tree. Eight possible logical combinations of
2 events and their complements are included as allowable gates. In BAM, the
probability of the top event is computed by forming a truth table, each line of which
represents a product term (P term) event disjoint from all the other P terms. In terms
of the GO methology, the P terms are equivalent to paths in the GO event tree. The
output from BAM is a point probability of the top event. As mentioned earlier,
WAMTAP can be used to modity the input for BAM sensitivity studies. WAMBAM is
written in FORTRAN for the CDC 6600.
5. PLMOD: A Dual Purpose Code
PLMOD [28] is described separately in this section because, though it can
perform both a qualitative and quantitative fault tree analysis, it depends neither on
XIIl 2
FAULT TREE HANDBOOK
the standard cut set generation nor the direct evaluation techniques. Like the
PATRECMC code, it performs a qualitative analysis which does not rely on standard
cut set generation techniques, but which can be used repeatedly for quantification.
The PLMOD computer code works by ','modularizing" the fault tree directly
from a description of its component and gate diagram. Defmed in terms of a
reliability network diagram, a module is a group of components which behaves as a
supercomponent (i.e., it is completely sufficient to know the state of the
supercomponent, and not the state of the components which comprise it, to
determine the state of the system). In terms of a fault tree diagram, an intermediate
gate is a module to the top event tree if none of the basic events contained in the
gate domain (i.e., all branches below the gate) appear elsewhere in the fault tree.
Briefly, modularization implies that all the independent subtrees (i.e., modules or
subsystems) are identified, and the minimal cut sets are defined recursively in terms
of these modules. Or, to put this is a slightly different way, a modularized tree is one
which is equivalent to the original tree but in some sense "maximizes" the
decomposition of the tree into independent subtrees.
The concept and advantages of modularization have been known for some time
[2] and an algorithm for finding the finest modular decomposition of a fault tree
given its cut sets was described by Chatterjee [6] in 1975. The modularization
process used by PLMOD is, as mentioned earlier, unique in that it is applied not to
the cut sets, but directly to a description of the fault tree diagram, using the list
processing features of the PL/l programming language. The modularization process
used by PLMOD is somewhat complex, and will therefore not be discussed here. (A
complete description appears in reference [28]).
Features of PL·MOD are the ability to handle complemented events, direct input
of symmetric (koutofn) gates, free field input, and dynamic storage allocation. The
output from PLMOD includes the standard and modular minimal cut sets for the top
event and specified intermediate gates of the tree.
Some disadvantages of PLMOD are its machine dependence (PL/l is not
available in many computer systems) and the lack of familiarity with PL/l among
scientific users.
The quantitative capabilities of PLMOD include the computation of the
occurrence probability and importance (see Chapter XI) for the top event and all
other modules. PLMOD also has a Monte Carlo option for computing uncertainties
and timedependent unavailability evaluation option which can handle non
repairable, repairable (revealed fault), and periodically tested components.
6. Common Cause Failure Analysis Codes
Common cause failure analysis is becoming increasingly important in system
reliability and safety studies because it has been recognized that common cause
failures can often dominate the random hardware failures. Common cause failure
analysis attempts to identify the modes of system failure (Le., minimal cut sets)
which have the potential of being triggered by a single, more basic common cause;
the minimal cut sets which need to be identified are those with two or more events,
all of which are susceptible to a single common cause failure mechanism.
FAULT TREE EVALUATION COMPUTER CODES
(a) COMCAN
XII13
COMCAN f31, developed at INEL, was the fIrst program to perform common
cause failure analysis. The input to the program consists of two parts: (1) the
minimal cut sets of the fault tree to be analyzed, and (2) the common cause
susceptibility data for each basic event. The output from the program is a list of the
minimal cut sets which are common cause candidates.
,A minimal cut set may be identifIed as a common cause candidate by either of
two criteria. The frrst criterion requires that all the events in the cut set be
potentially affected by the same cause or condition. The second criterion requires
that all the events in the cut set share susceptibility to a common cause or condition
and in addition, all components implied by the basic events in the minimal cut set
must share a common physical location with respect to the common cause
susceptibility. Some typical common causes include: impact, vibration, pressure, grit,
stress, and temperature. The minimal cut sets and common cause susct;ptibility data
constitute the required inputs. Optional inputs are the component manufacturers,
location domain definitions for generic causes, the location of each component
implied by the basic events, and ran)(s of component susceptibility to each common
cause. The more input provided, the more refined will be the search for the common
cause candidates. The output options include the ability to print only the common
cause candidates with ranks ~ N, and to include similar type components as one of
criteria for common cause candidates.
COMCAN is written in FORTRAN N for the IBM 360 computer.
(b) BACKFIRE
The BACKFIRE code [5], published in May 1977, is an offshoot of COMCAN.
The required and optional inputs are almost the same except that BACKFIRE
permits more than one location to be specifIed for a component. This is useful for
piping and wiring which may pass through domain barriers. Like COMCAN,
BACKFIRE is written in FORTRAN IV for the IBM 360 computer.
(c) SETS
The SETS code, described in section 1(t), can also be used for common cause
analysis [47]. The analysis is conducted ~ . a manner similar to that of COMCAN, by
inputting generic cause susceptibilities for each basic event. Avariable transformation
incorporates the common cause susceptibilities into the Boolean equation for the top
or any intermediate gate of the fault tree, and a few simple manipulations allow the
user to display the cut sets which are the common cause candidates.
BIBLIOGRAPHY
1. Army Ordnance Corps, Tables of the Cumulative Binomial Probabilities, ORDP
2011, 1952.
2. Z. Birnbaum, "On the Importance of Different Components in a Multicompo
nent System in Multivariate AnalysisII," P.R. Krishnaiah, Editor, Academic
Press, New York, 1969.
3. G.R. Burdick, N.H. Marshall and J.R. Wilson, "COMCANA Computer Program
for Common Cause Analysis," Aerojet Nuclear Company, ANCR1314, May
1976.
4. J.J. Cairns and K.N. Fleming, "STADIC: A Computer Code for Combining
Probability Distributions," General Atomic Co., GAAI4055, March 1977.
5. C.L. Cate and J.B. Fussell, "BACKFIREA Computer Code for Common Cause
Failure Analysis," University of Tennessee, Knoxville, Tennessee, May 1977.
6. P. Chatterjee, "Modularization of Fault Trees: A Method to Reduce the Cost of
Analysis," Reliability and Fault Tree Analysis, SIAM, 1975, pp. 101126.
7. W.C. Cochran, Sampling Techniques, John Wiley and Sons, Inc., New York,
1977.
8. W.J. Conover, Practical Nonparametric Statistics, John Wiley and Sons, Inc.,
New York, 1971.
9. R.C. Erdmann, F.L. Leverenz, and H. Kirch, "WAMCUT, A Computer Code for
Fault Tree Evaluation," Science Applications, Inc., EPRI NP·803, June 1978.
10. W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I,
John Wiley and Sons, Inc., New York, 1957, 1968.
11. J.B. Fussell and W.E. Vesely, "A New Methodology for Obtaining Cut Sets for
Fault Trees," Trans, ANS, Vol. 15, p. 262,1972.
12. J.B. Fussell and G.R. Burdick, ed., Proceedings of the International Conference
on Nuclear Systems Reliability and Risk Assessment, Gatlinburg, Tennessee,
SIAM, 1975.
13. B.J. Garrick, "Principles of Unified Systems Safety Analysis," NucL Engr. and
Design, Vol. 13, No.2, pp. 245321, August 1970.
14. W.Y. Gateley, D.W. Stoddard and R.L. Williams, "GO, A Computer Program for
the Reliability Analysis of Complex Systems," Kaman Sciences Corporation,
Colorado Springs, Colorado, KN67704(R), April 1968.
BIBl
BIB2
FAULT TREE HANDBOOK
15. B.V. Gnedenko, Y.K. Belyayev, and A.D. Solovyev, Mathematical Methods of
Reliability Theory, Translation edited by R.E. Barlow, Academic Press, New
York 1969.
16. F.F. Goldberg and W.E. Vesely, "Time Dependent Unavailability Analysis of
Nuclear Safety Systems," presented at the National Conference on Reliability,
Nottingham, England, September 1977.
17. A.E. Green and A.J. Bourne, Reliability Technology, WileyInterscience,
London, 1972.
18. IEEE, Guide for General Principles of Reliability Analysis of Nuclear Power
Generating Protection System, IEEE Std. 3521975, 1975.
19.' B.V. Koen and A. Camino, "Reliability Calculations with a List Processing
Technique," IEEE Transactions on Reliability, Vol. B23, No. I, April 1974.
20. B.V. Koen, A. Camino, et al., "The State of the Art of PATREC: A Computer
Code for the Evaluation of Reliability and Availability of Complex Systems,"
presented at the National Reliability Conference, Nottingham, England, Sep
tember 1977.
21. H.E. Lambert, "Fault Trees for Decision Making in Systems Analysis," Lawrence
Livermore Laboratories, DCRL51829, 1975.
22. F.L. Leverenz and H. Kirch, "User's Guide for the WAMBAM Computer Code,"
Science Applications, Inc., EPRI 21725, January 1976.
23. D.K. lioyd, M. Lipow, Reliability: Management, Methods, and Mathematics,
PrenticeHall, Inc., Englewood Cliffs, New Jersey, 1962.
24. N.R. Mann, R.E. Schafer and N.D. Singpurwalla, Mathematical Methods for
Statistical Analysis of Reliability and Life Data, John Wiley and Sons, Inc., New
York,1974.
25. S.D. Matthews, "MOCARS: A Monte Carlo Simulation Code for Determining
the Distribution and Simulation Limits," TREE1138, July 1977.
26. C.W. McKnight, et al., "Automatic Reliability Mathematical Model," North
American Aviation, Inc., Downey, California, NA 66838, 1966.
27. National Bureau of Standards, Tables of the Binomial Probability Distribution,
App. Math. Series, Vol. 6, 1949.
28. J. Olmos and L. Wolf, "A Modular Approach to Fault Tree and Reliability
Analysis," Department of Nuclear Engineering, Massachusetts Institute of
Technology, MITNE209, August 1977.
BIBLIOGRAPHY BIB3
29. P.K. Pande, et a1., "Computerized Fault Tree Analysis: TREEL and MICSUP,"
Operational Research Center, University of California, Berkeley, ORC 753,
Apri11975.
30. c.R. Rao, Linear Statistical Inference and Its Applications, John Wiley and Sons,
Inc., New York, 1977.
31. N.H. Roberts, Mathematical Methods in Reliability Engineering, McGrawHill,
New York, 1964.
32. H.G. Romig, Binomial Tables, John Wiley and Sons, Inc., New York, 1953.
33. A. Rosenthal, "A Computer Scientist Looks at Reliability Computations,"
SIAM, 1975, pp. 133151.
34. E.T. Rumble, F.L. Leverenz, Jr. and R.C. Erdmann, "Generalized Fault Tree
Analysis for Reactor ~ a f e t y ," Electric Power Research Institute, Palo Alto,
California, EPRI 21722, Interim Report, June 1975.
35. S.N. Semanderes, "ELRAFT, A Computer Program for the Efficient Logic
Reduction Analysis of Fault Trees," IEEE Trans. on Nucl. Sci., Vol. NS18, No.
1, pp. 481487, February 1971.
36. M.L. Shooman, Probabilistic Reliability: An Engineering Approach, McGraw
Hill, New York, 1968.
37. C.P. Tsokos and LN. Shimi, ed., The Theory and Applications of Reliability with
Emphasis on Bayesian and Nonparametic Methods, Academic Press, New York,
1977.
38. U.S. Nuclear Regulatory Commission, "Reactor Safety StudyAn Assessment of
Accident Risks in U.S. Commercial Nuclear Power Plants," WASH1400
(NUREG75/014), October 1975.
39. W.J. Van Slyke and D.E. Griffing, "ALLCUTS, A Fast, Comprehensive Fault
Tree Analysis Code," Atlantic Richfield Hanford Company, Richland, Wash·
ington, ARH·ST·112, July 1975.
40. W.E. Vesely, "Analysis of Fault Trees by Kinetic Tree Theory," Idaho Nuclear
Corporation, Idaho Falls, Idaho, IN1330, October 1969.
41. W.E. Vesely and F.F. Goldberg, "FRANTIC  A Computer Code for Time·
Dependent Unavailability Analysis," U.S. Nuclear Regulatory Commission,
NUREG0193, October 1977.
42. W.E. Vesely and R.E. Narum, "PREP and KITT Computer Codes for the
Automatic Evaluation of a Fault Tree," Idaho Nuclear Corporation, Idaho Falls,
Idaho, IN1349, 1970.
BIB4 FAULT TREE HANDBOOK
43. RR Willie, "Computer Aided Fault Tree Analysis," Operations Research
Center, University of California, Berkeley, August 1978.
44. RL. Wine, Statistics for Scientists and Engineers, PrenticeHall, Englewood
Cliffs, New Jersey, 1964.
45. E.R Woodcock, "The Calculation of Reliability of Systems: The Program
NOTED," UKAEA Authority, Health and Safety Branch, Risley, Warrington,
Lancashire, England, AHSB(S) R 153, 1971.
46. R.B. Worrell, "A SETS Users' Manual for the Fault Tree Analyst," Sandia
Laboratories, Alburquerque, New Mexico, NUREG/CR0465, SAND 772051,
1978.
47. R.B. Worrell and D.W. Stack, "Common Cause Analysis Using SETS," Sandia
Laboratories, Alburquerque, New Mexico, SAND 771832: 191..:Z..,
'((U.S. GOVERNMENT PRINTING OFFICE: 1992318277/60129
NRC FORM 336
1777)
u.s. NUCLEAR REGULATORY
BI8LIOGRAPHIC DATA SHEET
1. REPORT NUMBER (A..i_dbY DOCI
NUREG0492
4. TITLE AND SUBTITLE (Add V,,*,_ No.. if _"..}
Fault Tree Handbook
David F. Haasl, Norman H. Roberts,
William E. Vesely, Francine F. Goldberg
9. PERFORMING ORGANIZATION NAME AND MAILING ADDRESS (Includt! Zip Codt!l
U.S. Nuclear Regulatory Commission
Office of Nuclear Regulatory Research
Division of Systems &Reliability Research
WashinQton, DC 20555
12. SPONSORING ORGANIZATION NAME AND MAILING ADDRESS ((ncludt! Zip Codt!J
2. (Le""" blank}
3. RECIPIENT'S ACCESSION NO.
"
5. DATE REPORT COMPLETED
MONTH I YEAR
March 1980
DATE REPORT ISSUED
MONT" I YEAR
January 1981
6. (Lewe blankl
10. PROJECT/TASK/WORK UNIT NO.
11. CONTRACT NO.
13_ TYPE OF REPORT
Handbook
15. SUPPLEMENTARY NOTES
16. ABSTRACT (200 woniiIN _nl
IPE RIOD COVE RED (Inclusive daresl
14. (Leave blankl
This handbook describes a methodology for reliability analysis of complex systems
such as those which comprise the engineered safety features of nuclear power
generating stations. After an initial overview of the available system analysis
approaches, the handbook focuses on a description of the deductive method known
as fault tree analysis. The following aspects of fault tree analysis are covered:
basic concepts for fault tree analysis; basic elements of a fault tree; fault
tree construction; probability, statistics, and Boolean algebra for the fault
, tree analyst; qualitative and quantitative fault tree evaluation techniques; and
computer codes for fault tree evaluation. Al$o discussed are several example
problems illustrating the basic concepts of fault tree construction and
evaluation.
17. KEY WORDS AND DOCUMENT ANALYSIS
Fault Trees, Systems Reliability Analysis,
Systems Analysis, Minimal Cut Sets
17b. IDENTIFIERS/OPENENDED TERMS
lB. AVAILABILITY STATEMENT
Unrestricted
NRC FORM 335 (7771
17a. DESCRIPTORS
19. SECURITY CLASS (7h,s reportl 21. NO. OF PAGES
•
20. SECURITY CLASS (This pagel 22. PRICE
S
NUREG0492
Fault Tree Handbook
Date Published: January 1981 W. E. Vesely, U.S. Nuclear Regulatory Commission F. F. Goldberg, U.S. Nuclear Regulatory Commission N. H. Roberts, University of Washington D. F. Haasl, Institute of System Sciences, Inc.
Systems and Reliability Research Office of I\luclear Regulatory Research U.S. Nuclear Regulatory Commission Washington, D.C. 20555
For sale by the U.S. Government Printing Office Superintendent of Documents, Mail Stop: SSOP, Washington, DC 204029328
Available from GPO Sales Program Division.of Technical Information and Document Control U.S. Nuclear Regulatory Commission Washington, DC 20555 Printed copy price: and National Technical Information Service Springfield, VA 22161 $5.50
TABLE OF CONTENTS
Introduction vii 11 11 13 17 19
I.
Basic Concepts of System Analysis
1. 2. 3. 4.
The Purpose of System Analysis Defmition of a System Analytical Approaches Perils and Pitfalls
II.
Overview ofInductive Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . .. III 1. 2. 3. 4. 5. 6. 7. 8. 9. Introduction..................................... The "Parts Count" Approach ..... . . . . . . . . . . . . . . . . . . . .. Failure Mode and Effect Analysis (FMEA) Failure Mode Effect and Criticality Analysis (FMECA) Preliminary Hazard Analysis (PHA) .,. . . . . . . . . . . . . . . . . . .. Fault ffazard Analysis (FHA) " Double Failure Matrix (DFM) . . . . . . . . . . . . . . . . . . . . . . . . .. Success Path Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Conclusions ;...................... " III
II~l
112 114 114 115 115 1110 1112 IDl IIII 1111 llI3 ID4 Nl
III.
Fault Tree AnalysisBasic Concepts
l.
2. 3. 4. IV.
Orientation...................................... Failure vs. Success Models The Undesired Event Concept . . . . . . . . . . . . . . . . . . . . . . . . .. Summary.......................................
The Basic Elements of a Fault Tree
1. 2.
The Fault Tree Model " . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Nl SymbologyThe Building Blocks of the Fault Tree Nl
V.
Fault Tree Construction Fundamentals. . . . . . . . . . . . . . . . . . . . . . .. VI
1. 2. 3. 4. 5. 6. 7.
Faults vs. Failures " VI Fault Occurrence vs. Fault Existence " VI Passive vs. Active Components V2 Component Fault Categories: Primary, Secondary, and Command.. V3 Failure Mechanism, Failure Mode, and Failure Effect V3 The "Immediate Cause" Concept V6 Basic Rules for Fault Tree Construction V8
iii
Introduction Random Experiments and Outcomes of Random Experiments The Relative Frequency Defmition of Probability Algebraic Operations with Probabilities Combinatorial Analysis Set Theory: Application to the Mathematical Treatment of Events Symbolism ' Additional Set Concepts Bayes'Theorem VII VII VII VI3 VI3 VI8 VIII VII6 VII7 VII9 VIII VIII VII4 VIII2 VIIIS VIIII VIIII VIIII2 lXI IXI IX7 XI XI XI X7 X9 XIO XIS XI9 X22 X2S X26 X27 X27 X28 VII. 2. 2. 6. Probabilistic and Statistical Analyses 1. 3. 7. 11.iv VI. 4. 12. IX. 2. System Defmition and Fault Tree Construction Fault Tree Evaluation (Minimal Cut Sets) The Three Motor Example 1. 2. Poisson Application of the Poisson Distribution to System FailuresThe SoCalled Exponential Distribution The Failure Rate Function An Application Involving the TimetoFailure Distribution Statistical Estimation Random Samples Sampling Distributions ' Point EstimatesGeneral . 7. 10. 9. 3. 3. 8. 6. 13. Introduction The Binomial Distribution The Cumulative Distribution Function The Probability Density Function Distribution Parameters and Moments Limiting Forms of the Binomial: Normal. System Defmition and Fault Tree Construction Fault Tree Evaluation (Minimal Cut Sets) '" X. The Pressure Tank Example 1. TABLE OF CONTENTS Probability TheoryThe Mathematical Description of Events L 2. Rules of Boolean Algebra Application to Fault Tree Analysis Shannon's Method for Expressing Boolean Functions in Standardized Forms Determining the Mitltmal Cut Sets or Minimal Path Sets of a Fault Tree VIII. Boolean Algebra and Application to Fault Tree Analysis 1. S. S. 8. 4. 4. 9.
.. ... Point EstimatesMaximum likelihood Interval Estimators . .. ... . . BIBl . . . .. 6. . .. . .. . .. . . Introduction.. . .. Overview of Available Codes Computer Codes for Qualitative Analyses of Fault Trees Computer Codes for Quantitative Analyses of Fault Trees Direct Evaluation Codes PLMOD: A Dual Purpose Code Common Cause Failure Analysis Codes Bibliography ... .. 15. XII Qualitative Evaluations XI2 XI? Quantitative Evaluations XIIl " XIIl XII2 XII6 XII8 XIIII XII12 XII. . . . .. . . 16. .. . XI... 3. 5. 3. . .. 2. . ... .. ... . 2.. Bayesian Analyses X30 X35 X39 XII " " Fault Tree Evaluation Techniques 1. .. . Fault Tree Evaluation Computer Codes 1. . .. . . 4. . ...... .. . ... .TABLE OF CONTENTS 14.
{)400) in which it was stated that the fault/event tree methodology both can and should be used more widely by the NRC. Roberts. The publication of this handbook is in accordance with the recommendations of the Risk Assessment Review Group Report (NUREG/CR.INTRODUCTION Since 1975. The course has been taught jointly by David F. and 'members of the Probabilistic Analysis Staff. University of Washington. vii . NRC. It is hoped that this document will help to codify and systematize the fault tree approach to systems analysis. but also to make available to others a set of otherwise undocumented material on fault tree construction and evaluation. a short course entitled "System Safety and Reliability Analysis" has been presented to over 200 NRC personnel and contractors. as part of a risk assessment training program sponsored by the Probabilistic Analysis Staff. This handbook has been developed not only to serve as text for the System Safety and Reliability Course. Institute of System Sciences. Professor Norman H. Haasl.
11 . Decisionmaking is a very complex process.* The information so gained can be used in making decisions. but in the vast majority of cases. so that it is almost never possible to eliminate all elements of uncertainty. DIRECT INFORMATION I . For example. and therefore. we may believe in Murphy's Law: "If anything can go wrong. it will go wrong. Presumably. we simply cannot know all the consequences of electing to take a particular course of action. we will undertake a brief examination of the decisionmaking process. Schematic Representation of the Decisionmaking Process *There are other methods for performing this function. Furthermore. We all have deadlines to meet. which is a systematic method for acquiring information about a system. The Purpose of System Analysis The principal concern of this book is the fault tree technique. and we will highlight only certain aspects which help to put a system analysis in proper context. It is possible to postulate an imaginary world in which no decisions are made until all the relevant information is assembled. w~ may be convinced that "all is for the best in this best of all possible worlds. Figure 11 provides a schematic representation of these considerations." Thus..CHAPTER I .ACQUISITION fINDIRECT INFORMATION ACQUISITION DECISION I TIME AT WHICH DECISION MUST BE MADE I I ~ TIME AXIS • J Figure 11. by experimentation." Or. it will not be possible to acquire all the relevant information. before we even define system analysis. Our knowledge may be increased by appropriate tests and proper analyses of the resultsthat is. and not by the degree of completeness of our knowledge. any decision that we do make is based on our present knowledge abou t the situation at hand.. This is a far cry from the everyday world in which decisions are forced on us by time. because it is generally impossible to have all the relevant data at the time the decision must be made. To some extent our knowledge may be based on conjecture :Jd this will be conditioned by our degree of optimism or pessimism. conversely. Some of these are discussed briefly in Chapter II. knowledge may be obtained in several ways.BASIC CONCEPTS OF SYSTEM ANALYSIS 1. This knowledge comes partly from our direct experience with the relevant situation or from related experience with similar situations.
We are concerned here with making correct decisions. the stock has plummeted 20 points in the interim. We' must decide what information is relevant to a given decision before the data gathering activity starts. the system model). the corresponding system model will not usually be a very fruitful one. after long thought.. I would have to conclude that my original decision was bad. (2) A systematic program for the acquisition of this pertinent information.. and disciplined process. Nevertheless. To do this we require: (1) The identification of that information' (or those data) that would be pertinent to the anticipated rlecision... the acquisition of information) and not on the product (Le. What information is essential? What information is desirable? This may appear perfectly obvious. however. Six months later. The authors of this book.. the primary function of the system analysis is the acquisition of information and not the generation of a system model.12 FAULT TREE HANDBOOK The existence of the time constrain t on the decisionmaking process leads us to make a distinction between good decisions and correct decisions. According to this definition. but it is astonishing on how many occasions this rationale is not followed. some controversy. This emphasis is necessary because. My original decision can now be classified as good. in the absence of a directed. Data Gathering Gone Awry . The sort of thing that may happen is illustrated in Figure 12. Our emphasis (at least initially) will be on the process (Le. I find that the stock has risen 20 points. There are perhaps as many definitions of system analysis as there are people working and writing in the field. have chosen the following defmition: System analysis is a directed process for the orderly and timely acquisition and investigation of specific system information pertinent to a given decision. that original decision could very well have been correct if all the information available at the time had indicated a rosy future for XYZ Corporation... (3) A rational assessment or analysis of the data so acquired... /' " /' Figure 12.. and considerable experience. If. . I make a decision to buy 1000 shares of XYZ Corporation.. We can classify a decision as good or bad whenever we have the advantage of retrospect. . manageable.
and the same thing is happening. and his investigation leads him to some fascinating unanswered questions indicated in the sketch by A1 . Block A represents certified reality." is an expert in subarea A. 0 z ~ c. Notice. all the essential information is not available despite the fact that the efforts expended would have been able to provide the necessary data if they had been properly directed. He commences to investigate this area. however. Next." In common parlance we speak of "the solar system.BASIC CONCEPTS OF SYSTEM ANALYSIS 13 The large circle represents the information that will be essential for a correct decision to be made at some future time. then. These investigations lead to B1 and B2 and so on. and so on. Laboratory Alpha is in an excellent position to study subarea B. who is well funded and who is a "Very Senior Person. that Professor Jones' efforts are causing him to depart more and more from the area of essential data. and Decision Process 2." or a "communication system. in this process." "a system of government. The nature of the decisionmaking process is shown in Figure 13. the greatest emphasis should be placed on assuring that the system model provides as accurate a representation of reality as possible. When the time for decision arrives. Investigation of A1 leads to A2 . Defmition of a System We have given a defmition of the process of system analysis. this model is analyzed to produce conclusions (block C) on which our decision will be based.J A l w B ICI) > z MODEL > z CONCLUSIONS w C 0 w u TRUTH REALITY OUR PERCEPTION OF REALITY BASIS FOR DECISION Figure 13.:.J CI) 0 z Z 0 (/) l I~ c. Our next task is to devise a suitable definition for the word "system. System Model." but by experimentation and investigation (observations of Nature) we may slowly construct a perception of reality. So our decision is a direct outcome of our model and if our model is grossly in error. Now actual reality is pretty much of a "closed book. Relationship Between Reality.:. Professor Jones. so also will be our decision. Clearly. This is our system model shown as block B." and in so doing we imply some sort of organization existing among various elements that .
The poet Dante treated the Inferno as a system and divided it up into a number of harrowing levels but. and so forth. Note that the discrete elements themselves may be regarded as systems. circulating hot water piping systems. It seems reasonable~ then~ to establish the following defInition: A system is a deterministic entity comprising an interacting collection of discrete elements. a system must have some purposeit must do. Furthermore. Suppose one engine fails. This interaction •. Consider~ for example. The discrete elements of the definition must also. This is an important point because. a submarine consists of a propulsion system. For instance: are we interested in whether the system accomplishes some task successfully. the plumbing system in my home. the landing characteristics have changed rather drastically. For one thing. are we interested in whether the system fails in some hazardous way. each of these~ in turn~ may be further broken down into subsystems and subsubsystems. something. be identifiable. It is completely futile to attempt an analysis of something that cannot be clearly identified. etc. a piping system. Consider the telephone sitting on the desk. Note also from the defmition that a system is made up of partsrbr subsystems that interact. should design changes be made as a result of a system analysis~ the new system so resulting will have to be subjected to an analysis of its own. cord and cradle)~ or should the line running to the jack in the wall be included? Should the jack itself be included? What about the external lines to the telephone pole? What about the junction box on the pole? What about the vast complex of lines. Perhaps the most vital decision that must be made in system defmition is how to put external boundaries on the system. for instance~ the individual submarines in the Navy~s Pacific Ocean Submarine Flotilla.14 FAULT TREE HANDBOOK mutually interact in ways that mayor may not be well defmed. We now have six different possible systems depending on which two engines are out of commission. for example. a hull system. switching equipment~ etc. Suppose two engines fail. Is it sufficient to define the system simply as the instrument itself (earpiece. from a practical standpoint~ such a system is not susceptible to identification as would be. The word "deterministic" in the defInition implies that the system in question be identifiable. Furthermore~ if the physical nature of any part changesfor example by failurethe system itself also changes. local school systems all have defmite purposes and do not exist simply as figments of the imagination. the nation~ the ~ . of course. Thus. A system performs certain functions and the selection or" particular performance aspects will dictate what kind of analysis is to be conducted..~ that comprise the telephone system in the local area. a navigation system. We now have a new system quite different from the original one. which may be very complex indeed. a fourengine aircraft. generally insures that a system is not simply equal to the sum of its parts~ a point that will be continually emphasized throughout this book. or are we interested in whether the system will prove more costly than originally anticipated? It could well be that the correct system analyses in these three cases will be based on different system defmtions. Transportation systems. From a practical standpoint~ this is not very useful and~ in particular cases~ we must specify what aspects of system performance are of immediate concern.
b. This constitutes a choice of an internal boundary for the system. Observe also that one of the subsystems. the nuclear level? Here again. do I wish to extend my analysis to the individual components (screws. The system boundaries we have discussed so far have been physical boundaries. a decision can be made partially on the basis of the system aspect of interest. for purposes of the analysis. In this example. to set up temporal or timelike boundaries on a system. this dotted line constitutes an external boundary. ENVIRONMENT EXTERNAL i. etc. Consider a man who adopts the policy of habitually trading his present car in for a new one every two years. and will be emphasized throughout the book. System Definition: External and Internal Boundaries The choice of the appropriate system boundaries in particular cases is a matter of vital importance.. are the "discrete elements" mentioned in the general definition of a system. There may be several motivations for doing this. F. for the choice of external boundaries determines the comprehensiveness of the analysis. a. TYPICAL SUBSYSTEMS SYSTEM  ~ . into its smallest subsubsystems. the external system boundary will be fairly closed in. especially in the applications. B. transmitter. the atomic level. the system is the . It is also important in system definition to establish a limit of resolution. etc.. and indeed necessary in many cases.BOUNDARY A C E I I F B D a 9 m b c d e k I I I I SMALLEST SUBSUBSYSTEM (LIMIT OF RESOLUTION) h n o p q I I L I ~ Figure 14. whereas the choice of a limit of resolution limits the detail of the analysis. The smallest subsubsystems. C. the external boundary will be much further out.BASIC CONCEPTS OF SYSTEM ANALYSIS IS world? Clearly some external boundary to the system must be established and this decision will have to be made partially on the basis of what aspect of system performance is of concern. has been broken down. etc. Thus. most of them will be discussed in due course. If the immediate problem is that the bell is not loud enough to attract my attention when I am in a remote part of the house. In the telephone example. It is also possible.) of which the instrument is composed? Is it necessary to descend to the molecular level. The dotted line separates the system from the environment in which it is embedded. If the problem involves static on the line. A few further facets of this problem will be discussed briefly now. c.. A. What we have said so far can be represented as in Figure 14. It is sometimes useful to divide the system into a number of subsystems..
' The limit of resolution (internal boundary) can also be established' from consideration. particularly flow problems. I would be better advised to reduce the goals of my analysis to a determination of the optimum orientation of mv roof antenna. In some applications. An example of this would be a system whose temporal boundaries denote different operating phases or different design modifications. If the system failure probability is to be calculated. it might be desirable to include the state of the ionosphere in my analysis. it will be quite a different thing if the man intends to run the car for as long as possible. A system that does not exchange mass with its environment is termed a "closed system" and a closed system that does not exchange energy with its surroundings is termed an "adiabatic system" or an "isolated system.s of feasibility (fortunately!) and from the goal of the analysis. in which an actual physical boundary or an imaginary one is used to segregate a definite quantity of matter ("control mass") or a definite volume ("control volume"). After each phase change or design modification. the physical boundaries are subject to review and possible alteration.16 FAULT TREE HANDBOOK car and the system aspect of interest is the maintenance policy. The reader with a technical background will recognize that our definition of a system and its boundaries is analogous to a similar process involved in classical thermodynamics. timeconsuming analysis. it is interactions at this level with which we are concerned. . The system analyst must also ask the question. At any rate. but this would surely be infeasible. then the limit of resolution should cover component failures for which data are obtainable. It is possible to conduct a worthwhile study of the reliability of a population of TV sets without being concerned about what is going on at the microscopic and submicroscopic levels. under the restriction of a twoyear temporal boundary. time and staff available are inadequate for this task and more efficient analysis approaches are not possible. We now see that the external boundaries serve to delineate system outputs (effects of the system on its environment) and system inputs (effects of the environment on the system). The inputs and outputs of the system are then identified by the amounts of energy or mass passing into or out of the bounded region. This may call for an extensive. the system's physical boundaries might actually be functions of time. once the limit of resolution has been chosen (and thus the "discrete elements" defined). the maintenance policy adopted will be one thing. the system boundaries and \ ." A student who has struggled with thermodynamic problems. Optimally speaking. "Are the chosen system boundaries feasible and are they valid in view of the goal of the analysis?" To reach certain conclusions about a system. If I am concerned about my TV reception. will have been impressed with the importance of establishing appropriate system boundaries before attempting a solution of the problem. we assume no knowledge and are not concerned about interactions taking place at lower levels. A good deal of thought must be expended in the proper assignment of system boundaries and limits of resolution. it may be desirable to include a large system "volume" within the external boundaries. If the money. It is clear that. the limits of resolution serve to define the "discrete elements" of the system and to establish the basic interactions within the system. then the external boundaries must be "moved in" and the amount of information expected to result from the analysis must be reduced.
say. It is necessary at this point to discuss the respective characteristics of these approaches..3) = 106 but then the probabilities of events that we are ignoring become overriding. in addition. . of order 10 3 or greater.. in this volume. The solid inner circle represents our system boundary inside of which we are considering events whose probabilities of occurrence are. ./' Figure 15. ". When due consideration is not devoted to matters such as this. and should be a chief section in any report that is issued. Analytical Approaches Because we are concerned.. . in practical situations the boundaries or limits of resolution may need to be changed because of information gained during the analysis.. For example.SYSTEM BOUI\IDARY \.3)(10. and any modifications. /' 104 / / '~~EXTENDEDSYSTEM \ BOUNDARY . of order 10 4 or greater. There are two generic analytical methods by means of which conclusions are reached in the human sphere: induction and deduction...BASIC CONCEPTS OF SYSTEM ANALYSIS 17 limits of resolution should be defined before the analysis begins and should be adhered to while the analysis is carried out. The system boundaries and limits of resolution. . However. If the system boundaries were extended (dotted circle) we would include... The low numbers simply say that the system is not going to fail by the ways considered but instead is going to fail at a much higher probability in a way not considered... \ \ " '" I .. I I t J. must be clearly defined in any analysis..... 3.. events whose probabilities of occurrence were.. By designing twofold redundancy into our restricted system (solid circle) we could reduce event probabilities there to the order of (10.. with certain formal processes or models. consider Figure 15. it should come 'as no great suprise that these processes or models can be categorized in exactly the same way as are the processes of thought employed in the human decisionmaking process... say. Effect of System Boundaries on Event Probabilities To illustrate another facet of the bounding and resolution problem. and we are suffering under the delusion that our system is two orders of magnitude safer or more reliable than it actually is. it may be found that system schematics are not available in as detailed a form as originally conceived. the naive reliability calculator will often produce such absurd numbers as 10 16 or 10 18..
as well as details relating to the applications and evaluation of Fault Trees. Indeed. is an example of deductive system analysis. Typical of deductive analyses in real life are accident investigations: What chain of events caused the sinking of an "unsinkable" ship such as the Titanic on its maiden voyage? What failure processes. We might also inquire how the noninsertion of given control rods affects a scram system's performance or how a given initiating event. The broad principles of Fault Tree Analysis. which is generally a failure state. affects plant safety. we are constructing an inductive system analysis. such as a pipe rupture. has the task of reconstructing the events leading up to the crime.1·8 FAULT TREE HANDBOOK Inductive Approaches Induction constitutes reasoning from individual cases to a general conclusion. Fault Tree Analysis. contributed to the crash of a commercial airliner into a mountainside? The principal subject of this book. in the consideration of a certain system. Fault Hazard Analysis (FHA). we postulate that the system itself has failed in a certain way. we might inquire into how the loss of some specified control surface affects the flight of an airplane or into how the elimination of some item in the budget affects the overall operation of a school district." . etc. Holmes. In common parlance we might refer to this approach as a "Sherlock Holmesian" approach. instrumental and/or human. and we attempt to find out what modes of system/component* behavior contribute to this failure. some specific system state. In a deductive system analysis. we assume some possible component condition or initiating event and try to determine the corresponding effect on the overall system. is postulated. deductive methods are applied to determine how a given system state (usually a failed state) can occur. are given in later chapters. To repeatin an inductive approach. Thus. Failure Mode Effect and Criticality Analysis (FMECA). *A component can be a subsystem. Many approaches to inductive system analysis have been developed and we shall devote Chapter II to a discussion of the most important among them. a subsubsystem. all successful detectives are experts in deductive analysis. Examples of this method are: Preliminary Hazards Analysis (PHA). Deductive Approaches Deduction constitutes reasoning from the general to the specific. In this technique. Failure Mode and Effect Analysis (FMEA). If. Use of the word "component" often avoids an undesirable proliferation of "subs. and subsubsubsystem. and Event Tree Analysis. faced with given evidence. In summary. we postulate a particular fault or initiating condition and attempt to ascertain the effect of that fault or condition on system operation. inductive methods are applied to determine what system states (usually failed states) are possible. and chains of more basic faults contributing to this undesired event are built up in a systematic way.
. a plumbing section. Most of these problem areas assume the role of interfaces: subsystem interfaceS and disciplinarx interfaces. but gets only beautiful existence proofs. The user. One of the authors. and so on. failure modes may appear that are not at all obvious when viewed from the standpoint of the separate component parts. There was a hull section.BASIC CONCEPTS OF SYSTEM ANALYSIS 19 4. a system is a complex of subsystems manufactured by several different subcontractors or organizational elements. The principal shop project was drawing up plans for minesweepers. will be described later in the fault tree analysis discussions. and it is important that system boundaries and limits of resolution be clearly stated so that any potential hidden faults or inconsistencies will be identified. the safety coordinator who encumbers the system with so many safety devices that the reliability people have trouble getting the system to work at all. each section working happily within its own technical area. a brainchild of his own creation." or transfers. The same event symbols should be used if the integrated system is to be evaluated or quantified. Disciplinary Interfaces Difficulties frequently arise because of the differing viewpoints held by people in different disciplines or in different areas of employment. kicks it and swears at it with gay abandon. and indeed. a wiring section. This exercise demonstrated the unmistakable need for system integration. may show no such reverence. it was found that hull features and plumbing fixtures were frequently incompatible.." Systems which have control system interfaces (e. Each subcontractor or organizational element takes appropriate steps to assure the quality of his own product. Other conflicts can readily be brought to mind: the engineering supervisor who expects quantitative results from his mathematical section. that wiring and plumbing often conflicted with vent ducts. He drops it. These "places. in this case). It is important that the same fault defmitions be used in analyses which are to be integrated. a spray system having an injection signal input) can be analyzed with appropriate "places" left for the control analysis which is furnished as one entity. The draftsmen were divided into groups. on the other hand. The circuit designer regards his black box as a thing of beau ty and a joy forever. that the gyro room door could not be properly opened because of the placement of a drain from the head on the upper deck. The trouble is that when the subsystems are put together to form the overall system. He handles it gently and reverently. Interface problems often lie in control systems and it is best not to split any control system into "pieces. etc.g. . Subsystem Interfaces Generally. Perils and PitfaIls In the study of systems there are dangerous reefs which circumscribe the course which the analyst must steer. was employed as a marine draftsman. When an attempt was made to draw up a composite for one of the compartments (the gyro room. as a mere youth.
6 = system unavailability due to hardware failure per month 720 = number of hours in a month 1/12 = hours required for maintenance check per month Note that the probability of system unavailability due to our maintenance policy (only 5 minutes downtime per month) is greater by two orders of magnitude than the probability that the system will be down because of hardware failure. and suppose that the probability of system failure due to hardware failure is 10 6 per month. Then. the total probability that the system will be unavailable is the sum of its unavailability due to hardware failures and its unavailability due to the maintenance policy or: where 10. In this case the best maintenance policy may be none at all! The system analyst (system integrater) must be unbiased enough and knowledgeable enough to recognize interface problem areas when they occurand they will occur. consider a system that is shut down for an online maintenance check for 5 minutes every month. on a monthly basis. .110 FAULT TREE HANDBOOK As an example of an interface between operational and maintenance personnel.
For systems that exhibit any degree of complexity (Le. "What happens if?" More formally.OVERVIEW OF INDUCTIVE METHODS 1. in many systems (probably the vast majority) for which the expenses and refinements of Fault Tree Analysis are not warranted. For this reason. attempts to identify all possible system hazards or all possible component failure modesboth singly and in combination:become simply impossible. This chapter}s devoted to a discussion of inductive methods. the failure probability for the system. Introduction In the last chapter we defmed the two approaches to system analysis: inductive and deductive. The deductive approach is Fault Tree Analysisthe main topic of this book... Exhaustiveness in the analysis is a luxury that we cannot afford. Under this assumption.. money and manpower. This process is represented below: Component Failure Probability fA A B where F. these techniques provide a useful and illuminating comparison. In everyday language the inductive techniques provide answers to the generic question. nl . For this reason the inductive techniques that we are going to discuss are generally circumscribed by considerations of time. In safety and reliability studies the "state of existence" is a fault. The individual component probabilities are then added and this sum provides an upper bound on the probability of system failure. Second. pessimistic) assumption we can make about a system is that any single component failure will produce complete system failure. 2. obtaining an upper bound on the probability of system failure is especially straightforward. the inductive methods provide a valid and systematic way to identify and correct undesirable or hazardous conditions. the process consists of assuming a particular state of existence of a component or components and analyzing to determine the effect of that condition on the system.. for most systems). The "Parts Count" Approach Probably the simplest and most conservative (Le. We simply list all the components along with their estimated probabilities of failure. is equal to fA +fB +. We have felt it necessary to devote a full chapter to inductIve methods for two reasons. This may not necessarily be true in other areas.CHAPTER II . it is especially important for the fault tree analyst to be conversant with these alternative procedures. First of all. to Fault Tree Analysis.
. Nevertheless. For a particular system.3 which is considerably higher than 1 x 106 . A B Figure III. The "Parts Count" technique is conservative because if critical components exist. A System of Two Amplifiers in Parallel Suppose the probability of failure of amplifier A is 1 x 103 and the probability of failure of amplifier B is 1 x 10 3. . i. the parts count method covers multiple component failures due to a common cause. the probability of system failure is 1 x 10. then it will not impact or contribute using more refined analyses. 3. The parts count method thus can give results which are conservative by orders of magnitude if the system is redundant. Furthermore. so that no single failure is actually catastrophic for the system. then the parts count method can give reasonably accurate results.3 and fB = I X 10.e. When the system does have single failures.112 FAULT TREE HANDBOOK The failure probabilities can be failure rates. the Parts Count technique can provide a very pessimistic estimate of the system failure probability and the degree of pessimism is generally not quantifiable.3 = 2 x 10. the component probabilities are simply summed and hence the "parts count system failure probability" is 1 x 10. other more detailed techniques have been devised. By the parts count method.3 .e. the parts count method can also be used in sensitivity studies. any dependencies among the failures are covered.3 = 1 x 106 . i. and assuming independence of the two amplifiers.. all have an equally deleterious effect on system operation. Because all the components are treated as single failures (any single component failure causes system failure). unreliabilities.3 + 1 x 10. We first *Common cause failures will be discussed in subsequent chapters.3 x 1 x 10. Because the parallel configuration implies that system failure would occur only if both amplifiers fail. fA = I x 10. they often appear redundantly. if the system or subsystem failure probability does not impact or does not contribute using the parts count method. let us see what results the Parts Count approach yields for the simple parallel configuration oftwo amplifiers shown in Figure III. a component can often depart from its normal operating mode in several different ways and these failure modes will not. Failure Mode and Effect Analysis (FMEA) Inasmuch as the Parts Count approach is very simplistic and can give very conservative results. in general. * Finally. or unavailabilities depending on the particular application (these more specific terms will be covered later).
We now describe a table containing the following information: (1) Component designation (2) Failure probability (failure rates or unavailabilities are some of the specific characteristics used) (3) Component failure modes (4) Percent of total failures attributable to each mode (5) Effects on overall system..OVERVIEW OF INDUCTNE METHODS 113 discuss Failure Mode and Effect Analysis and return for a closer look at the system shown in Figure III. we obtain probability of .e. Based on the table. when either amplifier fails open.. 5% of them to the "short" mode. "Critical" thus means that the single failure causes system failure. weak signal. we estimate that 90% of amplifier failures can be attributed to the "open" mode. etc. Column 5. we can now more realistically calculate the probability of system failure from single causes. The two principal ones are "open" and "short" but suppose that our analysis has also detected 28 other modes (e. We know that whenever either amplifier fails shorted. What is the criticality of the other 28 failure modes? In this example we have been conservative and we are considering them all as critical. classified into various categories (the two simplest categories are "critical" and "noncritical"). The result for our redundant amplifier system might be as in Table III. Adding up the critical column.g. Table III. The numbers shown in the Critical column are obtained from multiplying the appropriate percentage in Column 4 by 10.3 from Column 2. considering now only those failure modes which are critical. We recognize that amplifiers can fail in several ways and our first task is the identification of these various failure modes. Redundant Amplifier Analysis 1 2 Failure Probability 3 4 % Failures by Mode 5 Effects NonCritical Component Failure Mode Critical A 1x10·3 Open Short Other 90 5 5 90 5 5 X X (5x10.5 ) X 15x105 ) X X 15x10. i. the system fails so we put X's in the "Critical" column for these modes.5 ) X 15x105 ) B 1x103 Open Short Other Based on prior experience with this type of amplifier. and the balance of 5% to the "other" modes. A short of any amplifier is one of the more critical failure modes inasmuch as it will always cause a failure of the system. the occurrence of anyone causes system failure.). On the other hand. there is no effect on the system from the single failure because of the parallel configuration. intermittent ground.
At this point the reader should be warned of a most hazardous pitfall that is present to a greater or lesser extent in all these inductive techniques: the potential of mistaking form for substance. system oriented. the effects are faults on the system operation. Although FMECA is not an optimal method for detecting hazards. but the number of possible component failure modes that can realistically be considered is limited.114 FAULT TREE HANDBOOK system failure = 5 x 10 5 + 5 x 10 5 + 5 x 10 5 + 5 x 10 5 = 2 x 10 4. Column 2 explains why this condition is a problem. as in our example. if the critical failure modes are a small percentage of the total failure modes (e. an order of magnitude or more. For this reason it' might be better for the analyst not to restrict himself to any prepared formalism. is essentially similar to a Failure Mode and Effects Analysis in which the criticality of the failure is analyzed in greater detail. failures having "noncritical" effects. This is a less conservative result compared to 2 x 10 3 obtained from the parts count method where the critiCal failure modes were not separated.g. to plant personnel and other humans. . the analysis needs be no more elaborate than is necessary for these objectives. thus eliminating costly design changes later on.lld be conducted as early in the product development stage as possible. (2) Potential Effects of the Fault. Preliminary Hazard Analysis (PHA) The techniques described so far have been. for the most part. The subject of this section Preliminary Hazard Analysis (PHA). The four fundamental facets of such an approach are (1) Fault Identification. is a method for assessing the potential hazards posed. and (4) Summary of Findings. A PHA study shol. 4. These techniques call for a wellcoordinated team approach.. . i. The difference between the two system results can be large. This will permit the early development of design and procedural safety requirements for controlling these hazardous conditions. the exercise will be completely futile. it is frequently used in the course of a system safety analysis. 5. it is foolhardy for a single analyst to imagine that he alone can conduct a correct and comprehensive survey of all system faults and their effects on the system. If the proje~t becomes simply a matter of fIlling out forms instead of conducting a proper analysis. The objectives of the analysis are to identify single failure modes and to quantify these modes. Finally. Failure Mode Effect and Criticality Analysis (FMECA) Failure Mode Effect and Criticality Analysis (FMECA). with reasonable certainty. Column 4 states whether the situation is under control or whether further steps should be taken. by the system. Column 3 describes what has been done to compensate for or to control the condition. i.. those component. 10% or less). (3) Existing or Projected Compensation and/or Control. These four facets generally appear as column headings in an FMECA layout. and assurances and controls are described for limiting the likelihood of such failures.e.e. Conservatism dictates that unspecified failure modes and questionable effects be deemed "critical" (as in the previous example).. In FMEA (and its variants) we can identify. Column 1 identifies the possible hazardous condition. The objectives of a PHA are to identify potential hazardous conditions inherent within the system and to determine the significance or criticality of potential accidents that might arise. Another point: if the system is at all complex.
was developed as a special purpose tool for use on projects involving many organizations. An inductive technique that also considers the effects of double failures is the Double Failure Matrix (DFM). In order to illustrate its use. As will become apparent in later chapters. and the use of numerous checklists that have been developed from time to time. one of whom is supposed to act as integrator. It was first used to good purpose in the Minuteman III program. the exercise of engineering judgment.OVERVIEW OF INDUCTNE METHODS 115 The first step in a PHA is to identify potentially hazardous elements or components within the system. The second step in a PHA is the identification of those events that could possibly transform specific hazardous conditions into potential accidents. we must first discuss various ways in which faults may be categorized. Fault Hazard Analysis (FHA) Another method. Various columnar formats have been developed to facilitate the PHA process. Fault Hazard Analysis (FHA). its use is feasible only for relatively noncompiex systems. This column should contain a listing of those operational or environmental variables to which the component is sensitive. A basic categorization which is related to that given in MIL STn 882 and modified for these discussions is as shown in Table 112. A typical FHA form uses several columns as follows: Column (1 )Component identification Column (2)Failure probability Column (3)Failure modes (identify all possible modes) Column (4)Percent failures by mode Column (5)Effect of failure (traced up to some relevant interface) Column (6)Identification of upstream component that could command or initiate the fault in question Column (7)Factors that could cause secondary failures (including threshold levels). Then the seriousness of these potential accidents is assessed to determine whether preventive measures should be taken. Perhaps the simplest goes something like this: Column (1 )Component/subsystem and hazard modes Column (2)Possible effects Column (3)Compensation and control Column (4)Findings and remarks 6. This process is facilitated by engineering experience. 7. Column (8)Remarks The FHA is generally like an FMEA or FMECA with the addition of the extra information given in Columns 6 and 7. Columns 6 and 7 have special significance for the fault tree analyst. This technique is especially valuable for detecting faults that cross organizational interfaces. . Double Failure Matrix (DFM) The previous techniques concerned themselves with the effects of single failures.
For example. The categorization will depend on the conditions assumed to exist previously. the loss of one of two redundant pumps. catastrophic pressure vessel failure. for example. for example. Fault Categories and Corresponding System Effects Fault Category I II III IV Effect on System Negligible Marginal Critical Catastroph ic It is desirable to give more complete definitions of the system effects: (I) Negligibleloss of function that has no effect on system. Table II3. if one pump is assumed failed. The above crude categorizations can be refined in many ways.116 FAULT TREE HANDBOOK Table 112.g. six fault categories were defined as shown in Table 113. either of which can perform a required function. (IV) Catastrophicthis fault will produce severe consequences which can involve injuries or fatalities. (II) Marginalthis fault will degrade the system to some extent but will not cause the system to be unavailable. Fault Categories for NERVA Project Fault Category I IIA Negligible A second fault event causes a transition into Category III (Critical) A second fault event causes a transition into Category IV (Catastroph ic) A system safety problem whose effect depends upon the situation (e. offsite power service remains on) A critical failure and mission must be aborted A catastrophic failure Effect on System liB IIC III IV . on the NERVA project. for example. For example. (III) Criticalthis fault will completely degrade system performance. and the categorizations can change as the assumed conditions change. the loss of a component which renders a safety system unavailable. which is no problem as long as primary. then the failure of a second redundant pump is a critical failure. the failure of all backup onsite power sources..
BVB and CVB and this important additional information has been displayed in the principal diagonal cells of the matrix. the single failure states. we can cascade into Category IV if either BVB or CVB is also failed closed which is why "Two Ways" is given in Table II4. if either BVB or CVB is also failed closed. if Block Valve A (BVA) is failed open we have Category IIA because. to wit. In contrast. For illustrative purposes we have filled in the entire matrix. For instance. we cascade into Category IV. In this figure.1ain diagonal terms. we cascade into Category III. CVA must be failed open too. if BVA is failed closed. Fuel System Schematic Let us define two fault states for this system and categorize them as follows: Fault State No flow when needed Flow cannot be shut off Category IV III We now proceed to consider all possible component failures and their fault categories. if Control Valve A (CVA) is also failed open. Similar considerations apply to the single failures of CVA. Note that if BVA is failed open. for a firstorder analysis we would be concerned only with the I1. This type of analysis is conveniently _ systematized in the Double Failure Matrix shown in Table II4. BLOCK VALVE A CONTROL VALVE A BVA eVA BVB CVB BLOCK VALVE B CONTROL VALVE B Figure 112. namely. Now concentrating only on single failures.OVERVIEW OF INDUCTIVE METHODS 117 To illustrate the concept of DFM. there is only one way in which a second failure can cascade us into Category III. If BVA is failed closed we have Category lIB because. consider the simple subsystem shown in Figure 112. the block valves can operate only as either fully open or fully closed. we can conduct a hazard category count as the following table shows: Number of Ways of Occurring Hazard Category IIA lIB 4 8 . whereas the control valves are proportional valves which may be partially open or partially closed.
namely.118 FAULT TREE HANDBOOK Table 114." For Configuration II we naturally define the same system fault states as for Configuration I. for instance the configuration shown in Figure 113. BVA FUEL SUPPLY eVA MOTOR BVB eVB Figure 113. Alternative Fuel System Schematic For brevity let us refer to the system of Figure II2 as "Configuration I" and that of Figure II3 as "Configuration II. Fuel System Double Failure Matrix BVA Open (One Way) IIA Closed Open CVA Closed Open BVB Closed IIA or liB Open CVB Closed Open BVA Closed X III liB IIA IIA or liB X (Two Ways) liB liB liB IIA or liB III liB IIA IIA IIA liB liB IIA or liB IIA IV IIA or liB IIA l IV Open CVA Closed (One Wayl IIA X IIA IIA or liB IIA IIA or liB X (Two Ways) liB IIA or liB IV IIA or liB IIA or liB IIA or liB IIA IV IIA or liB III IV Open BVB Closed (One Wayl IIA IV IIA or liB X III liB X (Two Ways) liB liB liB liB liB (One Way) IIA liB Open CVB Closed IIA IIA or liB IV IV X IX (Two Waysl liB How can this information be used? One application would be a description and subsequent review of how these hazard categories are controlled or are insured against. Another application would be a comparison between the configuration of valves shown in Figure II2 and an alternative design. "no flow when needed" is .
concentrating only on the single failure states. lIB 4 . Configuration II becomes essentially identical to Configuration I. and if BVX is failed open. we can conduct another hazard category count for Configuration II. Now. The results are shown in the following table: Number of Ways of Occurring 4 Hazard Category IIA . Table 115.OVERVIEW OF INDUCTIVE METHODS 119 Category IV and "flow cannot be shut off' is Category III. we have a pipe connecting the two main flow channels. We see that if BVX is failed closed. We can now pose the following question: "Which configuration is the more desirable with respect to the relative number of occurrences of the various hazard categories that we have defined?" The appropriate Double Failure Matrix for Configuration II is shown in Table 115. Alternative Fuel System Double Failure Matrix BVA Open Open BVA Closed liB (One Wayl IIA (One Way) liB (One Way) I CVA Closed Open Closed Open BVB Closed Open CVB Closed BVX o C IIA (One Way) Open CVA Closed Open BVB Closed IIA (One Way) liB (One Way) IIA (One Way) liB (One Way) I Open CVB Closed Open BVX Closed I In this case we have filled in only the principal diagonal cells which correspond to single failure states.
Instead of working in "failure space" we can eqUivalently work in "success space. we see that they are the same from the standpoint of cascading into Category III but that Configuration II has approximately onehalf as many ways to cascade into Category IV. Consider the configuration of two valves in parallel shown in Figure 114. (2) At least one valve must open for each phase. . (3) Both valves must be closed at the end of each phase." . Configuration II is the better design. Therefore. 8. This system may be analyzed either by a consideration of single failures (the probabilities of multiple failures are deemed negligible) or by a consideration of "success paths. Let us take up the former first. and valve fails to close on demand. For purposes of the analysis. using this criterion. Where differences are not as obvious. let us assume the following numbers: P (valve does not open) = 1 x 104 for each phase P (valve does not close) = 2 x 104 for each phase where the symbol "P" denotes probability." We give a brief example of the equivalence and then return to our failure space approach. more formal analysis approaches may also be used for additional information (these approaches will be discussed in the later sections). The two relevant component failure modes are: valve fails to open on demand. 1 Figure 114. Redundant Configuration of 2 Valves The single failure analysis of the system can be tabulated as in Table II6. The valves are assumed to be identical. System requirements are as follows: (1) The operation involves two phases.1110 FAULT TREE HANDBOOK Comparing the two configurations. Success Path Models We have been and will be discussing failures.
RoXRo Rc)3 . If R denotes "valve i opens successfully. Now let us see whether we can duplicate this result by considering the possible successes. P(path 3) = 2(1 .OVERVIEW OF INDUCfIVE METHODS 011 Table 116. verbally and schematically.Ro)(RoRc)2 Path 3: One valve fails to open on the second cycle but the other valve functions properly for both cycles. we have the following: o c Path 1: Both valves function properly for both cycles. There are three identifiable success paths which we can specify botp. Single Failure Analysis for Redundant Valve Configuration FAILURE MODE Failure to open Failure to close FAILURE EFFECT  COMPONENT Valve # 1 PROBABILITY OF OCCURRENCE (F) 4 x 104 (either phase) System failure  Valve # 2 Failure to open Failure to close System failure 4 x 104 (either phase) The system failure probability = 8 x 10 4." and R denotes "valve i closes successfully." and P (Path i) denotes the success probability associated withthe ith success path. P(path 2) = 2(1 . P(path 1) = (RoReJ4 Path 2: One valve fails to open on the first cycle but the other valve functions properly for both cycles.
Ro)(RoRc)2 + 2(1.99919997 '" 1 . In general.Ro)(RoRc)3 =0. .ll12 FAULT TREE HANDBOOK Numerically. in actual practice they generally play the role of "overview" methods and. 9. Worse yet. in all of these analyses the consequences of a certain event must be balanced against its likelihood of occurrence. on many occasions. system reliability is given by RSYSTEM = (Ro Rd 4 + 2(1. the identification of all possible combinations of component failure modes will be a truly Herculean task. the identification of all component failure modes will be a laborious. Conclusions Although the various inductive methods that we have discussed can be elaborated to almost any desirable extent.00019982 =0. For any reasonably complex system.00019988 + 0.8 x 104 which is essentially the same result as before but it can be seen that the failure approach is considerably less laborious.99880027 + 0. process. Thus. and probably unnecessary. this is all that is necessary. it is a waste of time to bother with failure effects (single or in combination) that have little or no effect on system operation or whose probabilities of occurrence are entirely negligible.
Although our first inclination might be to select the optimistic view of our systemsuccessrather than the pessimistic onefailure. We may desire an airplane that flies high. When the final version of this aircraft rolls off the production line.FAULT TREE ANALYSISBASIC CONCEPTS 1. for instance. some of these features may have been compromised in the course of making the usual III1 . Failure vs. Thus. which is the subject of the remainder of this text. First of all. it is generally easier to attain concurrence on what constitutes failure than it is to agree on what constitutes success. Chapter III presents the basic concepts and definitions necessary for an understanding of the deductive Fault Tree Analysis approach.CHAPTER III . or we can enumerate various ways for system failure. Orientation In Chapter I we introduced two approaches to system analysis: inductive and deductive. SUCCESS SPACE MINIMUM ACCEPTABLE SUCCESS MINIMUM ANTICIPATED SUCCESS MAXIMUM AI\ITlCIPATED SUCCESS _ _ _ _ _ _ _ _+COMPLETE FAILURE MAXIMUM TOLERABLE FAILURE MAXIMUM ANTICIPATED FAILURE +MINIMUM ANTICIPATED FAILURE TOTAL SUCCESS 1 FAILURE SPACE Figure IIII. we shall see that this is not necessarily the most advantageous one. "maximum anticipated success" in success space can be thought of as coinciding with "minimum anticipated failure" in failure space. 2. there are several overriding advantages that accrue to the failure space standpoint. moves fast and carries a big load. travels far without refueling. Success Models The operation of a system can be considered from two standpoints: we can enumerate various ways for system success. The Failure SpaceSuccess Space Concept It is interesting to note that certain identifiable points in success space coincide with certain analogous points in failure space. section 8. We have already seen an example of this in Chapter II. Figure 1111 depicts the Failure/Success space concept. Chapter II described the major inductive methods. From an analytical standpoint.
" in particular.III2 FAULT TREE HANDBOOK tradeoffs. We have been discussing why it is more advantageous for the analyst to work in failure space as opposed to success space.. Another point in favor of the use of failure space is that. it may be necessary to construct only one or two system models such as fault trees. In analysis. it may be helpful to subject some everyday occurrence (a man driving to his office) to analysis in failure space (see Figure 1112). there will be little argument that this event constitutes system failure. such as "valve does not open" which characterizes the failure space (partial failures. The "mission" to which Figure 1112 refers is the transport of Mr. whereas the event. Le. The desired arrival time is 8:30. Below "minimum anticipated failure" lie a number of possible incidents that constitute minor annoyances. A good example of the parsimony of events characteristic of failure space is the Minuteman missile analysis. When successes are considered. a valve opens partially. Only three fault trees were drawn corresponding to the three undesired events: inadvertent programmed launch. Whether the vehicle is a "success" or not may very well be a matter of controversy. . are also difficult events to model because of their continuous possibilities)." is generally easy to define. the degree of usefulness. "Success" tends to be associated with the efficiency of a system. Above this point lie incidents of increasing seriousness terminating in the ultimate catastrophe of death. Thus. X's arrival time to be delayed half an hour or less. When failures are considered. which cover all the significant failure modes. X arrives at his office by 9 :00." may be much more difficult to tie down. It was found that careful analysis of just these three events involved a complete overview of the whole complex system. has been shown many times in the past. the event "failure. To help fix our ideas. This fact makes the use of failure space in analysis much more valuable than the use of success space. On the other hand. "complete failure. therefore. the size of the population in failure space is less than the size of the population in success space. Thus. It is perhaps reasonable to let the point "maximum tolerable failure" coincide with some accident that causes some damage to the car and considerable delay but no personal injury. Actually all that is necessary is to demonstrate that consideration of failure space allows the analyst to get his job done. These characteristics are describable by continuous variables which are not easily modeled in terms of simple discrete events." Between this point and "minimum anticipated failure" lie a number of occurrences that cause Mr. accidental motor ignition. purely from a practical point of view. it may become necessary to construct several hundred system models covering various definitions of success. and this. from a practical standpoint there are generally more ways to success than there are to failure. X from arriving at the desired time. Arrival at 9:00 is labeled "maximum anticipated failure. indeed. although theoretically the number of ways in which a system can fail and the number of ways in which a system can 'succeed are both infinite. it is generally more efficient to make calculations on the basis of failure space. if the airplane crashes in flames. the amount of output. The drawing of tree diagrams for a complex system is an expensive and timeconsuming operation. and fault launch. "success. but the mission will be considered marginally successful if Mr. and production and marketing features. X by automobile from his home to his office. but which do not prevent Mr.
The point "minimum anticipated failure" would correspond to the attainment of all specifications and points below that would indicate that some of the specifications have been more than met. the analysis does not provide a sufficiently broad view of the system.FAULT TREE ANALYSIS .or catastrophic failure as mentioned above. The Undesired Event Concept Fault tree analysis is a deductive failure analysis which focuses on one particular undesired event and which provides a method for determining causes of this event.CONCEPTS 1113 COMPLETE FAILURE MAXIMUM TOLERABLE FAILURE ACCIDENT (DEATH OR CRIPPLING INJURY) MINOR ACCIDENT FLAT TIRE WINDSHIELD WIPERS INOPERATIVE (HEAVY RAIN) TRAFFIC JAM MAXIMUM ANTICIPATED FAILURE_~ARRIVES AT 9:00 WINDSHIELD WIPERS INOPERATIVE (LIGHT RAIN) TRAFFIC CONGESTION MINIMUM ANTICIPATED FAILURE_~ARRIVES AT 8AS LOST HUBCAP WINDSHIELD WIPERS INOPERATIVE (CLEAR WEATHER) TOTAL SUCCESS~ ARRIVES AT 8:30 (NO DIFFICULTIES WHATSOEVER) Figure III2. Fault Tree Analysis addresses itself to the identification and assessment of just such catastrophic occurrences and complete failures. The point "maximum tolerable failure" corresponds to the survival point of the company building the aircraft. for example. and generally consists of a complete. A chart such as Figure 1112 might also be used to pinpoint events in. the production of a commercial airliner. Careful choice of the top event is important to the success of the analysis. the analysis become unmanageable. Generally speaking. Above that point. The point "maximum anticipated failure" would correspond to some tradeoff point at which all specifications had not been met but the discrepancies were not serious enough to degrade the saleability of the airplane in a material way. . Use of Failure Space in Transport Example Note that an event such as "windshield wipers inoperative" will be positioned along the line according to the nature of the environment at that time. If it is too general. only intolerable catastrophes occur. 3. if it is too specific. Fault tree analysis can be an expensive and timeconsuming exercise and its cost must be measured against the cost associated with the occurrence of the relevant undesired event. The undesired event constitutes the top event in a fault tree diagram constructed for the system.
" (b) Crash of commercial airliner with loss of several hundred lives. (c) No spray when demanded from the containment spray injection system in a nuclear reactor. In the next chapter we will define Fault Tree Analysis and proceed to a careful definition of the gates and fault events which constitute the building blocks of a fault tree. Summary In this chapter we have discussed the "failure space" and "undesired event" concepts which underlie the fault tree approach. . In the analysis we might separate "failure under hostile attack" from "failure under routine operation. (f) Automobile does not start when ignition key is turned.1114 FAULT TREE HANDBOOK We now give some examples of top events that might be suitable for beginning a fault tree analysis: (a) Catastrophic failure of a submarine while the submarine is submerged. 4. (d) Premature fullscale yield of a nuclear warhead. (e) Loss of spacecraft and astronauts in the space exploration program.
Figure IVI shows a typical fault tree. Thus. which. PRIMARY EVENTS The primary events of a fault tree are those events. 2. It is important to understand that a fault tree is not a model of all possible system failures or all possible causes for system failure.CHAPTER IV . whereby an undesired state of the system is specified (usually a state that is critical from a safety standpoint). A fault tree is a complex of entities known as "gates" which serve to permit or inhibit the passage of fault logic up the tree. The gates show the relationships of events needed for the occurrence of a "higher" event. The gate symbol denotes the type of relationship of the input events required for the output event. These are the events for which IVI . or any other pertinent events which can lead to the undesired event.THE BASIC ELEMENTS OF A FAULT TREE I . The Fault Tree Model A fault tree analysis can be simply described as an analytical technique. is true of virtually all varieties of system models. It is a qualitative model that can be evaluated quantitatively and often is. SymbologyThe Building Blocks of the Fault Tree A typical fault tree is composed of a number of symbols which are described in detail in the remaining sections of this chapter and are summarized for the reader's convenience in Table IVI. human errors. the "lower" events are the "inputs" to the gate. Moreover. The fact that a fault tree is a particularly convenient model to quantify does not change the qualitative nature of the model itself. for one reason or another. It is also important to point out that a fault tree is not in itself a quantitative model. have not been further developed. The faults can be events that are associated with component hardware failures. The "higher" event is the "output" of the gate. and the system is then analyzed in the context of its environment and operation to find all credible ways in which the undesired event can occur. and the fault tree thus includes only those faults that contribute to this top event. A fault tree is tailored to its top event which corresponds to some particular system failure mode. these faults are not exhaustivethey cover only the most credible faults as assessed by the analyst. The fault tree itself is a graphic model of the various parallel and sequential combinations of faults that will result in the occurrence of the predefined undesired event. A fault tree thus depicts the logical interrelationships of basic events that lead to the undesired eventwhich is the top event of the fault tree. gates are somewhat analogous to switches in an electrical circuit or two valves in a piping layout. of course. This qualitative aspect.
IV2 FAULT TREE HANDBOOK TEST SIGNAL REMAINS ON KJCOll FOR t"60SEC Figure IVI. There are four types of primary events. the ci. . These are: The Basic Event o The circle describes a basic initiating fault event that requires no further development. Typical Fault Tree probabilities will have to be provided if the fault tree is to be used for computing the probability of the top event. In other words.rc1e signifies that the appropriate limit of resolution has been reached.
.Output fault occurs if at least one of the input faults occurs EXCLUSIVE OR .An event which is not further developed either because it is of insufficient consequence or because information is unavailable EXTERNAL EVENT .Output fault occurs if all of the input faults occur OR .BASIC ELEMENTS OF A FAULT TREE Table IVI.A fault event that occurs because of one or more antecedent causes acting through logic gates GATE SYMBOLS AND .Output fault occurs if all of the input faults occur in a specific sequence {the sequence is represented by a CONDITIONING EVENT drawn to the right of the gate} o INHIBIT .An event which is normally expected to occur <> o INTERMEDIATE EVENT SYMBOLS INTERMEDIATE EVENT . Fault Tree Symbols IV3 PRIMARY EVENT SYMBOLS o o BASIC EVENT . on another page) TRANSFER OUT .Output fault occurs if the (single) input fault occurs in the presence of an enabling condition {the enabling condition is represented by a CONDITIONING EVENT drawn to the right of the gate} TRANSFER SYMBOLS TRANSFER IN .g.Indicates that the tree is developed further at the occurrence of the corresponding TRANSFER OUT (e.A basic initiating fault requiring no further development CONDITIONING EVENT .Output fault occurs if exactly one of the input faults occurs PRIORITY AND .Specific conditions or restrictions that apply to any logic gate {used primarily with PR lOR lTV AND and INHIBIT gates} UNDEVELOPED EVENT .Indicates that this portion of the tree must be attached at the corresponding TRANSF E R IN .
gates are symbolized by a shield with a flat or curved base. the house symbol displays events that are not. The ORGate The ORgate is used to show that the output event occurs only if one or more of the input events occur. It is used primarily with the INHIBIT and PRIORITY ANDgates. There may be any number of input events to an ORgate..IV4 The Undeveloped Event FAULT TREE HANDBOOK The diamond describes a specific fault event that is not further developed. faults. . GATES There are two basic types of fault tree gates: the ORgate and the ANDgate. The External Event The house is used to signify an event that is normally expected to occur: e. of themselves. either because the event is of insufficient consequence or because information relevant to the event is unavailable.g. INTERMEDIATE EVENTS c An intermediate event is a fault event which occurs because of one or more o antecedent causes acting through logic gates. a phase change in a dynamic system. All other gates are really special cases of these two basic types. Thus. All intermediate events are symbolized by rectangles. With one exception. The Conditioning Event 0 The ellipse is used to record any conditions or restrictions that apply to any logic gate.
for an ORgate. . see Figure N4. Specific Example of the ORGate Note that the subevents in Figure N3 can be further developed. for instance.BASIC ELEMENTS OF A FAULT TREE NS Figure IV2 shows a typical twoinput ORgate with input events A and B and output event Q. OUTPUT a INPUT A INPUT B Figure IV2. the input faults are never the causes of the output fault. The ORGate It is important to understand that causality never passes through an ORgate. Event Q occurs if A occurs. VALVE IS FAILED CLOSED VALVE IS CLOSED DUE TO HARDWARE FAILURE VALVE IS CLOSED DUE TO HUMAN ERROR VALVE IS CLOSED DUE TO TESTING Figure IV3. Figure IV3 helps to clarify this point. Inputs to an ORgate are identical to the output but are more specifically defined as to cause. or both A and B occur. That is. B occurs.
ORGate for Human Error However. Figure IVS shows a typical twoinput ANDgate with input events A and B. the event VALVE IS INADVERTENTLY CLOSED DURING MAINTENANCE is still a restatement of the output event of the first ORgate VALVE IS FAILED CLOSED with regard to a specific cause. and output event Q. One way to detect improperly drawn fault trees is to look for cases in which causality passes through an ORgate.IV6 FAULT TREE HANDBOOK VALVE IS CLOSED DUE TO HUMAN ERROR VALVE IS NOT OPENED FROM LAST TEST VALVE IS INADVERTENTl Y CLOSED DURING MAINTENANCE Figure IV4. This is an indication of a missing ANDgate (see following definition) and is a sign of the use of improper logic in the conduct of the analysis. Event Q occurs only if events A and B both occur. . There may be any number of input faults to an ANDgate. The ANDGate The ANDgate is used to show that the output fault occurs only if all the input faults occur.
ALL ONSITE DC POWER IS FAILED I DIESEL GENERATOR 1 IS FAILED 1 ~ DIESEL GENERATOR 2 IS FAILED I BATTERY IS FAILED Figure IV6.g. Specific Example of an ANDGate When describing the events input to an ANDgate. The ANDGate In contrast to the ORgate the ANDgate does specify a causal relationship between the inputs and the output. when the first failure occurs (e. For example. the system may automatically switch in a standby unit.. An example of an ANDgate is shown in Figure IV6. input A of Figure IVS). is .. The second failure.BASIC ELEMENTS OF A FAULT TREE IV? OUTPUT Q INPUT A INPUT B Figure IVS. any dependencies must be incorporated in the event definitions if the dependencies affect the system logic. The ANDgate implies nothing whatsoever about the antecedents of the input faults. Dependencies generally exist when the failure "changes" the system. input B of Figure IVS. Le. the input faults collectively represent the cause of the output fault. A failure of both diesel generators and of the battery will result in a failure of all onsite DC power.
" . In this case.IV8 FAULT TREE HANDBOOK now analyzed with the standby unit assumed to be in place. Q OCCURS A OCCURS 8 OCCURS GIVEN THE OCCURENCE B OCCURS A OCCURS GIVEN THE OCCURENCE OF A OF B Figure IV? ANDGate Relationship with Dependency Explicitly Shown That is. the subtree describing the mechanisms or antecedent causes of the event I A OCCURS I will be different from the subtree describing the mechanisms for the event. input B of Figure IV5 would be more precisely defined as "input B given the occurrence of A. The INHIBITGate . I GIVEN THE OCCURRENCE OF B I A OCCURS For multiple inputs to an ANDgate with dependencies affecting system logic among the input events. The variant of the ANDgate shown in Figure IV? explicitly shows the dependencies and is useful for those situations when the occurrence of one of the faults alters the operating modes and/or stress levels in the system in a manner affecting the occurrence mechanism of the other fault. the "givens" must incorporate all preceding events.
" and the conditional input would be "T < T critical'" CHEMICAL REACTION GOES TO COMPLETION FROZEN GASOLINE LINE ALL REAGENTS PRESENT EXISTENCE OFLOW TEMPERATURE T Figure IV9." the input event would be "existence of low temperature. but some qualifying condition must be satisfied before the input can produce the output. Examples of the INHIBITGate . but its presence is necessary. (b) If a frozen gasoline line constitutes an event of interest. conditional input B and output Q. The INHIBITGate To clarify this concept. represented by the hexagon. Event Q occurs only if input A occurs under the condition specified by input B. A description of this conditional input is spelled out within an ellipse drawn to the right of the gate. In this case the output event would be "frozen gasoline line. a OUTPUT INPUT A Figure IV8. Figure IV8 shows a typical INHIBITgate with input A. The output is caused by a single input.BASIC ELEMENTS OF A FAULT TREE IV9 The INHIBITgate. two examples are given below and are illustrated in Figure IV9. The condition that must exist is the conditional input. The catalyst does not participate in the reaction. such an event can occur only when the temperature T is less than T critical' the temperature at which the gasoline freezes. (a) Many chemical reactions go to completion only in the presence of a catalyst. is a special case of the ANDgate.
. but just because A occurs it does not mean that Q follows inevitably.The EXCLUSIVE ORGate The EXCLUSIVE ORgate is a special case of the ORgate in which the output event occurs only if exactly one of the input events occur. OUTPUT Q CONDITION A Figure IVIO. another type of INHIBITgate depicted in Figure IVlO is used. but not always sufficient. However. An Alternative Type of INHIBITGate In Figure IVlO. The portion of time Q occurs when A occurs is given in the conditional input ellipse.IVIO FAULT TREE HANDBOOK Occasionally. especially in the investigation of secondary failures (see Chapter V). Figure IVII shows two alternative ways of depicting a typical EXCLUSIVE ORgate with two inputs. condition A is the necessary. a few other special purpose gates are sometimes encountered. o o OR A A Figure IVII. single cause of output Q. i.e. The gates we have described above are the ones most commonly used and are now standard in the field of fault tree analysis. The EXCLUSIVE ORGate . for Q to occur we must have A. .
In practice. The PRIORITY ANDGate The PRIORITY ANDgate is a special case of the ANDgate in which the output event occurs only if all input events occur in a specified ordered sequence. The PRIORITY ANDGate In Figure IVII. As we will see later in Chapter VII. the output event Q occurs if A occurs or B occurs. Figure IV·12 shows two alternative ways of depicting a typical PRIORITY ANDgate. A line from the apex of the triangle denotes a "transfer in." and a line from the side." This "transfer out. it can be accounted for in the quantification phase. but not if both A and B occur. Thus. TRANSFER SYMBOLS TRAN".. the necessity of having a specific sequence is not usually encountered. the quantitative difference between the inclusive and exclusive ORgates is generally so insignificant that the distinction is not usually necessary. the output event Q occurs only if both input events A and B occur with A occurring before B. In those special instances where the distinction is significant." perhaps on another sheet of paper.BASIC ELEMENTS OF A FAULT TREE IVII The exclusive OR differs from the usual or inclusive OR in that the situation where both input events occur is precluded. a "transfer out. Q Q OR A A Figure IVI2. . The sequence is usually shown inside an ellipse drawn to the right of the gate. will contain a further portion of the tree describing input to the gate." A "transfer in" attached to a gate will link to its corresponding "transfer out."N L ~ TRANSFER OUT The triangles are introduced as transfer symbols and are used as a matter of convenience to avoid extensive duplication in a fault tree.
Some time later. depending on the nature of the system. however. Failures are basic abnormal occurrences. for the construction of the fault tree. Suddenly. Consider next a bridge that is supposed to open occasionally to allow the passage of marine traffIc. the event is a fault because the bridge mechanism responded to an untimely command issued by the bridge attendant. Under conditions of no repair." and it was his untimely action that caused the fault." Consider a relay. however. but such an event could well have a deleterious effect on the progress of the battleand. he sent out an amended message via mounted messenger #2. we call this a relay "success. From the standpoint of constructing a fault tree VI . whereas faults are "higher order" events. The proper defInition of a fault requires a specification of not only what the undesirable component state is but also when it occurs. General Beauregard sent a message to one of his offIcers via mounted messenger #1. Thus. No failure occurred. This is clearly not a relay failure. Fault Occurrence vs." If. If the relay closes properly when a voltage is impressed across its terminals. We shall call an occurrence like this a "fault" so that. the attendant is part of this "system." Another possibility is that the relay closes at the wrong time due to the improper functioning of some upstream component. A fault may be repairable or not. generally speaking. one leaf of the bridge flips up a few feet. Still later. a fault that occurs will continue to exist. In one of the earliest battles of the American Civil War. without warning. In a repairable system a distinction must be made between the occurrence of a fault and its existence. Actually this distinction is of importance only in fault tree quantification (to be discussed in a later chapter). 1. and thus. untimely relay operation may well cause the entire circuit to enter an unsatisfactory state.CHAPTER V .FAULT TREE CONSTRUCTION FUNDAMENTALS In Chapter IV we defmed and discussed the symbols which are the building blocks of a fault tree. In this chapter we cover the concepts which are needed for the proper selection and defmition of the fault tree events. This is not a bridge failure because it is supposed to open on command and it does. Fault Existence In our discussion of the several varieties of fault tree gates. Faults VS. it did. we have a fault event but not a failure event. we have spoken of the occurrence of one or more of a set of faults or the occurrence of all of a set of faults. These "what" and "when" specifications should be part of the event descriptions which are entered into the fault tree. a further amended message was sent out via mounted messenger #3. in this case. the overall situation having changed. the relay fails to close under these circumstances. All messengers arrived but not in the proper sequence. Failures We fIrst make a distinction between the rather specific word "failure" and the more general word "fault. 2. we call this a relay "failure. all failures are faults but not all faults are failures. However. Again.
. such a component requires an input signal or trigger for its output signal. 3. the important difference between failures of active components and failures of passive components is the difference in failure rate values. we perform parametric studies of operating characteristics and studies of functional interrelationships.. "passive" failure modes of active components. To assess the operation of a passive component. If an active component fails. Further examples of passive components are: pipes. it may be a current or force. two to three orders of magnitude. The receiver will then respond in some way (provide an output) as a result of the message (input) that he has received.. for example. e. a wire or busbar carrying current or a steam line transmitting heat energy). etc. This is tantamount to considering all systems as nonrepairable. if we attempted to classify specific failure modes according to the "active" or "passive" defmition. bearings. a structural member). Active Components In most cases it is convenient to separate components into two types: passive and active (also called quasistatic and dynamic)." a term widely used in electrical and mathematical studies. The failure of a passive component will result in the nontransmission (or. An active component contributes in a more dynamic manner to the functioning of its parent system by modifying system behavior in some way. To assess the operation of an active component. active component failures in general have failure rates above 1x 104 per demand (or above 3 x 107 per hour) and passive component failures in general have failure rates below these values. A passive component contributes in a more or less static manner to the functioning of the system. (We could have. In such cases the active component acts as a "transfer function. Examples of active components are: relays. heat transfer studies.) . and so forth. consider a postman (passive component) who transmits a signal (letter) from one active component (sender) to another (receiver).. and so forth. the defmitions of active components and passive components apply to the main function performed by the component.g. we perform such tests as stress analysis. From a numerical reliability standpoint.g. for example. perhaps. Generally." The physical nature of this "signal" may exhibit considerable variety. journals. pumps. welds. modifies the system's fluid flow. Passive vs. there may be no output signal or there may be an incorrect output signal. for example. partial transmission) of its "signal. As an example. A passive component may also be thought of as the "mechanism" (e. As shown in WASH1400 [39]. a wire) whereby the output of one active component becomes the input to a second active component. quite commonly.g. In the above.g. A valve which opens and closes. and a switch has a similar effect on the current in an electrical circuit. an active component originates or modifies a signal. resistors. Such a component may act as a transmitter of energy from place to place (e. the difference in reliability between the two types of components is. and failures of the active component (or failures of the passive component) apply to the failure of that main function. or it may act as a transmitter of loads (e. In fact. valve rupture.V2 FAULT TREE HANDBOOK we need concern ourselves only with the phenomenon of occurrence." In contrast. A passive component can be considered as the transmitter of a "signal.
what are its effects (if any) on the system. e. which in turn consists of subordinate structures called "subsystems. A command fault in contrast. A primary fault is any fault of a component that occurs in an environment for which the component is qualified. . ruptures under a pressure p > PO' Because primary and secondary faults are generally component failures. and Failure Effect The definitions of system. the basic concepts of failure effects. In a particular analysis. 5. which are the components. subsystem. failure modes.g. a mode of valve failure.FAULT TREE CONSTRUCTION FUNDAMENTALS V3 4.. defmitions of system. Component Fault Categories: Primary. We may say that a "system" is the overall structure being considered. and failure mechanisms are important in determining the proper interrelationships among the events. .. secondary and command. what are the corresponding likelihoods of occurrence.. failure mechanisms produce failure modes which. See Table VI. the Spray Injection System may consist of two redundant refueling spray subsystems which deliver water from the refueling water storage tank to the containment.. For example. In constructing a fault tree. have certain effects on system operation. and depend upon the context of the analysis. they are usually called primary and secondary failures. etc. A secondary fault is any fault of a component that occurs in an environment for which it has not been qualified. we are specifying exactly what aspects of component failure are of concern. or component level. we are considering how a particular failure mode can occur and also. a pressure tank. Each of these subsystems in turn may consist of arrangements of valves. The subsystem of interest consists of a valve and a valve actuator. perhaps. in turn. e. "valve unable to open" is a mechanism of subsystem failure. pumps. . Le. Thus. and Command It is also useful for the fault tree analyst to classify faults into three categories: primary.g.. ruptures at some pressure p . and an effect of actuator failure." For example. designed to withstand pressures up to and including a pressure Po. and component are generally made for convenience in order to give hierarchy and boundaries to the problem. When we speak of failure effects. designed to withstand pressure up to and including a pressure Po. in a pressurized water reactor (PWR)." which in turn are made up of basic building blocks called "components. involves the proper operation of a component but at the wrong time or in the wrong place. When we detail the failure modes. a pressure tank. subsystem. subsystem.g. the component fails in a situation which exceeds the conditions for which it was designed. an arming device in a warhead train closes too soon because of a premature or otherwise erroneous signal origination from some upstream device. Failure Mode. we are concerned about why the particular failure is of interest. and component are relative. We can classify various events which can occur as viewed from the system. Secondary. Some of the events are given in the lefthand column of the table below. Failure Mechanism. In other words. Po because of a defective weld. When we list failure mechanisms. To illustrate these concepts consider a system that controls the flow of fuel to an engine. e.
. T 0"APPE . he would generate a list that corresponded to the failure modes of the subystems man who actually procures the switch. consider a simple system of a doorbell and its associated circuitry from the standpoints of the systems man. These are: (l) Switch . The system is shown schematically in Figure VI. Doorbell and Associated Circuitry From the viewpoint of the systems man. the system failure modes are: (1) Doorbell fails to ring when thumb pushes button.~ SWITCH BATTERY • T .(a) fails to make contact (including an inadvertent open) (b) fails to break contact (c) inadvertently closes (2) Bellsolenoid unitfails to ring when power is applied (includes failure to continue ringing with power applied) (3) Batterylow voltage condition (4) Wireopen circuit or short circuit. and the component designer. R •• SOLENOID Figure VI. BELL . (3) Doorbell fails to stop ringing when push button is released.. (2) Doorbell inadvertently rings when button is not pushed. the subsystems man. . and wires.V4 FAULT TREE HANDBOOK Table VI. bellsolenoid unit. battery.PUSHBUTTON I I . If the systems man now sat down and made a list of the failure mechanisms causing his failure modes. Fuel Flow System Failure Analysis Description of Event No flow from subsystem when required System Subsystem Valve Actuator Mechanism Mode Effect Valve unable to open Binding of actuator stem Corrosion of actuator stem Mechanism Mode Effect Mechanism Mode Mechanism To make the mechanismmodeeffect distinction clearer.
Table V2. and will constitute failures of certain subsystems. The lower levels of the designer's fault tree would consist of the mechanisms or causes for the component failure. The component designer's failure modes would be the component failures themselves. If the component designer were to construct a fault tree. In fault tree terminology these are the "top events" that the system analyst can consider. anyone of these component failures could constitute a suitable top event. These latter failures will be failure modes for the subsystems man and will make up the second level of our fault tree. He will select one of these top events and investigate the immediate causes for its occurrence. Doorbell Failure Analysis Failure Effect Switch fails to make contact Failure Mode Mechanism • Contacts broken High contact resistance Clapper broken or not attached Clapper stuck Solenoid link broken or stuck Insufficient magnetomotive force No electrolyte Positive pole broken • • Mechanical shock Corrosion • Bellsolenoid unit fails to ring • • • • Shock • Corrosion Open circuit in solenoid • • Low voltage from battery • Short circuit in solenoid • • Leak in casing Shock • • From the table we see that the system failure modes constitute the various types of system failure. These immediate causes will be the immediate failure mechanisms for the particular system failure chosen. they represent the results of particular component failures. and in many cases would be beneath the limit of resolution of the system man's fault tree. all of the subsystem and system failures higher in the tree represent failure effectsthat is.. It is also a list of failure effects from the standpoint of the component designer. in this "immediate cause" manner until we reach the component failures. We proceed. In other words.FAULT TREE CONSTRUCTION FUNDAMENTALS vs It is emphasized again that this last list constitutes failure mechanisms for the systems man and failure modes for the subsystems man. These would include quality control effects. the designer's "system" is the component itself. See Table V2. Let us try to imagine what sort of list the component designer would make. step by step. These components are the basic causes dermed by the limit of resolution of our tree. . environmental effects. If we consider things from the component designer's point of view. etc.
we reach the limit of resolution of our tree. comparators. or it could represent a portion of the "chain of command" in a corporation. B. necessary. necessary.V~ FAULT TREE HANDBOOK The "Immediate Cause" Concept 6.). The latter constitutes the top event of the system analyst's fault tree. In this way we proceed down the tree continually transferring our point of view from mechanism to mode. B. and sufficient causes of the top event are now treated as subtop events and we proceed to determine their immediate. System Illustrating "Immediate Cause" Concept This system is supposed to operate in the following way: a signal to A triggers an output from A which provides inputs to B and C. and continually approaching fmer resolution in our mechanisms and modes. it \ could represent an electrical system in which the subsystems are analog modules (e. Band C then pass a signal to 0 which finally passes a signal to E. until ultimately. he first. that is. It should be noted that these are not the basic causes of the event but the immediate causes or immediate mechanisms for the event. Now getting back to the system analyst.. our subtop events correspond to the top events in the subsystem man's fault tree. Our tree is now complete. The system of Figure V2 can be interpreted quite generally. it could be a piping system in which A. C and o are valves. subsystem 0 needs an input signal from either B or C or both to trigger its output to E. He next determines the immediate. we place ourselves in the position of the subsystems man for whom our failure mechanisms are the failure modes. This is an extremely important point which will be clarified and illustrated in later examples. A. The immediate.e. as we have seen. and sufficient causes for the occurrence of this top event. For example. As an example of the application of the "immediate cause" concept. Furthermore. its boundary) and then selects a particular system failure mode for further analysis. C and 0 are dynamic subsystems. determines. consider the simple system in Figure V2. defmes his system (i. This limit consists of basic component failures of one sort or another. amplifiers. B A o E c Figure V2..g. necessary. In so doing. . and sufficient causes. etc. We thus have redundancy in this portion of our system.
This is tantamount to assigning a zero failure probability to the wires." The analyst should strongly resist the temptation to list the event. "no input to D" as the immediate cause of "no signal to E. There are two possibilities: (1) "There is an input to D but no output from D. "no output from D." is "no output from D. then event 1 (which can be rephrased. Our analysis of the top event ("no input to E") consequently produced a linkage of fault events connected by "and" and "or" logic. I or 2." can arise from the union of the two events. In this connection." As a matter of terminology. onestepatatime approach. We must now continue the analysis by focusing our attention on events 3 and 4. We readily identify 5 as a failure (basic tree input)." then event 1 above would have been missed. its immediate. As far as 3 is concerned.. However." and 4 = "no output from C. we have 3 = 5 or 6 where 5 = "input to B but no output from B. We deal with event 4 in an analogous way. "D fails to perform its proper function due to some fault internal to D") is not analyzed further and constitutes a basic input to our tree." . the motivation for considering immediate causes is now clear: it provides assurance that no fault event in the sequence is overlooked.FAULT TREE CONSTRUCTION FUNDAMENTALS Y7 Let us choose for our top event the possible outcome "no signal to E." and it is next necessary to determine its immediate cause or causes." and 6 = "no input to B." This terminology is also fairly consistent with the mechanistic definitions of "fault" and "failure" given previously. events 1 and 2.. We have now identified the subtop event. 1 = 3 and 4 where 3 = "no output from B. the event "no input to A" is also considered to be a basic tree input.. If our limit of resolution is the subsystem level.. With respect to event 2. "no signal to E. or command links. om subtop event." In the determination of immediate causes. The immediate cause of the event." (2) "There is no input to D. pipes. We now proceed to a stepbystep analysis of the top event. The reader should note that if we had taken more than one step and had identified (improperly) the cause of "no input to D. The analysis will be terminated when all the relevant basic tree inputs have been identified. In fact. it is convenient to refer to events as "faults" if they are analyzed further (e. necessary and sufficient cause is "no output from B and no output from C." and let us agree that in our analysis we shall neglect the transmitting devices (passive components) which pass the signals from one subsystem to another. The further steps in the analysis of this system can now be easily supplied by the reader." which appears as an intersection of two events. event 2). The framework (or system .g. The "immediate cause" concept is sometimes called the "Think Small" Rule because of the methodical. Event 6 is a fault which can be analyzed further. an event such as 1 which represents a basic tree input and is not analyzed further is referred to as a "failure. "no output from D. one step should be taken at a time." Therefore. Le. We are now ready to seek out the immediate causes for our new mode failures.
Note that none of the failure events have been ''written in". We shall now examine the basic rules for successful fault tree analysis. Consider Figure V3. A. In the beginning it was thought of as an art. A suitable top event would be "doorbell fails to ring when finger pushes button.V8 FAULT TREE HANDBOOK model) on which this linkage is "hung" is the fault tree. Basic Rules for Fault Tree Construction The construction of fault trees is a process that has evolved gradually over a period of about 15 years. C. Observance of these rules helps to ensure successful fault trees so that the process is now less of an art and more of a science. At this juncture the reader may be interested in conducting a short analysis of his own with reference to the doorbell circuit previously given in Figure VI. It is a simple fault tree or perhaps a part of a larger fault tree. Q A B c D Figure V3. D. but is was soon realized that successful trees were all drawn in accordance with a set of basic rules. 7. The next section provides the necessary details for connecting the fault event linkage to its framework (fault tree). they have been designated just Q. A Simple Fault Tree . B." The analysis would commence with the statement: DOORBELL FAILS TO RING BATTERY DISCHARGED OR SOLENOID NOT ACTIVATED in which the event "battery discharged" represents a failure or basic tree input.
As a general rule." look for the minimum necessary and sufficient immediate cause or causes. secondary and command modes." add an ORgate below the event and look for primary. when energy originates from a point outside the component. SWITCH I. an INHIBITgate. If the fault event is classified as "stateofsystem. (2) Motor fails to start when power is applied. The "whencondition" describes the condition of the systemwith respect to the component of interestwhich makes that particular state of existence of the component a fault. B." classify the event as a "stateofcomponent fault. an ORgate." To illustrate Ground Rule II. The "whatcondition" describes the relevant railed (or operating) state of the component. A. it becomes necessary to describe exactly what such events such as Q. Examples of fault statements are: (l) Normally closed relay contacts fail to open when EMF is applied to coil. and the proper procedure for doing this constitutes Ground Rule I: Write the statements that are entered in the event boxes as faults. D actually are. So be it.BATTERY L 0 MOTOR LOAD Figure V4. Simple MotorSwitchBattery System . Do not tailor the length of your statement to the size of the box that you have drawn. The analyst is cautioned not to be afraid of wordy statements." classify the event as a "stateofsystem fault." If the answer is "No. or possibly no gate at all. C. "Can this fault consist of a component failure?" is "Yes. A "stateofsystem" fault event may require an ANDgate. the event may be classified as "stateofsystem. consider the simple motorswitchbattery circuit depicted in Figure V4. when we have a specific problem in hand. make the box bigger! It is permissible to abbreviate words but resist the temptation to abbreviate ideas. state precisely what the fault is and when it occurs. The next step in the procedure is to examine each boxed statement and ask the question: "Can this fault consist of a component failure?" This question and its answer leads us to Ground Rule II: If the answer to the question. Note that Ground Rule I may frequently require a fairly verbose statement. If necessary.FAULT TREE CONSTRUCTION FUNDAMENTALS V9 Now." If the fault event is classified as "stateofcomponent.
then that normal functioning must be defeated by faults if the fault sequence is to continue up the tree. CLASSIFICATION State of component CLASSIFICATION State of component State of component State of component State of component State of system In addition to the above ground rules. Switch inadvertently opens when thumb pressure is applied. However.vtO FAULT TREE HANDBOOK The system can exist in two states: operating and standby. Motor fails to start when power is applied to its terminals. The correct assumption to make is that the component functions normally. Another way of stating this is to say that. Motor inadvertently starts. We might find. thus allowing the passage of the fault sequence in question. that the propagation of a particular fault sequence could be blocked by the miraculous and totally unexpected failure of some component. there are a number of other procedural statements that have been developed over the years. then it is assumed that the component functions nonnally. The following faults can be identified and classified using Ground Rule II: OPERATING STATE SYSTEM FAULT Switch fails to close when thumb pressure is applied. The first of these is the No Miracles Rule: If the nonnal functioning of a component propagates a fault sequence. STANDBY STATE SYSTEM FAULT Switch inadvertently closes with no thumb pressure applied. . the model must take it into account. The first is the CompletetheGate Rule: All inputs to a particular gate should be completely defined before further analysis of anyone of them is undertaken. Two other procedural statements address the dangers of not being methodical and attempting to shortcut the analysis process. if an AND situation exists in the system. in the course of a system analysis. Motor ceases to run with power applied to terminals. if the normal functioning of a component acts to block the propagation of a fault sequence.
. the gatetogate shortcuts may lead to confusion and may demonstrate that the analyst has an incomplete understanding of the system. and gates should not be directly connected to other gates. and each level should be completed before any consideration is given to a lower level. The "gatetogate" shortcutting may be all right if a quantitative evaluation is being performed and the fault tree is being summarized. A fault tree can be successful only if the analyst has a clear and complete understanding of the system to be modeled. Q A B Figure V5. The CompletetheGate Rule states that the fault tree should be developed in levels. With regard to the NoGatetoGate Rule. when the tree is actually being constructed.FAULT TREE CONSTRUCTION FUNDAMENTALS Yll The second is the No GatetoGate Rule: Gate inputs should be properly defined fault events. A ShortCut Fault Tree The "gatetogate" connection is indicative of sloppy analysis. a "shortcut" fault tree is shown in Figure V5. However.
This is easily done in simple cases when the number of outcomes is not great. We begin with the concept of outcome collections which can be conveniently described in terms of a random experiment and its outcomes.. Tails. Chapter VI addresses the basic mathematical technique involved in the quantitative assessment of fault trees: probability theory. The notation {E 1 . .. . two Tails. En} will be used to denote the outcome collection of possible events E 1 . if we toss an ordinary coin and our purpose is to determine whether the coin will land Heads or Tails. E2 . Probability theory is basic to fault tree analysis because it provides an analytical treatment of events. because the examples we will be discussing in Chapters VIII and IX include not only construction but evaluation of the trees.PROBABILITY THEORY: THE MATHEMATICAL DESCRIPTION OF'EVENTS I. Similarly. . we are almost ready to begin some actual fault tree construction examples. The itemization of the outcomes of a random experiment is known. which is the flipping of the coin. we are not performing a random experiment in the sense of the definition because the outcome. Introduction Having completed our discussion of fault tree fundamentals. Random Experiments and Outcomes of Random Experiments A random experiment IS defined to be any observation or series of observations in which the possible result or results are nondeterministic. as an outcome space but we believe that the term outcome collection is somewhat more descriptive. a result is nondeterministic ifit is only one of a number of possibilities that may occur. when the number of outcomes is very great. perhaps infinite. and possesses. say.En' The concept of an outcome collection will now be illustrated by means of several simple examples. E2 . at least in theory. we are performing a random experiment. mathematically." however. but can also be done. the throw of a die constitutes a random experiment unless the die is loaded to give exactly the same outcome on every trial. combinatorial analysis.CHAPTER VI . The topics of probability theory which we shall consider include the concepts of outcome collections and relative frequencies.. we must first digress in Chapters VI and VII to cover some basic mathematical concepts which underlie that evaluation. etc. VII . the time to failure of a motor. . It is clear that many nontrivial observations of our environment constitute random experiments. the measurement of the amount of iron in a meteorite. and some set theory. However. Thus. The term "random experiment" is a very general one and the reader will be able to supply innumerable examples. may be expected on every toss.. A random experiment may be characterized by itemizing all its possible outcomes. 2. such as: the measurement of stiffness of a certain spring. A result is deterministic if it always occurs as the outcome of an observation. If our coin is "phony. and events are the fundamental components of fault trees. the algebra of probabilities.
OB)' (OA. If the two systems are designated A and B. F }. Note that the outcome collection does not include the time of failure because we are observing only whether the system fails or succeeds in some given time. Objectto determine the coordinates of the coin when it has come to rest. (g) Random experimentthe operation of a system for some prescribed length of time 1.F B) constitutes overall system failure. Outcome collection: {(xl' YI)' (x2' Y2).) ObjectTo determine if (and how) the system fails by time t.6}. System fails due to joint failure of A and B }. Objectto determine the state of the two systems after a given time interval. (d) Random experimentthe throw of a die. System fails due to failure of B alone. }. the outcome collection would contain such further events as "Valve closes less than halfway. Note that this outcome collection has an infinite number of elements.OB)' (FA. O}. . System fails due to failure of A alone. Outcome collection: {T. in addition.. we can define Fi = the event that system i fails and 0i =the event that system i does not fail (i = A.4. If. (c) Random experimentstart of a diesel. . A and B. H}. Objectto determine the time of failure (t) to the nearest hour. Outcome collection: {S. Note that if there is present danger of the coin's disappearing into the depths of a nearby crevasse.F B)}· In this example only the event (FA . then opens. Objectto determine what number is uppermost when the die comes to rest.. respectively. Outcome collection: {C. B). Outcome collection: {t l .F B)." etc. (x3' Y3)""} where (Xl' YI)are the Cartesian coordinates which locate the coin on the ruled surface. (A and B are thus single failures. we were to consider partial failure modes. Objectto determine whether the diesel starts (S) or fails to start (F). (b) Random experimenttoss of a coin onto a ruled surface. t 2 ." "Valve closes. then this latter outcome (or event) should be added to the outcome collection. The system includes two critical components. Outcome collection: {(FA. (f) Random experimentan attempt to close a valve.VI2 FAULT TREE HANDBOOK (a) Random experimentone toss ofa fair coin. 2.3. Objectto determine if valve closes (C) or remains open (0). Objectto determine whether coin lands Heads (H) or Tails (T). (e) Random experimentthe operation of a diesel which has successfully started.5. Outcome collection: {System does not fail. OlitCOme collection: {I. and will fail if either or both of these fail. (OA. (k) Random experimentoperation of two parallel systems.
IfP(E 1 ) = 0. E l' After N repetitions or trials we find.E 2 . We then set up the ratio.. C. B. This formula can be readily extended to any number of mutually exclusive events A. En' Suppose we repeat the experiment N times and watch for the occurrence of some specific outcome. The question now arises: does this ratio approach some definite limit as N becomes very large (N~)? If it does. say. that outcome E 1 has happened N 1 times. however. the "empirical" definition given by Equation (VIl) is adequate here. . . . we can write down an expression for the probability that either A or B occurs: peA or B) = peA) + PCB) (VI2) This relation is sometimes referred to as the "addition rule for probabilities" and is applicable to events that are mutually exclusive. 4. Suppose that A and Bare mutually exclusive. E. E 1 is certain to occur. we expect to get either Heads or Tails as a result of a coin toss. which represents the relative frequency of occurrence of E 1 in exactly N repetitions of this particular random experiment. Thus peE ) = lim 1 N~oo (N 1) N (VII) Some obvious properties ofP(E 1 ) arise from this definition: 0< P(E 1 ) < 1 If P(E 1 ) = I. This simply means that A and B cannot both happen on a single trial of the experiment..E 3. D. A more formal definition of probability involves set theory. P(A or B or C or D or E) = peA) + PCB) + P(e) + P(D) + P(E) (VI3) . P(E 1). The Relative Frequency Definition of Probability Consider some general random experiment with the outcome collection. E 1 is impossible. in symbols. We could not possibly get both Heads and Tails on a single toss. we call the limit the probability associated with event E 1 . by actual count. For instance. If events A and B are mutually exclusive. Algebraic Operations with Probabilities Consider a random experiment and designate two of its possible outcomes as A and B.. E 1.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI3 3...
. The reader should also note that (VI2) always gives an upper bound to the true probability (VI4) when the events are not mutually exclusive.. For example. En' the general formula can be expressed as: (VI5) peEl or E 2 or .peA and B) . C peA or B or C) = peA) + PCB) + P(C) .1/6 = 1/2 Equation (IV4) can be extended to any number of events. E 2 . or En) = L: P(E L: L i)  n nl n P(E i and Ej ) i=1 n2 nl i=1 j=i+l n + L L: L: i= 1 j=i+1 k=j+1 P(E i and Ej and Ek ) ... . suppose that the random experiment is the toss of a single die and let us define two events as follows: A"the number 2 turns up" B"an even number turns up" Clearly these events are not mutually exclusive because if the result of the toss is 2. both A and B have occurred..peA and C) . .. if we return to our single die problem and define A and B as above. Now. If we ignore the possibility of any two or more of the events E i occurring simultaneously. or En) = L: P(E i=1 n i) (VI?) ..peA and B) (VI4) If A and B are mutually exclusive.PCB and C) + peA and B and C) For n events E l .. B. For example.VI4 FAULT TREE HANDBOOK For events which are not mutually exclusive a more general formula must be used. equation (VI6) reduces to peEl or E 2 or . we can calculate peA or B) numerically: peA or B) = 1/6 + 1/2 . peA and B) = 0 and (VI4) reduces to (VI2). for 3 events A. (VI6) where "~" is the summation sign. The general expression for peA or B) is now peA or B) = peA) + PCB) .
This means that in the course of several repetitions of the experiment. Consider now two events A and B that are mutually independent. the overheating of a resistor in an electronic circuit may very well change the failure probability of a nearby transistor or of related circuitry.• En' peEl and E 2 and . If A and B are two mutually independent events. peA and Band C and D) = peA) PCB) P(e) P(D) (VI·9) Very often.1. we have peA and B and C) == peA) P(B/A) P(CIA and B) (VI·II) where P(CIA and B) is the probability of C occurring given A and B have already occurred.. we encounter events that are not mutually independent. if two components are operating in parallel and are isolated from one another. (VIlO) If A and B are mutually independent. The rare event approximation plays an important role in fault tree quantification and is discussed further in Chapter XI. in that the true probability is slightly lower than that given by equation (VI7). and EnI) . For n events E l ' E 2 . we introduce the concept of conditional probability. the occurrence of Heads on the first toss should not cause the probability of Tails on the second toss to be any different from 1/2. peA and B) = peA) PCB) (VI8) This is often called the "multiplication rule for probabilities" and its extension to more than two events is obvious. then we can write.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI5 Equation (VI7) is the socalled "rare event approximation" and is accurate to within about ten percent of the true probability when P(E j ) < 0. Furthermore. then the failure of one does not affect the failure of the other. So the results of successive tosses of a coin are considered to be mutually independent outcomes. that is... For three events A. and we need a new symbol: P(BIA) which is the probability of B. then P(A/B) = peA) and PCBIA) = PCB) and (VI·IO) reduces to (VI8). B. any error made is on the conservative side... PeEn lEI and E 2 . the occurrence (or nonoccurrence) of A has no influence on the subsequent occurrence (or non· occurrence) of B and vice versa. they are mutually interdependent. Thus Equation (VI·IO) constitutes a general expression for the probability of the joint occurrence of two events A and B. . In this case the failures of the components are independent events. If a wellbalanced coin is tossed randomly. Also. The probability of rain on Tuesday will most likely be influenced by the weather conditions prevailing on Monday. En) = peEl) P(E 2 IE l ) P(E 3 IE I and E 2) (VI·12) .. In order to treat events of this nature.. C. The probability of A and B both occurring then becomes peA and B) = peA) P(BIA) = PCB) P(AIB). given A has already occurred. For instance.
022. etc. We note its face value (e.149 Let us now calculate the probability of getting an Ace and a King in two draws. peA and K) = peAl) P(K 2 IA l ) + P(K l ) P(A2IKl) = Gi)(54 + (5~)(541) =(52~~51) =6:3 =0. Therefore. At this point the reader is invited to present an argument that the events in question would have been independent if the first card drawn had been replaced in the deck and the deck had been shuffled before the second draw." We can evaluate this expression numerically as follows: 4)(48) peA) = ( 52 51 + (48)(4) + (4 )(3) 52 51 52 51 396 33 =(52)(51)::: 221 ::: 0.022." "Queen. We then choose a second card at random from the deck of 51 and note its face value. There are three mutually exclusive possibilities: (Ace on first draw then nonAce on second draw) or (nonAce on first draw then Ace on second draw) or (Ace on first draw then Ace again on second draw).e. but we have just shown that peA and K) = 0. Expressed in mathematical language. the event E2 as the nonoccurrence of E 2 ..149)2 = 0. Consider the set of mutually independent events.012 1) We now note an important point. the events in question are not independent. Consider the following random experiment: we select one card at random from a wellshuffled regulation deck of 52. we do not put it back in the deckthis is known as "sampling without replacement"). we have .the events "getting an Ace" and "getting a King" were independent. A stands for "Ace" and A stands for "nonAce. we would have peA and K) = peA) P(K) = (0.012 =1= 0." etc. "seven. and define the event E l as the nonoccurrence of E l ..g.VI6 FAULT TREE HANDBOOK An example of the use of the formulae in this section may be helpful. If . Because a particular event must either occur or not occur. this becomes much more succinct: P (at least one Ace in two draws) = peA) = peAl and A 2) + peAl and A 2) + peAl and A2) = peAl) P(A 2 IA I ) + peAl) P(A 2 IA I ) + peAl) P(A 2 IA I ) in which the subscripts refer to the order of the draw. A result with important applications to fault tree analysis is the calculation of the probability of occurrence of at least one of a set of mutually independent events. We now calculate the probability of getting at least one Ace in the two draws.) and lay it aside (i.
then Equation (VII?) applies. En occurs. The elements of an outcome collection possess several important characteristics: (a) The elements of an outcome collection are all mutually exclusive.. = P(E n ) =p. . En: at least one Ei occurs or none of the Ei occur. the events E 1 ..{[I. etc. P (at least one Ei occurs) =1 =1 P (no Ei occurs) P(E l and E 2 .P(E n)]} (VII?) In the especially simple case where P(E 1 ) = P(E 2 ) = . B) (VII4) =1  P(E i ) Now there are two possibilities with regard to the events E l' E 2 . it being assumed that these events are mutually independent.. Le. the righthand side of (VII?) reduced to 1 . for this reason.. In the general case. it is perhaps intuitively obvious that E's must also be mutually independent. As a matter of fact.. . Therefore. Our general result (VII?) finds application in fault tree evaluation. no minimal cut sets have common component failures.p)n . .... E2 . the events E 1 . this result can be easily proven. . These modes are termed the minimal cut sets of the fault tree and if they are independent. (VIIS) But from Equation (VII4) this is the same as (VII 6) so that our final result is P(E 1 or E 2 or E 3 or .. Therefore. represent the modes by which system failure (top event of the fault tree) can occur... To conclude this section we return to the concept of an outcome collection because we are now in a position to define it in more detail.P(E 2 )] [1 . We shall discuss minimal cut sets in considerable detail in later chapters. or En) = 1 .. [1. .PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS P(E i ) + P(Ei ) P(E i ) VI7 =1 (VI.. E 2 . .(1.P(E 3 )] . consider a system in which system failure occurs if anyone of the events E 1 . If the component failures are independent.. For example. where each component failure will cause system failure. The probability of system failure is then given by (VI17).P(E 1 )] [1 . and En) We know that the E's are mutually independent and. For example... . then Equation (VI17) is applicable. En may be failures of critical components ·of the system.
The reader will. For a certain redundant system to fail we may need only some particular number of component failures without regard to the order in which they occur. and so on until a sample of size r is completed. B. e. For example. (nl) choices for the second item. If we sample without replacement.VI8 FAULT TREE HANDBOOK (b) The elements of an outcome collection are collectively exhaustive. under a replacement policy there are n r possible samples of size r. we are not concerned with order. we put each element drawn back into the population after recording its characteristics of interest. injection. there are six rearrangements or permutations corresponding to the single combination ABD. ADB. As an introduction. and (b) without replacement. Thus. This process can be conducted in two ways: (a) with replacement." Consider a collection or set of four entities: {A. Le. In this case we are concerned with combinations of events. then recirculation. n choices for the second item. For example. we are concerned with order. BDA. we lay each item aside after "measuring" it. when a particular sequence of events must occur. Combinatorial Analysis We now discuss combinatorial analysis because it allows us to easily evaluate the probabilities of various combinations of events. BAD. This means that we have included in the outcome collection every conceivable result of the experiment.r+l) == (n)r (VII 8) As shown in Equation (VI18). However.g. how many different samples of size r are possible? We have n choices for the first item. when we talk about permutations. n choices for the third item. The . If we sample with replacement. D}. Briefly. we are generally concerned with permutations. when we talk about combinations. DAB. (nr+ 1) choices for the rth item. Observe that under a replacement policy duplication of items in the sample is possible. If we sample without replacement.. Thus. failure of containment spray injection necessarily implies failure of containment spray recirculation and hence here we are concerned with order. Whether order is important to us or not will depend on the specific nature of the problem..recall that this concept was mentioned briefly in our example involving the draw of two cards from a desk. such as failures of redundant systems. in a twooutofthree logic system we may need any twooutofthree component failures to induce system failure. (c) The elements of the outcome space may be characterized as being continuous or discrete. DBA. This group of three may be rearranged or permuted in six different ways: ABD. Thus. Consider the problem of choosing (at random) a sample of size r from a population of size n. (n. 5. For example. the timestofailure for some systems are continuous events and the 52 possible cards when a card is drawn are discrete events... we may end up with the elemen ts ABD. (n2) choices for the third item. we review the difference between "combination" and "permutation. the special symbol (n)r is used for this product. If we sample with replacement. We now seek a few simple rules that will enable us to count numbers of combinations or permutations as desired. Suppose we make a random choice of three items from the four available. the total number of different samples of size r that can be drawn from a population of size n without replacement is (n) (nl) (n2) . we have n choices for the first item. C.
. 3·2·1 (nr)(nrl) .. but resistors from the same manufacturer are treated as being indistinguishable. For example. Thus the number of ways of rearranging n items among themselves is just n!. there will be some permutations among the n! possible rearrangements that cannot be differentiated. Consider now a population of n items. and r from a third. we might have the quantities p resistors from one manufacturer. q of which are all alike. . We can distinguish among manufacturers. and BCD.. n! When r =n.. B. (nr+l) (nr) (nrl) . q from the second manufacturer. Each of these can be permuted in 6 ways so that there should be a total of 24 permutations and this agrees with _ )1 = 1' = 4 x 3 x 2 x 1 = 24 . C.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI9 symbol (n)r may be written in a more familiar way by using the following algebraic device: (n) (nl) (n2) . 3. the number of distinct arrangements is given by . the total number of distinct arrangements is given by n! p! q! r! n! 4! (VI21 ) In general. we have 1~3! = 4 possible combinations and they are ABC.. p of which are all alike. rearrangements of the Manufacturer 1 resistors among themselves will lead to no new distinguishable arrangements and similarly for Manufacturer 2 and Manufacturer 3. D}. If we are interested in combinations and not permutations. Equation (VI19) constitutes one of the fundamental relationships that we have been seeking. It gives us the number of permutations of n things taken r at a time. + nk = n). we do not desire to count the r! ways in which the r items of the sample can be rearranged among themselves. we get (n)n = O! = n! because O! == 1. and r of which are all alike. n2 of which are alike. for n items. Thus. ACD.. Thus. Thus the number of combinations of n things taken r at a time is given by n! _ (nr)! r! = r (n) (VI·20) where (~) is the symbol used to denote this quantity.2. ... n 1 of which are alike. Specifically.1 (VI19) (n)r= using the definition of the factorial numbers. and nk of which are alike (n 1 + n2 + . If we choose at random three elements from the population {A. ABD. with (p+q+r) = n... (n r.
~ (52) . K.4.. where X and Y stand for any two of the 13card categories 2. For example. The number of ways the system can fail if m components fail is (~). I. In how many ways can the categories X. The number of . If any component has a probability p of failing.. The number of ways of accomplishing this is = 4. the probability of anyone combination leading to system failure is Thus the probability of system failure from m components failing is In addition to m components failing. . Now there are 4 suits and the triplet XXX must be constituted in some way from the 4 available X's. A. then the ratio N will give us the probability of PH being dealt a full house. The total number of possible 5card poker hands is simply the number of combinations of 52 things taken 5 at a time or 52) (5 = 47! 5!= NpH 52! where the subscript PH stands for "poker hands.VItO n! FAULT TREE HANDBOOK (VI22) An example will show how these expressions are used in the solution of a problem. Q.4168 or approxImately 700 PH 5 For an example more pertinent to reliability analysis. First let us determine the number N F H." If we can find out how many of N FH these contain a full house NFH . the system fails if m out of n components fail. the system can also fail if m+ I components fail or if m+2 components fail. or the number of combinations of n items taken m at a time. Similarly. the pair YY must be constituted in some way from the 4 available Y's = 6. Consequently NFH = and the number of ways of doing this is (13)(12)(4)(6) and (j) (i) . the system may consist of 3 sensors where 2 or more sensor failures are required for system failure (2outof3 logic).et us try to calculate the probability of being dealt a full house (3ofakind and a pair) in ordinary 5card poker. 1 _ NFH _ (13)(12)(4)(6) _ 6 P(Full House) .. etc. We can write a full house symbolically as XXXYY. . up to all n components failing. consider a system consisting of n similar components. J. Y be chosen? Clearly we have 13 possible choices for X and then 12 possible choices for Y so that the product (13)(12) represents the total number of ways of selecting the categories X and Y. 3.
The items of immediate interest to us are the outcome events of random experiments and our development of the elementary notions of set theory will be restricted to the event concept. solutions of Bessel's equation. etc. Consider. the following possible events of interest associated with the toss of a die: Athe number 2 turns up Bthe result is an even number Cthe result is less than 4 Dsome number turns up Ethe result is divisible by 7. Our application of set theory involves a considerable particularization. etc. To obtain the total system failure probability we add up the probabilities for m components failing. the probability of system failure from k components failing is k=m+l.... . . and hence the total system failure probability is which can also be written as L k=m n (~) pk (l_p)nk The probabilities are examples of the binomial distribution which we shall discuss later in more detail.. for example. Examples are: prime numbers. . n.. . . k. The probability for a particular combination of k failures is k = m+ 1. relays. scram systems. combinatorial analysis allows us to determine the number of combinations pertinent to an event of interest.. . m+2. m+2. m+ 1 components failing. We can think of an event as a collection of elements. Set theory is a more general approach which allows us to "organize" the outcome events of an experiment to determine the appropriate probabilities. Hence. . a set is a collection of items having some recognizable features in common so that they may be distinguished from other things of different species. Set Theory: Application to the Mathematical Treatment of Events As seen in the previous section. In the most general sense.n.m+2. 6.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VIII ways the system can fail from k component failures is (~) where k = m+ I..
void.2..e. where braces. C are contained in set D. "zero failure time. 0 < t :s. then X and Yare equal (i. i." are used to denote a particular set and the quantities within the braces are the elements of that set. Any such set that contains all the outcomes of an experiment is referred to as the universal set and is generally denoted by the symbol n or I (also sometimes by the number 1 when the notation is informal). C subsets of D and write ACD. B. = {2} . where the symbol "E" means "is an element of' and the symbol tf.e. We have: A B = {2. the socalled null set symbolized by ¢. lED. It thus coincides with the outcome collection. C represents time of failure greater than 1 hour. 3. As another example.4.. could be associated with different consequences if an abnormal situation existed (i.5. CCD." We also note that the elements of A. There exists a graphical procedure that permits a simple visualization of the set theoretic concept which is known as a Venn diagram. consider the time of failure t of a diesel (in hours) and consider the sets A = {t=O} B = {ti .2. The universal set is represented by some geometrical shape (usually a rectangle) and any subsets (events) of interest are shown inside. events. Both B and C can be represented as 3element sets. BCD.. If X and Yare two sets such that X is a subset of Y. ItfB. they are the same set). We call A. 2.e.6}.t>I} The failure to start of the diesel is represented by A.3.e..e. Figure VII represents a Venn diagram for the previous toss of a die example. 6} C={I... "{ }. Each of these sets. 4.. Returning to our diethrowing example.6} E = ¢ (the null. i.5. or empty set). Observe also that A is a subset of both B and C. Event D contains all the possible results of the experiment. Event A can be represented as a set having a single elementthe number 2.means "is not an element of.VI12 FAULT TREE HANDBOOK Each of these events can be considered as a particular set whose elements are drawn from the basic outcome collection of the experiment: {I.3} D= {I. 4.." B represents times of failure which are greater than zero hours (i. XCV and Y is a subset of X. the diesel started) and less than or equal to one hour. loss of offsite power)..e. 1 } C= {ti. E is an impossible event and can be represented by a set containing no elements at all. YCX. i. This fact is symbolized as follows: IEC. i.e. B. ItfA. we note that the element "1" belongs to C and D but not to A or B.
. and is indicated by the shaded area in Figure VI2. " I . Note that the word "and" translates into the symbol "n... .. and is indicated by the shaded area in Figure VI4...y ~ . . The intersection of two sets X.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI·13 D Figure VII.3.. and is indicated by the shaded area in Figure VI3...4." The operation of intersection is portrayed in Figure VI3. is written as xny. Returning to the die example. I . is written X (or X').. is written as XUY.. I I "'.2. The Operation of Union The union of two sets of X. Venn Diagram Representation of Sets Operations on sets (events) can be defined with the help of Venn diagrams. the union of B. C is written.. I Figure VI2." The operation of complementation is portrayed in Figure VI4.... .6}. The operation of union is portrayed in Figure VI2. BUC= {1. fx ." . For the die example. In the die Example EnC = {2 } = A. '"  "' . the complement of the set BUC is (BUC)' = (BUC) = {5 }. .. Note that the word "or" translates into the symbol "U.Y is the set that contains all elements that are common to X and Y. Y is the set that contains all elements that are either in X or in Y or in both. The complement of a set X is the set that contains all elements that are not in X.
The Operation of Complementation There is another operation (unnamed) that is sometimes defined. but it is not independent of the operations we have already given. The operation is illustrated in Figure VI5. we are left with the shaded area indicated in Figure VI5. This process is occasionally written (YX) but the reader can readily see that (YX) = ynx' Q ~YI Figure VI5.VI14 FAULT TREE HANDBOOK Q Figure VI3. If we remove from set Yall elements that are common to both X and Y. The Operation (YX) . The Operation of Intersection x Figure VI4.
C not only to designate the components themselves but also the events "component success" for A. ABC}. Suppose that we have determined (perhaps by fault tree analysis) that our system will fail if any two or more of its components fail. ABC. Using the set theory concepts we have developed.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VIIS so that the symbol (VX) is not needed and hence will not be further used. ABC. we can now translate probability equations into set theoretic terms." Because there are 3 components and 2 modes of operation for each component (success/failure).peA and B) becomes P(AUB) = peA) + PCB) . can be represented by the subset U 52 U 53 U 54 = {ABC. ABC. subsystems). we shall understand the events "component failure" for A. We have enumerated. these inductive approaches can be efficient." S. C. This latter event can then be more fully analyzed by means of fault tree analysis. As an example of the use of the settheoretic approach.g. (VI23) and 5 = 51 . The socalled "matrix approaches" are of this nature. Thus. Then the events corresponding to system failure are: 51 = ABC 52 = ABC 53 = ABC 54 = ABC the event. we may take intersections and unions of any of the basic elements (simple events) of the universal set to generate still other events which can be represented as sets consisting of particular elements. 13. ABC ABC}. by A. This approach of enumerating the outcome events is sometimes used in the inductive methods of system analysis (according to which we enumerate all possible combinations of component operation or nonoperation and determine the effects of each possibility on system behavior). B. B fails. we can calculate the probability of system failure. ABC. B. This information can be used in various applications.P(A(lB).. knowing the probability of component failures. in an exhaustive manner. Let us use the symbols A. ABC. ABC. C. "system failure. peA or B) = peA) + PCB) . "A operates successfully. Thus. For example. For example. ABC. The symbol ABC will consequently be used to represent the event. consider a simple system consisting of three components A. B. respectively. C. We can also use an inductive approach to determine which combination has the most serious consequences. If our system is relatively simple or if the "components" are relatively gross (e. all the ways the system can fail. the universal set (or ou tcome collection) is given by: n = {ABC. In the same way we have done above. C. B. in that the number of combinations in the universal set will be fairly small. respectively. combinatorial analysis tells us that there are 2 3 = 8 combinations which give all system modes of failure or operation. and C operates successfully.
and complementation is called a Boolean algebra. An algebra based on the defined operations of union. the set (or. The logical symbols are the older. and complementation. Boolean algebra allows us to express events in terms of other basic events. If "+" is written for "u" on the lefthand side.e. the engineering symbology is now quite widespread in the engineering literature and any expectation of a return to mathematical or logic symbols seems futile." It is unfortunate that engineering has adopted "+" for "u" and an implied multiplication for "n. we have an equation with "+" meaning one thing on the left and a totally different thing on the right. In our fault tree applications. more particularly. intersection. If.shall use it later in this book. Despite these difficulties and confusing elements in the symbology. the minimal cut sets) and we calculate the probability of system failure in terms of the probabilities of component failures. in fact the symbol "v" is an abbreviation for the Latin word "vel" which means "or. the symbology differs among the fields of mathematics. We can manipulate these equations to obtain the combinations of component failures that will cause system failure (i. In fault tree analysis. Symbolism Boolean algebra. We shall further pursue these topics in later sections. we . set theoretic symbolism is not uniform. intersection. We have introduced a new mathematical entity. and engineering as follows: Operation Union of A and B Intersection of AandB Complement of A Probability AorB Mathematics AUB Logic AvB Engineering A+B A and B not A AnB AI\B A·B or AB A' or A' or A A A The symbols used in mathematics and in logic are very similar.p(AnB). and. the event). deals with event operations which are represented by various symbols. use of the engineering notation is widespread. Unfortunately." This procedure "overworks" the symbols "+" and ". the reader is unacquainted with event . however. 7.". which is the algebra of events. As an example of the confusion that might arise when "+" is used for U. By using the basic operations of union. as a matter of fact. logic. system failure can be expressed in terms of the basic component failures by translating the fault tree to equivalent Boolean equations. consider the expression P(AUB) = peA) + PCB) ..VI16 The equation peA and B) = P(AIB) PCB) FAULT TREE HANDBOOK = P(BIA) P(A) becomes (VI24) p(AnB) = P(AIB) PCB) = P(BIA) peA).
therefore. per se." If we count the number of elements in the class §. 5. 7 }. 3 } are compound events. They do not constitute. (6. The event A = {2} is a simple event. {1. 7 }." Consider again the throw of a single die.3}. S will have 6 elements and S. In the throw of 2 dice. will have 2 6 = 64 elements. 3. This will be useful groundwork for some of the fault tree concepts to follow and will also lead to a more rigorous definition of "probability. TItis will serve as a reminder that set algebraic operations are not to be confused with the operations of ordinary algebra where numbers. {5. {3. A class is a set whose elements are themselves sets and these elements are generated by enumerating every possible combination of the members of the original outcome collection. to define a mathematical entity that does include them. If we list every possible combination of these 4elements we shall generate the class ~ defined on the original set S as follows: ~= {1}. 3. two of which will be B = {2. their intersection is nonempty (i.3 }. even though they are made up of elements of the outcome collection. 7}. every conceivable result (both simple and compound) of the experiment. in the die toss experiments. 5. consider the 4element outcome collection S = {I.. 7}. in more advanced texts. they are not mutually exclusive). and not events. {l. it is strongly recommended that he use the proper mathematical symbols until attaining familiarity with this type of algebra. 3). 7 }.6 } and C = {l. 6). elements of the outcome collection.5). 5. {I. (3. if the original set has n elements. {1.5. The utility of the class concept is simply that the class will contain as elements. (5. 4. by definition. in probability language. *When certain appropriate mathematical restrictions are imposed. because they are not included in the outcome collection. Embedded somewhere in this enormous number of subsets we find the compound event "sum = 7" which can be represented in the following way: E = {(1. in fact it constitutes an element of the outcome collection. 7 }. Thus. 3. B and C have an element in common. all mutually exclusive and. {¢}. As an example. as a Borel Field. In contrast. 7}. {3. 2. 5}. we fmd 16 which is 2 4 where 4 is the number of elements in the original set S. a class is often referred to. {1.§ will have 2 36 (a number in excess of 10 10) elements.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI17 algebra. {7}. all mutually disjoint.2). Such a mathematical entity is called a class.5}. {I. are manipulated. {3. the corresponding class* will have 2n elements. 5}.. Now compound events (like B and C) are generally the ones that are of predominant interest in the real world and it is necessary. (4. S will contain 36 elements and . {l.e. 8. {5}. 6} and C = {1. thus. Additional Set Concepts We shall now proceed further with certain other set concepts which will illustrate the difference between simple events and compound events. 4). {3}. 7}. 4. (2. 2. In general. 1)}. Notice that the null set ¢ is considered an element of the class to provide a mathematical description of the "impossible event. they are not "disjoint" subsets or. The elements of outcome collections are. 3. the events B = {2. .
etc. ABC} "Both Band C fail": {ABC. Y = +1. the event ABC is an element of the original universal set. Set Theoretic Definition of Probability In Figure VI6. ABC} The reader should bear in mind that the elements of classes are sets. ABC. Next the axis of real numbers between 0 and 1 is drawn. y = +4. A function can now be defined that "maps" event E into some position on this axis.). and these eight elements gave all modes of system failure or operation. ABC. it maps . For our purpose a mapping may be considered simply as a functional relationship.VII 8 FAULT TREE HANDBOOK Consider again the simple threecomponent system A. The set theoretic definition of the probability function is shown schematically in Figure VI6. the box labeled "s" represents the outcome collection for some random experiment. The function peE) is somewhat more complicated. where system failure consists of any two or more components failing. It could. ABC. x = ±2.§. Specifically. For instance the relation y =x 2 maps all numbers x into a parabola (x = ±1. The relation y = x maps all numbers x into a straight line making a 45° angle with the yaxis. B. Thus. In that example. represent the totality of the 36 possible outcomes in the twodice experiment. ABC} "Five components fail": </> "System fails": {ABC. Perhaps the most useful feature of the class concept is that it provides us with a basis for establishing a proper mathematical definition of the probability function. s o PIE) 1 Figure VI6. In these examples one range of numbers is mapped into another range of numbers and we speak of pointfunctions. The class based on this set would contain 2 8 = 256 elements and would include such events as: "A operates properly": {ABC. ABC. It is shown schematically in Figure VI6. In the twodice example the class § possesses an enormous number « 10 1 0) of elements. for instance. C. The concept of mapping may be unfamiliar. These elements represent every conceivable outcome (simple and compound) of the experiment. This function is the probability function peE). the event E = "sum of 7" is a member of . The utility of the class concept is that it enables us to treat compound events in a formal manner simply because all possible compound events are included in the class. The circle labeled "~" represents the class generated from set S by enumerating all combinations of the elements of S. whereas {ABC} is a set containing the single element ABC and is an element of the class generated from the universal set. the universal set comprised eight elements.
however. One is that probability has now been defined without making use of the limit of a ratio. rather. The second thing is that this definition doesn't tell us how to calculate probability. In unsophisticated terms. to each event.PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI19 a set onto a range of numbers. In other cases we may have to investigate the physical nature of the problem in order to develop the probabilities of events of interest. it delineates the mathematical nature of the probability functions. Bayes' Theorem The formula of Bayes plays an important and interesting role in the structure of probability theory. A4 . Two things should be noted. A 3 . a probability function simply assigns one unique number." we already know how to calculate its probability assuming that all 36 outcomes in the outcome collection are equally likely: 6 I P(E)==36 6 Of course this is a particularly simple example. We shall first develop the formula using a set theoretic approach. Figure VI·? portrays the "partitioning" of the universal set on into subsets Ai' A 2 . If E is the event "sum = ? . a probability. Partition of the Universal Set The A's have the following characteristics: Ai UA 2 UA 3 UA 4 UA5 = U ~ =on i=1 i=5 (VI25) . 9. and we speak of a set function instead of a point function although the underlying concept is the same. and it is of particular significance for us because it illustrates a way of thinking that is characteristic of fault tree analysis. A 5 · B Figure VI7. and then discuss the meaning of the result.
the large intersection symbol implies a succession of intersections just as the symbol n implies a succession of products. where X is any set.~ P(BiA. This is true for any arbitrary events A. 1 1 1 (VI·29) .) P(A. We shall return to (VI·26) in a moment. Consider now the probability equation for an intersection. In particular it will be true for B and any one of the A's in Figure VI? Thus..) The expression for B above can be written in a more mathematically succinct form: B= U BnA i=1 i=5 i (VI26) in which the large union symbol implies a succession of unions just as the symbol ~ implies a succession of sums... p(AnB) = P(AIB) P(B) = P(BIA) P(A).k .VJ20 FAULT TREE HANDBOOK Any set of A's having the properties of Equation (VI·2S) is said to constitute a partition of the universal set. we obtain P(BIAk ) P(Ak ) P(A IB) . we can write (VI27) or (VI·28) We can now write P(B) in a different way by using Equation (VI26). The reader can show (by shading the appropriate regions in the Venn diagram) that (Actually BnA 1 = ¢ but ¢UX = X. If we substitute this expression for P(B) into (VI·28).) . B.. Similarly. Also shown in Figure VI? is another subset B. P(B) = P { U iS 1=1 } BnAi i=5 P(BIA i ) P(A i ) i=1 = L i=S p(BnAi ) = i=1 L which can be done because the events (BnAi ) are mutually exclusive.
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS
VI·2!
This is Bayes' theorem. The equation is valid in general for any number of events ~, A 2 , •.. , An which are exhaustive and are mutually exclusive (see Equation (VI2S)). The summation then extends from i = 1 to i = n instead of i = 1 to i = S. We now discuss some of the meanings of Equation (VI29). Suppose that some event B has been observed and that we can make a complete list of the mutually exclusive causes of the event B. These causes are just the A's. Notice, however, the restrictions on the A's given by the relations (VI2S). * Now, having observed B, we may be interested in seeking the probability that B was caused by the event Ak . This is what (VI29) allows us to compute, if we can evaluate all the terms of the righthand side. The Bayseian approach is deductive: given a system event, what is the probability of one of its causative factors? This is to be contrasted with the inductive approach: given some particular malfunction, how will the system as a whole behave? The use of Bayes' formula will now be illustrated by a simple example for which it is particularly easy to enumerate the A's. Suppose that we have three shipping cartons labeled I, II, III. The cartons are all alike in size, shape, and general appearance and they contain various numbers of resistors from companies X, Y, Z as shown in Figure VJ8.
9 ITEMS
6 ITEMS
9 ITEMS
3X
4Y
2Z
1X
2Y
3Z
2X
3Y
4Z
II
III
Figure VI8. Illustration of the Use of Bayes' Formula A random experiment is conducted as follows: First, one of the cartons is chosen at random. Then two resistors are chosen from the selected box. When they are examined, it is found that both items are from Company Z. This latter event we identify with event B in our general development of Bayes' rule. The "causes" of B are readily identified: either carton I was chosen or carton II was chosen or carton III was chosen. Thus, Al = choice of carton I A 2 = choice of carton II A 3 = choice of carton III Now we should be able to work out the probability that, given event B, it was carton I that was originally chosen.
*The restrictions on the Aj given by (VI2S) are equivalent to the restrictions: p(AjnAj ) = O;i *j.
~
P(A j ) = I and
VI22
FAULT TREE HANDBOOK
It would appear natural enough to set
because the boxes are all alike and a random selection among them is made. We now need to evaluate the terms P(BIA I ), P(BIA 2) and P(BIA 3). This is easily done from Figure VIB. P(BIAI)
=(~) (~) = =(~) (;) =
1 36
P(BIA2)
+
t·
P(BIA 3 ) = (~) (~) =
When these numbers are substituted into Bayes' formula we obtain
In a similar way we can calculate
36 30 P(A 2 IB) = 7i and P(A 3 IB) =71'
Thus, if event B is actually observed, the chances are about 5050 that carton II was originally chosen. As another example, refer once more to our simple system made up of three components. We have already determined that the system can fail in anyone of the four modes, Sl' S2' S3' S4' If the system fails and we wish to know the probability that its failure mode was S3' we can compute: P(SIS3) P(S3) _ _ IS P(S3 ) = P(SIS I ) P(SI) + P(SIS2) P(S2) + P(SIS 3) P(S3) + P(SIS 4) P(S4) . This can be written in a simpler form because the system will surely fail if anyone of the events Sl' S2' S3' or S4 occurs.
PROBABILITY THEORYMATHEMATICAL DESCRIPTION OF EVENTS VI23
 P(S3) IS P(S3 ) = peSt) + P(S2) + P(S3) + P(S4)
Pr~u..!!1ably the quantities P(Si) can be estimated from reliability data. Th~quantity
P(SiIS) is sometimes called the "importance" of system failure cause Si. Bayes' Theorem is sometimes applied to obtain optimal repair schemes and to determine the most likely contributors to the system failure (i.e., which of the P(SiIS) is the largest).
CHAPTER VII  BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS
I.
Rules of Boolean Algebra
In the previous chapter we developed the elementary theory of sets as applied, specifically, to events (outcomes of random experiments). In this chapter we are going to further develop the algebra of events, called Boolean algebra, with particular application to fault trees. Boolean algebra is especially important in situations involving a dichotomy: switches are either open or closed, valves are either open or closed, events either occur or they do not occur. The Boolean techniques discussed in this chapter have immediate practical importance in relation to fault trees. A fault tree can be thought of as a pictorial representation of those Boolean relationships among fault events that cause the top event to occur. In fact, a fault tree can always be translated into an entirely equivalent set of Boolean equations. Thus an understanding of the rules of Boolean algebra contributes materially toward the construction and simplification of fault trees. Once a fault tree has been drawn, it can be evaluated to yield its qualitative and quantitative characteristics. These characteristics cannot be obtained from the fault tree per se, but they can be obtained from the equivalent Boolean equations. In this evaluation pr0cess we use the algebraic reduction techniques discussed in this chapter. We present the rules of Boolean algebra in Table VIII along with a short discussion of each rule. The reader is urged to check the validity of each rule by recourse to Venn diagrams. Those readers who are mathematically inclined will detect that the rules, as stated, do not constitute a minimal necessary and sufficient set. Here and elsewhere, the authors have sometimes sacrificed mathematical elegance in favor of presenting things in a form that is more useful and understandable for the practical system analyst. According to (la) and (lb), the union and intersection operations are commutative. In other words, the commutative laws permit us to interchange the events X, Y with regard to an "AND" operation. It is important to remember that there are mathematical entities that do not commute; e.g. the vector cross product and matrices in general. Relations (2a) and (2b) are similar to the associative laws of ordinary algebra: a(bc) = (ab)c and a + (b+c) = (a+b) + c. If we have a series of "OR" operations or a series of "AND" operations, the associative laws permit us to group the events any way we like. The distributive laws (3a) and (3b) provide the valid manipulatory procedure whenever we have a combination of an "AND" operation with an "OR" operation. If we go from left to right in the equations, we are simply reducing the lefthand expression to an unfactored form. In (3a), for example, we operate with X on Y and on Z to obtain the righthand expression. If we go from right to left in the equations, we are simply factoring the expression. For instance, in (3b) we factor out X to obtain the lefthand side. Although (3a) is analogous to the distributive law in ordinary algebra, (3b) has no such analog.
VIII
n (X u V') = X' n V' = (X u V)' x' • (X + V') = X' . V' = (X + V)' These relationships are unnamed but are frequently useful in the reduction process. if X occurs then (X+Y) has also occurred and XC(X+Y). Whenever the occurrence of X automatically implies the occurrence ofY. We can symbolize this situation as XCY or X+Y. We can develop a similar argument in the case of (Sb). and n (I. In engineering notation S2 is often replaced by 1 and ¢by O. The idempotent laws (4a) and (4b) allow us to "cancel out" any redundancies of the same event. Then X' . Rules of Boolean Algebra Mathematical Symbolism Engineering Symbolism X'V = V'X X+V = V+X Designation Commutative Law (1a) xnv = vnx (1b) X u V = V U X (2a) X n (V n z) = (X n V) n Z x. In this case X+¥= Y and XY= X. The laws of absorption (Sa) and (Sb) can easily be validated by reference to an appropriate Venn diagram. De Morgan's theorems (7a) and (7b) provide the general rules for removing primes on brackets. In (Sa). then X is said to be a subset ofY. therefore X(X+Y) = X.) (9a) X U (X' n V) x+ x'· V = X+V (9b) X. (V • Z) = (X • V) • Z X(VZ) = (XV)Z X + (V + Z) " (X + V) + Z X • (V + Z) = X • V + X • XIV + Z) = XV + XZ Associative Law (2b) X u (V u z) + (X u V) u Z (3a) xn(vuz) = (xnv)u(xnz) z Distributive Law (3b) X U(V nz) = (X UV) niX UZ) X + V • Z = (X + V) • (X + Z) (4a) X nx = X (4b) X ux = X X· X X+X = X Idempotent Law =X = X (5a) X n (X u VI = X (5b) X u (X n V) = X (6a) X nx' = </J x .VD2 FAULT TREE HANDBOOK Table VII2. Suppose that X represents the failure of some component. *The symbol I is often used instead of n to designate the Universal Set. (X + V) Law of Absorption x+x.v=x x· X' = ¢ X + X' " n = I (X')' = X (X • V)' = Complementation (6b) X u X' = H = 1* (6c) (X')' =X (7a) (X n V)' = X' u V' (1b) (X x' + V' de Morgan's Theorem u V)' = X' n V' (X+V)· = X"V' (Ia) ¢nx (Ib) <. we can also argue in the following way. With respect to (Sa).)' " (Bf) n n' = <.e) s~nx = X n·x " x n+x = n t/J' = 52 n' "¢ = Xu V (Id) nUx (Ie) <.)UX = 1> =X =H 9' x "¢ rp+ x " x Operations with q.
The original expression has been substantially simplified for purposes of evaluation. (D+B)o(D+C) = D+(BoC). In this light (7a) simply states that for the double failure of X and Y not to occur. We can apply (3b) to (A+B) (A+C) obtaining (A+B) (A+C) = A+(BC). are being used. (a) [(AoB) + (AoB') + (A'oB')]' = (AoB)' (AoB')' (A'oB')' = (A'+B') (A'+B) (A+B) = A' + (B'oB) (A+B) = (A'+¢) (A+B) = A' (A+B) = (A' A) + (A' B) = ¢ + (A'oB) = A' B. either X must not fail or Y must not fail. There are more general rules for simplifying Boolean functions and we will discuss these later in this chapter. A few examples of this procedure are now given. have as our final result (A+B)o(A+C)o(D+B)o(D+C) = BoC + AoD. In either case the removal of primes is accomplished by using (7a) or (7b). Ifwe now let E represent the event BoC.(9b). 0 0 0 0 0 0 0 0 0 0 . For the moment we are concerned with what can be accomplished by more or less unsystematic manipulation of the algebra. The reader should work carefully through these illustrations and ascertain at each step which of the rules. we have (A+BC)o(D+BC) = (A+E) (D+E) = (E+A)o(E+D). As an application of the use of these rules. (1 a) . This example can be worked either by (a) removing the outermost prime as a first step or by (b) manipulating the terms inside the large brackets and removing the outermost prime as a last step.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII3 represents the nonfailure or successful operation of that component. let us try to simplify the expression (A+B) (A+C) (D+B) (D+C). 0 0 0 0 0 0 0 0 Example IShow that [(Ao B) + (Ao B') + (A' B')]' 0 = A' 0 B. therefore. We thus have as an intermediate result (A+B) (A+C) (D+B)o (D+C) = (A+BoC)o(D+BoC). Another application of (3b) yields (E+A)o(E+D) = E+AoD = BoC + AoD. Likewise. We.
(A B' ·C')' = C + [(A' B') + (Bo A)] (A' BC')' (Ao B' c')' = (A+B'+C) (A'+B+C) = C + [(A+B') (A'+B)] = C + [(A+B') A' + (A+B') B] = C + [(A'· A) + (A'· B') + (Bo B') + (Bo A)] = C + [p + (A' B') + ¢ + (Bo A)] = C + [(A'oB'. we proceed from the "higher" faults to the more basic faults (i. Application to Fault Tree Analysis In this section we shall relate the Boolean methodology to fault trees. 0 Now let L= XoY.) + (BA)]. Notice in this example that A. certain .. 0 0 0 0 0 0 0 0 0 0 Example 3Show that [(Xo Y) + (Ao BoC)] 0 [(Xo Y) + (A'+B'+C')] = Xo Y Applying de Morgan's theorem to the second tenn inside the second bracket.VII4 (b) = = = = = = [(AoB)+(AoB')+(A'oB')]' [(Ao(B+B') + (A' B')]' [A·n + (A'oB'»)'. The gate output is the "higher" fault event under consideration and the gate inputs are the more basic ("lower") fault (or failure) events which relate to the output. 2.B.C are completely redundant. we have [(X Y) + (A B C)] 0 0 0 0 [(X Y) + (A· B· C)']. When we draw a fault tree. The basic symbol is the "gate" and each gate has inputs and an output as shown in Figure VIII. is a logic diagram depicting certain events that must occur in order for other events to occur. as we now know. and we have (L+M)· (L+M') = L 0 (M+M') = L + n = L= X·Y and the original statement is proved.e. 0 FAULT TREE HANDBOOK [A + (A' B')]' 0 [(A+A') (A+B')]' [n • (A+B')]' [A+B']' = A'· B. from output to inputs). 0 Example 2Show that (A' BoC')' . The events are termed "faults" if they are initiated by other events and are tenned "failures" if they are the basic initiating events. The fault tree interrelates events (faults to faults or faults to failures) and certain symbols are used to depict the various relationships (see Chapter IV). M= (ABoC). In this process (Chapter V). A fault tree.
there is a onetoone correspondence between the Boolean algebraic representation and the fault tree representation.. as shown in Figure VII2.. from Equation (VI4): P(Q) = P(A) + P(B). Either of the events A or B." the ORgate is sometimes drawn with a "+" inside the gate symbol as in Figure VII2. is equivalent to the Boolean expression. Anyone or more of the input events must occur to cause the event above the gate to occur. For n input events attached to the ORgate. the ORgate with two input events. The Gate Function in a Fault Tree techniques are used to determine which category of gate is appropriate. The two basic gate categories are the ORgate and the ANDgate. the equivalent Boolean expression is Q = Al + A2 + A 3 + . The ORGate The fault tree symbol CJ is an ORgate which represents the union of the events attached to the gate.. or both must occur in order for Q to occur. Because of its equivalence to the Boolean union operation denoted by the symbol "+. The ORgate is equivalent to the Boolean symbol "+. + An' In the terms of probability. INPUT Figure VIII. Q=A+B. Because these gates relate events in exactly the same way as the Boolean operations that we have just discussed...." For example.p(AnB) or = P(A) + P(B)P(A)P(BIA) (VIII) . I OUTPUT GATE I .INPUT 1 11INPUT .BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII5.
The probability expression for the output event Q of an EXCLUSIVE ORgate is: P(Q)EXCLUSIVE OR= peA) + PCB) . but not both. In Chapter IV. B. These observations. then P(BIA) = PCB) and P(Q) = peA) + PCB) .peA) =PCB).VII6 FAULT TREE HANDBOOK Q A B Figure VII2.peA) PCB). peA) + PCB) ~ peA) + PCB) . then P(BIA) = 1 and P(Q) = peA) + PCB) ..p(AnB) for all A. in all cases.e. we briefly discussed the EXCLUSIVE ORgate.2P(AnB) (VII2) . • If event B is completely dependent on event A. then p(AnB) is small compared with peA) + PCB) so that peA) + PCB) is a very accurate approximation of P(Q). The reader will remember peA) + PCB) as the "rare event approximation" discussed in Chapter VI. B also occurs. i. the output event Q of an EXCLUSIVE ORgate with two input events A and B occurs if event A occurs or event B occurs. a conservative estimate for the probability of the output event Q. As the reader should remember. • The approximation P(Q) =:: peA) + PCB) is. • If A and B are independent events. whenever A occurs. that is. low probability events (say peA). PCB) < 10 1). we can make the following observations about Equation (VIII): • If A and B are mutually exclusive events. especially the last two. will become very important when we discuss quantification in Chapter XI. then p(AnB) = a and P(Q) = peA) + PCB). • If A and B are independent. A TwoInput ORGate From our discussion of probabilities in Chapter VI.
A Specific TwoInput ORGate Tllis ORgate is equivalent to the Boolean expression RELAY CONTACTS FAIL TO OPEN RELAY COIL NOT DE·ENERGIZED + CONTACTS FAIL CLOSED Instead of explicitly describing the events. all references to the ORgate should be interpreted as the inclusive variety. It may sometimes. A2 . unless otherwise noted. it should be observed that in any case. the error which is made by using the inclusive rather than the exclusive ORgate biases the answer on the conservative side because the inclusive OR has the higher probability. Figure VII3 shows a realistic example of an ORgate for a fault condition of a set of normally closed contacts. low probability component failures. In conclusion. if the event "relay contacts fail to open" is labeled "Q. be useful to make the distinction in the special case where the exclusive OR logic is truly required. we observe that if A and Bare independent low probability component failures. however. a unique symbol (Q." "relay coil not deenergized" is labeled "A. This is why the distinction between the inclusive and exclusive ORgates is generally not necessary in Fault Tree Analysis where we are often dealing with independent. the intersection term may be large enough to significantly effect the result. In this latter case." . In the remainder of this text. the difference in probability between the two expressions is negligible.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII7 Comparing Equations (VIII) and (VII2). and in addition where there is a strong dependency between the input events and the failure probabilities are high. Therefore. NORMALLY CLOSED RELAY CONTACTS FAIL TO OPEN RELAY COIL NOT DE·ENERGIZED RELAY CONTACTS FAIL CLOSED Figure VII3. etc.) is usually associated with each event as shown in Figure VII2.
VII'8 FAULT TREE HANDBOOK and "contacts fail closed" is labeled "B. If wire failures are ignored then the fault tree representation of the event. A Specific ThreeInput ORGate ." we can represent the ORgate of Figure VII3 by the Boolean equation Q=A+B. "No Current to Point B" is shown in Figure VII5. The points A and Bare points on the wire. if anyone or more of these elementary events occurs. The input events to an ORgate do not cause the event above the gate. Consider two switches in series as shown in Figure VII4. they are simply reexpressions of the event above the gate. An ORgate is merely a reexpression of the event above the gate in terms of the more elementary input events. Two Switches in Series NO CURRENT TO POINT B SWITCH 1 IS OPEN SWITCH 2 IS OPEN NO CURRENT TO POINT A Figure VII5. The event above the gate encompasses all of these more elementary events. This interpretation is quite important in that it characterizes an ORgate and differentiates it from an ANDgate. but we feel it is so important to fault tree analysis that we cover it again. We have discussed this topic before. having now gone through the Boolean algebra concepts. SWITCH 1 SWITCH 2 _I"'~ e~ + • Figure VII4. then B occurs.
that is. Event B is merely a reexpression of events AI' A2 . the equivalent Boolean expression is In this case. For n input events to an ANDgate. For two events attached to the ANDgate. All of the input events attached to the ANDgate must exist in order for the event above the gate to occur. the equivalent Boolean expression is Q=A· B. (VII3) From our discussion of probability in Chapter VI. from Equation (VIlO): P(Q) = P(A)P(BIA) = P(B)P(AIB). that symbol is sometimes included inside the ANDgate as in Figure VII6. then the Boolean representation is B = Al +A 2 +A 3 .BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII9 If the events are denoted by the symbols given below. then P(Q) may be significantly greater than P(A)P(B). A 3 . in the extreme case where B depends completely on A. P(AIB) = peA). as shown in Figure VII6. whenever A occurs.". B also occurs. We have classified the particular events AI' A2 . NO CURRENT TO POINT B = EVENT B SWITCH 1 IS OPEN = EVENT A1 SWITCH 2 IS OPEN = EVENT A2 NO CURRENT TO POINT A = EVENT A3 The event B occurs if Al or A 2 or A 3 occurs. For example. event Q will occur if and only if all the Ai occur. In terms of probability. tbn P(BIA) = 1 and P(Q) = peA). • If A and B are not independent events. . The ANDgate is equivalent to the Boolean symbol ". The ANDGate The fault tree symbol 0 is an ANDgate which represents the intersection of the events attached to the gate. we can make the folloWing observations about Equation (VII3): • If A and B are independent events.". then P(BIA) = PCB). Because of its equivalence to the Boolean intersection operation denoted by the symbol ". A 3 as belonging to the general event B. and P(Q) = peA) PCB).
. We conclude this section with an example showing how Boolean algebra can be used to restructure a fault tree. A TwoInput ANDGate Again. then the gate is an ORgate and the event is merely a restatement of the input events. The rules of Boolean algebra can thus be applied to restructure the tree to a simpler.(B+C). which allows quantitative and qualitative evaluations to be performed in a straightforward manner. called the minimal cut set form.VIItO FAULT TREE HANDBOOK Q A • I I I A I B Figure VII6.B) + (AC). Thus. then the gate is an ANDgate and the inputs constitute the cause of the event above the gate. Event Q is caused only if every one of the input events occurs. If the event above the gate occurs when anyone of the input events occurs. the reader should bear these observations in mind for future reference when we discuss fault tree quantification in later chapters. there is not one "correct" fault tree for a problem but many correct forms which are equivalent to one another. equivalent form for ease of understanding or for simplifying the evaluation of the tree. we shall apply the rules of Boolean algebra to obtain one form of the fault tree. This causal relationship is what differentiates an ANDgate from an ORgate. Consider the equation D = A . The fault tree structure for this equivalent expression for D is shown in Figure VII8. The two fault tree structures in Figures VII? and VII8 may appear to be different. Later. If the ev~nt above the gate occurs only when combinations of more elementary events occur. they are equivalent. The corresponding fault tree structure is shown in Figure VII? Now according to Rule 3a. however. event D can also be expressed as D = (A. The events attached to the ANDgate are the causes of the event above the gate.
BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIIII o A BORC B C Figure VII7. Fault Tree Structure for D = A • (B+C) o AANDB AANDC A B A c Figure VII8. Equivalent Form for the Fault Tree of Figure VII7 .
the result is referred to as an "expansion in minterrns. a Boolean function of n variables can be expanded about one. X 3' . For instance if E designates some event of interest. 1. For this reason. .. First some preliminaries need to be reviewed. . In . we must remember that the dots (.• .. .VII12 3. or all n of the variables... X2 . Xn) using the rules of Boolean algebra.. X3 . FAULT TREE HANDBOOK Shannon's Method for Expressing Boolean Functions in Standardized Fonns In the previous sections we have discussed how Boolean functions may be expressed in a number of ways by applying the algebraic rules presented earlier in this chapter. . . The reader can readily show that equation (VII4) is correct by considering the only two possibilities. X 3 . For instance. X 2 ) = Xl • X 2 . X 3 .. Le. indeed. then E=l indicates that the event has occurred and E=O indicates that the event has not occurred. . ~) = [X I • X 2 • f(1.· .. all of its arguments.· . According to Shannon's method. Xn : f(X I . when it is taken about all n of the variables). . Xn)' The Boolean function can take on only two values: 1 (occurs) and 0 (does not occur). X 3' .. X 3 . . 0. Xn )] + Xl • X' 2 • f(1. ... The symbolism f(1. Xn) about X2 in the manner of equation (VII4). X 2 .. Xn) and f(O.. Xn' the Boolean expressions lying outside the functional representations are called minterms which consist of combinations of certain X's occurring and others not occurring. the extension of equation (VII4) to the expansion about the two variables Xl and X2 is: f(X I ' X2 . A Boolean variable is a twovalued variable.. In this section we shall discuss Shannon's method for expanding Boolean functions. . For example if f(X I .. When the expansion is "complete" (Le. ~)] + X' I • X' 2 • f(O. then f(1.. Xl occurs or does not occur. or. three. This function may be expanded about one of its arguments (say Xl) in the following way: (VII4) In equation (VII4). .. theorems in Boolean algebra are much more easily proved than theorems in ordinary algebra in which the variable may take on an infinity of values.) and pluses (+) represent the intersection and union operations. Xn )] (VII5) The reader should note that equation (VII5) can be obtained by simply expanding f{1.. X2 . . Consider a function of the n Boolean variables Xl' X2 . 1.· . 0. because we are dealing with events. X2 ." The latter constitutes a standard or canonical form consisting of all combinations of occurrences and nonoccurrences of the events of interest. Equation (VII4) can be extended to the expansion of a Boolean function about two. X3' .. two. X 2 . Xn)] + X'I· X 2 • f(O. This method is a general expansion technique applicable to any Boolean function. When the expression is carried out about all the variables Xl' X2 .. X 3 .. Xl = 1 and Xl = 0. . ... . . X2 ) = 1 • X 2 = X2 . . . Xn) indicates that 1 has been substituted for Xl' We may simplify f(1.· .
0)] + [X'. Z) = [X·Y·Z·f(I. 1)] + [X·Y·Z I·f(1. then. Such an expansion. Thus from equation (VII4).. Xn ) X'I •X' 2 • f(O. 1. Thus in equation (VIIS) the four terms X I 'X 2 • f(I.. Another reason for representing a Boolean function in either minterms or maxterms is that these expressions are unique for any given function. provides a general technique for determining whether two Boolean expressions are equal.. *Two events are disjoint if their intersection is the null set. X 3 ' .0)] + [X·Y'·Z·f(1. the maxterm expansion in one variable is f(X I ' X 2 ' . Instead of obtaining a sum (union) of combinations (intersections) as in the minterm expansion. two events are disjoint if the probability of both events is zero. The maxterm expansion is simply obtained from the minterm expansion by interchanging' for + and a for 1.. We shall start with (a) and expand it using the minterm expansion. and vice versa. 1)] + [XI·Y·Z'·f(O. X3 ' . X I . 0)]. Xn ) are disjoint from one another. 1. or equivalently. because if they are. in a few simple algebraic steps. . each combination term is disjoint* from all the others.1)] + [X'·Y'·Z'·f(O.1)] (VII7) + [X·Y·Z'·f(1. Shannon's expansion will now be illustrated for the two 3variable functions: (a) (b) f(X. 1. 1. the complementary maxterm expansion can be obtained for any Boolean function. (X· Y) + (X' .Y'. they will have identical minterm (or maxterm) forms. ~) = [X I + f( 0. .Z'f(O. Each minterm expression will have as a "coefficient" the function f evaluate(f at the appropriate 1's and D's corresponding to the occurrences and nonoccurrences of X. 0. . we obtain a combination (intersection) of sums (unions) for the maxterm expansion. ~)] . X3""'~) X I 'X'2 • f(l.Y. This characteristic is generally true for any order expansion and it can be exploited in quantifying fault trees whenever Shannon's expansion is used. In the minterm (or maxterm) expansion.. 0. 0.Z) = (X·y) + (X'·Z) + (Y·Z) f(X. 0. that the functions in (a) and (b) are indeed equal but we wish to use (a) and (b) to exemplify the general process involved in using Shannon's expansion. X3""'~) X'I •X2 • f(O.. Xx' . .Y. In addition to the minterm expansion. 0)] + [X"Y'Z'f(O. 1.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII13 the complete expansion there will be 2 n minterms consisting of all combinations of the possible occurrences and nonoccurrences of the X variables.Z) = (X·y) + (X'·Z). 1.Z) + (y. 0. [X' I + f( 1. ~)] (VII6) Analogous forms are obtained for higher order expansions. 0. The reader can show.
O. 1.O)] .I)=(11)+(OI)+(11)= 1+0+1 = I f(1.0) + (1'0) = 0+0 = ° ° ° ° It is now apparent that (b) has exactly the same minterm expansion as (a) and therefore we have proved that (X· Y) + (X' •Z) + (Y Z) = (X.0) + (0·1) + (0·1) = 0+0+0 = f(l .I) = (00) + (11) + (01) = 0+1 +0 = 1 f(O.l) = (10) + (01) = 0+0 = f(1 . f(1. The reader should note from Figure VII9 two important properties of the minterms: They are all mutually disjoint and their collective union yields the universal set.Y) + (X' Z).I. for expression (b). etc.0. f(1.VII14 FAULT TREE HANDBOOK Note that this expression is valid for any 3variable Boolean function.O.O.1)] .O) = (00) + (10) + (00) = 0+0+0 = ° ° ° ° When these values are substituted into the expanded form of (a) above.0)] .I) = (0.l). can be readily evaluated by making the appropriate substitutions in the original functional form as follows: f(1.0) = (01) + (10) + (10) = 0+0+0 = f(O.I. [X'+Y+Z+f{1. 1).0) = 0+0 = f(O.O)] • [X+Y'+Z'+f(O.[X'+Y+Z' +f(1. I. f(1.[X'+Y'+Z'+f(1.I) = (01) + (11) + (11) = 0+ 1+1 = 1 f(O. (YZ) = (XYZ) + (XYZ') + (X'YZ) + (X'Y'Z).0)=(11)+(O0)+(10)= 1+0+0= 1 f(1 .O. It is of interest to view the minterms of equation (VII?) with the help of Venn diagrams.O.1)] • [X'+Y'+Z+f{l . .O. the result is the unique minterm expansion of (a): (XY) + (X' Z) +. 0). The maxtermexpansion of a 3variable Boolean function can be obtained from an extension of equation (VII6) as follows: f(X. Z) = [X+Y+Z+f(O.1 .O.1 .O) = (0.0). Now f(1.. It now remains to evaluate f(I. This is shown in Figure VII9.1.O)] .I.1)] (VII8) The reader should now be able to show that the maxterm expansions of (a) and (b) are identical.I)=(O'O)+(II)=O+1 = 1 f(O. in which all of the terms in parentheses are disjoint.1 .0) = (10) + (0.I.I.1 .I)] • [X+Y'+Z+f(O.1) = (1..O) = (0.O)={lI)+(OO)= 1+0= 1 f(1. Y.I.0) = (10) + (00) + (00) = 0+0+0 = f(O.O.1) + (lI) = 0+ 1 = 1 f(O.1.0.[X+Y+Z'+f(O.O.I)={lI)+(OI)= 1+0= 1 f{l.1) + (10) = 0+0 = f(O. etc.I.I.0.
Any fault tree will consist of a finite number of minimal cut sets. a minimal cut set is thus a combination (intersection) of primary events sufficient for the top event. By the definition." The minimal cut sets define the "failure modes" of the top event and are usually obtained when a fault tree is evaluated. The combination is a "smallest" combination in that all the failures are needed for the top event to occur. The twocomponent minimal cut sets represent the double failures which together will . if one of the failures in the cut set does not occur. The onecomponent minimal cut sets. Once the minimal cut sets are obtained. The minimal path sets are essentially the complements of the minimal cut sets and define the "success modes" by which the top event will not occur. will cause the top event to occur.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIIIS X·V·z X • V • Z' X • V' • Z X • V' • Z' X' • V • Z X' • V • Z' X' • V' • Z X' • V' • Z' Figure VII9. represent those single failures which will cause the top event to occur. however. Determining the Minimal Cut Sets or Minimal Path Sets of a Fault Tree One of the main purposes of representing a fault tree in terms of Boolean equations is that these equations can then be used to determine the fault tree's associated "minimal cut sets" and "minimal path sets. The minimal path sets are often not obtained in a fault tree evaluation. then the top event will not occur (by this combination). they can be useful in particular problems. Venn Diagram Representation of the Minterms of a 3Variable Boolean Function 4. the quantification of the fault tree is more or less straightforward. Minimal Cut Sets We can formally define a minimal cut set as follows: a minimal cut set is a smallest combination of component failures which. if they all occur. if there are any. which are unique for that top event.
the tree is first translated to its equivalent Boolean equations and then either the "topdown" or "bottomup" substitution method is used. An example of a top event expression is T=A+BC where A. are used to remove the redundancies. This top event has a onecomponent minimal cut set (A) and a twocomponent minimal cut set (BC). Example Fault Tree . the equivalent Boolean equations are shown below the tree. Figure VIIto.. the distributive law and the law of absorption. etc.VIII 6 FAULT TREE HANDBOOK cause the top event to occur. Consider the simple fault tree shown in Figure VIItO. are basic component failures on the tree. The methods are straightforward and they involve substituting and expanding Boolean expressions. To determine the minimal cut sets of a fault tree. and hence the general ncomponent minimal cut can be expressed as where Xl' X2 . B. Two Boolean laws. The minimal cut set expression for the top event can be written in the general form. Each minimal cut set consists of a combination of specific component failures. where T is the top event and M1 are the minimal cut sets. and C are component failures. The minimal cut sets are unique for a top event and are independent of the different equivalent forms the same fault tree may have. For an ncomponent minimal cut set. all n components in the cut set must fail in order for the top event to occur.
C'C =C. Therefore. Finally. The fault tree can thus be represented as shown in Figure VII·II which is equivalent to the original tree (both trees have the same minimal cut sets). substituting for E4 and applying the law of absorption twice T = C + (A· B)' A + (A' B)' B =C + A'B. Figure VIIII. Substituting for E 1 and E2 and expanding we have: T =(A+E 3 ) • (C+E4 ) =(A' C) + (E 3 'C) + (E4 ' A) + (E 3 • £4) Substituting for E 3 : T = A· C + (B+C) • C + E4 •A + (B+C) • E4 = A·C + B·C + C'C + E4 ·A + E4 ·B + E4 ·C.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VIII 7 T =E 1 ·E 2 E 1 = A+E 3 E 3 = B+C E 2 = C+£4 E4 = A'B We will first perform the topdown substitution. so we have: But A'C + B'C + C + E4 • C = C by the law of absorption. T = C + E4 • A + E4 • B. We start with the top event equation and substitute and expand until the minimal cut set expression for the top event is obtained. one single component minimal cut set and one double component minimal cut set. By the idempotent law. The minimal cut sets of the top event are thus C and A· B. Fault Tree Equivalent of Figure VIII 0 .
B. except that now the operation begins at the bottom of the tree and proceeds upward. suppose we have the pumping system shown in Figure VIII 2.C + A' B + C + A.B. Finally. we substitute into E2 to obtain E 2 =C+AB.B. The bottomup approach can be more laborious and timeconsuming. Equations containing only basic failures are successively substituted for higher faults. we can model this system by fault tree of Figure VIIII where: T = "no flow of water to reactor" C = "valve V fails closed" A = "pump 1 fails to run" B ="pump 2 fails to run" .C =C + AB. Water Pumping System Assume that out undesired event is "no flow of water to reactor. As a very simple example. and C. expanding and applying the absorption law. T = E 1 E 2 E 1 = A+E 3 E 3 = B+C E 2 = C+E4 E4 = AB Because E4 has only basic failures. The minimal cut sets of the top event are thus again C and A. however. we have T=(A+B+C)· (C+AB) = A'C + AA'B + B·C + B'A'B + C'C + C'A'B = A. Substituting into £1' we obtain E 1 = A+B+C so E 1 has three minimal cut sets A. substituting the expressions for E 1 and E2 into the equation for T. PUMP 1 VA~EV INEXHAUSTABLE WATER SOURCE REACTOR PUMP 2 Figure VII12. E 3 is already in reduced form having minimal cut sets Band C.C + A· B + B." Ignoring the contribution of the pipes. the minimal cut sets are now obtained for every intermediate fault as well as the top event. Consider again our example tree (we repeat the equivalent Boolean equations for the reader's convenience). B.VIII 8 FAULT TREE HANDBOOK The bottomup method uses the same substitution and expansion techniques. The minimal cut sets of £2 are thus C and A.
may be obtained directly from the original tree by complementing all the events and substituting ORgates for ANDgates and vice versa. The minimal path set expression for the top event T can be written as T' = PI + P2 + . This event is of great interest from the point of view of system safety. we can take the complement of the minimal cu t set equation and obtain the minimal . The minimal cut sets of the dual tree are the socalled "minimal path sets" of the original tree. the cut sets do not really provide any insights that are not already quite obvious from the system diagram. Now we know that the top event of a fault tree may be represented by a Boolean equation. whether we complement the top event equation or the tree itself. In more complex systems. there is also a Boolean equation for the complement (i. The terms PI' P2' .. In either case. we are applying de Morgan's theorem given in our Table of Boolean Algebra Laws.. These are discussed in Chapter XII. using either the topdown or bottomup method. This complemented tree. nonoccurrence) of the top event. We can find the minimal path sets of a given tree by forming its dual and then using either the topdown or bottomup method to find its minimal cut sets. The combination is a smallest combination in that all the primary event nonoccurrences are needed for the top event to not occur. called the dual of the original fault tree. . These cut sets are the minimal path sets of the original tree which we desire. the determination of the minimal cut sets. where the system failure modes are not so obvious. For smaller fault trees. the minimal cut set computation provides the analyst with a thorough and systematic method for identifying the basic combinations of component failures which can cause an undesired event. Alternatively.e. + Pk where T' denotes the complement (nonoccurrence) of T. In this simple case.BOOLEAN ALGEBRA AND APPLICATION TO FAULT TREE ANALYSIS VII19 We have just shown that the minimal cut sets of this tree are C and A· B. Px are the minimal path sets of the fault tree. in tum. . if anyone of the events occurs then the top event can occur. and because this equation may be complemented. can be done by hand. if the minimal cut sets of the tree have already been determined. From the point of view of reliability we would be more concerned with the prevention of the top event. however. • X' m where the Xi are the basic events in the fault tree and the X'i are the complements. This complemented equation.. various computer algorithms and codes for fault tree evaluation are available. Minimal Path Sets and Dual Fault Trees The top event of a fault tree represents system failure. . For larger trees. where a minimal path set is a smallest combination (intersection) of primary events whose nonoccurrence assures the nonoccurrence of the top event. corresponds to a tree which is the complement of the original tree. This tells us that our undesired event "no flow of water to tank" will occur if either valve V fails closed or both pumps fail to run... Each path set can be written as Pi = X' 1 • X' 2 • .
e. this tells us that we can prevent the undesired event and assure system success if either (1) Valve V is open and pump 1 is running. . we obtained the following minimal cut expression in the previous section. Applying de Morgan's theorem to the term (AB)' T' ="C'(A'+B' ) and using the distributive law (i. In terms of our pumping system of Figure VII12. or (2) Valve V is open and pump 2 is running. Taking the complement T' =C'(AB)' = (C + AB)' using de Morgan's theorem. the minimal path sets of the tree are C' A' and C' B' . For our sample tree. T= C + AB.. expanding) T' ="C ' A' + c ' B' Therefore.VU·lO FAULT TREE HANDBOOK path sets directly.
Pressure Tank System The function of the control system is to regulate the operation of the pump. When the tank is empty. First we present details of system operation. The tank is fitted with an outlet valve that drains the entire tank in an essentially negligible time. the pressure switch contacts open. and the cycle is repeated. The pressure switch has contacts which are closed when the tank is empty. From the trees so constructed. i. Consider now Figure VIIII which shows a pressure tank . is not a pressure relief valve.THE PRESSURE TANK EXAMPLE 1. SWITCH Sl PRESSURE TANK FROM RESERVOIR Figure VIIII. relay Kl contacts open.. System Defmition and Fault Tree Construction In this and the next chapter. some obvious conclusions will be drawn. and relay K2 contacts open. The latter pumps fluid from an infmitely large reservoir into the tank. however. but detailed evaluation procedures will be postponed until Chapter XI. the pressure switch contacts close. Initially the system is considered to be in its dormant mode: switch 81 contacts open. The reader will then be shown how the corresponding fault trees are developed stepbystep with the help of the rules described in Chapter V. The operational modes are given in Figure VIII2.e. When the threshold pressure has been reached. the control system is VIlll . removing power from the pump.CHAPTER VIII . the outlet valve. causing the pump motor to cease operation. we are going to defme undesired events for two simple systems.pumpmotor device and its associated control system. deenergizing the coil of relay K2 so that relay K2 contacts open. We shall assume that it takes 60 seconds to pressurize the tank.
STARTS TIMING TIMER REl PRESSURE SW .ENERGIZED IC10SED) .RESETS TO ZERO TIME PRESSURE SW .CONTACTS OPEN .FAILED CLOSED PUMP STOPS EMERGENCY SHUTDOWN S1 RELAY "1 RELAY'2 TIMER REL PRESSURE SW CONTACTS CONTACTS CONTACTS CONTACTS CONTACTS OPEN OPEN OPEN CLOSED CLOSED Figure VIII2.STARTS TIMING PRESSURE SW .MONITORING PRESSURE PUMP STARTS S1 RELAY '1 RElAY .CONTACTS CLOSED & MONITORING TRANSITION TO READY .MOMENTARILY CLOSED RELAY '1 .CONTACTS OPEN TIMER REL PUMP STOPS RELAY '"2 READY MODE S1 RELAY "1 RELAY ''2 TIMER REL PRESSURE SW CONTACTS OPEN CONTACTS CLOSED CONTACTS OPEN CONTACTS CLOSED CONTACTS OPEN & MONITORING RELAY '1 RELAY ''2 TIMER REL EMERGENCY SHUTDOWN TRANSITION (ASSUME PRESSURE SWITCHINGUP) . Pressure Tank Example .~ TIMER REt CONTACTS OPEN CONTACTS CLOSED CONTACTS CLOSEO CONTACTS CLOSED & TIMING PRESSURE SW .CONTACTS CLOSE RELAY ''2 PUMP STARTS PUMPING MODE DORMANT MODE SN SI RELAY 1 RELAY .ENERGIZED & LATCHED RELAY ''2 .~ TIMER REL PRESSURE SW CONTACTS CONTACTS CONTACTS CONTACTS CONTACTS OPEN OPEN OPEN CLOSEO CLOSED START UP TRANSITION S1 .TIMES OUT & MOMENTARIL Y OPENS PRESSURE SW .ENERGIZED (CLOSED) TIMER REL .TRANSITION TO PUMPING .DEENERGIZED (OPEN) .CONTACTS OPEN .
secondary.g. the timer resets to 0 seconds. For our undesired event let us take: RUPTURE OF PRESSURE TANK AFTER THE START OF PUMPTNG It will simplify things considerably if we agree to neglect plumbing and wiring failures and also all secondary failures except. Whether or not we include the statement in the diamond is moot. In this problem we shall establish our limit of resolution at the "component failure level. the one of principal interest: "tank rupture after the start of pumping. First we check to make sure that our top event is written as a fault and that it specifies a "what" and a "when." By "component" we shall mean those items specifically named in Figure VIIII. and command modes. This applies power to the coil of relay Kl. .PRESSURE TANK EXAMPLE VIII3 deenergized. The timer relay has been provided to allow emergency shutdown in the event that the pressure switch fails closed. The closure of relay Kl contacts allows power to be applied to the coil of relay K2. At any rate. We could just assume at the beginning that the tank was an appropriate one for the operating pressures involved. when the pressure switch contacts open (and consequently relay K2 contacts open). of course. Relay Kl is now electrically selflatched. This starts a clock in the timer. *In this case. In this deenergized state the contacts of the timer relay are closed. If the clock registers 60 seconds of continuous power application to the timer relay coil. we choose not to trace this fault any further. In normal operation. the timer relay contacts open (and latch in that position). there is no command mode." The reader may object that a system that includes an infinitely large reservoir and an outlet valve that drains the tank in a negligible time is unrealisticand now we are suggesting the neglect of plumbing and wiring faults that might contribute to the occurrence of the top event. Power is applied to the timer coil as soon as relay Kl contacts are closed." we immediately add an ORgate beneath the top event and consider primary. The point is that with this simplified system we can illustrate most of the important steps in fault tree construction. Thus the primary failure of the tank (e." Next we apply our test question: "Can this fault consist of a component failure?" Because the answer is "Yes. a fatigue failure of the tank wall) is already at the limit of resolution and is shown in a circle.* Our tree has now developed to the point shown in Figure VIII3.. however. We will also assume that the tank is empty and the pressure switch contacts are therefore closed. Initially the timer relay contacts are closed and the timer relay coil is deenergized. breaking the circuit to the Kl relay coil (previously latched closed) and thus producing system shutdown. thus closing relay KI contacts. In a more complex system the reader might tend to lose sight of the overall system and become too involved in the details. whose contacts close to start up the pump motor. System operation is started by momentarily depressing switch S1.
Fault Tree Construction . Its immediate cause is "power applied to motor for t > 60 seconds. the pump is simply operating and pump operation for any length of time cannot consist of a component failure. we introduce another ORgate and our tree assumes the form shown in Figure VIII4. the opportunity for error is lessened thereby. a secondary failure is the failure of a component in an environment for which it is not qualified.for which it is qualified. however. nothing is lost by jumping from "pump operates continuously for t > 60 seconds" directly to "K. necessary and sufficient cause or causes." There is. an ANDgate. no harm done in detailing the intermediate causes and. which is the failure of a component in an environment . we look for the minimum. Therefore this fault event must be classified "stateofsystem." We have now added the string of events shown in Figw~ VIII6. Notice that the fault spelled out in the rectangle is a specific case of the top event with a more detailed description as to cause." We now recall the rules of Chapter V. Here again we indicate.2 relay contacts remain closed for t > 60 seconds." a stateofsystem fault. . The reader will remember from Chapter V." a stateofsystem fault.Step 1 Thus our attention is now directed to the secondary failure of the tank. Can the input event to the Inhibit gate consist of a component failure? No. The immediate cause of the latter event is "K2 relay contacts remain closed for t> 60 seconds. Now it might happen that our tank could miraculously withstand continuous pumping for t > 60 seconds but an application of our "No Miracles" rule constrains us to the statement that the tank will always rupture under these conditions. in a diamond. or no gate at all. a set of conditions whose causes we choose not to seek.VIII4 FAULT TREE HANDBOOK RUPTURE OF PRESSURE TANK AFTER THE START OF PUMPING TANK RUPTURE (SECONDARY FAILURE) Figure VIII3. In this case. Because the secondary failure of the tank can consist of a component failure. Below a stateofsystem fault we can have an ORgate. that in contrast to a primary failure. In this case the immediate cause is "motor runs for t > 60 seconds. We can indicate this on the fault tree by using an Inhibit gate whose input is "continuous pump operation for t > 60 seconds" (see Figure VIII5). immediate. Fwthermore. as a matter of fact.
Step 3 . Fault Tree Construction .PRESSURE TANK EXAMPLE VIIIS TANK RUPTURE (SECONDARY FAILURE) TANK RUPTURES DUE TO INTERNAL OVER·PRES· SURE CAUSED BY CON· TlNUOUS PUMP OPERA· TlON FOR 60 SEC t> Figure VIII4. Fault Tree Construction .Step 2 TANK RUPTURES DUE TO INTERNAL OVER· PRESSURE CAUSED BY CONTINUOUS PUMP OPERATION FOR t> 60 SEC PUMP OPERATES CONTINUOUSLY FORt>60SEC Figure VIII5.
In this case. weld. The event of interest here is the conunand mode event described in the rectan'gIe. secondary. We thus draw an ORgate and add primary.shown in Figure VIII8.Step 4 We have now to consider the fault event." Can this consist of a component failure? Yes. all events that are linked together on a fault tree should . Notice that both input events to the ANDgate in Figure VIII8 are written as faults. "K2 relay contacts closed for t > 60 seconds.VIII6 FAULT TREE HANDBOOK PUMP OPERATES CONTINUOUSL Y FOR t> 60 SEC MOTOR RUNS FOR t > 60 SEC POWER APPlIEO TO MOTOR FOR t> 60 SEC K2 RElAY CONTACTS REMAIN CLOSEO FORt>60SEC Figure VIII6. and command modes as shown in Figure VIII7. the erroneous signal is the application of EMF to the relay coil for more than 60 seconds. but in the wrong place or at the wrong time because of an erroneous command of signal from another component. This stateofsystem fault can be analyzed as. as we know. the contacts could jam. or corrode shut. In fact. Fault Tree Construction . Recall that a command fault involves the proper operation of a component.
PRESSURE TANK EXAMPLE VIII7 K2 RELAY CONTACTS REMAIN CLOSED FOR t> 60SEC EMF APPLIED TO K2 RELAY COIL FORt>60SEC Figure VIII7. . statements in ellipses). perhaps. but when they are closed for greater than 60 seconds.g. Notice that the condition that makes this event a fault is framed in terms of the other input event to the ANDgate. The fault event. so both input events in Figure VIII8 are followed by ORgates. "pressure switch contacts closed for t > 60 seconds.. Fault Tree Construction .Step 5 EMF APPLIED TO K2 RELAY COIL FORt>60SEC A • I PRESSURE SWITCH CONTACTS CLOSED FORt>60SEC I EMF REMAINS ON PRESSURE SWITCH CONTACTS WHEN PRESSURE SWITCH CLOSED FOR t> 60 SEC Figure VIII8. those statements that are added as simply remarks (e. starting with the lefthand event (see Figure VIII9. Fault Tree Construction ." can consist of a component failure. that is a fault. Likewise the fact that an EMF is applied to the pressure switch contacts is not itself a fault. The pressure switch contacts being closed is not a fault per se. We analyze them separately.Step 6 be written as faults except.
Fault Tree Construction .g.VIII8 FAULT TREE HANDBOOK PRESSURE SWITCH CONTACTS CLOSEO FOR 60SEC t> Figure VIII9. we wish to pursue the event in the lefthand diamond somewhat further (e.Step 7 We see that this. EMF REMAINS ON PRES· SURE SWITCH CONTACTS WHEN PRESSURE SWITCH CONTACTS CLOSED FOR 60SEC t> EMF THRU K1 RelAY CONTACTS WHEN PRES· SURE SWITCH CONTACTS CLOSED FOR 60 SEC t> EMF THRU S1 SWITCH CONTACTS WHEN PRES· SURE SWITCH CONTACTS CLOSED FOR 60 SEC t> Figure VIIIIO. etc. The lefthand one is the more easily analyzed as shown in Figure VIIIII. Both the input events in Figure VIIIlO are stateofcomponent faults.. Fault Tree Construction . leg of the tree has reached its terminus (all input events are either circles or diamonds) unless. ruptured diaphragm.). We now analyze the righthand event in Figure VIII8 as shown in Figure VIIIlO.Step 8 . for some reason.
El E2 E3 E4 E5 Pressure tank rupture (top event). Finally. EMF on K2 relay coil for t > 60 seconds. Pressure tank rupture due to internal overpressure from pump operation for t > 60 seconds which is equivalent to K2 relay contacts closed for t > 60 seconds. we show the complete fault tree for the pressure tank example in Figure VIIII 3.Step 9 Here we have reached another tree terminus. Actually. the fault tree of Figure VIII13 (fould be considered too complete. are defined as follows: The E's are fault events. . where the circles represent primary failures as shown in the legend and the fault events EI.PRESSURE TANK EXAMPLE VIII9 EMF THRU S1 SWITCH CONTACTS WHEN PRESSURE SWITCH CONTACTS CLOSED FOR 60 SEC t> Figure VIIIII. other secondary failures (the dotted diamonds) could simply be omitted from the diagram. Further simplifications can also be made. E2 etc. EMF through KI relay contacts when pressure switch contacts have been closed for t > 60 seconds which is equivalent to timer relay contacts failing to open when pressure switch contacts have been closed for t > 60 seconds. Because the only secondary failure that we developed was the rupture of the pressure tank due to overpumping. leading to the basic fault tree of Figure VIII14. Fault Tree Construction . The analysis of the remaining input event in Figure VIIIIO is shown in Figure VIII12. EMF remains on pressure switch contacts when pressure switch contacts have been closed for t > 60 seconds. The reader will note from this latter figure that we have fma1ly worked our wayin this stepbystep fashiondown to timer relay faults.
VIIItO FAULT TREE HANDBOOK EMF THRU Kl RELAY CONTACTS WHEN PRESSURE SWITCH CONTACTS CLOSED FOR t> 60 SEC EMF NOT REMOVED FROM Kl RELAY COIL WHEN PRESSURE SWITCH CONTACTS CLOSED FOR t> 60SEC TIMER RELAY CONTACTS FAIL TO OPEN WHEN PRESSURE SWITCH CONTACTS CLOSED FORt>60SEC Figure VIII12. Fault Tree Construction .Final Step .
. (SECONDARY FAILURE) / " '> . ......... .. . ./ ""/" <...... .... .............. ....TANK RUPTURES ~UE TO IMPROPER SELECTION OF INSTALLATION (WRONG TANK) /" .." /" EXCESS PRESSURE NOT SENSED BY PRESSURE ACTUATED SWITCH ~ ...... /" PRESSURE SWITCH .
the safety of our system would be considerably enhanced by substituting a pair of relays in parallel for the single relay K2. The calculations involved in getting these numbers will be discussed later. Qualitatively. although we do note that the failure of the pressure switch contributes to each of them. EI appears as the union of various combinations (intersections) of basic events and is the minimal cut expression for the top event. It should be just the other way around! Thus. first. at this time we simply assume the values as being "data.VIII12 FAULT TREE HANDBOOK 2. to evaluate the fault tree so that we can assess. our system contains a much more serious design error: We are monitoring the controls instead of the parameter of interest (pressure in this case). Fault Tree Evaluation (Minimal Cut Sets) Fault tree evaluation in general will be taken up in Chapter XI. Actually. however. recall from Chapter V that the probability of the event T should be less (by an order of magnitude or so) than the probability of event K2. It is useful at this point. The next basic event. the primary failure of the pressure tank itself. From the basic fault tree of Figure VIII14 we can express the top event as a Boolean function of the primary input events using the method explained in Chapter VII. Because the tank is a passive component. the most obvious way to improve the system would be to install a pressure relief valve on the tank and remove the timer. Table VIIII gives values for the failure probabilities for the components in the system. is T. To make a quantitative assessment of our results we need estimates of failure probabilities for our components. We are now in a position to make. KI) + (S • R) This expression of the top event in terms of the basic inputs to the tree is the Boolean algebraic equivalent of the tree itself. armed with some data. and SoK2. the leading contributor to the top event is the single relay K2 because it represents a primary failure of an active component." . we have found five minimal cut setstwo singles and three doubles: K2 T S Sl So KI So R 0 Each of these defines an event or series of events whose existence or joint existence will initiate the top event of the tree. Of less importance are the three double component cut sets SoSI. In our example. This is accomplis)led by starting at the top of the tree and working down: EI = T+E2 = T + (K2 + E3) = T + K2 + (S • E4) = T + K2 + S • (Sl + E5) = T + K2 + (S • Sl) + (S • E5) = T + K2 + (S. a qualitative assessment of our results and then. however. a gross quantitative assessment. in a gross way.' Sl) + S • (KI + R) = T + K2 + (S • Sl) + (S . Therefore. SoKI. the strengths and particularly the weaknesses of the pressure tank control circuit. in order of importance.
E4.PRESSURE TANK EXAMPLE VBI13 E1 E2 E3 E4 E5 LEGEND: FAULTS E1 E2. Basic (Reduced) Fault Tree for Pressure Tank Example .E5 TOP EVENT INTERMEDIATE FAULT EVENTS PRIMARY FAILURE OF TIMER RELAY PRIMARY FAILURE OF PRESSURE SWITCH PRIMARY FAILURE OF SWITCH S1 PRIMARY FAILURE OF RELAY K1 PRIMARY FAILURE OF RELAY K2 PRIMARY FAILURE OF PRESSURE TANK R S S1 K1 K2 T Figure VIII14.E3.
This illustrates the importance of applying the rules described in Chapter V. .. One designrelated principle that arises out of this example is that a system should be designed so that ANDgates appear as close to the treetop as possible in an effort to eliminate single event cut sets. KI] P[S • R] P[S • Sl] = = 5 x 106 = = = 3 x 1O~5 (1 x 104 ) (3 x 105 ) =3 x 109 (1 x 104 ) (1 x 104 ) = I x 108 (1 x 104 ) (3 x 105 ) = 3 x 109 We now wisht<> estimate the probability of the top event..e.4 x 105 The relative quantitative importance of the various cut sets can be obtained by taking the ratio of the minimal cut set probability to the total system probability: Cut Set T Importance K2 S • KI S·R S • Sl I 14% 86% Less than 0. The most frequent analytical error made by these students is the tendency to leap directly from the event of rupture to the pressure switch. we conclude that the rare event approximation of equation (VI7) is applicable. The family automobile is a good example of a system which is not of this type.1 % The pressure tank example has been provided on numerous occasions as a workshop exercise for students learning the basics of fault tree construction. the primary failure pf K2. We therefore simply sum the minimal cut set probabilities and obtain: P(E 1) ~ 3. Observing that the top event probability is given by the probability of the union of the minimal cut sets. i. is missed completely. and the reader can amuse himself by making a lengthy list of single events that will immobilize his car. P[T] P[K2] P[S . Failure Probabilities for Pressure Tank Example COMPONENT Pressure Tank Relay K2 Pressure Switch Relay KI Timer Relay Switch Sl SYMBOL T F AlLURE PROBABILITY K2 S KI R Sl 5 3 I 3 I 3 x 106 x 105 X 104 x 105 x 104 x 105 Because a minimal cut is an intersection of events. E1 . and that the probability of each individual minimal cut set is low.vm14 FAULT TREE HANDBOOK Table VIIII. the probabilities associated with our five minimal cut sets are obtained by multiplying the appropriate component failure probabilities (assuming independence of failures). When this is done the single failure minimal cut set K2.
 . and. Finally. Once K3 has closed." WASH1400 (NUREG75/014). I O". the closure of K6 applies power from Battery I to the coil of K7. After an interval of 60 seconds.~ KTl ~ . a momentary depression of pushbutton SI applies power from Bi. System Definition and Fault Tree Construction In this chapter we discuss a somewhat more complicated example of fault tree construction and evaluation..CHAPTER IX ..S. TEST SIGNAL L. the purpose being to check the proper operation of Motors 1. .~ .THE THREE MOTOR EXAMPLE 1. KT2. In that case KTI opens to deenergize KI and Motor I stops. KT2. * Figure IXI displays a power distribution box and Figure IX2 gives the system modus operandi.." IXl .. The closure of KS applies power from Battery 2 to the coil of K6 and also starts Motor 2. K3 is supposed to open...0'1.2. a 60second test signal is impressed through K3. shutting down the operation of all three motors." Section 2.. and 3.&. Should K3 fail closed after the expiration of 60 seconds... but K4 fails closed. I I I I I ~II BATTERY 1 ~~ . KT3) open.. KT2 KT3 I I I .. deenergizing the coil of KI.lttery I to the coils of cutthroat relays KI and K2. I • I I ' " rll~r4OL: OiOR ~ _ BATTERY 2 KT 3 Figure IXI. Appendix II. I I A. Thereupon KI and K2 close and remain electrically latched. KT3 normally closed. With contacts KTI..@J. thus shutting down system operation.. all three timers (KTI. The closure of K4 starts Motor I. Power Distribution Box Fault Tree Example Next. SlL.S. power from Battery I is applied to the coils of relays K4 and KS.. *For an actual example of a fault tree analysis applied to a major nuclear safety system see U. Commercial Nuclear Power Plants. "Fault Tree Analysis. • I I~ ..~ • I ."". "Fault Trees.. . Nuclear Regulatory Commission.. "Reactor Safety StudyAn Assessment of Accident Risks in U. Suppose K3 opens properly at the end of 60 seconds. KT2 and KT3 act Similarly to stop Motor 2 or Motor 3 should either KS or K7 fail closed. Closure of K7 starts Motor 3. .
TRANSITION TO STANDBY MODE K3 K4 K5 KG K7 I CONTACT TRANSFER OPEN KT1 KT2 KT3 } TIMER RESET ~ >. OPERATE MODE K3 K4 K5 KG K7 KT1 KT2 KT3 ) t J I } CONTACTS CLOSED TRANSFER CLOSED START TIMING CONTACTS K1 K2 K3 K4 K5 KG K7 KT1 KT2 KT3 I } CONTACTS CLOSED CONTACTS CLOSED & TIMING 1. GIVEN A PRIMARY FAILURE OF K4. STANDBY MOOt"· Sl .MOMENTARIL Y CLOSED j CONTACTS CLOSED K1 } CLOSED AND K2 ELECTRICAllY LATCHED K1 K2 K3 K4 K5 KG K7 KT1 KT2 KT3 } CONTACTS CLOSED CONTACTS OPEN TRANSITION TO. GIVEN A PRIMARY FAILURE OF K3. ALL TIMERS TRANSFER OPEN TO DEENERGIZE K1 COIL. SHUTDOWN MODE S1 K1 K2 K3 K4 K5 K6 K7 KT1 KT2 KT3 CONTACTS OPEN I } TRANSITION TO EMERGENCY SHUTDOWN CONTACTS OPEN CONTACTS OF ONE OR MORE OPEN Figure IX2. Power Distribution Box Component Status Diagram . K5. OR K7. THE SINGLE TIMER TRANSFERS OPEN AND DEENERGIZES Kl COIL. 2.< DORMANT MODE Sl K1 K2 K'3 K4 K5 KG K7 KT1 KT2 KT3 Sl STANDBY MODE CONTACTS OPEN Sl OPERATE MODE CONTACTS OPEN CONTACTS OPEN TRANSITION TO· .
Each of the three inputs to the ORgate must be analyzed separately. Thus our "tree top" appears as in Figure IX3. We see from Figure IXI that two fault events have to exist to cause the occurrence of the event of interest: (I) K5 relay contacts remain closed for t > 60 seconds.THREE MOTOR EXAMPLE IX3 OVERRUN OF ANY MOTOR AFTER TEST IS INITIATED EMF APPLIED TO MOTOR 1 FOR t> 60 SEC EMF APPLIED TO MOTOR 2 FORt>60SEC EMF APPLIED TO MOTOR 3 FOR t> 60 SEC Figure IX3. Limiting the analysis to the former problem for the moment. we are faced with the top event: EMF APPLIED TO MOTOR 2 FORt>60SEC Because this is a stateofsystem fault. "transfer out" symbols. as shown. are applied to the fault events relevant to Motor I and Motor 3. we have to decide how exhaustive our analysis is going to be. A fuller analysis (which we shall hold in abeyance for awhile) would also include possible wiring defects. we look for immediate necessary and sufficient causes. . (2) K2 relay contacts fail to open when K5 relay contacts have been closed for t > 60 seconds. The simplest approach would be to restrict ourselves to the failures of relays and switches. We propose to treat the overrun of Motor 2 first. and we shall do this first. "Treetop" for Motor Overrun Problem Our main concern in this problem is the "overrun" (t > 60 seconds) of anyone of the motors after test initiation. At this point. To avoid a clutter of trees on a single sheet.
however. To Simplify the problem. Notice that the failures of KS or K2 contacts to open are not. Fault Tree for Overrun of Motor 2 (Relay Logic Only) Because this is a stateofcomponent fault. when the time restriction is specified. and command inputs. to the command input which is . it demands an ORgate with primary. Figure IX4. faults. let us not pursue secondary failures.IX4 FAULT TREE HANDBOOK These two fault events are shown as inputs to the 2nd level ANDgate in Figure IX4 which displays the completed tree for overrun of Motor 2. secondary. We now direct our attention to the fault event: K5 RELAY CONTACTS REMAIN CLOSED FOR t>60 SECS. Figure IX4 shows only primary and command inputs. we tum our attention therefore. they become faults. The primary failure of KS relay is already a basic tree input (a limit of resolution). of themselves. Thus.
this strategy proves to be particularly useful inasmuch as this same fault event recurs in the analysis of the overruns of Motor I and Motor 3. we now have to consider the righthand input to the 2nd level ANDgate. The details of the rest of the right leg of the tree are left to the reader. we realize that it will be the "top event" of a fairly substantial subsidiary tree of its own. This latter event appears in a diamond in Figure IX4. . The fault event relevant to relay K3 is a stateofcomponent fault which demands an ORgate followed by primary and command inputs. This fault event is K2 RELAY CONTACTS FAil TO OPEN WHEN K5 RELAY COIHACTS CLOSED FOR t> 60 SEC This is a stateofcomponent fault whose command input is EMF NOT REMOVED FROM K2 RELAY COIL WHEN K5 RELAY CONTACTS CLOSED FORt>60SEC a state~fsystem fault. In this case. Returning now to the fault event relevant to Kl relay. This is really a single input event except that we have also chosen to consider the possibility (highly improbable) of an EMF on the K2 coil through the S1. The basic inputs are primary failures of Kl relay. We therefore "transfer out" to another sheet of paper (see Figure IX5). KT 1. This leg of the tree is now complete. The task of studying the details of the subsidiary fault tree of Figure IX5 is left to the reader. KT2 and KT3 contacts. Referring once more to Figure IX4.THREE MOTOR EXAMPLE IXS EMF REMAINS ON K5 COIL FOR t> 60 SEC We note that this event is a stateofsystem fault. the command input is shown in a diamond because its cause (or causes) lie "outside" of our defined system. These two events are shown in Figure IX4 as inputs to the 4th level ANDgate on the left. From Figure IXl we fmd that this fault event will occur when both K3 contacts and K 1 contacts fail to open with the time restriction t > 60 seconds. S1 switch and KT2 timer contacts. As a matter of fact.
. KT3 TIME R CONTACTS FAIL TO OPEN WtiEI\I K3 CONTACTS CLOSED FOR t '> 60 SEC Figure IX5. Analysis of Fault Event Relevant to Kl Relay .IX6 FAULT TREE HANDBOOK Kl RELAY CONTACTS FAIL TO OPEN WHEN KJ CONTACTS CLOSED FORt>60SEC EMF NOT REMOVED FROM Kl RELAY COIL WHEN K3 CONTACTS CLOSED FOR t / 60 SEC EMF TO Kl COIL THRU TIMER CIRCUIT WHEN K3 CONTACTS CLOSED FOR 60 SEC t> EMF TO K1 COIL THRU S1 CONTACTS WHEN K3 CONTACTS CLOSED FOR 60 SEC t> KT1 TIMER CONTACTS FAIL TO OPEN WHEN K3 CONTACTS CLOSED FOR 60 SEC t> KT2 TIMER CONTACTS FAIL TO OPEN WHEN K3 CONTACTS CLOSED FOR 60 SEC t> .
although similar. E 1 = E2 • E 3 E2 = K5 + E4 E3 = K2 + Es E4 = K3 • E6 Es = KI + KT2 + S I E6 = KI + E7 + Sl E 7 = KTI • KT2 • KT3 Next. starting from the bottom of the tree. each fault event is expressed in terms of the basic tree inputs. EMF not removed from KI relay coil when K3 contacts have been closed for t > 60 seconds. The reduced tree is shown in Figure IXII. and the fault events E 1 . KI relay contacts fail to open when KS contacts have been closed for t > 60 seconds.THREE MOTOR EXAMPLE IX' The fault trees relevant to overruns of Motor I and Motor 3 are shown in Figures IX6 and IX7. the tendency to transfer the fault event Es to the subsidiary tree of Figure IXS must be strongly resisted. KS relay contacts remain closed for t > 60 seconds. In the equations below each fault event is expressed in terms of its equivalent Boolean equation. The reader should note that events Es and E 6 . EMF remains on KS coil for t > 60 seconds. are not the same because the "when" is different. For ease of presentation we will deal with a reduced version of the fault tree of Figure IX4 where the diamonds and houses are removed. The reader may now care to try his hand at the analysis of a different top event: Motor I fails to start in presence of test signal. KI relay contacts fail to open when K3 contacts have been closed for t > 60 seconds. etc. are defined as follows: E1 E2 E3 E4 Es E6 E7 EMF applied to Motor 2 for t > 60 seconds. K2 relay contacts fail to open when KS contacts have been closed to t > 60 seconds. 2. In Figure IXII the circles represent primary failures of the components denoted. Fault Tree Evaluation (Minimal Cut Sets) We now turn to the qualitative evaluation of the fault tree of Figure IX4 and determine the minimal cut sets for the overrun of Motor 2. Thus. and IX9. A solution for Motor I is displayed in Figure IXlO. E 7 = KT 1 • KT2' KT3 E6 = Kl + (KTI • KT2 • KT3) + SI Es = Kl + KT2 + Sl . E 2 . A more exhaustive study of the overrun problem for Motor 2 (including wiring defects) is shown in Figures IX8. Engineering notation is used.
" If all three timers. Following the procedure detailed for Motor 2 he should be able to check the following solutions: MINIMAL CUT SETS (Overrun of Motor 1) KI • K4 K4·51 K4' KTI K1 • K3 K3 • 51 K3 . that because (K3' K1) is minimal from the relationship (K3 • K 1 • K 1) = (K3 • K I). they might be likely candidates for a socalled "common case" failure. then (K3 • K I • K2). The reader should now try his hand at identifying the minimal cut sets in the other two problems: overrun of Motor 1 and overrun of Motor 3. our ostensible quadruple minimal cut set becomes essentially a "double. is not minimal. In this case. we will further discuss common cause failures.IX8 FAULT TREE HANDBOOK E4 = K3 • (Kl + KTI • KT2 • KT3 + SI) =K3' Kl +K3· (KTI' KT2' KT3)+K3' SI E 3 = K2 + Kl + KT2 + SI E 2 = K5 + K3 • Kl + K5 + K3 • (KTI • KT2 • KT3) + K5 + K3 • SI E 1 = [K5 + K3 • Kl + K5 + K3 • (KTI • KT2 • KT3) + K5 + K3 • S1] • [K2+Kl+KT2+S1] The Boolean expression for E 1 simplifies to: E 1 = {K5 + (K3 • Kl) + [K3 • (KTI • KT2 • KT3)] + (K3 'SI)} • {K2 + K1 + KT2 + S l} Expanding this expression using the distributive law." In later sections. are identical. for instance. for example. All the minimal cut sets above are "doubles" except the last one which appears to be a "quadruple. however. the minimal cut sets (all redundancies eliminated) are: K5' K2 K5 • Kl K5 • KT2 K5 • SI K3' Kl K3 • SI K3 • KTI • KT2 • KT3 Note. KT 1 • KT2 • KT3 MINIMAL CUT SETS (Overrun of Motor 3) Kl • K7 K2 • K7 K7' KT3 K7' Sl Kl • K3 K3 • 51 K3 • KT1 • KT2 • KT3 .
THREE MOTOR EXAMPLE EMF APPLIED TO MOTOR 1 FOR t> 60 SEC IX·9 K4 RELAY CONTACTS REMAIN CLOSED FOR t > 60SEC Kl RELAY CONTACTS FAil TO OPEN WHEN K4 CONTACTS CLOSED FOR t> 60 SEC EMF REMAINS ON K4 COIL FOR t > 60 SEC EMF NOT REMOVED FROM Kl RELAY COIL WHEN K4 CONTACTS CLOSED FORt>60SEC K3 RELAY CONTACTS REMAIN CLOSED FOR t > 60SfC Kl RELAY CONTACTS FAIL TO OPEN WHEN K3 CONTACTS CLOSED FOR t> 60 SEC EMF TO K1 COIL THRU Sl CONTACTS WHEN Kl CONTACTS CLOSED FOR t> 60 SEC EMF TO K1 COIL THRU TIMER CIRCUIT WHEN K4 CONTACTS CLOSED FORt>60SEC KT2 TIMER RESET KTl TIMER CONTACTS fAIL TO OPEN WHEN K4 CONTACTS CLOSED FOR t > 60 SEC K3 TIMER RESET KTl TIMER DOES NOT "TIME OUT" DUE TO IMPROPER INSTALLATION OR SETTING Figure IX6. Fault Tree for Overrun of Motor 1 (Relay Logic Only) .
Fault Tree for Overrun of Motor 3 (Relay Logic Only) .IXtO FAULT TREE HANDBOOK k3 RELAV CONTACTS FAIL TO OPEN TEST SIGNAL REMAINS ON K3 COIL FOR t 60SEC KT1 TIMER RESET Figure IX7.
KT2 & KTJ CIRCUITS + EMF FROM BATTERY 1 APPLIED TO K1 CONTACTS _EMF FROM BATTERY 1 APPLIED TO S1 CONTACTS KTl AND K13 TIMER RESET RESET SIGNAL INADVERTENTl Y APPLIED OR NOT REMOVED FROM SWITCHSI SISWITCH INAOVERTENTl Y CLOSES OR FAILS TO OPEN KT2TIMER DOES NOT "TIME OUT" DUE TO IMPROPER INSTALLATION OR SETTING KT2 TIMER CONTACTS FAIL TO OPEN . K5& K7 WIRE CIRCUITS K3 RELAY CONTACTS FAil TO OPEN TEST SIGNAL REMAINS ON K3COIl FORI>6DSEC +EMF TO K2COIL THRU 51.GROUID AVAILAILE TOIIOTOR 1 TERilIIAL FAULTS TO+ EMF IN K2. KT1.
. EMF FROM BATTERY 1 AVAILABLE TOS1 CONTACTS EMF TO K1 COIL THRU TIMER CIRCUIT WHEN K3 CONTACTS CLOSED FOR t > &OSEC FAULTS TO .JX12 FAULT TREE HANDBOOK .. EMF FROM BATTERY 1 AVAILABLE TO K1 CONTACTS KT1 TIMER DOES NOT TIME OUT DUE TO IMPROPER INSTALLATION DR SETTING KT2T1MER DOES NOT TIME OUT DUE TO IMPROPER INSTALLATION OR SETTING KT3TIMER DOES NOT TIME OUT DUE TO IMPROPER INSTAllATION OR SETTING RESET SIGNAL INADVERTENTLY APPLIED OR NOT REMOVED FROM SWITCH S1 Figure IX9.. SI TIMER CIRCUITRY EMF TO KI COIL THRU SI CONTACTS WHEN K3 CONTACTS CLOSED FOR &0 SEC t> . EMF IN K1 8. Subtree for Transfer "T" in Motor Overrun Problem .
WIRE FAULTS IN K1 RELAY LATCHING CIRCUIT WIRE FAULTS IN K3 RELAY & COMP CIRCUITRY .
IX14 FAULT TREE HANDBOOK E1 E2 E3 E7 Figure IXII. Basic Fault Tree for Overrun of Motor 2 .
what is the probability of x =0. Note that in 4 trials." The probability of this particular outcome is . At the end of each test let us further suppose that the test results allow us to categorize each run unambiguously as "success" or "failure. The Binomial Distribution Suppose that we have 4 similar systems that are all tested for a specific operating time. fourth run a failure. "first run successful." If the probability of success on each run is designated by p (and thus the probability of failure is (1p)). third run successful. 1. with emphasis on some specific distributions that are of special interest to the systems analyst. 3. To introduce this discussion we shall first look at the binomial distribution and shall then proceed to distribution theory in general.4 successes out of the four runs or trials? The outcome collection for this experiment can be easily written down (subscripts 'denote run number. XI. has been developed by one of the authors in the course of teaching statistics to engineering students and engineers in the field. He may fmd it convenient at a later time to review this chapter as the need arises. We shall start with a discussion of probability distribution theory.PROBABILISTIC AND STATISTICAl ANALYSES 1.CHAPTER X . second run a failure. p • (lp) • p • (lp) = p2 • (l_p)2. We shall then attack the basics of statistical estimation. it is possible to have 4 XI . This material will also serve as the basis for future chapters on the quantification of fault trees. although perhaps not in the best tradition of mathematical statistics. The method of presentation. 2. The reader who feels wellversed in statistics and probability may prefer to skip this chapter and proceed directly to Chapter. 2. In this process mathematical rigor has sometimes been sacrificed to expedite the explication of basic ideas. Introduction In this chapter we shall attempt to present those basic elements that are necessary for understanding the probabilistic and statistical concepts associated with the fault tree. and "S'~ and "F" stand for success and failure): SlS2 S 3 F 4 SlS2 F 3S4 SlF 2S 3 S4 SlS2 F 3 F 4 SlF 2S 3 F 4 F 1 S2 S 3 F 4 F1S2 F3 S4 F 1 F 2 F 3 S4 F 1 F 2 S 3 F4 F 1 S2 F 3 F 4 SlF 2F 3 S4 F 1 F 2 S3 S4 A symbol such as Sl F 2S3 F 4 means.
and no successes (x = 0) in only 1 way. If there were n trials and each trial could be classified as "success" or "failure. of Successes x=O x=I x=2 x=3 x=4 No. In general for n trials the number of ways of achieving x successes is n) (x x! (nx)! n' which is the number of combinations of n items taken x at a time. then P[exactly x successes in n trials] = (~) pX (l_p)nx (XI) =b(x. (The probability density form is discussed further in succeeding sections. Three successes in 4 trials can occur in 4 ways (thus the coefficient 4) and if we have had 3 successes. Note also that there are a total of 16 = 24 outcomes.three successes are obtained the total probability is 4p3(l_p). he will see that the individual terms in the probability column of our previous example are generated by equation (XI). we must have had also 1 failure.X2 FAULT TREE HANDBOOK successes (x =4) in only 1 way. n. These expressions in the last column represent individual terms in the binomial distribution. The probability of a particular outcome of three successes and one failure is p3(lp) and because there are 4 possible outcomes in which . of Ways Probability 1 4 6 4 1 1pO (lp)4 4 p l(lp)3 6p 2(1_p)2 4 p 3(l_p)1 I p4(l_p)O Consider the terms in the last column. the probability density is also sometimes called the probability mass function." there would be 2n outcomes. 2 success (x = 2) in 6 ways. Now let us classify these results in a somewhat different way as follows: No.p) in which b (x. 1 success (x = 1) in 4 ways. the general form of which is written as follows: If the probability of success on any trial is p. For example. *For a discrete variable such as x.n. . p) represents the binomial distribution in what is termed probability density form. 4p3(lp) represents the probability of having exactly 3 successes in 4 trials.)* If the reader will set n = 4 and let x range through the values of 0 to 4. 3 successes (x =3) in 4 ways.
The statistical average of the binomial distribution is np and its variance is np(1p).g. It is important for the reader to realize that if a problem comes up and one or more of these assumptions is violated. the probability of having at least 2 successes in 4 trials is. Thus. as a matter of fact. . (3) All n trials are mutually independent. or spread. (2) There are exactly n random trials and n is numerically specified. Thus: Plat most x successes in n trials] = L (~) pS(1_p)ns == B(x. 'defective/nondefective). All distributions and. all mathematical formulae are characterized by underlying assumptions and restrictions. These terms are discussed further in subsequent sections. It is of the utmost importance to list these assumptions explicitly: (1) Each trial has only one of two possible outcomes. n. We have called these "Success"/"Failure" but they could just as well be called anything else (e.. and [41]. often in the form of equation (X2) but also occasionally in the form of equation (X3) and sometimes in the form of equation (XI). The average is a measure of the location of the distribution and the variance is a measure of its dispersion. We have made a number of assumptions (tacitly so far) in our own use of the binomial.p) is the binomial distribution in cumulative distribution form (the cumulative distribution form is discussed further in upcoming sections). The binomial distribution is extensively tabulated. At this stage we may simply interpret the cumulative binomial as giving the probability that the number of successes is less than or equal to some value. and p remains constant throughout the sequence of trials.PROBABILISTIC AND STATISTICAL ANALYSES X3 Questions regarding the probabilities of having at most x successes in n trials and at least x successes in n trials can be answered by summing the appropriate individual terms. [33]. n.p) s=O x (X2) Plat least x successes in n trials] = L (~) pS(1p)ns s=x (X3) n where B (x. then use of the binomial distribution will be questionable unless the effect of the violation is investigated. returning to our example. and a valid use of these distributions or formulae involves a knowledge of what the associated assumptions and restrictions are. See for example References [1]. (4) The probability of "success" (or whatever you want to call it) on any trial is designated by some letter such as p.
X4
FAULT TREE HANDBOOK
Let us now return to our list of assumptions and consider them in more detail. What can be done in case one or more of these restrictions happens not to be true? We will look at only a few cases to indicate the possible violations to the assumptions. (1) What if each trial has more than 2 possible outcomes? For instance, in certain testing methods, 3 decisions are possible after each trial: accept the lot, reject the lot, continue testing. If we are drawing single chips from a container holding a mixture of white, green, red, blue, and yellow chips, then each trial has 5 possible outcomes. This case does not pose a serious problem, for, instead of using the binomial distribution, we simply use its generalization, which is known as the multinomial distribution and which is adequately discussed in many statistics texts (see, e.g., reference [52]). When applicable, we might also lump the outcomes into "success" and "failure" and use the binomial on this more gross categorization. (The probability of "success" would be the sum of the probabilities of the outcomes categorized as "success.") (2) Now suppose that the number of trials, n, is not known, but only the number of successes. For example, we may agree to throw a single die until the number "5" comes up. It is not known, a priori, how many tosses will be necessary. Or we may agree to test similar relays until one fails. Again the number of relays tested is not known. Under such conditions (i.e., whenever a number of trials are performed until a preassigned number of successes are obtained) we cannot use the binomial distribution, but another distribution closely related to the binomial and called the negative binomial distribution is available (see reference [13]). The negative binomial 1)(x; k,p) gives the probability that the k th success occurs on the x th trial and is written,
~(x; b,p)= (~=D pk (l_p)xk
(X4)
(3) If the outcomes of the n trials are mutually interdependent (i.e., the (x+l)th outcome depends in some way on the xth outcome and possibly on preceding outcomes as well), a number of difficulties arise. Some sort of conditional probability representation is required. The probability of a specific sequence of outcomes would depend on the order of occurrence ("serial outcome space") and each different sequence would have a different probability. For example, it would generally be invalid to use the binomial distribution to estimate the probability that the daily weather be rainy or not rainy because weather patterns often persist for days or even weeks at a time and what happens on Wednesday is dependent on what has happened on the preceding Tuesday. If independence is in doubt, there are statistical tests to check independence before the binomial is used (see reference [11]). (4) If the probability of "success" (or whatever) changes throughout the series of trials, where samples are chosen from a fIXed population without replacement, the socalled hypergeometric distribution can be employed (see reference [52]). This distribution is:
(X5)
PROBABILISTIC AND STATISTICAL ANALYSES
xs
where
a = number of items exhibiting characteristic A in the population b = number of items exhibiting characteristic B in the population a+b = N = population size or lot size n = size of sample drawn from the population x = number of items exhibiting characteristic A in the sample.
For example, characteristic A may be "defective" and characteristic B, "nondefective." Here h (x; n, a, b) gives the probability that exactly x of the n sample items exhibit characteristic A. Use of the hypergeometric distribution is necessary when sampling of a small population is carried out without replacement. ("Small" means that N and n in equation (X5) are comparable in size.) For instance, if we receive a shipment of 50 transistors, 10 of which are defective, the initial proportion of inoperative ones is 1/5, but this proportion will change as we withdraw a sample of, say, 20 without replacement. As the reader will observe from equation (X5), use of the hypergeometric frequently entails complicated arithmetical calculations involving factorial numbers, and for this reason, the binomial is often used in these cases to give approximate results. The binomial distribution provides a good approximation whenever N ;;;. IOn where N is the population (or lot or batch) size and n is the sample size.* In this case a/n is taken as the approximate value of p. As a specific example of the use of the binomial distribution, consider the following problem: ABC Corporation mass produces resistors of a certain type. Past experience indicates that one out of a hundred of these resistors is defective. Therefore, the probability of obtaining a defective resistor in a sample is p = 0.01. If a sample of 10 resistors is selected randomly from the production line, what is the probability of its containing exactly one defective resistor? We have: x= l,n= 1O,p=0.01 and b(x = 1; n = 10, p = 0.01) = (\0) (0.01) (0.99)9 If tables of the cumulative binomial distribution are available, this can be most readily evaluated by fmding B(I)  B(O) because B(1) = prO or 1 defective resistors], and B(O) = P [exactly 0 defective resistors] . B(I)  B(O) = 0.9957  0.9044 = 0.0913 By making calculations similar to the foregoing we can now draw a distribution function giving the probability of finding x = 0, 1,2,3, ... , 10 defective resistors in the sample of 10. This distribution is shown in Figure XI. The continuous curve in Figure XI is not really legitimate because the binomial distribution is inherently discrete, but this kind of continuous interpolation serves to make the general shape of the distribution visible.
*Some authors recommend a somewhat less conservative rule of thumb of N ;;;. 8n.
X6
FAULT TREE HANDBOOK
1.0 0.9
U)
:i:_
WCl
1
u.. wo :::w 1...1 uCl. w:lE u..e:t wu) °2 x~
0.5
P(O) = 0.9044 P(1) = 0.0913 P(2) = 0.0042 P(3 OR MORE) ;; 0
0.1 0 1
2
3
4
5
NUMBER OF DEFECTIVE ITEMS
Figure XI. Distribution of x Defective Items in a Sample of 10 when p = 0.01 In safety and reliability evaluations, the binorriial distribution is applicable to systems featuring redundancy if each redundant component operates independently and if each redundant component has the same (or approximately the same) probability of failure. For example, suppose we have n redundant components and, if more than x of these fail, the system fails. Let p be the probability of component failure in some designated operating period. Tbe probability that the system will not fail is then the probability that x or fewer components fail and this is just the binomial cumulative probability B(x; n, p). Alternatively, consider a situation in which n events are possible and, if more than x of these events occur, we have an accident. If the n events are independent and if they all have approximately the same probability of occurring, the binomial distribution is again applicable. In general, the binomial distribution may be used when we have n repetitions of some event ("n trials") and we desire information about the probability of x occurrences of an outcome, fewer than x occurrences of an outcome, or more than x occurrences of an outcome, always presuming that the assumptions previously listed are satisfied. The "n trials" can be n components, n years, n systems, n human actions, or any other appropriate quantity. We shall return to the binomial distribution shortly because two of its limiting forms are of special interest to us. For the moment, however, we will discuss distributions and distribution parameters in general.
PROBABILISTIC AND STATISTICAL ANALYSES
X7
3.
The Cumulative Distribution Function
Let us use the symbol X to designate the possible results of a random experiment.
X is usually referred to as a random variable* and it may take on values that are
either discrete (e.g., the number of defective items in a lot) or continuous (e.g., the heights or weights of a population of men). Actually, the latter category of values is also strictly discrete because the measuring apparatus employed will have some limit of resolution. It is convenient mathematically, however, to consider such values as representing a continuous variable. It will be convenient to use the corresponding lower case letter x to designate a specific value of the random variable. The fundamental formula that we are about to present will be given for the continuous case with detours here and there to point out differences between the continuous and discrete cases whenever it seems important to do so. In general, it is operationally a matter of replacing integral signs with summation signs. The cumulative distribution function F(x) is defined as the probability that the random variable X assumes values less than or equal to the specific value x. F(x) = P[X ~ xl (X6)
According to equation (X6), F(x) is a probability and thus must assume values only between (and including) zero and one:
o ~ F(x) ~ 1
If X ranges from 00 to +00, then
F(oo) = 0 F(+oo) = I
If X has a more restricted range, Xl < X < xu' then
F(XI) = 0 F(x u ) = 1
An important property of the cumulative distribution function is that F(x) never decreases in the direction of increasing x. F(x) is a nondecreasing function although not necessarily monotonically so, in the strict mathematical sense. This can be stated more succinctly as follows:
One further important property of F(x) is stated in equation (X?) below. ** (X?)
*Random variables will be denoted in this text by boldface type. ** For discrete variables the formula is P[ x 1 < X " x2] = F(x 2 )  F(x l )·
1 1 o CONTINUOUS VARIABLE CASE o DISCRETE VARIABLE CASE Figure X2. T.... The cumulative distribution F(x) gives the probability that the measurement value is less than or equal to x. As another example consider a random experiment in which we are performing repeated measurements on some item. Typical Shapes for F(x) The binomial cumulative distribution B(x. consider a random experiment in which we observe times to failure of a single component. The random variable X represents the "measurement outcome" in general and Xi represents some specific measurement value.. As an example of a random variable and its corresponding cumulative distribution. Typical shapes for F(x) for a continuous variable and for a discrete variable are shown in Figure X2. we repair it. in every instance its repair state coincides with its initial operating state.. The properties of the cumulative distribution function that we have presented in the above formulae are generally valid for continuous and discrete random variables. The cumulative distribution F(t) for any t thus gives the probability that the time to failure will be less than or equal to t. set t = 0 and note the new time to failure (see sketch below). In application.. n.. Whenever the component fails. the cumulative distribution function must either be determined from theoretical considerations or be estimated by statistical methods. . We could estimate values of F(x) from Fest (~) where in which n gives the total number of measurements and ni is the number of measurements in which X assumes values less than or equal to Xi' As n gets larger and larger Fest (Xi) will approach more and more closely the true value F(~). FAILURE 1 FAILURE 2 FAILURE 3 FAILURE 4 FAI LURE 5 Let us assume that repair ("renewal") does not alter the component. The random variable of interest. p) encountered in section 2 is one specific example of F(x). We represent a specific value of T by the symbol t i . is the time to failure from renewal or repair.X8 FAULT TREE HANDBOOK .. Le.
. (In the figures x increases as we move to the right). the result is unity. The quantity f(x) itself is therefore the probability per unit interval. probabilities must be expressed in terms of intervals. f: f(x)dx= I (XII) This property of f(x) permits us to treat areas under f(x) as probabilities. especially useful. (b) a distribution skewed to the right. In the case of a continuous variable.PROBABILISTIC AND STATISTICAL ANALYSES X9 4. Thus f(x)dx is the probability that the quantity of interest will lie in the interval between x and x + dx. Of course. f(x)dx = P [x < X < x + dx] (X12) Our previous equation (X?) can now be stated in another. the interval length dx may be made as small as we please. is obtained from F(x) by a process of differentiation: * f(x) = dx F(x) An equivalent statement is. The fundamental meaning ofthe pdf is stated in equation (XI 2). d (X8) F(x) = LX ~ f(y)dy (X9) Because f(x) is defined as the slope of a nondecreasing function. and (c) a distribution skewed to the left. In the case of *We assume F(x) is wellbehaved and allows differentiation. form: (Xl3) Typical shapes for f(x) are illustrated in Figure X3 in which (a) represents a symmetrical distribution. The Probability Density Function For a continuous random variable. the probability density function (pdf). f(x). This is because the probability associated with some specific value x is always zero because there are an infinite number of values of X in any range. we must have f(x) a (XI 0)" When the pdf is integrated over the entire range of its argument.
for example.5 0' and 50% of the time it will be greater than x. ~ni is the number of 5. Equation (Xl3). For this reason such a parameter is called a location parameter.90' . Distribution Parameters and Moments The characteristics of particular probability density functions are described by distribution parameters.50..50) = .s. to the right). The most common location parameter is the statistical average. In (a) the median is indicated by x. the 90% percentile is such that F(x. if we consider a large number of measurements.50 and. The median is a particular case of the general apercentile. Typical Shapes for f(x) discrete variables we replace integral signs by summation signs and sum over the individual probabilities of the xvalues in the range of interest. F(x. f(x)dx could be estimated by f(x)~= ~ni n where n is the total number of measurements and measurements for which X lies between x and x + ~x. xo:' defined such that F(xo:) =a. the midrange.50' Therefore p(X. 50% of the time the outcome will be less than or equal to x.50' From the definition of the median. With regard to our previous time to failure example. In the measurement example f(x)dx gives the probability that the measurement outcome will lie between x and x + dx. in terms of the cumulative distribution. the other 50%. then becomes applicable for a discrete variable X.90 and 90% of the time the outcome value x will be less than or equal to x. One variety of parameter serves to locate the distribution along the horizontal axis. x. which locates the "peak" or maximum of the probability curve (there may be no "peak" at all or there may be more than one as in bimodal or trimodal distributions). For an illustration of these concepts refer to Figure X4. and others of lesser importance.. Other location parameters commonly employed are: the median (50% of the area under the. the mode.XtO FAULT TREE HANDBOOK (a) (b) (e) Figure X3. f(t)dt gives the probability that the component will fail in the time interval between t and t + dt. Empirically. For example.90) = .50) =. probability density curve lies to the left of the median. which is simply the average of the minimum and maximum values when the variable has a limited range.
They are.50 (AI MEDIAN (SI MODE (C) MIO·RANGE·. Two Symmetrical Distributions with Different Dispersion Parameters . Mode. In Figure XS we see two symmetrical distributions with the same values of mean. less frequently employed. many other types of distribution parameters. and mode all coincide. the mean. the median will fall between the mode and the mean. and mode. We will calculate that variance of the distribution in a later section. Of these. In (c). this empirical average will approximate the true average and will approach the true average more and more closely as the number of repetitions is increased. the mode is indicated by x m and gives the most probable outcome value. median. Other dispersion parameters. If we repeat a random experiment many times and average our outcome values. Parameters used to describe this aspect of a distribution are known as dispersion parameters. and MidRange In (b) of Figure X4. we see how the midrange is calculated from the two extreme values xl and x2' The average is also termed the mean. X mr = x1 + X 2 2 Figure X4. It is essential that we become familiar with the general methods for calculating distribution parameters once the functional Figure X5. but those we have mentioned are some of the most basic ones. There are. strikingly different from the standpoint of how the values are clustered about the central position. The Median. For skewed distributions. median. the most familiar are the variance and the square root of the variance or standard deviation. (We assume the distribution has an average and is such that the empirical average converges to the population average. are the median absolute deviation and the range between a lower bound value and an upper bound value.) In the case of a symmetrical distribution as shown in (a) of Figure X3. as in Figure X3 (c). or the expected value.PROBABILISTIC AND STATISTICAL ANALYSES Xll ><0. in fact. however.
l.l~ =foo xn f(x)dx (X16) and gives us the expected value of xn. and (b) moments about the mean. (a) Moments About the Origin The first moment about the origin is defmed as J. We will use the symbol J. The second moment about the origin is defined as J. written E[X].ll = I: x f(x)dx (XI 4) This gives us the mean or expected value of X. In general.ll is always and invariably equal to 0.X12 FAULT TREE HANDBOOK form of the pdf is known.l2 = J: x 2 f(x)dx (XIS) and gives us the expected value of X2. it is not of great utility. .l for the mean for the sake of brevity although actually E [X] = J. Distribution moments may be calculated about any specified point but we shall restrict ourselves to the discussion of (a) moments about the origin. the expected value of g(X) may be obtained from E[Y] =E[g(X)] = I: g(x) f(x)dx (XI?) (b) Moments About the Mean The first moment about the mean is defined as (XI 8) Because J. If Y =g(X) is any function of X. E [xn] . and X is distributed according to the pdf f(x). Some of these general methods entail calculating the moments of the distribution and are of great importance in theoretical statistics. the nth moment about the origin is defmed as oo J. E [X2 ] .
n = (xp. and J." namely lIn.=1 x n n L. . the nth moment about the mean is defined as P.l2.)2 f(x)dx.: x· i=l l' is a special case of equation (X22) which can. (X21) Equation (X21) permits us to calculate the variance by evaluating the integral in (XIS) rather than the integral in (X19) which is more complicated algebraically.2 = I: [00 (xp.ll ' namely. in the case of a single die we have .PROBABILISTIC AND STATISTICAL ANALYSES XI3 The second moment about the mean is defined as P.)2J.2. the first moment about the origin is written p.l2 =1: (x _J. = p. In general. (X19) This gives us the variance a2 or E[(Xp. J. Thus.i = L.: i=l n xi P(xi) (X22) where P(xi) is the probability associated with the value xi" The commonly used formula for fmding the average of n values.)nJ. Equation (X21) is easily proved as follows: J. There is a useful relationship between P.l)2 f(x)dx In the case of a discrete random variable.)n f(x)dx (X20) and gives us E [(Xp. be used whenever all the individual values are treated as having the same probability or "weight.
X·14 FAULT TREE HANDBOOK Il = Il' = I + 2 + 3 ~ 4 + 5 + 6 = 2i = 3.. Also... The question of bias is discussed later in this chapter. The bias is eliminated by multiplying (X23) by the factor n/(nI).. Because any value is equally likely..1l)2 P(Xi) (X23) In case all the xi have the same "weight".5 despite the fact that such an outcome is impossible in practice. .. . x a b Figure X6. L... f(x) f o ~." For large n the difference is slight.. a constant. Consider the rectangular pdf shown in Figure X6 in which any value between a and b is equally likely...!. if the random variable is discrete. f(x) = f o. the second moment about the mean (the variance) assumes the form: 112 = L i=l n (xi .... The area under a pdf must integrate to I and hence we have 1 Area = f o (ba) = 1 so that fO = ba . * (X24) We conclude this section with a simple example of the use of distribution moments.l.. "Bessel's correction. equation (X23) reduces to a sampling n formula for computing the variance of a sample of n readings..5 so that the expected value is 3............ The Rectangular Distribution *(X23) provides a biased estimate of the population variance a 2 ..
For example.CIlJ)2 =[b x2 (b~a) dx = (b~a) [x:J~ = (b~a) (b 3 (b~a r 2 (b. ba L2 a 2(b .a) 2(b .a)(b + a) = b + a . Limiting Forms of the Binomial: Normal. Poisson Several important distributions can be obtained as limiting forms of the binomial distribution.2ab + a2 = (ba)2 12 12 6. We read the above as the limit when p is ftxed and n goes to infmity.ll. Omitting the mathematical details. I b x . a)= ~ a exp [HX :IlY] (X25) *We suspect that the normal distribution is sometimes called Gaussian in commemoration of the fact that it was one of the few things Gauss didn't discover! Actually it was first investigated by de Moivre.a) 2 The variance of the distribution is given by Var = a2 = 112 = 112 .1 dx ba a =_1_ rx2Jb = b 2 .a2 = (b .a) 3 . consider lim p fixed n + ex> [(~) pX (1  p)nx] . = J.I = E(X) = .I. this process leads to the famous normal or Gaussian* distribution f(x.PROBABILISTIC AND STATISTICAL ANALYSES The mean of this distribution is given by XIS J. . a ) _ (b:a) 2 = b 2 .l.
to fInd a transformation which has the effect of standardizing IJ. which neatly cancels the a in the original coeffIcient 1/(.l) a z (x = 1 = J. The normal distribution is extensively tabulated but not in the form (X25).l + a) Figure X7. It is possible.u) t z (x z (x I = z z. It is.l. A graph of fez) is shown in Figure X7. The transformation from one distribution to another requires. f(z) a . and (J and would be impossibly unwieldly.. a factor known as the Jacobian of the transformation (see reference [25]). to 0 and a to I.. A direct tabulation of (X25) would require an extensive coverage of values of the parameters IJ.. .X16 FAULT TREE HANDBOOK where IJ. Suppose that an arbitrary point zl is a point of interest (see Figure X7). =0 = J. however. = +1 J. of course. Some tabulations record the area from zl to 00 and still others. This transformation is z=(J xIJ. Of primary interest to us is the area under the curve included between two points on the horizontal axis. the area from zl to the origin. (X26) and the corresponding distribution in terms of z is fez) =_I_ e z2/2 V2i (X27) which is known as the standardized normal distribution and which forms the basis for all tabulations. The reader should remember that such an area can be treated as a probability because the total area under the curve is unity.fI/i u). In the present case the Jacobian is just a. The Standardized Normal Distribution Several characteristics of the standardized normal distribution are tabulated. The reader should note that the passage from equation (X25) to equation (X27) via equation (X26) is not a matter of simple substitution. among other things. and a are the mean and standard deviation respectively.. . Some tabulations record the area under the curve from zl to +00 (shaded in the fIgure).
9050 is z= 0. which can be skipped by the experienced.0030 1.9000 inch ± 0. This is the probability that X ~ 0.68 and the area from /l. Then the area of both tails is 2(0.0475.0050. The widths of slots in forgings are normally distributed with mean /l = 0. This is the probability of fmding a slot width that is outside the specifications. Therefore. a simple numerical example will be given. that is consistent with such a rejection rate? . 0 measures the distance from the mean /l to the inflection point of the curve.95. What is the maximum allowable 0.67] is 0. in terms of the original variable X. the area under the pdf curve from /l.9050 . For the normal distribution.9050. we could lower the rejection rate by controlling production more carefully so that 0 is reduced. the righthand tail area for P[Z ~ 1.0950. f(x) 0. say 0'. If no change in the specifications is possible.9000 0. 9.67 From standard normal tables. This rejection rate is fairly high. Nonetheless.001.0. It is assumed that the reader already has some familiarity with the normal distribution and its tabulations.0475.PROBABILISTIC AND STATISTICAL ANALYSES X17 essential to ascertain what area is being tabulated before the tables can be used intelligently.0475) = 0.8950 0.0 to /l + 0 is approximately 0.0030 inch.5% of the total production will be rejected.9000 inch and standard deviation 0 = 0. what percentage of the total output will be rejected? The rejected forgings will be those widths that fall into either one of the shaded regions indicated in the sketch. For X. By symmetry the lefthand tail area is also 0.9050 x The value of z corresponding to x = 0.900 0. If the specification limits (acceptance limits) are 0. Suppose that our goal is a rejection rate of 1/1000 = 0.20 to /l + 20 is approximately 0.
e. From z =(xJ. There are several reasons for us to be interested in the normal distribution.3O.005 30·00152'mch .3. The expected number of occurrences of .90503. = 3. The outcome of the limiting process is mX ( np)X _ _ e.001/2 =0.001 the area in each tail (see sketch) must be 0. the value of z that cuts off a tail area of 0.g. 4546. (The mathematical details of the limiting process can be found in many texts.L)/z. Another is that the normal distribution provides a fairly good statistical model of many wearout processes. is obtained as the following limiting form of the binomial distribution: lim n + n+O 00 {(~) pX (I_p)nx} (X28) In equation (X28) the limit.np = _ em x! m! (X29) where m = np.0005 is 3. Another critically important distribution.L)/a we find a' =(xJ.) Equation (X29) gives the probability of exactly x occurrences of a rare event (p ~ 0) in a large number of trials (n ~ (0).001.00152 inch.~OOO =0. One is that the mean of a large number of measurements tends to be normally distributed regardless of the underlying distribution of each measurement (from the Central Limit Theorem). the maximum allowable value of a is 0. Therefore.is taken in such a way that the product np remains finite.X18 FAULT TREE HANDBOOK If the rejection rate is 0. th us a ' =0. From tables. for a rejection rate of .: reference [32]. pp.0005. from the standpoint of systems and reliability analysis.
" Now. the parameter of interest is time.PROBABILISTIC AND STATISTICAL ANALYSES X19 the event is np = m. on the average. Application of the Poisson Distribution to System FailuresThe SoCalled Exponential Distribution Assume we have a given system which is in steadystate. What is the probability of finding exactly one defective unit in a random lot of 10 (n = 10)? The exact result can be found by using the binomial distribution: b(1. For example. but because it describes the behavior of many rare event occurrences. 10. The result is: ° P [0 occurrences of system failure] =em where m = expected number of system failures in a large number of "trials. when it is not in the process of being burnt in and is it not wearing out.1).3874 The Poisson approximation yields the result 0. We are specifically interested in the probability of exactly occurrences of system failure. Let the event of interest be system failure. We will now further discuss these system applications. We thus seek a way of somehow expressing m in terms of time. If we operate this system for .0. we will assume that when the system fails. Thus.1 (p = 0. We say the mean time to failure (0) is 50 hours. It turns out that this can be readily done. 7. If we increase the random lot size to 20 (n = 20). it is restored to a condition that is essentially as good as new and that the repair time is negligible. The Poisson distribution provides a fairly good approximation to the binomial even though p is not particularly small and n is not especially large. As can be shown by the method of moments (although there are easier ways) that the mean and variance of the Poisson distribution are both numerically equal to m. regardless of their underlying physical process.2707 The Poisson distribution is important not only because it approximates the binomial.1)=0. as far as failures are concerned. suppose that in a mass production process the probability of encountering a defective unit is 0. a failure occurs every 50 hours. Further.0.2702 Poisson . the agreement is even closer: binomial . Assume we have data for this system and. The distribution given in (X29) is known as the Poisson distribution. applying the Poisson distribution we are led to set x = 0 in equation (X29).0.3679 which does not differ greatly from the exact value. The Poisson distribution also has numerous applications in describing the occurrence of system (or component) failures under steadystate conditions.
are widely used in system analysis and reliability.! (l dt dt eXt ) (X32) f(t) = AeXt The pdf in equation (X32) is generally referred to as the "exponential distribution of timetofailure. Hence we have (X3D) The probability of system failure prior to time t is given by the cumulative distribution function F(t).F(t) and F(t) = 1 ." . The use of the binomial distribution is restricted by a number of assumptions that we took care to list earlier in this chapter. cumulative distribution. The reliability. given by equations (X3D). be extremely cautious in applying equation (X3D) to the calculation of system reliability for the follOWing reasons. R(t) = eXt = 1 . Using the symbol t for operating time we have in general: Expected number of failures in time t =t/O =At if A = I/O =em =et / 8 =eXt. The exponential is an especially simple distribution. we expect to experience 2 failures.X20 FAULT TREE HANDBOOK 100 hours. But m = expected number of failures. The reason is clear. a trial is an "opportunity to fail in some interval 0 f time. the failure rate or 0.! F(t) =. but one of them remains untouched: the assumption that all "trials" are mutually independent. We must. R(t). the mean time to failure) must be determined empirically. Only one parameter (A. We saw that equation (X3D) arises from the Poisson distribution. f(t) (X. (X31).ext .31) =. Translated into failure language.. is by defmition the probability of continuous successful operation for a time t. and pdf. The pdf corresponding to equation (X31) can now easily be found. Therefore. therefore. 100/50 = 2. and (X32).however. we have. The system either fails prior to time to or it does not. The latter is obtained as a limiting form of the binomial distribution. that is. P[D occurrences of system failure] Now the reliability of a system." Also. respectively. Some of these assumptions have been modified in the limiting process. the expression (X3D) is sometimes referred to as simply the exponential distribution.
by definition. t + At) is the same as the probability of failure in any interval of the same length (given no failure has occurred up to that time). If the system is not able to be repaired. Let us define PI (t) = probability that system is in state E I at time t poet) = probability that system is in state EO at time t and let us assume that at the start of time (t = 0) the system is in its operating state EI · Now PI (t + At) represents the probability that the system is in state E 1 at time t + At. the limiting form of the lefthand side of the equation is just.PROBABILISTIC AND STATISTICAL ANALYSES X21 When system failures are repairable. according to our initial assumption (probability of failure in a given interval is a function only of interval length).. given no failure up to time t.the length of that interval. Thus ~~ . At = dtPI(t)=PI(t)=API(t) . Specifically. the derivative of PI (t) with respect to t. because we start operation at t = 0. Consequently (1 . the probability of failure in the interval (t. For the exponential distribution. a nonrepairable system that can exist in one of two states: E I system operating. At). Consider. Another way of characterizing the failure processes is as follows. our interpretation of the probability behavior must be modified to the following: given no prior failure. it is the same as the probability of failure in the interval (0. Eosystem failed. does not fail) in time interval At. then our interpretation of the independent trial assumption is as follows: the probability of failure in some future interval is a function of only the length of the interval and is independent of the number of past failures. Thus. if we have had a failure at an earlier time. which is another description of the exponential. we are "as good as new" at time t. the quantity ALit gives the probability that the system makes a transition from state E 1 to state Eo in the time interval At and that A is some constant (the failure rate).AAt) gives the probability that the system does not make a transition from E I to EO (i. [PI(t+ At)PI(t)] d . the probability of failure in some future time interval is a function only of. we can derive the exponential distribution from that assumption alone. for example. We can write where. Algebraic rearrangement yields the following difference equation: If we allow the length of the time interval to approach zero (At*O).e. then the probability of the failure existing at a later time is 1 and the probability of the failure occurring at a later time is 0 because it has already occurred. If we start with the assumption that the probability of failure in a certain interval is a function only of interval length. We need the restriction "given no prior failure" because. for a nonrepairable component.
We now have the differential equation. because Po (t) + PI (t) = 1. f(t). called the failure rate function: A(t)dt = P [failure occurs between t and t + dt I no prior failure]. a convention originally proposed by Newton.I . (X33) For any general distribution. 8. The Failure Rate Function Recall from previous sections that F(t) = P[failure occurs at some time prior to t] and that f(t)dt = P[failure occurs between t and t + dt].R(t) = F(t). A(t). _ f(t) A(t) . Also. and" F(t).X22 FAULT TREE HANDBOOK in which the prime stands for differentiation with respect to time. t t [In PI (t)] 0 = [. This is easily integrated if we remember our boundary condition PI (t=O) = 1.At] 0 which is just the reliability of the system. ei\t = I .F(t) . We now define a conditional probability. there is an important relationship between the three functions A(t). (X34) . we have poet) = 1 The corresponding pdf is f(t) = Aei\t and we recognize the exponential distribution. Pi(t)=API(t).
T is a random variable according to the defmition given in equation (X. We remember from basic probability theory that in general. which is just equation (X. _ P [(t < T < t + dt)n(t < T)] ( t )dt X pet < T) Now event A is just a special case of event B. X(t)dt = P [t < T < t + dt It < T]. P (A I B) =peA n B) PCB) . Let (t < T < t + dt) be denoted as event A and (t < T) be denoted as event B._ t \I III Figure X8. the curve shown in Figure X8 results. Le. In set theoretic notation. AnB = A. ACB (A is a subset of B) and. It follows that X(t)dt= P [t<T<t+dt]=P[A] = f(t)dt." A(t) """""'.PROBABILISTIC AND STATISTICAL ANALYSES X23 The validity of equation (X34) is easily demonstrated as follows. when A occurs then B automatically occurs. t for a General System . Let us designate the time at which failure occurs as T. If A(t) is plotted against time for a general system.34).33).. Plot of A(t) vs. P(t<T) P[B] IF(t) Finally.F(t) . A(t) f(t) 1 . For obvious reasons this relationship between A(t) and t has become known as the "bathtub curve. Therefore. under these circumstances.
the exponential Region II may be entirely missing or the burnin region may be negligible. For an actual system.34) explicitly for both F(t) and f(t). the X(t)vs.F(t) = exp~t X(X)dX} (X36) and therefore F(t) =1  exp~t X(X)dX]. Region I is termed the region of "infant mortality" where sometimes an underlying distribution is difficult to determine. The distribution appropriate to this part of the curve may depend quite critically on the nature of the system itself. it is convenient to solve (X. This is accomplished by writing equation (X34) in the form X(t)dt where F(t) F' ( t ) = d dt. Manufacturers will frequently subject their product to a burnin period attempting to eliminate the early failures before lots are shipped to the consumer.t curve may be quite different from that depicted in Figure X8. which is equivalent to 1. II. If we differentiate equation (X36). we obtain f(t) = X(t) exp~t X(X)dx} =X =constant in (X3?) (X36) The reader should convince himself that if we put X(t) and (X3?)." and is the region of chance failures to which the exponential distribution applies. Integrating both sides of equation (X35) we have =_ [F'(t) dt] 1 . For example. Region II corresponds to "a constant failure rate function.F(t) (X35) L o t X(x) dx =In [1 . Returning to the failure rate function equations. we obtain .X24 FAULT TREE HANDBOOK This curve can be divided into three parts which are labeled I. III in Figure X8. Region III corresponds to a wearout process for which the normal distribution often provides an adequate model.F(t)].
the exponential distribution is often referred to as the "random failure distribution. 9.e. This relation is true for t ~ 8.1) whence f(t) = kt m exp (and R(t)=IF(t)=exp ( ktm+l) m+1 . Because of the constancy of the failure rate function in this case. Thus. the failure rate function is a constant (independent of t) and A(t)dt = P [failure occurs between t and t + dt I no prior failure] = Adt. we are assuming that we are on the constant. p.. If we choose to describe a component failure distribution with the exponential distribution. When m is less than 0 but greater than 1. or III of the bathtub curve.). For the exponential distribution. It is also valuable to note that if we use et / e to represent reliability. i. if A(t) = kt (linearly increasing failure rate) we find that R(t) = 1 .xt and f(t) = ~£Xt.. the burnin portion of certain bathtub curves may be modeled.PROBABILISTIC AND STATISTICAL ANALYSES F(t) = 1 . Appendix D. Equations (X36) and (X37) may be used to investigate a wide variety of failure rate models. where k is known as the scale parameter and m as the shape parameter. the Weibull distribution. and [36] . is obtained by putting ACt) = Kt m (m> . [32].. the probability of future failure is independent of previous successful operating time. 137138. X2S which is just the exponential distribution. An important distribution of timestofailure.e. II. by changing the value of m. then. we are being conservative even if wearout is occurring (but no bumin). we consider the follOWing example. f( t) approaches normality.F(t) = exp(kt 2 /2) which is known as the Rayleigh distribution. R(t) . et/e where R(t) is the actual reliability and 0 is the actual mean time to failure.. we can use the Weibull distribution to embrace regions I.. 190 et seq. An Application Involving TimeToFailure Distribution The concept of a distribution of timestofailure is an extremely important one and in order to impress this more thoroughly on the reader's mind. For m = 0 we have the exponential distribution and as m increases a wearout behavior is modeled. steady state portion of the bathtub curve with no burnin or wearout occurring. (For a proof of this see Reference [15]). When m increases to 2. For instance. ktm+l) m+ 1 The Weibull distribution is a twoparameter distribution.e. . The reader can find a more detailed description of the Weibull distribution in the literature of reliability (see reference [23] pp." i.
5 correspond to negative values of time. RA(t)=e tie A RA (10) = elO/lOO = eO.X26 FAULT TREE HANDBOOK Similar components are being purchased from two manufacturers. Now we consider the components from manufacturer B. the area in the tail corresponding to z =2.988 Thus for Manufacturer B the reliability is 98. of 100 hours (OB = 100 hrs. we need to find a value of the transformed variate z corresponding to t = 10 hours. From the above example we note the 'differ~nce between R A and R B despite the fact that 0 A = 0B' This difference arises because of the difference in the failure distributions. By way of compromise.) and states that the distribution of timestofailure for his components is exponential.0% but for t = 200 hours. the sample variance.62%. R A = 36. The importance of the sample's being random will be discussed shortly. R A = 13.5% and R B = 0. 10. and so forth.8%.01222 = 0. we take a random sample from the population. such as the sample mean. A and B.905 Thus for Manufacturer A there is a 90. eventually the exponential will give higher values of reliability than the normal. First we consider the component from manufacturer A. the sample median. This is a very large population.) but states that the appropriate distribution of timestofailure in his case is normal with a mean of 100 hours and a standard deviation of 40 hours.5 is extremely small and we ha. Manufacturer A claims a mean life of 100 hours (0 A = 100 hrs. From the sample we can estimate any sample parameters that may be of interest. B also claims a mean life. 1 = 0.0. * Thus RB(t = 10) = 1 . z== uB tO B 10100 40 =2. The distribution here is normal. Statistical Estimation Suppose that we were engaged in a study of the heights of men between the ages of 20 and 30 in the greater Los Angeles area. For both types of components let us attempt to compute the reliability for 10 hours of operation. this would assuredly prove to be physically impossible. However.01222 (from standard normal tables) and represents the probability of failure prior to t = 10 hours. The reader may care to calculate the value oft that gives RA = R B .25 This value of z cuts off a tail area of 0. The problem i~ simply *This is not exactly correct because values of z less than 2.ve not considered it necessary to make the small correction involved.5% reliability. As t increases. . and although we might be eager and willing to measure every member of the population. for t = 100 hours.8% and R B = 50. For instance. Somewhere between t = 100 hours and t = 200 hours both distributions give the same value for reliability.
In this case you might remain blissfully unaware that the crate had been dropped in transit and that every component at the botton of the crate had been smashed. polling was carried out largely by means of telephone. in fact. Roosevelt or Mr. 12. Sampling Distributions Suppose we take a random sample of size n from some population and compute a sample mean xl' where Xl =~ L:: i=1 n Xi' We could now take a second sample of size n *The symbol (j is added to designate an estimator of some population parameter (). In the depressed economic state of the country in those days. we have to know precisely what is meant statistically by statements like: "1i'a is a good estimator of population parameter 0" "1J'b "9b is a best estimator of population parameter 8" is a better estimator of e than is '8'c"* These matters will be taken up in Section 13. specifically the sampling distribution of the mean. The poll predicted a Landon victory whereas. most of the telephones were owned by wealthy to moderately wealthy Republicans who intended to vote for Landon. A classic case in which the assumption of randomness was invalid involved the Literary Digest Poll of 1936. Other random sample approaches are described in Reference [10]. conclusions drawn from a sample which is thought to be random but which actually does not accurately reflect the characteristics of the population. Most statistical calculations are based on the assumption of randomness. a sample median or a sample midrange? To answer questions such as these. The conclusions drawn from this nonrandom sample were spectacularly in error. for instance. Roosevelt won with a popular plurality of 11. care must be taken to ensure a random sample because usual estimation techniques are based on this assumption. Random Samples A random sample can be defined as a sample in which every member of the population has the same chance of being included.PROBABILISTIC AND STATISTICAL ANALYSES X27 this: how good are these sample statistics as estimates of corresponding population quantities? As a matter of fact. Shortly thereafter the Literary Digest became defunct. be seriously in error. therefore. can. is to use random number tables to select the specific samples. it would be incorrect to select only from those components at the top of the crate. In this case. One simple method. First we must discuss the importance of choosing random samples. 11. Whenever sampling is performed.785 votes and an electoral vote of 523 to 8. If you wish to select a random sample of electronic components from a large crate that has just been delivered. Landon would win election to the presidency of the United States.069. have we any assurance that a sample mean is a better estimate of the population mean than is. and then we must establish the concept of a sampling distribution. The poll attempted to sample public opinion on the subject of whether Mr. for example. .
these sample means are values of a random variable.1 and variance a 2 .1 and variance a 2 In. n ~ 50).1 and variance a 2 but with distribution otherwise unknown.g. how is X distributed? A partial answer is provided by socalled restricted central limit theorem which states: If X (the random variable of interest) is distributed normally with mean X is distributed normally with mean J. The variance of the sampling distribution of the mean decreases as the sample size increases and this provides a justification for taking as large a sample as possible. More important is the general central limit theorem which states that if X is distributed with mean J. In fact. In a like manner we could generate other sample means.1X =J.1 alyn Other estimators (e. we are concerned with normal distributions. is a function of a value X2 from the chi square distribution through the relation The chi square distribution has been extensively tabulated and is widely used in decision criterion. etc.. 13. range. for finite populations of size N and for samples of size n.X28 FAULT TREE HANDBOOK and compute its mean x2.. the variance estimate. variance. e. median. variance (ax)2 = a J.) are characterized by their corresponding sampling distributions. at least for large n (e. The question arises. The associated random variable representing the collection of values is called the sample estimator. Note that the z transformation corresponding to an x value is z =' xJ. To facilitate the discussion . We do not expect these sample means to all be the same. Consequently. Point EstimatesGeneral A single numerical value (such as i) calculated from a sample constitutes a point estimate of some corresponding parameter. then This theorem is exactly true only for populations of infinite size. For instance. s2. in goodness of fit tests. whenever we deal with sample means of large samples. most of which can be found in advanced texts on statistics.. x3' x3' x5' etc. and in hypothesis testing. Let the sample means in general be represented by the symbol X which denotes the random variable.1 and 2/n where n is the sample size.g.g. for a normal distribution. the distribution of X is still very closely approximated by the normal distribution with mean J. Reference [30].
b) Minimum Mean Square Error Estimators and Minimum Variance Estimators The mean square error (MSE) of an estimator is defmed as (X38) The MSE thus is a measure of the amount an estimator ft deviates from the true value (J. if 'U'a is an unbiased estimator of the population mean Ji. we can always rewrite the MSE as The first term on the righthand side is the variance of the estimator and the second term is called the square of the bias of the estimator. By adding and subtracting E(ft) inside the parenthesis in equation (X38). 'U'c' the sample midrange estimator. then 'U'a could be the sample mean estimator. nl L... On the other hand X is an is a biased estimator of 0 2 .J i=1 X)2 ' 1 which is an unbiased estimator of 0 2 . we know that the sample mean unbiased estimator of Ji because E(X) = Ji. then From the properties of the expectation. The reader should note that the following characteristics of estimators relate to these sampling distributions.(X. and so forth. Thus. and manipulating the result.PROBABILISTIC AND STATISTICAL ANALYSES X29 let us use the symbol (J to designate the population parameter being estimated and the symbols 'U'a' 'U'b' 'U'c' . If we multiply by n/(nl )Bessel's correctionwe have n S2 =_1_ . if (J were the population mean.. to designate various sample estimators for (J. will all have sampling distributions. 'U'b' the sample median estimator.. a) Unbiased Estimators An estimator is said to be unbiased if its sampling distribution has a mean that equals the population parameter being estimated. For instance. The estimators 'U'a' '8'b' 'U'c' . If the estimator is unbiased then E(11') = (J and ...
we usually choose the one with minimum variance. 1fc ' . we use the ratio var var ('8'2) (1fl ) as one measure of the relative efficiency of estimator 1fl with respect to 1f2 . one of them has the smallest MSE. one has minimum variance. This method is widely used. for example. In this case we are more interested in minimizing the deviation from the true value than we are in the longrun unbiased property. because on the average we want the estimator to equal the true value. (b). Point EstimatorsMaximum Likelihood There is one very important technique for calculating estimators that is called the method of maximum likelihood.E(8)] 2. Further discussion of estimator considerations is given in reference [24] . c) Consistent Estimators If1fa is a consistent estimator of 0.. When n becomes very large then the probability that the estimator deviates from the true value goes to zero. then a (biased) MMSE estimator may be more efficient. In comparing the variance of two unbiased estimators 1fl and 1f2 . 1fb . . In this case we say that "1fa converges in probability to 8." Properties (a). we are going to use an estimator only once or a few times. the pdf of the estimator concentrates about the true value of the parameter. however. and (c) are the principal characteristics that determine whether an estimator is good or not. If we choose between two or more unbiased estimators. If. then that estimator is called the minimum variance unbiased estimator and abbreviated MVUE.. then that estimator is called the minimum mean square error (MMSE) estimator. If among several estimators. all of which are unbiased. 14. If we are going to use an estimator in numerous applications then we generally want the estimator to be unbiased. we say that the one with the smaller variance is relatively more efficient and in fact. The choice of estimator depends on the situation. If among several estimators 1fa . then P[I1faOI<€J > (1<5) for all n>n' where € and <5 are arbitrarily small positive numbers and n' is some integer. in . We can interpret the above equation as saying that as the sample size n increases.X30 FAULT TREE HANDBOOK MSE = E [ 8'. Hence the MSE for an unbiased estimator is simply the variance of the estimator.
or is near the most probable one. As a more physical illustration. Bridge players do not expect to pick up a hand containing all 13 spades. 0) where 0 is some unknown population parameter that we desire to estimate.. not only do we see the actual. The kind of hand you usually get may look something like this: 4 Spades 2 Hearts 4 Diamonds 3 Clubs This hand can arise in an enormous number of waysto be precise. assume that we had a special camera that photographed the molecules in a box full of gas. the maximum likelihood technique yields estimates which are generally usable. Even so. we write down an expression that gives us the probability associated with this particular sample and then apply the condition that it be a maximum. Even for moderate or small samples. In general. the maximum likelihood technique yields estimators which are consistent and which are both MMSE and MVUE. there is a positive (although miniscule) probability that we shall fmd a picture shOWing all the particles crowded into one corner of the box and all heading due north. For most of the rest of the time you will be getting hands that are very similar to this. The technique is based on the supposition that the particular random sample drawn from the population is the most likely one that could have been chosen. the technique of maximum likelihood is founded on the assumption that our sample is representative of the most likely one we could have withdrawn from the populationalways with the proviso that we have gone to some pains to assure that it is a random one. In fact. For large sample sizes (n+oo) under rather general conditions. Using the pdf. . consider two examples. When the photographs are developed.PROBABILISTIC AND STATISTICAL ANALYSES X31 computing parameter estimates in life testing. The probability of a 13spade hand. because that hand can also arise in only one way. you will get the distribution 4432 (regardless of suit) about 20% of the time. . then we would be making the assumption that the sample was a probable one and not the very unlikely one. instantaneous locations of the molecules but also their individual vector velocities. Assume that our sample (size n) is xl' x2" . The probability of such a hand is extremely small because it can arise in only one way. We could take photographs like this for thousands of years and they would all look pretty much the same: a homogeneous distribution of molecules in space going all directions. (11) (1i) (11) (1i) = (13)4 (11)3 (0)2 (3) ways (over 10 billion). is the same as that associated with any hand that is preassigned card by card. Suppose that we are taking random samples from a population that is distributed' according to the pdf f(x. To make this supposition reasonable. If we were to apply the maximum likelihood technique to one sample (a given photograph). x n and that the sample variables are independent. however. The maximum likelihood technique is based on the premise that the sample we have is a most probable one.
We shall now consider some specific examples to indicate how the maximum likelihood technique is applied in practice. *If the sample variables were not independent.. 0) = II f(xi. f(Xn.. In (Likelihood Function) = L(O) = In II f(xI. 8) i=1 i=1 n n (X41) Notice that this is a function only of 0 because all the Xi's (our sample values) are known. we can now write down an expression that gives us the probability of our sample: If we drop off the differentials we get an expression that is known as the likelihood function: Likelihood Function = f(xI. . The likelihood function is no longer equal to the probability of the sample but it represents a quantity that is proportional to that probability. d (X42) d(} L(O) = O. 0) f(x2. 0) . We are now in a position to determine 0 for which L(O) is a maximum. 8) i=1 n (X40) where the symbol n stands for a continued product.* The next step is to take the natural logarithm ("In") of the likelihood function.X32 FAULT TREE HANDBOOK The probability that out fust reading will be xl in the interval dXI is simply f(xI. This is simply a matter of convenience because most of the pdfs we encounter are exponential in form and it is easier to work with the natural logarithm than with the function itself. Assuming a solution is obtainable (which it generally is). O)dxI' The probability that our first reading will be xl in dX 1 and that our second reading will be x2 in dX2 is Following this line of reasoning. and known variance a2 = 1. the result is written (}ML' the maximum likelihood estimate of the unknown population parameter O. and we would attempt to maximize that (see Reference [24]). 0) = Lin f(Xi. then the likelihood would consist of a multivariate distribution. Suppose that we are taking random samples from a population whose distribution is normal with unknown mean p. We do this by setting the derivative of L(O) with respect to () equal to zero and solving for O.
" = "2 This yields ~x· 1 (1) (2) ~ (xCp)(l) =~ (xCp) = O. maximum likelihood estimate of p.. p.... 1 J Taking natural logarithms we have L(p) =  ~ In (21r) . n i=l 1~ n Thus.p)21.PROBABILISTIC AND STATISTICAL ANALYSES X33 f(x. exp (21r)n/2 [_1 ~ 2 ~ (x. the maximum likelihood estimate of p is just the ordinary arithmetic mean. Xi = x... Then our fundamental pdf is f(x. p... . If more than one population parameter is to be estimated. Suppose that in our previous example we knew neither p nor a2 .. L. so PML =  _ L. The likelihood function is = 1 . p We desire to make :1. i=l n Applying the maximization condition. L. a = 1) = 1 V2i exp [(X2 )2] . The likelihood function becomes . dL(p) <i.. a 2 ) = 1 v0/ffa exp  [ (Xp 2a2 )2] . i=l i=l nil = 0 .. the process is similar.~ ~(xCp)2.
L.~ In (21T) . namely. the mean life. Then 2 _ _ ~( _)2 1 aunbiased . again the simple arithmetic mean. 0) =0. i=l This is a biased estimate of a2 .nl L. i=1 . The estimated value of the variance is often denoted by the symbol s2. ~ML n I I : Xi =n i=l n =x.~ In a2  ~ I : (xC~)2 . is large (say n ~ 30). The bias may be removed by multiplying a~L by the quantity n/(nl). the sample size. 2a i=1 n We now fmd oL{o~ and 3L{[3(a2 )] and set the results equal to zero.. f(t.1 et / 8 (t> 0).X34 Taking natural logarithms we have FAULT TREE HANDBOOK L(J. and so that 0ML = ~ n I : t i . The second operation yields a 2 ML = ~ I : (xcx)2 . Assume we have observed n times of failure for n components tested. The first operation yields the same result as before.. . As a final example let us return to the exponential distribution and find the ML estimate for 0.J Xi X • n i=1 There is little difference between a~L and a~nbiased if n. a2 ) =. Then and Therefore..
0. Suppose we draw repeated samples (Xl' x2) from a population. . in fact. say. The interval defined by 8u and 8 L is plotted on the figure. 71 =0. Interval Estimators We have seen in the previous sections how we can make point estimates of population parameters on the basis of one or more random samples drawn from the population. Now we take a random sample (Xl' x2) from which we calculate the values 8 u and 8 L at. In this case we are willing to accept a 5% probability (risk) that our assertion is not. This involves making an assertion such as: P [(1J'lower < 8 <~pper)] = 71 where 8 is some unknown population parameter. the 95% confidence level.95. If. we desire to bracket with a confidence interval. for instance.0. one of whose parameters. adopt a different approach. £u' . if we desire.90. '8tower and 1rupper are estimators associated with a random sample and 71 is a probability value such as 0. Figure X9.:. The actual value of the population parameter 8 is marked on the vertical axis and a horizontal plane is passed through this point.99.PROBABILISTIC AND STATISTICAL ANALYSES X3S 15. true.95 we refer to the interval for particular values of'O'lower and lfupper as a 95% confidence interval. etc. We construct a threedimensional space with the vertical axis corresponding to 8 and with the two horizontal axes corresponding to values of X I and x2 (see Figure X9). Geometrical Interpretation of the Concept of a Confidence Interval . 'e' L e I:: X 2 x. 8. We can. To help clarify the concept of a confidence interval we can look at the situation in a geometrical way.
Now we shall give an example shOWing how the quantities 0L and 0u are computed from a sample drawn from a normally distributed population with mean fJ. our chance of choosing one that does indeed include e will be 95%. For example. The process of taking a random sample and computing from it a confidence interval is equivalent to the process of reaching into a bag containing thousands of confidence intervals and grabbing one at random.96 for w. we could substitute the values 0. values wand w such that P[w<Z~w] =17. x2) yields the values and 0L' etc. If each sample is drawn from a normal distribution then the sample mean X has a normal distribution with mean fJ. We are now going to concentrate on the inequality on the lefthand side of the last equation and attempt to convert it into the form .95 for 17 and 1. For this reason 100% confidence intervals are of little interest. Even if each sample value is not drawn from a normal distribution. then 95% of them will cut the horizontal plane through (and thus include 0) and 5% of them will not. 5% of the time we will be unlucky and select one that does not include e (like the interval (Ou..96. we will assume that we wish to bracket fJ. and that we know a (based on previous data and knowledge). In contrast. (xi. X will still be approximately normal with mean fJ. we can go to 99% intervals. As we go to higher confidence levels (and lower risks) the lengths of the intervals increase until for 100% confidence the interval includes every conceivable value of (I am 100% confident that the number of defective items in a population of 10.95. etc. for any assigned probability 17.X36 FAULT TREE HANDBOOK Next we take a second sample (xi. Z=a/yn and where the distribution of Z is tabulated. and standard deviation a/Vii.) . If they are all 95% intervals. A third sample (xi'. In this way we can generate a large family of confidence intervals. and standard deviation a. This interval is plotted on the figure. w = 1. for which the risk is only 1%. For the example. and hence we can calculate these intervals without knowledge of the true value of 0. xi). The confidence intervals depend only on the sample values (xl' x2). If a risk of 5% is judged too high. then by the Central Limit Theorem. The quantity Z will then be the standardized normal random variable. where 0i 0u 0u ° 0D ° XfJ. for 17 thus have = 0. Because the distribution of Z is tabulated we can find.000 is somewhere between 0 and 10.000.). and standard deviation a/yn. where n is the sample size. for sufficiently large n. Substituting for Z in the above equation we where again w is known for any given 17. X2) from which we calculate the values and at the 95% level. (For example. in Figure X9). If the confidence intervals are all calculated on the basis of 95% confidence and if we have a very large family of these intervals.
l < +w vn~r a] Now we subtract X from every term of the inequality Next we multiply through by 1 remembering.. In the case of the population mean. to reverse the directions of the inequalities.n Substituting a particular value interval.PROBABILISTIC AND STATISTICAL ANALYSES X37 Let us first multiply through by the factor a/yn thus converting the inequality to [ w vIi a < XJ. and ware known quantities.n vn .. . a a [ w+ X>J.. where t=.n ... estimate a from our sample thus obtaining the quantity s. The value of a is also assumed to be known.l>w+ X vn vn J J which we can write in the form [ a a Xw<p<X+w. then 0L =x  _ w a a vn and 0u =_ + w ..l Proceeding as we did with the Z variable. If we do not know the value of a then we can.. X. x for the random variable X we thus have a particular The above inequality yields our desired confidence interval for p. in the process. x If a confidence coefficient 1] has been assigned. as described earlier for the normal distribution. s/vn xJ... Now we can form the standardized value t. we can form the inequality _ ts _ ts x<p<x + . vn . n.
depend on the way the data are collected. Confidence intervals for the failure rate A. equivalently.5.95 < R < 0. however. then R>0.S. = 110 can be easily obtained by inverting the above inequalities for (J. 2n) 2 <0 < X2(2.. normal tables may be used. we have for a twosided 95% confidence interval nOML X2(2. Gossett and is known as the t distribution. 2n)< 0.5% and 2.. as well as the maximum likelihood point estimates. nOML Other intervals for different levels of confidence can be obtained from the tabulated values of the chi square distribution. the mean time between failures. e. was nOML where t i are the observed times of failure. For a type 2 test.98 at the 95% confidence level. The number of components failing in this time interval is then random. In what is called a type 1 test.0 . n components are operated until a . its properties are now extensively tabulated: So if a is unknown.X38· FAULT TREE HANDBtJv." The value t.5. estimated by the point estimate (J'.follows a chi square distribution with 2n degrees of freedom.5.5. 2n) be the chi square values corresponding to cumulative distribution values of 97.<X 2(97.5.5% confidence level.g. n components having the same failure rate are operated for a prearranged time interval T. if 0. onesided confidence intervals are more commonly seen than twosided intervals.5.95 at the 97.5%. The values of the t distribution depend on the sample size (degrees of freedom). is not a value from the standard normal distribution.2n) . 2n) and X2 (2. If the sampling distribution is symmetric (equal tail areas on the ends). It can be shown that Xl = . for example. For the exponential distribution. For more involved experiments. when the sample size is greater than 25 or 30 the t table value is indistinguishable from the normal table value so that in that case. then a twosided interval may easily be converted into a corresponding onesided interval. In terms of estimating the reliability or mean time to failure. nOML X (97. we find the value of t for given 'Y/ from t tables and not from normal tables. As it turns out. 2n) or. The distribution was investigated in the early part of this century by W. Letting X2 (97. the confidence intervals.
. the mean time to failure (J is thus treated as itself being associated with a probability distribution. t n ) of component failure times is collected. t n )). . In the Bayesian approach. the parameters of the sampling distribution are treated not as fixed. we have f(x) = Ae. . Testing with replacement is also discussed and confidence intervals and point estimates are given for the Weibull and gamma distributions as well as the exponential distribution under a variety of circumstances. cited earlier. Expressing the exponential in terms of the failure rate A = l/e.. We then talk about the posterior distribution of Awhich represents our updated knowledge of the distribution of A with the additional sample data incorporated.AX • The failure rate is thus also associated with a probability di~tribution (because of the relation A = l/e. Schafer. giving associated point estimates and confidence intervals. . but as random variables. Now Bayes theorem says P(BIA) = ~P(A/B) PCB) B P(A/B) PCB) Letting "A" denote the data sample "D. e. the distribution of Ais given by the distribution for (J and vice versa). and Singpurwalla (see reference [24]). t 2 . The pdf peA) is known as the prior distribution and it represents our prior knowledge of A before a given sample is taken. . Let the pdf for A be denoted by peA). we have been treating the parameters of the sampling distributions as being fixed. Section 9. Assume now that a given sample (t 1 .g. The posterior distribution of A.. In many applications. .. With regard to the exponential distribution f(x) =t ex/e. The text by Mann. 16. Bayesian Analyses In the past discussions..PROBABILISTIC AND STATISTICAL ANALYSES X39 preassigned number of components fail and this preassigned number may be less than n. whose pdf we shall denote by p(AID)." and "B" the event that the failure rate lies between A and A + dA. this is a questionable assumption. further discusses tests of type 1 and 2. is easily obtained by applying Bayes theorem* (the symbol "D" denotes the data sample. we have exp [p(AID)= ~ exp [ t t 1=1 Ati ] An peA) 1=1 At i ]' An peA) dA *See Chapter VI. From here on A and () will denote the random variables associated with the parameters Aand (). (t l' t 2 .
X40
FAULT TREE HANDBOOK
Here we have replaced the summation sign by the integral sign. Because the denominator does not involve A(it is integrated out) we can write the above equation as p(AID) = K exp [
t
Ati ] An peA)
1=1
where K is treated as a normalizing constant. The distribution p(AID) is the posterior distribution of A now incorporating now both our prior knowledge and the observed data sample. Bayes theorem thus gives a formal way of updating information about the failure rate A (i.e., going from peA) to p(AID». If a second sample D' were collected (say ti, t 2 ... , t~) then the distribution of X could be updated to incorporate both sets of , data. If p(XID,D') represents the posterior distribution of X based on both sets of data D and D' then we simply use the above equation with p(AID) now as our prior giving
p(XID, D') = K exp [
t
Ati ] An p(AID)
1=1
Choices for the initial prior peA) as well as techniques for handling various kinds of data are described in detail in various texts (see references [24] and [30]). In Bayesian approaches, the pdfs obtained for the parameters (such as p(A/D» give detailed information about the possible variability and uncertainty in these parameters. We can obtain point values, such as the most likely value of X or the mean value of A. We can also obtain interval values, which are probability intervals and are sometimes called Bayesian confidence intervals. For example, having determined p(XID) we can then determine lower and upper 95% values of AL and AU' such that there is a 95% probability that the failure rate lies between these values, Le.,
f
AU p(A/D) dA = 0.95.
XL
Other bounds and other point values can be obtained in the Bayesian approach because the parameter distribution (e.g., p(XID» is entirely known and this distribution represents our knowledge about the parameter. The Bayesian approach has the advantage that engineering experience and general knowledge, as well as "pure" statistical data, can be factored into the prior distribution (and hence posterior distribution). Once distributions are obtained for each relevant component parameter, such as the component failure rate, then the distribution is straightforwardly obtained for any system parameter quantified in the fault tree, such as the system unavailability, reliability, or mean time to failure. One must be very careful in determining the priors which truly represent the analyst's knowledge and in ascertaining the impact of different priors if they are all potentially applicable. Bayesian approaches are further discussed in reference [24].
CHAPTER XI  FAULT TREE EVALUATION TECHNIQUES
1.
Introduction
This chapter describes the techniques which form the bases for manual and automated fault tree evaluations and discusses basic results obtained from these evaluations. Once a fault tree is constructed it can be evaluated to obtain qualitative and/or quantitative results. For simpler trees the evaluations can be performed manually; for complex trees computer codes will be required. Chapter XII discusses computer codes which are available for fault tree evaluations. Two types of results are obtainable in a fault tree evaluation: qualitative results and quantitative results. Qualitative results include: (a) the minimal cut sets of the fault tree, (b) qualitative component importances, and (c) minimal cut sets potentially susceptible to common cause (common mode) failures. As previously discussed, the minimal cut sets give all the unique combinations of component failures that cause system failure. The qualitative importances give a "qualitative ranking" on each component with regard to its contribution to system failure. The common cause/common mode evaluations identify those minimal cut sets consisting of multiple components which, because of a common susceptibility, can all potentially fail due to a single failure cause. The quantitative results obtained from the evaluation include: (a) absolute probabilities, (b) quantitative importances of components and minimal cut sets, and (c) sensitivity and relative probability evaluations. The quantitative importances give the percentage of time that system failure is caused by a particular minimal" cut set or a particular component failure. The sensitivity and relative probability evaluations determine the effects of changing maintenance and checking times, implementing design modifications, and changing component reliabilities. Also included in the sensitivity evaluations are error analyses to determine the effects of uncertainties in failure rate data. Listed below is a summary of the type of results obtained from a fault tree evaluation. In the follOWing sections we discuss fault tree evaluation in more detail. Classes of Results Obtained From a Fault Tree Evaluation: Qualitative Results a. b. c. Minimal cut sets: Qualitative importances: Common cause potentials: Combinations of component failures causing system failure Qualitative rankings of contributions to system failure Minimal cut sets potentially susceptible to a single failure cause
Quantitative Results a. b. Numerical probabilities: Quantitative importances: Probabilities of system and cut set failures Quantitative rankings of contributions to system failure
XII
XI2
FAULT TREE HANDBOOK
Sensitivity evaluations: Effects of changes in models and data, error determinations
c.
2.
Qualitative Evaluations
For the qualitative evaluations, the minimal cut sets are obtained by Boolean reductIon of the fault tree as previously described in Chapter VII, section 4. Additional examples of minimal cut set determinations are given in this section to further familiarize the reader with Boolean reduction techniques. The minimal cut sets obtained are used not only in the subsequent qualitative evaluations but in all the quantitative evaluations as well.
(a) Minimal Cut Set Determinations
Because the minimal cut sets form the bases for all types of evaluations considered here, we shall briefly review the determination of minimal cut sets from the tree. In general, as stated in Section 4 of Chapter VII, our goal is to obtain the top event in the minimal cut set form
The minimal cut sets Mj consists of combinations of primary failures (component failures), e.g., Mj = C1 C2 C3 , which are the smallest combinations of primary failures that cause system failure. The substitution can be either a topdown substitution or a bottomup substitution as previously described. Most computer algorithms for determining minimal cut sets are based on these principles. (Computer codes are discussed in Chapter XII.) Let us now consider the pressure tank fault tree of Figure XI·I. Figure XI} is similar to the detailed pressure tank fault tree constructed in Chapter VIII and shown in Figure VIII13. In Chapter VIII, minimal cut sets were determined for a reduced version of the tree. We will determine here the minimal cut sets of the detailed tree as a first type of qualitative and/or quantitative evaluation. In the fault tree in Figure XI} we designate primary failures by PI' P2 , ... in circles; secondary failures by SI' S2 ... in diamonds; undeveloped events by E1> E2 ... in diamonds; and all higher fault events by G I , G2 ... except for the top event which we designate T. The set of Boolean equations equivalent to the fault tree of Figure XI} is:
T = PI + Sl G1 = E 1 + G2 G2 =P Z +Sz G 3 = G4 • Gs G4 = G6 + G7 Gs = P3 + S3 G6 = P4 + S4 G7 = P5 + Ss Gg = P6 + S6
+ GI
+03
+ E3 + E4 + Gg + E6 .
Note that we have one equation for each gate in the tree.
FAULT TREE EVALUATION TECHNIQUES
XI3
Figure IXI. Pressure Tank Fault Tree
and thus becomes Substituting for G4 and GS in the equation for G3 we have Using the distributive law we now express G 3 in its expanded form: G3 = (P4 • P 3) + (P4 • 8 3 ) + (P4 • E 3 ) + (8 4 • P 3 ) + (S4 '8 3 ) + (8 4 • E 3 ) +(E 4 ' P 3 )+(E4 ' 8 3 )+(E4 • E 3 ) +(Ps' P3 )+(ps' 8 3 )+(ps • E 3) + (Ss • P3) + (8 s • S3) + (8 s • E 3) +(P6· P 3)+(P6' S3)+(P6 • E 3) + (S6 • P 3) + (8 6 • 8 3) + (S6 • E 3) + (E 6 • P 3) + (E 6 • 8 3) + (E6 • E 3)· ~w~=~+~+~'and~se~~~ G 1 = E 1 + P2 + S2 + G3 sothatT=P 1 +SI +E 1 +P2 +S2 +G 3 · Substituting the expanded form of G3 into this last equation. which are alieady in minimal cut set form. thus contains: 5 single component minimum cut sets. and 24 double component minimum cut sets. The top event. Using the distributive law and the law of absorption we will then transform each gate equation to minimal cut set form. or system model. G7 contains only the higher fault Gs and so substituting it becomes which is now in minimal cut set form. Both G6 and G s are already in minimal cut set form. G4 contains G6 and G7 .XI4 FAULT TREE HANDBOOK We will use the bottomup procedure and write each gate equation in terms of primary events (the P's. E's. we fmally have T in correct minimal cut set form. GS is already in minimal cut set form. and S's) by substituting for the G's. .
Component failure probabilities are in general different and depend on testing intervals.) can also be obtained on a selective basis if they show potential susceptibility to common cause failures (discussed further in the next section). For example. if we wish. we can indicate the susceptibility that compone!!t failures may . and 10 double component minimal cut sets. or one steam line break may cause all instrumentation to fail in a control panel. Similar checks can be done on criteria that restrict specific types of failures which may not individually fail a system (e. we do not know which failures will be common cause failures. active components or human errors). some idea of failure importances can be obtained by ordering the minimal cut sets according to their size. more basic cause may result in multiple failures which fail the system. double. Because required computer times can increase dramatically as the size of the minimal cut sets increase. downtimes. As an additional calculation. (c) Common Cause Susceptibilities The primary failures (component failures) on a fault tree do not necessarily have to be independent. the ranking according to size gives a gross indication of the importance of the minimal cut set. the savings in number of minimal cut sets will be even greater if we do not explicitly show the secondary failures. however.. The singlecomponent minimal cut sets (if any) are listed first. In evaluating a fault tree. a singlecomponent cut set probability will be o(the order of 10 3. For example. then the triple. A single. then the doublecomponent minimal cut sets. etc. Computer codes usually list the minimal cut sets in this order. The minimal cut sets can be checked to see if this criterion is satisfied. etc. it is often the practice to obtain only the single. if individual component failure probabilities are of the order of 10 3. etc. Multiple failures which can fail the system and which can originate from a common cause are termed common cause failures. then we do not really need to explicitly show them. and perhaps triplecomponent minimal cut sets. The minimal cut set information can sometimes be used directly to check design criteria. With larger trees. separate the causes in the quantitative analyses. an operator may have miscalibrated all the sensors. By deleting the S's in the fault tree. a triple 10 9. and a double cut set 10 6. we delete all the cut sets containing S. higher order minimal cut sets (quadruples. (b) Qualitative Importances After obtaining the minimal cut sets..no single component shall fail the system.FAULT TREE EVALUATION TECHNIQUES XIS We note that if the S's denote secondary failures. then this is equivalent to stating that the system shall contain no single component minimal cut sets. We now have 3 single component minimal cut sets. etc. Because the failure probabilities associated with the minimal cut sets often decrease by orders of magnitude as the size of the cut set increases. For example. therefore the ranking of minimal cut sets according to size gives only a general indication of importance. if a design criterion states that. All we need to do is to redefine the primary failures P to denote any type of cause and we can.g.
we then defme specific "elements. Having performed this coding. The list below gives some example categories which might be considered in a common cause susceptibility evaluation. Our next task in the common cause susceptibility evaluations involves component coding. for the category "Manufacturer" the elements would be the particular manufacturers involved which we might code as "Manufacturer I.XI6 FAULT TREE HANDBOOK have to a common initiating cause.e. List of Common Cause Categories to be Evaluated Manufacturer Location Seismic Susceptibility Flood Susceptibility Temperature Humidity Radiation Wearout Susceptibility Test Degradation Maintenance Degradation Operator Interactions Energy Sources Dirt or Contamination For each common cause category. The categories and elements can be indexed or keyed according to any convenient coding system." For example. i. For example. we are interested only in those common causes which can trigger all the primary failures in a minimal cut set. environment. For the "Location" category." "Manufacturer 2. the top event ocCurs. Examples of common cause categories include manufacturer. having element 8 associated with category 2. and for more specificity we would defme acceleration ranges where failure would likely occur.. Therefore. and element 3 associated with category 3 ("183"). "MV2183" may denote manual valve 2 ("MV2") having element 1 associated with category 1. energy sources (not explicitly shown in the tree). system failure occurs. For the category "Seismic Susceptibility. This kind of naming can easily be coded on the fault tree and in any subsequent computer input." we might defme several sensitivity levels ranging from no sensitivity to extreme sensitivity. for each component failure we denote the element of each category associated with the component. Now by defmition." etc. A cause which does not trigger all the primary failures in a minimal cut set will not by itself cause system failure. we might divide the plant into a given number of physical locations which would be the elements. The minimal cut sets which are potentially susceptible to common cause failures . which are general areas that can cause component dependence. To identify minimal cut sets which are susceptible to common cause failures we can first define common cause categories. As part of the component name code or in associated component description fields. if all the primary failures in a minimal cut set occur. and humans. we can then identify the potentially susceptible minimal cut sets among the collection of minimal cut sets determined for the fault tree.
This final screening may be based on past histories of common cause occurrences. we consider only failure probability models for which either a constant failure rate per hour applies or a constant failure rate per cycle applies. and fmally the system. a factor of lOis required. Quantitative measures of the importance of each cut set and of each component can also be obtained in this process. i. Afterwards. the reader is referred to [12] and [17] for discussions of these other models. Let the constant failure rate per hour be denoted by A. If the failure rates are treated as random variables. which we shall simply call the Amodel. probability. (b) Constant Failure Rate Per Hour Model: Probability Distributions Consider first a component whose failures are modeled as having a constant failure rate per hour. diamond. We first discuss the usual "point estimations" where one value is assigned to each failure rate and where one value is obtained for each minimal cut set probability and for the system probability. and/or engineering judgement.e. Where timedependent effects such as bumin or wearout are important or where precision better than. say. etc. for example. Chapter XII. we then need to finally screen these cut sets to determine those which. Having identified the potentially susceptible minimal cut sets. we will discuss random variable analyses. For any component. then random variable propagation techniques can be used to estimate the variabilities in system results which result from the failure rate variations. etc. some sort of quantification analysis. then we are assuming that failure probabilities are directly related to component exposure times. The Amodel is the model most often used in fault tree quantifications. For the Amodel. test and maintenance. we ignore any timedependent effects such as component burnin and component wearout. The longer the exposure time period. or environmental.may require further action. The quantitative evaluations are most easily performed in a sequential manner.FAULT TREE EVALUATION TECHNIQUES XI7 are those whose primary failures all have the same element of a given category. discusses computer codes which can perform the initial searches. top event. 3. section 6. the resulting first failure probability distributions are exponential distributions. When we use the constant failure rate per hour model. the higher the probability of failure. such as contamination. We shall review the distribution properties to refresh the reader's .). The failure causes can be human error. In using these constant failure rate models. (a) Component Failure Probability Models By "component" we mean any basic primary event shown on the fault tree (circle. This last step is the most difficult and most time consuming. Quantitative Evaluations Once the minimal cut sets are obtained. first determining the component failure probabilities. The constant failure rate models which we discuss are usually used for order of magnitude results. These more sophisticated models include. then the minimal cut set probabilities. Weibull and Gamma failure distribution modeling. then more sophisticated models are required. corrosion. probability evaluations can be performed if quantitative results are desired..
the quantity I . .F(t) is called the complementary cumulative probability. and we will not explicitly state it in the forthcoming discussions. The probability F(t) that the component suffers its first failure within time period t. is: F(t) = I . which is the probability of no failure in time period t (given the component is initially working): IF(t) = eAt . where the interval ~t approaches zero. For the exponential distribution. F(t) is called the component unreliability.XI8 FAULT TREE HANDBOOK memory. This "initially working" assumption is applied to all the calculations. Estimates of the constant failure rate A are given for a variety of components in various data sources.eAt . In reliability terminology. In reliability terminology. is the probability that the component does not fail in time period t but then does fail In some small interval ~t about 1. we again assume the component is initially working at the beginning of the period.F(t). Table XII gives some representative failure rates for various component failures.F(t) is the component reliability and is denoted by R(t): R(t) =1. the quantity f(t)~t. (XI2) In statistical terminology. As part of the defmition of f(t)~t.F(t). (XII) The quantity F(t) is the cumulative probability distribution as discussed in section 3 of Chapter X. I . The analyst needs to obtain a value of A for each component failure on his fault tree for which the constant failure rate model is applied. (XI3) The density function denoted by f(t) is the derivative of F(t). The complementary quantity to F(t) is the quantity I . the density function f(t) is: f(t) = Ae At. the data are taken from WASHI 400 (reference [38]). given it is initially working. (XI4) The constant failure rate per hour model is so named because a formal calculation of the timedependent failure rate A(t) simply gives the constant value A.
3.. 10 4 3>10'· 3"...10 7 3.3 /0 3.0 9 1x10 7 lxl0 9 ..10· 6 'HR ".0 '/0 " 1O· 5 .10.10 8 "10...hne NO Contact to Close RELAYS 5hort AcrolS NO/NC Contact Open NC Contact 3x:l0 7 HR 1.3"0 6 3. 2 ]. To Run Normal Faitur..10 ' 110.3.10.08_ lx 10.04 ". 10 6 Ext.8 . .. Rupture Val_ (Manual) Failure To Remain Open (Plug) Fall To Open/D Va'" (Ralief) Premature Open/H R PipoV 3" HI Quahly Rupture (SectIon) Rupture L.J .7 lx 10...'0 7 3.3 "108. 2 .4 "10.108·3.9 /HR 3. c. ..5 /0 3. D lx 10'/HP. ~..10· lO lxl0 7 "'0..0. 1O /HR "1O· 9 /HR 3'l06/HR 3.2 /0 3.0.4 .10... Supplies NO/Output 1x 10. 6 HR 3. 10 "103/HR To Operata " 103/D 1x 10..lO· 7/HR 3xl0. .D "10.FAULT TREE EVALUATION TECHNIQUES Table XII.4 3 . '"Manual.6 /HR " 1O· 6'HR "1O· 7/HR .10.0.1()"7/HR 3.05/HR 3..04 3.. Shift T.0· 8 /HR 4 " 10 /0 3..10.HR lx10 4 ·0 J Ext.04/HR 3.10 4 "'0.106 /HR "108IHR 3."10 5 1...librltion.4 D 11( 10.t t: Pressure Fill to OPEA Ruptur. 5 ". 6 3~1O·~ 1.1()"4/l'J lx 10.HR lx 1O.10.n11llukRupture Failure To Operate Va'" (Vacuum) Torque: Fill to OPER 3. 104 "104.3.10.10. 9 3. 4 1.0 3 3.10... 4 lx 10 3 3..'0 6 3x 10· 7311 iO· 6 Pr."'0.lx '0.10.10 4 3. 6 lx 10.3 3x 106·3.0.Joe~l:D Failura To Start Pumps Failur.0 4 ".::>5l a:C wZ Failure to Operate CLUTCH ELEC Premature Open Failure to Open CLUTCH MECH Failure to Operate 3..10.10 3 ::> '" « . E.04 .. Fill to TRANS Contacts Short Failure to Operate w '" cl 0 1x 10" 'D hl0 8 1(R ~ 11110 9 lx 10.kRupture 1)( 10.2 3x 10. Annuciaton.10 ' FI.3..5 "10.3 /HA 3x.0 7 3. 5 3...7 1x1O. ~tow Met.0· 6 /HR lx 1O· 6 'HR .: Wakll  Laak/Rupturo Laak Short to PI'!R Open CKT TRANSFORMERS Short Falls to Function HI PWR Application Shorts SOLID STATE DEVICES Fails To Function low PWR Application Shorts 1.::> Cl a: Z w ~~ ~z a: w .3.'0 5 ". ..'0 8 3.0 5 3..0.k CIRCUIT BREAKERS Prlm.6 3. 6 " FaUura to ~o Comb_) InstrurMntatton O~" (Amplific8tion.10.. "..4 /0 SCRAM Failure to Insert (Single Rod I RODS Failure to Start ELECTRIC MOTORS Failure to Run Failure to Run (Extreme ENVIR 1 Failure to EnergIze Val_: MOV_ IPlug) Failure To Remain Open 3.10.t wcl <t:li lx 1()"4 .10 5 /D " 10·8 /HR " 1O· 8 'HR " 10. lx10 3 .6 .10 5 Pipas< 3" hl0 6 . 10 1 3.10 4 1.1 Leek Or Rupture Val_ (SOV) Fills To Operata Fails To Operate Val_ (AOV) (Plug) Failure·To Rlmlin Open "loB/HR ".. 3.5 3. 4 /0 1x10· 4 . 10 6 J.2 3.._ F8ilurlto Loechl Run ~o 0_1 (Engi.104.a Plant) (E.10.10..06/HR 3.0.'0 . Open FUSES Filiure to Open Open 3. 1' 3. 2 3."10..6 /HR "10.9 1x 10 7 '. hlO· 5 Gaskltl WIRES Short to GND 3.0 6 ~=. Representative Failure Rates from WASH1400 XI9 ..s....10. 5 3. (ChIct<) Rev«se leek 1xl0 7 HA Limit: Failure to Operate ~ 3.lves: Onfices.10' .7 . 9 1."0·9_ 1x '0 7 3"0 7 3111O· 6 3"0 7 ..4 :D 1x 10.10 4 1xl0 9 1xl0 7 3. 10' 11(10 7 '" 0 ::E w Cl :I: U w a: ::> .8 HR 1. 4 /0 ".3.ies Pow.3 D ". HR 3..6 3.10. 5 ."10 3>10 8 3"0 6 lx 10. 6 3. 3 ".10 5 3.7/HR ]J( 10 4 _ 3.4 "10.5 .10.10 5 3"0 4 3.. 3 lit 10 7 lx 10.. Z w .9 . 10" hlO· 7 h10 5 Failure to Start Od' 0 _ (Com.. 3 3. Z ::Ez lll~ w. 3.06/HR 3..ture Transfer 3"0 7 3.10 . To Run Extrlml ENV ~ail. Only) Failure to Run ~o ' 3.104 Batt.104 10 lxl0 41 D "...mlture.10 4 1x'07 1x 10..J IE w (Tilt) V.4 0 Ext... 8 .10..10.10..10.10 7 hlQ 4 1l( 10 3 Failure To Open Val.0."10. .HR 3.8 M'R 1. '0.t a: wCl lllw o "'0 . 5 ~~ t::> g"..ll.10' '/HR "10· 8 ..4 .3 /D 3.10..
XItO
FAULT TREE HANDBOOK
Extreme precision is not required (and is not believed!) in a fault tree evaluation; it is the order of magnitude size of the failure rate that is of concern, i.e., is the failure rate 10 6 per hour or 10 5 per hour. To this "order of magnitude" precision, detailed environments and detailed component specifications are often not important in obtaining gross estimates of the failure rate. The analyst, however, should of course use all the available information in obtaining as precise an estimate as he can for Afor each component or basic event on his tree. Because extreme precision is not required in a fault tree evaluation, the exponential distribution can be approximated by its fust order term to simplify the calculations. The cumulative exponential distribution, i.e., the component unreliability, is thus approximated as: F(t)
== At.
(XIS)
The above approximation is accurate to within 5% for failure probabilities (F(t» less than 0.1 and the slight error made is on the conservative side. Furthermore, the approximation error is small compared to uncertainties in A. The failure rate A used in the Amodel can be either a standby failure rate or an operating failure rate; data sources give both types of rates. If A is a standby failure rate, then the time period t used in Equation (XIS) should be the standby time t, Le., the time period during which the component is "ready" for actual operation. For these standby situations, F(t) is the probability that the failure will occur in standby. If A is an operating failure rate, then t is the actual operating time period and F(t) is the probability that the failure will occur in operation. Many components will have both a standby failure rate and an operating failure rate; for example, a pump will have a standby failure rate when it is not operating and will have an operating failure rate when it is operating. The analyst must ensure that the proper failure rate is used with the proper time period. For a standby and an operating phase the total failure probability is
where the subscript "s" denotes the standby phase and the subscript "0" the operating phase. For small probabilities (less than, say, 0.1) we can thus simply sum the failure probabilities.
(c) The Constant Failure Rate Per Hour Model: Reliability Characteristics
As stated in the previous section, the quantity R(t) = I  F(t) is the probability of no failure in time t and is termed the component reliability. The component unreliability F(t) is the probability of at least one failure in time period t. It is also the probability of first failure in time t. These definitions of F(t) incorporate the pOSSibility of more than one failure occurring if the component failures are repairable (the reason for the phrase "at least one failure"). If the component failure is not repairable, then at most one failure will occur.
FAULT TREE EVALUATION TECHNIQUES
XIll
When we say failures are repairable, we mean that the component is repaired or replaced when it fails. The repair or replacement need not begin immediately after failure, and when begun will require some time to complete. The repair or replacement operation can be characterized by the downtime of the component denoted by d, which is the total period of time the component is down and unavailable for operation. For a standby component d is the downtime during which a demand on the component may occur. If the plant is shutdown sometime after the failure occurs then d is only that period of online downtime in which the component could still be "required to operate (to respond to, say, an accident). The cumulative distribution for the downtime denoted by G(d) is defined as follows: G(d) = the probability that the downtime period will be less than d. (XI6)
The cumulative distribution G(d) is obtained from experience data on repairs/ replacements and completely defines the repair/replacement process for quantitative evaluations. Let q(t) be the component unavailability and be dermed as: q(t) = the probability that the component is down at time t and unable to operate if called on.
(XI7)
1  q(t) is the component availability and is the probability that the component is up and able to operate were it called on. If component failures are not repairable, then the component will be down at time t if and only if it has failed in time 1. Consequently, for nonrepairable failures, where the component is up at t = 0, the unavailability q(t) is equal to the unreliability F(t):
q(t) = F(t); nonrepairable failures. (XI8)
For the exponential distribution, the unavailability q(t) is thus simply given by the approximation: q(t) ~ At. (XI·9)
For nonrepairable failures, the constant failure rate A is thus all that is needed to calculate the basic component characteristics F(t) and q(t) which are used in a fault tree evaluation. For repairable failures, the component unavailability q(t) is not equal to the unreliability and we need information on the repair process* to calculate q(t). We will assume that the repair restores the component to a state where it is essentially as good as new. This assumption is optimistic but is usually made. The effects of test inefficiencies can be investigated by more sophisticated analyses. (For other treatments see references [37] and [41].)
*From here on, we will speak only of "repair" but we will mean repair or replacement.
XI12
FAULT TREE H~BOOK
For repairable failures, we consider two cases: (1) when failures are monitored, and (2) when failures are not detectable until a periodic surveillance test is performed. For the monitored case, when a failure occurs an alarm, annunciator, light, or some other signal alerts the operator. In this case, the unavailability q(t) quickly reaches a constant asymptotic value qM which is given by: q
= XTD
M

(XIlO)
1+ XT D
=:: XT D ·
(XIll)
The failure rate X is the standby failure rate and the quantity T D is the average online downtime obtained by statistically averaging the downtime distribution (described by the cumulative distribution G(d». The downtime which is evaluated is again the online downtime during which the system is up and the component may be called on (for example, to respond to an accident situation). For simplified evaluations, the downtimes can often be broken into several discrete values with associated probabilities and a statistical average taken over the discrete values. The approximation given by Equation (XI· 11) is conservative and is within 10% accuracy for XTD < 0.1. For components which are not monitored but which are periodically tested, any failures occurring are not detectable until the test is performed. This is the situation, for example, when surveillance tests are performed monthly; any failure which occurred during the past month would be detected only when the test is performed. (We assume perfect testing here, in that essentially 100% of the failure modes are detected.) For periodic tests performed at intervals of T, the unavailability rises from a low of q(t = 0) = 0 immediately after a test is performed to a high value of q(t = T) = I  e ?\ T :: XT immediately before the next test is performed. Because the exponential can be approximated by a linear function (for XT < 0.1 say) the average unavailability between tests is approximately XT/2. The average value is applicable for fault tree evaluations if we assume that a demand on the component may occur uniformly at any time in the interval. If the component is found failed at a surveillance test, then it will remain down during the necessary repair time. Considering this additional repair contribution, we have the following expression for the total average unavailability qT for periodically tested components: (XI12) In the above formula, X is again the standby failure rate per hour and T R is the average repair time obtained from downtime considerations. The repair time evaluated is again the online repair time during which the component may be called on to function. The subscript "R" is used on TR to denote that it is the average repair time and not the total downtime which is the sum of the repair time plus the undetected downtime from time of failure to time of detection.
FAULT TREE EVALUATION TECHNIQUES
XII 3
In general, T R is small compared to T and the second term on the right hand side of Equation (XII 2) is negligible; we thus have: (XI13)
For repairable failures, the unavailability is thus given by qM or qT depending on whether monitoring exists or periodic testing is performed with no monitoring between tests. (If monitoring is performed, qM applies regardless of whether any additional periodic testing is performed.) For each repairable component of the tree, X and T D (monitored) or X, T R' and T (periodically tested) are required as data inputs. Failure rate data sources supply X and operating specifications for the component are sources for T R , T, and T D . In addition to the component unavailability, there is one component reliability characteristic which is of interest when an operational system is being evaluated. This component characteristic is the component failure occurrence rate wet) and is defined such that: w(t),M = the probability that the component fails in time t to t + .11.
(XII4)
In the definition of wet), we are not given that the component has operated without failure to time t as was the case for the failure rate definition X(t) (see Section 8 of Chapter X). In fact, if the component is repairable, then it can have failed many times previously; the quantity w(t).1t is the probability that it fails in time t to t + .1t irrespective of history. The occurrence rate w(t) is applicable for both nonrepairable and repairable components. For both repairable and nonrepairable components, the expected number of failures in some time interval t 1 to t 2 , denoted by n(t 1 , t 2 ), is given by the integral of wet) from t 1 to t 2 : (XIIS)
For nonrepairable component failures, the component can only fail once. Therefore, wet) is equal to the probability density function for first failure: wet) = f(t) (XII 6) (XI·I?)
= Xe Xt
where Equation (XII?) is for the constant failure rate model (the Xmodel). For time t small compared to I/X (such that Xt < 0.1), e Xt is approximately 1, and hence Equation (XII?) becomes, w(t):::: X, Xt < .1. (XII8)
the constant failure rate per hour model. and component unavailability. the constant failure rate per hour model. For n demands in time t and assuming independent failures. (Some of the computer codes discussed in the next chapter can compute the time dependent values ofw(t). etc. wet) can be a complex function of time. This probability of failing per cycle. Failure rate data sources sometimes indicate the appropriate models../n (The subscript c in R c and qe denotes "cyclic" values. is applied when failures are inherent to the component and are not caused by "external" mechanisms which are associated with exposure time.e. the component is assumed to have a constant probability of failing when it is called on (i.XI·14 FAULt TREE HANDBOOK For repairable failures.1. wet) = A is generally a reasonable approximation. or per demand.. For cyclic failures.) . (XI19) Hence for both nonrepairable and repairable failures. is most applicable for his analyses. we can use a constant failure per cycle model. the reliability (Re) and the unavailability (qc) are given by: (XI20) or 1. when it is "cycled"). and this asymptotic value of X is generally precise enough for applications: wet) ~ A. The constant failure rate per cycle model.). is independent of any exposure time interval. otherwise. which is the probability of failure per cycle. which we shall simply call the pmodel. has been applied to the majority of components.) (d) Reliability Characteristics for the Constant Failure Rate Per Cycle Model Instead of modeling component failures as having a constant failure rate per hour. the pmodel or the Amodel. Le. In past practice. however. For example. F(t). it approaches A as time progresses. We again use the definition of component unreliability. as given by Equations (XI8) through (XIll). whereas the Xmodel. many of the inherent component failures would be detected and the failures might then be best modeled by the Amodel. the pmodel has been applied to relatively few components. such as the time interval between the test or the time that the component has existed in standby. burnin testing). q(t). np < 0. In the constant failure rate per cycle model. After preoperational testing (Le. The reliability characteristics for the pmodel are straightforward and are all based on the one characterizing value p. The analyst must decide which component model. the analyst must decide based on knowledge of the failure causes and mechanisms.. (YI. which we denote by p.R~ = q" ~ np. a component which is obtained from a manufacturer and immediately placed in the field may be modeled as having a certain failure probability p due to manufacturing defects. the cycling of the component may actually cause the failure (because of stress.
).g. we note that 1 . (Equation (XIlO) or (XIH)). For one demand (n = 1). e. having respective unavailabilities of 1 x 10. in which we ignore any timedependent transient behaviors. For each component on the fault tree modeled by the pmodel. We can index each minimal cut set of the fault tree in any way we choose. For a fault tree of a standby system. As an example in applying Equation (XI24).= qc = p. q2(t). such as a nuclear safety system. the reliability characteristics for the minimal cut sets can be evaluated...FAULT TREE EVALUATION TECHNIQUES XIIS As noted in the above equations.. the minimal cut set is an intersection of the associated component failures..2 and 1 x 10 3. If the components in the minimal cut set are all repairable or cyclic. if a minimal cut set has two components. one component can be periodically tested and the other can have a cyclic failure rate. by its definition. any combinations of component unavailabilities can be used (e. then constant values can be used for the component unavailabilities. and then Qi(t) would be the unavailability for minimal cut set L To determine Qi(t) we note that.cut set. Hence: (XI24) where ql (t). etc. (XI22) Because a minimal cut set can be viewed as being a particular failure mode of the system we can also define Q as: Q(t) = the probability that the system is down at time t due to the particular minimal. (XI23) We can thus also call Q(t) the system unavailability due to a minimal cut set. the minimal cut set occurs if and only if all the component failures occur.R c . recall from Chapter VII (Equation VII3) that the probability of an intersection (Le. the reliability and unavailability do not depend explicitly on time but on the number of cycles (demands) occurring in that time. the user must obtain the appropriate value of p and the number of demands (usually one). For this completely . (e) Minimal Cut Set Reliability Characteristics Once the component reliability characteristics are obtained.g. Assuming the component failures are independent. are the unavailabilities of the component in the particular minimal cut set and ni is the number of components in the cut set. then the cut set unavailability is The component unavailabilities are those given in the previous sections. the characteristic of principal concern is the minimal cut set unavailability denoted by Q: Q(t) = the probability that all the components in the minimal cut set are down at time t and unable to operate. an AND gate) is simply the product of the component probabilities. etc.
The occurrence rate itself. Assuming independence of the component failures Wj(t) is thus given by: (XI27) where the q(t)'s are the component unavailabilities and the w(t)'s are the component occurrence rates (Equation (XI. The contribution from each of the componen~s failing in time t to t + ~t is summed over the nj components in the cut set to obtain the total . A minimal cut set characteristic giving information on these reijabilityrelated concerns and one which is easily calculable is the minimal cut set occurrence rate denoted by Wet)." A minimal cut set failure occurs at time t to t + ~t if all the components except one are down at time t and the other component then fails at time t to t + ~t. then instead of the unavailability.XI16 FAULT TREE HANDBOOK repairable or cyclic case and within our approximations. The first term on the right hand side of Equation (XI27) is the probability that all components except component 1 are down at time t and then component I fails. etc. Wet). the minimal cut set unavailability is then simply a constant and independent of time. is thus a probability per unit time of the minimal cut set failure occurring. (XI25) The quantity ~t is a small increment of time. The second term is the probability that component 2 is the component that fails in time t to t + ~t. The minimal cut set occurrence rate Wet) is defmed such that: W(t)~t = the probability that the minimal cut set failure occurs in time t to time t + ~t. all other components being already down. To calculate Wi(t). then Wlt) refers to the occurrence rate of minimal cut set i. the number of system failures and the probability of no system failure are of primary interest. 14». If the fault tree is of an operational system. (XI26) If we index all the minimal cut sets of the tree. Because a minimal cut set can be considered a system failure mode we can equivalently define W(t)~t to be: W(t)~t = the probability that a system failure occurs in time t to t + ~t by the particular minimal cut set. we use the basic definition of a minimal cut set and the concept of an "occurrence.
less than 0.t 2) is the expected number of times the system fails in time period t 1 to t 2 . then Ni(t 1 . and k(t) is the probability of a demand occurring per unit time at time t.t2) of failures of minimal cut set i occurring during some time period from t 1 to t 2 is: (XI29) If the components in the minimal cut set are all repairable. where p is the cyclic component failure probability. The above equation can be applied to minimal cut sets having cyclic components. The expected number Ni(t 1 . Even when components are repairable.FAULT TREE EVALUATION TECHNIQUES XII' occurrence rate given by Equation (XI27). and constant values are used for the component unavailabilities and occurrence rates (ignoring any transients). net) is the (expected) number of demands in time t. Wi(t) = Wi' In this constant value case Ni(t 1 . The At's cancel in Equation (XI27). if we use for the cyclic components qc(t) ~ n(t)p and wc(t) ~ pk(t). as previously stated. then Wi(t) is a constant. giving: (XI28) The minimal cut set occurrence rate Wlt) is strictly applicable when all the components have a per hour failure rate (the Xmodel).) The quantities net) and k(t) must be obtained from operational considerations for the component. do not have any explicit timeassociated behaviors.t2) is less than 1. due to minimal cut set i. When all components are nonrepairable then Ni(t 1 .t2) is also the probability that the minimal cut set fails in the time period t 1 to t2 (the expected number being equal to the probability in this case).t 2) is simply equal to the time interval multiplied by the constant minimal cut set occurrence rate: (XI30) Because the system fails every time the minimal cut set fails (by the minimal cut set definition). Even though Nlt 1 .1). Ni(t 1 . Cyclic components (the pmodel). (The quantity k(t)At is thus the probability that a demand occurs in time t to t + At.t2) is strictly the expected number of failures. it is also a reasonably good approximation for the probability of the minimal cut set failing in time period t 1 to t2' This approximation in the repairable case is conservative (the true probability is . if the system failure probability over the time period of interest is small (say.
(XI32) For a standby system.1: NiCt1 . then Ni (t 1 . the actual achieved values of Qi(t) and Wi(t) may be quite higher and will be much more difficult to estimate.. is being evaluated. may be quite higher than the calculated values of Qi(t) and Wi(t) because of dependencies among the component failures.t2) ~ minimal cut set unreliability in time t 1 to t2' (XI31) In terms of system failure.. the minimal cut set characteristics are generally only computed for the lower order cut sets. the determination of the system characteristics is quite straightforward. The minimal cut set characteristics need to be calculated for all the dominant minimal cut sets of the tree. the minimal cut set unavailabilities. Qi(t). We can thus say. Wi(t). If the top event of the fault tree is not a system failure but . Qs(t) is the most critical system charaO{eristic. generally agreeing to within 10% of the true probability for Ni(tl . singlecomponent and doublecomponent cut sets. (See for example reference [12] for further considerations. when Ni(t 1 .t2) is the system unreliability due to minimal cut set i.g.t2)) and fairly accurate. when Ni (t 1 h) < 0. then only the cut set unavailability Qi(t) is usually calculated. the exact minimal cut set and system unreliabilities are generally difficult to' calculate and Ni(t1.) (D System (Top Event) Reliability Characteristics Once the minimal cut set characteristics are obtained. For small numbers of minimal cut sets.t2) consequently provides a useful and generally accurate approximation which is simple to calculate and good enough for most applications. For fault trees having very large numbers of minimal cut sets.1. the characteristics can be calculated for all the cut sets of the fault tree.1. the values of Qi(t) and Wi(t) for higher order cut sets (e.XII 8 FAULT TREE HANDBOOK less than Ni(t 1 . For a fault tree evaluation. provide comprehensive information on th. Because component failures are assumed to be independent. such as a nuclear safety system. and the minimal cut set occurrence rates.t2) < 0. triples and higher) are generally negligible compared to those for the lower order cut sets (singles and doubles). using the above reasoning. If a standby system. The system unavailability denoted by Qit) (the subscript "s" denoting "system") is defmed as: Qs(t) =the probability that the system is down at time t and unable to operate if called on.e probabilistic behavior of the minimal cut sets. If the fault tree has only double cut sets and higher. The independence assumption represents an optimal condition and the true cut set characteristics. Using our previous nomenclature. When some components are repairable. then the calculated values for Qi(t) and Wi(t) represent design capability numbers useful principally for relative evaluations. for doubles and up.t 2) is thus also approximately the probability of system failure in time t 1 to t2 due to minimal cut set i. such as a nuclear safety system.g. that Ni(t 1 .t 2) < 0. e. within our approximation. the probability of minimal cut set failure is the minimal cut set unreliability and hence.
The system failure occurrence rate then.) The system failure occurs if and only if anyone or more of the minimal cut sets occurs. then Qs(t) is independent of time and is simply a constant value Qs' For online operational systems. (XI34) The occurrence rate itself. Equation (XI33) is usually used in fault tree evaluations. can be expressed as the sum of the minimal cut set occurrence rates Wi(t): Ws(t) = I: Wi(t). If we ignore the possibility of two or more minimal cut sets being simultaneously down. Now. i=l N (XI3S) Equation (XI3S) is another application of the "rare event approximation. any error made is on the conservative side in that the true unavailability is slightly lower than that computed by Equation (XI33). It generally gives results agreeing within 10% of the true unavailability for Qs(t) < 0. it is simple to calculate.1. . and it can be truncated at any value of N to consider only those cut sets contributing most to Qs(t). the socalled "rare event approximation. then Qs(t) is the probability that the top event exists at time t (having occurred earlier and persisting to time t). the system unavailability Qs(t) can thus be approximated as the sum of the minimal cut set unavailabilities Qi(t): Qs(t) =::: I: Qi(t) i=l N (XI33) where ~ denotes a summation of Qi(t) over the N minimal cut sets considered in the tree.33)." It is quite accurate for low probability events because for such events. the probability of two or more minimal cut sets occurring simultaneously is negligible. Equation (XI." was first introduced in Chapter VI. Equation (XI·3S) is again simple to evaluate and it can be truncated so that only the N dominant minimal cut contributors are considered. is the probability per unit time of system failure at time t. Ws(t).FAULT TREE EVALUATION TECHNIQUES XI19 some general event. Ws(t). Wit) is the probability per unit time that the top event occurs at time 1. If the component failures are all repairable or cyclic and constant values used for the component unavailabilities. Furthermore. the system is down if and only if anyone or more of the minimal cut sets is down. (For any general top event. the system failure occurrence rate Ws(t) is of interest and is defined such that: WS<t)~t = the probability that the system fails in time t to t + ~t.
then Qs(t). the reader must keep in mind the assumptions and limitations of the calculations.t2) is also a reasonably accurate approximation for the probability of system failure in time t 1 to t 2 . and the expected number of system failures Ns(t 1 . or the system failure occurrence rate.tl.t) = o f t Wit')dt'. We derme the component importance to be the fraction of system failure probability contributed by .the particular component failure. The system unavailability Qs(t).t 2 ) give comprehensive information on the probabilistic description of system failure.t2) in time tl to t2 (XI36) As a particular use of the above formula.t 2) less than 0. In using these results. is: Ns(O. When these dependencies exist to the extent that they significantly increase failure probabilities. then Ws(t} is siinpiy a constant WS ' and Ns(t 1 t2} is simply Ws times the interval t2 . (XI·3?) If the component failures are all repairable or cyclic and constant values used for the component unavailabilities. if the fault tree has only double cut sets and higher. We define the minimal cut set importance to be the fraction of system failure probability contributed by a particular minimal cut set. the expected number of system failures in time t. Ns(t). for our discussion here we will use one of the simplest methods of calculating the importances. the system failure occurrence rate Ws(t). the expected number of system failures Ns(tl . Using the same rationale as for the minimal cut set Ni (t 1 . the system results as calculated here may be very much below the true values due to dependencies among the component failures. which is the system unreliability: (XI38) The quantity Ns(O. Qit).1. Different formulas can be used to calculate the importances (see reference [21] for a discussion of the different approaches).XI20 FAULT TREE HANDBOOK is: If we use Wset).t) is thus a reasonably accurate approximation of the system unreliability in time period t. .t 2). (g) Minimal Cut Set and Component Importances As an additional evaluation we describe a quantitative technique for determining the "importance" of each minimal cut set and each component failure. Ws(t) and Ns(t 1 . particularly the assumption of independence of component failure occurrences. The minimal cut set importance and the component importance can be calculated with regard to the system unavailability. for Ns(tl. Nitt .t s) represent optimal designbased numbers which are useful for relative evaluations but are not useful for absolute evaluations. As discussed for the minimal cut set characteristics (section e).
i. The rule in either case is the same: to calculate the minimal cut set importance. the rare event approximation is used. and let ek (t) be the importance of component k at time t (we have simply indexed the cut sets and components for ease of identification). given the system is down. The quantity ek(t) is approximately the probability that the system is down due to component k being one of the causes. and constant values are used for the component unavailabilities then the importances Ej(t) and ek(t) are constant and independent of time: Ej(t) = E j and ek(t) = ek' The minimal cut set and component importances thus can be ranked from largest to smallest without regard to the time considered. Elt) is approximately the probability that the system is down due to minimal cut set i. With regard to the system failure occurrence rate. In terms of conditional probabilities. For the component importance.e. we take the ratio of the minimal cut set characteristic over the system characteristic. (XI42) The symbol ~ in equation (XI4I) denotes a sum of Qj(t) over all those minimal cut sets containing component k as one of its components. Because the system can be down if and only if one or more of the cut sets is down..FAULT TREE EVALUATION TECHNIQUES XI21 Ws(t). then: (XI39) = fraction of system unavailability contributed by minimal cut set i and: (XI40) (XI4I) = fraction of system unavailability contributed by failure of component k. Let Ej(t) be the importance of minimal cut set i at time t. we sum the characteristic of all minimal cut sets containing the component and divide by the system characteristic. given the system is down.) When all components of the fault tree are repairable or cyclic. (The quantities are approximate because intersections of minimal cut sets are ignored. With regard to system unavailability. Ej(t) and the component importance ek(t) are: ~ (XI43) = fraction of system failure occurrences at time t contributed by minimal cut set i (XI44) . the minimal cut set importance /. the sum of Qj(t) in Equation (XI4I) is the probability that the system is down due to component failure k being one of the causes.
For the convenience of the reader. With regard to ~(t). If constant values are used for all the component characteristics then ti(t) and edt) are again simple constants which can be ranked from highest to lowest without regard to time. .XI22 and: FAULT TREE HANDBOOK (XI45) = Fraction of system failure occurrences at time t in which component k is one of the contributors. Tables XI2 and XI3 summarize all the formulas which have been presented for the evaluation of a fault tree. component k is defined to be one of the contributors to system failure at time t if it is either down at time t or if it fails at time t. (XI46) The reasoning for the above two formulas is the same as used previously.
At '" At..At '" A. Ns(t.1 T wIt) = Ae. t2) where "j = the number of components in the ith minimal cut set < 0. tZ) < 0.1 ATO < 0. TR 2 pmodel Cyclic Component Minimal Cut Sets p. At < 0. monitored Repairable.fable XI2.L: W·(t) 1=1 I FIt) '" N s(tl.1 where A = component failure rate per hour (operating or standbY. qnilt) W31t) F(t) = same as q(t) F(t) '" Njltl. nll).p)n(t) '" n(tlp. Summary of Equations for Reliability Characteristics ReqUired Data Components Amodel Nonrepairable Repairable..At '" At. t2)..1 wit) = pklt) Wilt) = qZ(t) q3(t) .1 + ATR '" AI. qni l(t) wnilt) System 0slt) = j= .e.t2).. T.. nlt)p n' OJ(t) = III k=l qk1t) < 0.1 + ql It) q2(t) . kIt) qlt) = 1 ..1 < 0. NI (t. qnjlt) wzlt) + ql(t) qZll) .. At same as above same as above < 0. At q(t) = 1 + ATO q(t) = AT 2 ATO '" ATO' < 0.as applicable) TO = average downtime per failure in hours T test interval In hours TR = average repair time per failure in hours = probability of cyclic component failure per demand p nIt) = expected number of demands in time t kIt) = cyclic component demand rete per hour at time t = . TR q(t) = 1 .{t) + qilt) q3lt) . qnj(t) W. TO A.(1 .. periodically tested Unavailability Occu rrence Rate Unreliability A A..1 wIt) = A (asymptotic) wIt) = A (asymptotic) F(t) = 1 .. L: N N Ojlt) where N = the number of minimal cut sets WsW =.e..
if T is a periodic testing interval. The error spread obtained for the results gives the uncertainty or variability associated with the result. In judging the significance of an effect. depending on the needs of the engineer. It is particularly convenient to assess effects of component data variations using the formulas presented previously in this chapter because they explicitly contain component failure rates. As a type of sensitivity evaluation. If the system unavailability does not change significantly. then the effects on system unavailability with regard to different testing intervals can be studied by varying the T's for the components. the computation of point estimates for the unavailability and failure occurrence rate of the top event of a fault tree were described. will be short. the discussion therefore. likewise. which are independent of the fault tree evaluation techniques per se. it is important that the analyst take into account the precision of his data. scopingtype evaluations can also be performed by using a high failure rate and a low failure rate for a particular event on the tree. This can entail as simple a calculation as redoing the computations with different T's or as complex as employing dynamic programming. In this section we briefly take up the question of how to evaluate the sensitivity of these estimates to variations or uncertainty in the component data or models. . As a type of sensitivity study. For example. although a factor of 2 variation in the system unavailability might be very significant when failure rates are known to 3 significant figures. Summary of Equations for Quanitative Importance ith Cut Set Importance kth Component Importance System Unavailability ek(tl = L OJ(t) °s(t) kin i System Failure Occurrence Rate A Wj(t) Ei(tJ = . the same factor of 2 variation would probably not be significant when failure rates are known only to an order of magnitude. A wide spectrum of sensitivity analyses can be performed. then the event is not important and no more attention need be paid it. failure rates (A) can be changed to determine the effects of upgrading or downgrading component reliabilities. formal error analyses can be performed to determine the error spread in any fmal result due to possible data uncertainties or variabilities.Ws(t) ~k(tl = L kin i Wilt)  (h) Sensitivity Evaluations and Uncertainty Analyses In the previous sections. If the system unavailability does change significantly.XI24 FAULT TREE HANDBOOK Table XI3. Sensitivity studies are performed to assess the impact of variations or changes to the component data or to the fault tree model. The error analyses employ statistical or probabilistic techniques. In sensitivity analyses different values may be assigned these variables to determine the differences in any result. For example. test intervals. and repair times as variables. then more precise data must be obtained or the event must be further developed to more basic causes.
The reader is referred to Mann. * For the random variable approach. The variation in the data values is "simulated" by randomly sampling from probability distribution functions which describe the variability in the data.FAULT TREE EVALUAnON TECHNIQUES XJ25 A variety of error analysis methods can be used... In the Monte Carlo method. the fault tree evaluations are repeated a number of times (each repetition is called a "trial") using different data values (e. The final error spread on the result is the estimate of the result variability arising from the variability in the failure rates and other data treated as random variables. and dependencies. The above method is completely analogous to repeating an experiment many times to determine the error in the experimental value. The probability distributions can be Bayesian prior distributions on the parameters A. picking the 5% largest value and 95% largest value to represent the 90% range for the result). the method most adaptable to a fault tree evaluation is the Monte Carlo simulation technique. or can be distributions representing planttoplant variation in the failure rates and other data.g. (Section 2b of Chapter XII describes several fault treeoriented computer codes for performing Monte Carlo simulation. . general sizes of the errors. The whole set of repeated calculations will give a set of system results from which an error spread is determined (e.) *When failure rates and other data are treated as constants. The Monte Carlo method can accommodate general distributions. Each trial calculation will give one value for the system result of interest such as the system unavailability or occurrence rate. A's and TR 's) for each calculation. then classical confidence bound approaches should be used. with uncertainties arising from the variability of the estimators.. and Singpurwalla (reference (30]) for further details.g. Schafer. and we will briefly explain the approach when data are treated as random variables. T R' etc.
FAULT TREE EVALUATION COMPUTER CODES 1. Overview of Available Codes This chapter covers the computer code methodology currently available for fault tree analysis. Group 1. codes that compute minimal cut and/or path sets).e. GO "Fault Finder" 1977 NOTED [45] 1971 PATREC [19] 1974. A DualPurpose Code PLMOD [28] 1977 Group 5. Direct Evaluation Codes ARMM [26] 1965 SAFTE [13] 1968 GO [14] 1968. The codes discussed are divided into five groups. MOCARS [25] 1977.. Common Cause Failure Analysis Codes COMCAN [3] 1976 BACKFIRE [5] 1977 SETS [47] 1977 Group 1 consists of codes that perform the qualitative evaluation of a fault tree (i. et al. FRANTIC J41] 1977 Group 3.CHAPTER XII . Codes for Qualitative Analysis PREP [42] 1970 ELRAFT [35] 1971 MOCUS [11] 1972 TREEL and MICSUP [29] 1975 ALLCUTS [39] 1975 SETS [46] 1974 FTAP [43] 1978 Group 2. PATRECMC [20] 1977 BAM [34] 1975. Codes for Quantitative Analysis KlTTI [42] 1969. The codes in Group 2 perform quantitative (probabilistic) analysis based on the structural information XIIl . WAMBAM [22] 1976. The numbers in brackets refer to references in the bibliography. KITT2 [40] 1970 SAMPLE [38] 1975..WAMCUT [9] 1978 Group 4.
The division between qualitative and quantitative aspects develops naturally because the probabilistic analysis often involves repeated evaluation of the tree (i. it is often most efficient to perform the time<onsuming structural analysis once. PLMOD. Other advantages afforded by the computation of the minimal cut sets are (1) the minimal cut sets themselves provide much useful information to the analyst. only the single and double event cut sets were retained for the independent failure computations. cut sets whose size (number of events) is greater than some prespecified number n.. thus increasing computational efficiency and reducing data requirements. Several methods can be used to overcome or at least alleviate the problems associated with obtaining the minimal cut sets. This is often very effective for trees having low order cut sets which dominate the high order cut sets. finally. (2) non<ontributing cut sets (usually based on cut set size) can be discarded prior to quantification. even in the absence of any quantitative data. Group 5 contains the codes developed for use in common cause analysis. because they indicate the minimal sets of components whose failure will cause the system to fail. and. and (4) cut sets are required as part of the input to the common cause analysis codes. 2. using a distribution of failure or repair rates to perform sensitivity or error analysis). In WASH·1400 [38] for example. save the results in some convenient form (usually minimal cut sets) and then use these results to quantify the tree using different sets of data. as required. had in excess of 64 million cut sets).e. Computer Codes for Qualitative Analysis of Fault Trees This section deals with codes that compute the minimal cut (and/or path) sets of a fault tree.XII2 FAULT TREE HANDBOOK embodied in the cut sets. This is because the number of cut sets can increase exponentially with the number of gates and can easily reach the millions or even billions of terms (e. the higher order cut sets were analyzed only for common mode and . during the processing. One disadvantage of the minimal cut set codes is that the storage and computer time required to process even mediumsize trees can become quite prohibitive. and even the number of minimal cut sets may not be a good prediction of reqUired processing time. many of them will generate cut sets on request as a nonintegral part of the analysis. The five groups of codes are described in the succeeding sections of this chapter. at different time points. one example tree with 299 basic events and 324 gates. The codes in Group 3 are designed to perform direct numerical evaluation of a fault tree without computing cut sets as a necessary intermediate step. (3) the ability to compare the minimal cut sets with the original tree provides a valuable error check. The problem is complicated further because a simple count of events and gates is a very poor indicator of the expected number of minimal cut sets. Thus. The most common is to eliminate. In contrast. the probabilistic assessment is called the quantitative evaluation of the fault tree. a dual purpose code that can be used both for the qualitative and quantitative analysis of a fault tree constitutes Group 4.. however.g. The computation of the minimal cut sets is often referred to as the qualitative evaluation of the fault tree because the results are based solely on the structure of the tree and are independent of the probabilities associated with the basic events. Thus it is often difficult to predict the storage requirements and run time for a given tree.
both explicit and implicit (e. and special gates. so NOT gates. The basic events are assumed to be independent. ALLCUTS. FTAP. were the first computerized fault tree evaluation codes. pairs of basic events. unlimited replicated events are allowed. exclusive OR gates). its algorithm has been superseded by improved methods. The latter seems to be a promising method. the input to PREP is limited to AND and OR gates. are prohibited. TREBIL (for "tree build") takes the user's input description of a fault tree and builds a FORTRAN subroutine of the Boolean equations for the tree. groups of three basic events. Also. minimal cut sets found by COMBO are limited to maximum length of 10 components. there is no way to generate cut sets for intermediate gates. section 4. we discuss the individual qualitative analysis codes. must be input in terms of their basic AND and OR gate structure. PREPMINSET has two options for cut set generation: COMBO and FATE. MINSET then uses the TREE subroutine produced by TREBIL to find the tree's minimal cut and/or path sets. PREP. etc. ELRAFT. The user determines the maximal size of the cut sets to be computed (for low probability events. written in FORTRAN IV for the IBM 360 computer and released in 1970. (a) PREP The PREP and KITI codes [40] [42]. The main disadvantage of PREP is that COMBO requires a prohibitive amount of computer time for large order cut sets of large fault trees. such as those found in nuclear power plant fault trees. and PLMOD). such as koutofn gates. SETS is somewhat different from the other codes in that it provides a very general and flexible tool for manipulating the fault tree in the form of its corresponding Boolean equations. discussed in section lea). and KITTI and KITT2 perform time dependent fault tree analysis in the context of Kinetic Tree Theory. to determine which combinations cause the top event of the fault tree to occur. and is discussed further in later sections (see sections on SETS. was the first cut set code. doubles or triples usually suffice). In the remainder of this section. and FTAP discussed in sections l(b)(g). however. and automatic tree decomposition schemes.. MICSUP. It is included mainly for background. Disadvantages of using a tree reduction process are that (1) it is impossible to determine the total failure probability discarded and (2) analysis of dependencies such as events dependent on common causes requires separate evaluation of the higher order cut sets. The KIIT codes will be discussed in the quantification section. whereas FATE is not guaranteed to find all the minimal sets. COMBO systematically fails all single basic events. PREP consists of two parts: PREPTREBIL and PREPMINSET. Another similar approach is to reduce the tree based directly on cut set probability instead of order.FAULT TREE EVALUATION COMPUTER CODES X1I3 common cause failure potentials.. PREP allows a maximum of 2000 components and 2000 gates. use of auxiliary storage media during cut set processing. FATE incorporates quantitative data on the component's reliability to fmd minimal cut sets which are "most likely" to occur. Other techniques used in some codes are efficient "packed" and/or bitlevel storage schemes. all use variations on the "topdown" and "bottomup" methods described earlier in Chapter vn. SETS. . using the results from PREP. This requires the input of component failure probabilities at the outset.g. PREP is a minimal cut set (or path set) generator. and there is no easy method to input replicated portions of the tree. It does this by performing Monte Carlo simulation. MOCUS.
 TREEL and MICSUP [29] are based on an idea similar to that used in MOCUS except that instead of working from the top event down. The user may place an upper limit on the length of cut sets found if desired. the BICS will be minimal. otherwise. Coded in FORTRAN IV for the CDC 6600. the nonminimal BICS must be discarded. MOCUS is written in FORTRAN IV for the IBM 360 series computer. the product of the prime factors can soon exceed the capacity of the machine to represent the number. ALLCUTS uses a topdown algorithm. The major drawback of ELRAFT is that for large trees. ALLCUTS optionally allows input of basic event probability data. thus reducing the computer time and storage requirements. and a plot program KILMER can be used to produce a Calcomp plot of the fault tree based on the fault tree input description and conversational plotting instructions. ALLCUTS can compute the top event probability. If this data is input. In the ELRAFT code. (c) MOCUS The MOCUS code [11] was written in 1972 to replace PREP as a minimal cut set generator for the KITT codes. The tree is processed from the bottom up. * If the tree contains no replicated events.XII4 (b) ELRAFT FAULT TREE HANDBOOK The ELRAFT (Efficient Logic Reduction of Fault Trees) code [35] uses the unique factorization property of the natural numbers to find the minimal cut sets of a fault tree. most other aspects of the code are similar to PREP. Nonminimal BICS and BICS of length greater than a userspecified limit can be discarded as they appear. ALLCUTS handles up *The basic topdown and bottomup algorithms for minimal cut set generation are explained in chapter VII. (d) TREEL AND MICSUP _ . MICSUP has the advantage of generating the BICS for each intermediate gate of the tree. Other aspects of the MOCUS code are identical to PREP. ELRAFT is capable of finding minimal cut sets of up to six basic events for the top event and other specified intermediate events. . and select cut sets in specified probability ranges. and cut sets for the gates at successively higher levels are represented by the product of the numbers associated with their input events.. As a result of processing the tree from bottom to top. (e) ALLCUTS Another code for finding minimal cut sets is ALLCUTS [39] developed by the Atlantic Richfield Company. The MOCUS algorithm may be used to find the minimal cut or path sets for up to 20 gates in a given tree. MICSUP (Minimal Cut Set UPward) starts with the lowest level gate basic inputs and works upward to the top tree event. sort and print up to 1000 minimal cut sets in descending order of probability. Every integer greater than 1 can be expressed as a unique (except for order) product of prime factors. The socalled "Boolean indicated cut sets" (BICS) are generated by successive substitution into the gate equations beginning with the top event and working down the tree until all gates have been replaced by basic events. * TREEL is a preprocessor that checks the tree for errors and determines in advance the maximum number and size of the Boolean indicated cut and path sets. As with MOCUS. each basic event is assigned a unique prime number. section 4. similar to that of MOCUS. An auxiliary program BRANCH can be used to check the input and cross reference the gates and input events.
and special gates represented by any valid Boolean expression defined by the user. octal. is a general program for the manipulation of Boolean equations which can be applied to fault trees and used to find minimal cut or path sets. FTAP uses two basic techniques to reduce the number of nonminimal cut sets produced and thereby increase the code's efficiency. FTAP is unique in offering the user a choice of three processing methods: topdown. developed by Sandia Laboratories. ELRAFT. The Nelson method is a prime implicant algorithm which is applied to trees containing complement events and uses a combination of topdown and bottomup techniques. Other useful features are free field input. MOCUS. bottomup and the "Nelson" method. The advantages of the SETS code are its generality and flexibility. exclusive or gates. The topdown and bottomup approaches are basically akin to the methods used in MOCUS and MICSUP. SETS allows tree reduction based on both cut set order and cut set probability. FTAP is the only fault tree code. SETS can handle complemented events. which can compute the prime implicants. the option of saving the cut sets or factored equation for any event on a file for future use. For example. The factored equation is a compact form of the minimal cut set equation from which cut sets of any order can be generated for enumeration purposes. the current version of the code uses 110 K. one example of which is the ability to dynamically manipulate the tree via SETS user programs.FAULT TREE EVALUATION COMPUTER CODES XII·! to 175 basic events and 425 gate events. respectively. bitlevel storage scheme and use of auxiliary storage are other SETS features aimed at efficient processing of large trees. and MICSUP. SETS is written in FORTRAN for the CDC 6600. (f) SETS The Set Equation Transfonnation System [46]. This capability gives the user a great deal of control over the processing. Unlike PREP. The first technique. It will also rank and print the minimal cut sets in order of descending probability (assuming basic event probabilities have been input). other than SETS. ALLCUTS. It can be used to find the "prime implicants" (a more general term than minimal cut sets which includes the possibility of having both an event and its complement in a Boolean equation) of any intermediate gate. A recently added feature enables SETS to automatically identify the independent subtrees and select stages for efficient processing of large trees. a feature which can be especially helpful when analyzing large trees. (g) FTAP The Fault Tree Analysis Program [43] is a cut set generation code developed at the University of California Berkeley Operation Research Center. ALLCUTS is written in FORTRAN IV and COMPASS (assembly language) for the CDC 6600 computer. A packed. the ability to easily input replicated subtrees. is modular decomposition. The . a SETS user program may be written to decompose the original tree and process it in stages without requiring any changes to the original fault tree input description. used in the bottomup and Nelson methods. This approach is quite similar to that used in PL·MOD (see section 4) and somewhat similar to the SETS algorithm for identifying and processing independent subtrees (see section 2(f)).
usually many times less. KITT also computes quantitative importances. or may be nonrepairable. respectively. direct input of symmetric (koutofn) gates. or variation in the component failure characteristics. PREP and MOCUS generate the cut sets in a form which is directly usable as input to the KlTT codes. error bounding. The input to these codes consists of two parts: (1) the equation for the top event unavailability or unreliability (usually from the minimal cut sets but can be obtained from other nonfault tree models such as block diagrams or schematics) (2) failure rate. SAMPLE and MOCARS. described in sections 2(a) and 2(c). The codes can thus be used in conjunction with any qualitative analysis code which generates the minimal cut sets in terms of components (basic events) such as PREP. (a) The KITT codes KITTI and KlTT2 [40] [42] perform time dependent fault tree quantification based on the minimal cut or path set description of the tree. MOCUS. . the ability to find either path sets or cut sets. is called the "dual algorithm" in the FTAP documentation [43] . The algorithm involves transformation of a product of sums into a sum of products whose dual is then taken using a special method. an exponential repair distribution. KlIT2 allows each component to have its own unique time phases whereby its failure and repair data may vary from phase to phase. Computer Codes for Quantitative Analysis of Fault Trees This section deals with codes which perform quantitative evaluations of fault trees. Given the above inputs. and repair data for the components appearing in the equation. error. described in section 2(b). The author claims that the nonminimal sets appearing during construction of the dual "will always be less than the number of such sets in [the original product of sums] . The KlTT and FRANTIC codes. FTAP is written in FORTRAN and assembly language. In addition. Other required inputs are the component failure rates and repair characteristics. Quantitative importances: quantitative rankings of contributions to system failure. several types of quantitative results may be computed including: Numerical probabilities: probabilities of system and component failures. test. SETS.XII6 FAULT TREE HANDBOOK second technique. used in the topdown and Nelson methods. and considerable flexibility and user control over processing and output." Other features of FTAP are the ability to reduce the tree based on cut set order or probability. Each component may have a constant repair time. and is available in versions for the CDC 6600/7600 and IBM 360370 model computers. compute a distribution and error bounds for system failure probability based on uncertainty. Components are assumed to have exponential failure distributions. 3. Sensitivity evaluations: effects of changes in models and data. compute timeaveraged and timedependent point estimates for the system failure probability. etc.
Typically. The occurrence rate per hour. binomial. The sample values for the failure rates are then combined by means of a system function given in a usersupplied FORTRAN subroutine to determine the sample system results. gamma. lognormal. In addition to the above. and empirical distributions. or exponential function. MOCARS [25] is similar to principle and operation to SAMPLE. beta. See Chapter XI for a discussion of importance measurements. MOCARS. The output also includes the estimated mean and standard deviation of the distribution and a tabular histogram of the system density function. the effect on the system unavailability of uncertainties or variations in the component failure rates can be assessed. (b) SAMPLE. the usersupplied system unavailability function might be the minimal cut set equation obtained from one of the qualitative analysis codes. Sample is written in FORTRAN IV. and each minimal cut or path set at arbitrary time points specified by the user: The probability of the failure existing at time t (the unavailability). each component. SAMPLE allows normal." the different system values can be tabulated and the resulting empirical distribution can be characterized. The failure rate per hour. lognormal. et al. . Several codes have been written to compute the probability distribution of a calculated system result (such as unavailability) when probability distributions are assigned to the component failure rates to account for data variability. SAMPLE [38] was the Monte Carlo code used in WASH1400. but they are all quite similar. The probability of the failure not occurring to time t (the reliability). Weibull. Other expanded versions of SAMPLE [4] have been written. normal.FAULT TREE EVALUATION COMPUTER CODES XII' The KITT codes compute the following five probability characteristics for the system failure (top event). or logUniform distributions to be specified for the component failure rates. lognormal. MOCARS was written in FORTRAN to run on the INEL CDC 761973 operating system. and so we will not discuss them here. but allows a larger variety of sampling distributions including exponential. After a number of these "trials. By this method. Poisson. the KITT codes rank the events in single and double component cut sets by qualitative and quantitative importance. It allows the system unavailability function to be specified either as FORTRAN statements or in terms of cut sets. The expected number of failures occurring to time t. The output distribution is presented in terms of estimated empirical probability percentiles from which the estimated median and upper and lower bounds can easily be read. Other options include microfJlm plotting using the Integrated Graphics System (IGS) and the ability to perform a KolmogorovSmirnov goodnessoffit test on the output distribution to see if it resembles a normal. These codes use Monte Carlo simulation in which the component failure rates are sampled from input probability distributions.
Disadvantages are mainly connected with the inability to produce cut sets (many of the direct evaluation codes have added the option to compute cut sets. the direct evaluation codes quantify the system model in a single step. GO also allows switches and time delays and models all system states instead of a single fault event. the failure rate. and test and repair characteristics must be provided. described in sections 3(c) and 3(0 respectively. A Monte Carlo version of the FRANTIC code [16] is available in which sampling distributions may be input for the component failure rates. Calcomp plots of the time dependent system unavailability function may be produced. In addition to periodically tested components. and different test staggerings. testeaused failures. Thus they do not produce cut sets as an integral part of the analysis and they require probabilistic input for each component from the outset of the processing. and print and plot options. Exponential failure distributions are assumed. a change in probabilities often requires a complete rerun. Also. and the necessity of inputting probabilities for all of the components even though many may be insignificant contributors to system failure. Other input includes the time period for the calculations. developed by North American Aviation for the U. Both GO and WAMBAM reduce storage requirements by eliminating low probability paths at an intermediate stage of the processing and at the same time keep track of the total of the discarded path probabilities. ARMM models a reliability block diagram using a success path approach. The program is capable of handling Weibull . repair times. The output from these codes is generally in the form of point estimates for the system unavailability or failure probability. The component failure probabilities are determined using failure density functions supplied for each component. As in the SAMPLE code. test bypass capabilities. (a) ARMM The ARMM (Automatic Reliability Mathematical Model) code [26]. but not as an integral part of the analysis).S. incorporating in detail the effects of different periodic testing schemes. the system model function is input in the form of FORTRAN subroutine. nonrepairable and monitored components as well as human error and common cause contributions can also be modeled. test efficiency. The program can be used to assess the effects on system unavailability of test downtimes. Inspection and Checking) code [41] computes the average and time dependent unavailability of any general system model such as a fault tree or event tree. For each component. 4. FRANTIC is written in FORTRAN IV for the IBM 360370 series computers. The GO and WAM·BAM codes. Air Force and modified by Holmes and Narver for application to nuclear power plant systems. offer the advantage of allowing complement events and some modeling of dependencies. was the first direct evaluation code.XII8 FAULT TREE HANDBOOK (c) FRANTIC The FRANTIC (Formal Reliability Analysis including Normal Testing. Direct Evaluation Codes As their name implies.
(c) GO The GO methodology [14]. SAFTE1. and then resume operation with a new random time to failure and time to repair. It is written in FORTRAN IV for the IBM 360 computer. probabilities are given for states other than simple success or failure (e. The numerical evaluations are performed in a onestep process as the signals (event probabilities) are traced through the model using a Markov chain (event tree) approach to propagate the values.FAULT TREE EVALUATION COMPUTER CODES XII9 (timedependent failure rate) density functions. The effects of component repair cannot be modeled. This process is analogous to supplying failure probabilities for components in a fault tree. or the probabilities of response over a series of time points are supplied for some operators). Some of the 16 GO operators are similar to fault tree gates. and SAFTE3. (b) SAFTE The SAFTE (Systems Analysis by Fault Tree Evaluation) codes [13]. time delays and switches can be modeled as well as complementary event logic and mutually exclusive states. and various degrees of failure such as spurious or premature operation. and mutually exclusive failure modes. such as for sensitivity studies. However. The output events can include system success. the user also specifies the probabilities associated with the possible operational modes of each component. This means that a change in component probabilities. the probability of premature operation. The input model used is called a GO chart and it resembles a schematic or flowchart made up of a set of standardized operators which describe the logical operation and interconnection of the system components. delayed or partial operation. the SAFTE2 code also includes the ability to sample from normal repair distributions to get a time to repair for each component. In both SAFTEI and SAFTE2. instead of computing the cut sets. The outputs from GO are the probabilities of occurrence of individual output events or the joint probability density of degrees of performance for several output events. GO also provides a simplified method for modeling repeated portions of the tree through the use of "supertypes. but in addition to logic functions. SAFTE2. dependent components." In addition to specifying the types of operators and their interconnections. and complete failure to operate. are Monte Carlo simulation programs using techniques similar to the FATE option of PREP to generate random times to failure of components in a fault tree.g. requires a complete reevaluation even . The SAFTE codes are written in FORTRAN IV for the IBM 360 computer.. the SAFTE codes directly generate a distribution of times to failure for the system. In this version. developed in the mid1960's by Kaman Sciences Corporation. the random times to failure are generated from exponential failure distributions. SAFTE3 computes the probability of system failure based on steady state repair. a failed component may be repaired (made as good as new) before system failure. however. using either direct or importance sampling techniques. in the GO approach. The SAFTEI code works as described above. differs from the fault tree approach in that the normal operating sequence is modeled and all possible system states are considered.
PATREC's greatest limitation is in its handling of replicated events. PATREC replaces a single fault tree having r different replicated events by 2 r fault trees with no replicated events. However. If the analyst wishes to use the fault tree model instead of the GO chart. normal. it may be argued that because GO charts are similar to the familar system schematics. Cut sets can be generated.XII1O FAULT TREE HANDBOOK though the system structure remains unchanged. (e) PATREC PATREC [19] was the first computer code to apply list processing techniques to fault tree evaluation. Similarly. A set of subtree patterns along with their corresponding probability equations are stored in the computer code's library. and the necessity of inchiding all system components. instead of analyZing the system at a series of discrete time points. GO is written in FORTRAN for the CDC 7600. GO has options for pruning the tree of branches with probabilities lower than a selected value. using a pattern recognition algorithm implemented in the PL/l programming language. using an algorithm similar to that of MOCUS. Weibull and forms including repair times. In this case only the subset of the GO operators analogous to the fault tree gates would be used and the output would be a point estimate of the top event failure probability. PATREC cannot efficiently evaluate fault trees with more than about 20 replicated events. developed by the United Kingdom Atomic Energy Authority in 1971. Even with approximations which allow some of the 2 r fault trees to be discarded. Rather than generating cut sets. NOTED produces a graph of the cumulative failure probability as a continuous function of time at any of several points in the system. while keeping track of the total probability of the discarded paths. The whole tree is thus eventually reduced to a single leaf which corresponds to the probability of failure of the total system. The fault tree is then searched for occurrences of the library patterns. However. (d) NOTED NOTED [45]. and of deleting signals which are no longer needed. PATREC evaluates the tree directly. Because of the diversity and detail of the GO operators. the modeling process for GO is somewhat more complex than that for fault trees. Therefore. . it is still possible to evaluate the tree using GO. GO also includes a "Fault Finder" option to compute the 4th order cut sets for a selected output event. the modeling process is easily learned by designers and engineers. direct input of koutofn gates is also supported. Each recognized pattern is replaced by a supercomponent with probability of occurrence equal to that stored in the library. The pattern recognition scheme yields the correct probabilities only when no events are replicated. but they are not used in the evaluation of the fault tree. PATREC can evaluate trees containing both an event and its complement. if desired. the behavior of the input components is described by continuous failure distributions including exponential lognormal. is similar in concept to GO. Because the probability tree can easily become very large.
the P terms are equivalent to paths in the GO event tree. each line of which represents a product term (P term) event disjoint from all the other P terms. SPASM. WAMTAP can be used to modity the input for BAM sensitivity studies.g. uses a combination of concepts from the GO methodology and fault tree analysis. This distinction will come up again in our discussion ofPLMOD (see section 4 of this chapter). Eight possible logical combinations of 2 events and their complements are included as allowable gates. PLMOD: A Dual Purpose Code PLMOD [28] is described separately in this section because. The output from BAM is a point probability of the top event. and the mean and variance of the probability of any gate. The evaluation code. WAMBAM is written in FORTRAN for the CDC 6600. Note that storage of the patterns for subsequent reevaluation means that PATRECMC is not really a direct evaluation code because the stored patterns are actually the result of an intermediate qualitative analysis independent of the component probabilities. BAM. At the user's option. the probability of the top event is computed by forming a truth table. As mentioned earlier. the user can include a constant "failureondemand" (e. like PREP. The operation of the code is similar to that of SAMPLE (see section 2(b) of this chapter) except for the representation of the system function. failure to start) probability which is independent of time. (f) WAMBAM The WAMBAM codes [9] [22] [34] were developed at Science Applications Inc. the input to BAM can be saved and subsequently modified by WAMTAP. In PATRECMC. It can also generate the input to a Monte Carlo code. Weibull. In terms of the GO methology. The WAMBAM package actually consists of four codes: WAM. For the exponential failure case. It generates the numeric input for BAM from the input description of the fault tree and the event probabilities. WAMCUT can be used to compute minimal cut sets. PATRECMC [20] is a Monte Carlo version of PATREC which can be used to assess the effects of uncertainty in the component reliability parameters. normal. WAM and WAMTAP are input preprocessors for the evaluation code BAM (Boolean Arithmetic Model). The WAM preprocessor. BAM. In addition. The GO computational scheme is used but the operations are modeled as gates on a fault tree.. which computes a distribution on gate probability. WAMTAP allows the probability of single components or groups of components to be changed in order to run sensitivity studies or to include common cause contributions. a calculation is first made to identify the patterns in the tree using the list processing methods described earlier. 5. and WAMCUT. is designed to ease the input preparation process. WAMTAP.FAULT TREE EVALUATION COMPUTER CODES XIIII PATREC is capable of performing time dependent system unavailability analysis where each basic event may have a failure distribution which is exponential. for EPRI beginning in 1975. In BAM. or log normal. The patterns are then stored in memory so that they can be repeatedly evaluated during the Monte Carlo trials. it depends neither on . though it can perform both a qualitative and quantitative fault tree analysis. the component can optionally be assumed repairable with an exponential repair distribution.
Defmed in terms of a reliability network diagram. The modularization process used by PLMOD is somewhat complex. free field input. modules or subsystems) are identified. PLMOD also has a Monte Carlo option for computing uncertainties and timedependent unavailability evaluation option which can handle nonrepairable. Briefly. as mentioned earlier.. The quantitative capabilities of PLMOD include the computation of the occurrence probability and importance (see Chapter XI) for the top event and all other modules. and not the state of the components which comprise it. modularization implies that all the independent subtrees (i. more basic common cause. Features of PL·MOD are the ability to handle complemented events. 6. all of which are susceptible to a single common cause failure mechanism. and will therefore not be discussed here. Like the PATRECMC code.. The modularization process used by PLMOD is. all branches below the gate) appear elsewhere in the fault tree. to put this is a slightly different way. and the minimal cut sets are defined recursively in terms of these modules.. a module is a group of components which behaves as a supercomponent (i. direct input of symmetric (koutofn) gates.e. minimal cut sets) which have the potential of being triggered by a single. unique in that it is applied not to the cut sets.e. but which can be used repeatedly for quantification. The output from PLMOD includes the standard and modular minimal cut sets for the top event and specified intermediate gates of the tree. repairable (revealed fault). a modularized tree is one which is equivalent to the original tree but in some sense "maximizes" the decomposition of the tree into independent subtrees. The PLMOD computer code works by '.'modularizing" the fault tree directly from a description of its component and gate diagram. the minimal cut sets which need to be identified are those with two or more events. to determine the state of the system). but directly to a description of the fault tree diagram. (A complete description appears in reference [28]). Or. In terms of a fault tree diagram. using the list processing features of the PL/l programming language. an intermediate gate is a module to the top event tree if none of the basic events contained in the gate domain (i. and dynamic storage allocation.e. Common cause failure analysis attempts to identify the modes of system failure (Le. it performs a qualitative analysis which does not rely on standard cut set generation techniques. and periodically tested components. Some disadvantages of PLMOD are its machine dependence (PL/l is not available in many computer systems) and the lack of familiarity with PL/l among scientific users.XIIl 2 FAULT TREE HANDBOOK the standard cut set generation nor the direct evaluation techniques.. The concept and advantages of modularization have been known for some time [2] and an algorithm for finding the finest modular decomposition of a fault tree given its cut sets was described by Chatterjee [6] in 1975. . Common Cause Failure Analysis Codes Common cause failure analysis is becoming increasingly important in system reliability and safety studies because it has been recognized that common cause failures can often dominate the random hardware failures. it is completely sufficient to know the state of the supercomponent.
vibration. and (2) the common cause susceptibility data for each basic event. all components implied by the basic events in the minimal cut set must share a common physical location with respect to the common cause susceptibility. The more input provided. The required and optional inputs are almost the same except that BACKFIRE permits more than one location to be specifIed for a component.A minimal cut set may be identifIed as a common cause candidate by either of two criteria. Optional inputs are the component manufacturers. location domain definitions for generic causes. This is useful for piping and wiring which may pass through domain barriers. pressure. The frrst criterion requires that all the events in the cut set be potentially affected by the same cause or condition. The output options include the ability to print only the common cause candidates with ranks ~ N.a manner similar to that of COMCAN . published in May 1977. grit. Some typical common causes include: impact. the location of each component implied by the basic events. COMCAN is written in FORTRAN N for the IBM 360 computer. A variable transformation incorporates the common cause susceptibilities into the Boolean equation for the top or any intermediate gate of the fault tree. is an offshoot of COMCAN. The output from the program is a list of the minimal cut sets which are common cause candidates. and temperature. described in section 1(t). BACKFIRE is written in FORTRAN IV for the IBM 360 computer. and to include similar type components as one of criteria for common cause candidates. .ptibility data constitute the required inputs. can also be used for common cause analysis [47]. was the fIrst program to perform common cause failure analysis. and ran)(s of component susceptibility to each common cause. developed at INEL. The analysis is conducted ~. The minimal cut sets and common cause susct. (b) BACKFIRE The BACKFIRE code [5]. and a few simple manipulations allow the user to display the cut sets which are the common cause candidates. Like COMCAN. The input to the program consists of two parts: (1) the minimal cut sets of the fault tree to be analyzed. stress. by inputting generic cause susceptibilities for each basic event. The second criterion requires that all the events in the cut set share susceptibility to a common cause or condition and in addition.FAULT TREE EVALUATION COMPUTER CODES (a) COMCAN XII13 COMCAN f31. . the more refined will be the search for the common cause candidates. (c) SETS The SETS code.
11. Erdmann. ed." General Atomic Co. Tennessee. Inc. C. Colorado. and Design. 3.C. Knoxville. 10.J. 12. 245321. KN67704(R). "Modularization of Fault Trees: A Method to Reduce the Cost of Analysis.R.R. B. 1977. Sampling Techniques. 262. 1975. p. Marshall and J. W. A Computer Code for Fault Tree Evaluation." Trans. "On the Importance of Different Components in a Multicomponent System in Multivariate AnalysisII. April 1968. 4. May 1976. 2." NucL Engr. 1971. and H. Army Ordnance Corps. Vol. SIAM. "BACKFIREA Computer Code for Common Cause Failure Analysis. Garrick. Krishnaiah. pp. Inc. GAAI4055.C. pp. F.B.L.. ANS. "WAMCUT. Cairns and K. Wilson. Academic Press. 7. 1969. Practical Nonparametric Statistics. ANCR1314. New York. John Wiley and Sons. Fussell and W.J. 6. P. EPRI NP·803.E. W. June 1978.2. Colorado Springs. G. Fussell." Kaman Sciences Corporation.J. Chatterjee. D. A Computer Program for the Reliability Analysis of Complex Systems." Science Applications. J.W. 14. 9. 15. 8.R. Kirch.B. 1952. Vol." Reliability and Fault Tree Analysis. No.N. J.B. Fussell and G. New York. New York. John Wiley and Sons. N..Y. Inc. BIBl . August 1970." University of Tennessee.BIBLIOGRAPHY 1. Z. "GO. 1968. "Principles of Unified Systems Safety Analysis. 1957. W. March 1977. Cate and J. Leverenz. SIAM.1972. Proceedings of the International Conference on Nuclear Systems Reliability and Risk Assessment. May 1977. Stoddard and R. I. 5. W. Birnbaum. Tables of the Cumulative Binomial Probabilities. Inc. Cochran." P. Conover.. Burdick.L. Tennessee. John Wiley and Sons.. Williams.. Editor. R. 101126. An Introduction to Probability Theory and Its Applications. "A New Methodology for Obtaining Cut Sets for Fault Trees. New York.R. "STADIC: A Computer Code for Combining Probability Distributions. 13. 13. 1975. "COMCANA Computer Program for Common Cause Analysis. Feller. Vol..H. Gatlinburg. Fleming. ORDP 2011. Gateley.L." Aerojet Nuclear Company. J. Vesely. Burdick.
Reliability: Management. lioyd. Tables of the Binomial Probability Distribution. 17. Lambert. J. 1949. 1966. 1972. Belyayev.' B.D. 20. N.V. B.E. S. Y. No." North American Aviation. Guide for General Principles of Reliability Analysis of Nuclear Power Generating Protection System. Singpurwalla." TREE1138." Lawrence Livermore Laboratories.BIB2 FAULT TREE HANDBOOK 15.J. 22. Green and A..E. Camino. Massachusetts Institute of Technology. "A Modular Approach to Fault Tree and Reliability Analysis. Solovyev. Schafer and N. "Reliability Calculations with a List Processing Technique. New York 1969. Englewood Cliffs. National Bureau of Standards. Camino. 6. 26.D. F. Inc. April 1974. A. August 1977. Vol. DCRL51829.. Olmos and L. Mathematical Methods for Statistical Analysis of Reliability and Life Data. 1962. . Inc." Department of Nuclear Engineering. Translation edited by R. M. A. Vesely. Kirch. IEEE Std. Inc.. Matthews. Mathematical Methods of Reliability Theory. F.L. 18. et al. "Fault Trees for Decision Making in Systems Analysis." presented at the National Conference on Reliability. "User's Guide for the WAMBAM Computer Code. 16. 19.E. "Automatic Reliability Mathematical Model. 28. Goldberg and W. C.1974.V.E. Leverenz and H." Science Applications. September 1977. Inc. 25. B23. R. Methods.D. Series.F. 3521975.W. Gnedenko. 21. Lipow. Downey. "The State of the Art of PATREC: A Computer Code for the Evaluation of Reliability and Availability of Complex Systems. McKnight. July 1977. Nottingham. Nottingham. "Time Dependent Unavailability Analysis of Nuclear Safety Systems. 27. California. Math. Koen.K.. PrenticeHall. B. January 1976. Vol. New Jersey. EPRI 21725. and A. App. 24. England..V. and Mathematics. H. NA 66838.E. et al. John Wiley and Sons." presented at the National Reliability Conference. WileyInterscience. Mann. September 1977. London. England. IEEE. Koen and A. 23..K. Bourne. 1975. Academic Press. "MOCARS: A Monte Carlo Simulation Code for Determining the Distribution and Simulation Limits. Reliability Technology.R. I. MITNE209. New York. D." IEEE Transactions on Reliability. Wolf. Barlow. 1975.
October 1969.. N. Goldberg. ed.R. Linear Statistical Inference and Its Applications.. Probabilistic Reliability: An Engineering Approach. 1970. 42. Griffing. 1964. University of California. Idaho.E. and R.A Computer Code for Time· Dependent Unavailability Analysis. 1953.E. McGrawHill. 133151." Operational Research Center. Commercial Nuclear Power Plants. . Palo Alto.G. A Computer Program for the Efficient Logic Reduction Analysis of Fault Trees. ORC 753.L. c. Tsokos and LN. 31. S. ARH·ST·112. New York. Nuclear Regulatory Commission. U.J. 40. California.T.. C. 1975.. Idaho Falls. February 1971. Leverenz.E. Interim Report. Richland. "ELRAFT. No. P. M. W. 32. Romig. W. Rosenthal. A Fast. 35.S. New York. Rao." IEEE Trans. Narum. 38.BIBLIOGRAPHY BIB3 29. June 1975. et a1.C. New York. Erdmann." Electric Power Research Institute. Jr. IN1349. 33." Idaho Nuclear Corporation. John Wiley and Sons." Atlantic Richfield Hanford Company. 41. "Analysis of Fault Trees by Kinetic Tree Theory.H. Wash· ington. Vol. NS18. "Reactor Safety StudyAn Assessment of Accident Risks in U. Van Slyke and D.P. IN1330. Binomial Tables. 37. "FRANTIC . 36.E.. Vesely and F. October 1977. Roberts. Idaho Falls." U. F. W. "Computerized Fault Tree Analysis: TREEL and MICSUP. 1. New York." WASH1400 (NUREG75/014). "A Computer Scientist Looks at Reliability Computations." SIAM. Berkeley." Idaho Nuclear Corporation. 34. New York.S. 1977.S. Pande.N. Inc. "ALLCUTS. Apri11975. Semanderes. E.L. "PREP and KITT Computer Codes for the Automatic Evaluation of a Fault Tree.K. NUREG0193. 1968. Mathematical Methods in Reliability Engineering. Nuclear Regulatory Commission. Idaho. A. 481487. John Wiley and Sons. Sci. H. Academic Press. July 1975. Vesely and R. on Nucl. 39. W. October 1975. 30. pp. Shimi. McGrawHill. Vesely. EPRI 21722. The Theory and Applications of Reliability with Emphasis on Bayesian and Nonparametic Methods.F. pp.E. "Generalized Fault Tree Analysis for Reactor ~afety . Inc. 1977. Shooman. Comprehensive Fault Tree Analysis Code. Rumble.
Berkeley.. '((U. 1978. Worrell and D.:Z. E." UKAEA Authority. "A SETS Users' Manual for the Fault Tree Analyst. New Mexico. NUREG/CR0465. Stack. Statistics for Scientists and Engineers. 45. University of California. "Computer Aided Fault Tree Analysis. Englewood Cliffs.B. R.S. RL. Worrell. Warrington." Sandia Laboratories. SAND 772051. PrenticeHall. RR Willie. 44. "Common Cause Analysis Using SETS. August 1978. "The Calculation of Reliability of Systems: The Program NOTED. Wine. 1971.. GOVERNMENT PRINTING OFFICE: 1992318277/60129 . AHSB(S) R 153. New Mexico. Alburquerque. R." Operations Research Center. 46. SAND 771832: 191. New Jersey. Health and Safety Branch. Lancashire. 47.R Woodcock.W.BIB4 FAULT TREE HANDBOOK 43. Risley.B. England.. 1964." Sandia Laboratories. Alburquerque.
DATE REPORT COMPLETED MONTH David F. 13_ TYPE OF REPORT I PE RIOD COVE RED (Inclusive daresl Handbook 15. and computer codes for fault tree evaluation. This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations.NRC FORM 336 1777) u. PROJECT/TASK/WORK UNIT NO. Systems Reliability Analysis. ABSTRACT (200 wonii IN _nl 14. fault tree construction. 17. " 5. DC 20555 12. Goldberg March DATE REPORT ISSUED MONT" I YEAR 1980 I YEAR 1981 9. DESCRIPTORS Fault Trees. the handbook focuses on a description of the deductive method known as fault tree analysis. Roberts. statistics. Al$o discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation. (Lewe blankl 10. Haasl. SPONSORING ORGANIZATION NAME AND MAILING ADDRESS ((ncludt! Zip Codt!J January 6.i_d bY DOCI NUREG0492 2. and Boolean algebra for the fault tree analyst. PRICE S Unrestricted NRC FORM 335 (7771 20. Francine F. NO. RECIPIENT'S ACCESSION NO.AUTHOR~1 3. SUPPLEMENTARY NOTES 16. 11. TITLE AND SUBTITLE (Add V. OF PAGES 22.S. CONTRACT NO. AVAILABILITY STATEMENT ·lIft~' 19.s._ No.. NUCLEAR REGULATORY ~"ON BI8LIOGRAPHIC DATA SHEET 1. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis. After an initial overview of the available system analysis approaches.s reportl • 21. (Leave blankl . KEY WORDS AND DOCUMENT ANALYSIS 17a.. PERFORMING ORGANIZATION NAME AND MAILING ADDRESS (Includt! Zip Codt!l U. qualitative and quantitative fault tree evaluation techniques. . Nuclear Regulatory Commission Office of Nuclear Regulatory Research Division of Systems & Reliability Research WashinQton. (Le""" blank} 4. Systems Analysis. IDENTIFIERS/OPENENDED TERMS lB. probability. } Fault Tree Handbook 7. REPORT NUMBER (A. SECURITY CLASS (This pagel . Minimal Cut Sets 17b. Norman H. basic elements of a fault tree. William E.*. SECURITY CLASS (7h.. Vesely. if _ " .