Professional Documents
Culture Documents
Fault Tree Analysis: P.L. Clemens
Fault Tree Analysis: P.L. Clemens
P.L. Clemens
February 2002
4th Edition
Topics Covered
2
8671
All fault trees appearing in this training module have been drawn, analyzed,
and printed using FaultrEaseTM, a computer application available from: Arthur
D. Little, Inc./Acorn Park/ Cambridge, MA., 02140-2390 Phone (617) 8645770.
3
8671
Origins
4
8671
5
8671
7
8671
Some Definitions
FAULT
An abnormal undesirable state of a system or a system
element* induced 1) by presence of an improper command
or absence of a proper one, or 2) by a failure (see below). All
failures cause faults; not all faults are caused by failures. A
system which has been shut down by safety features has not
faulted.
FAILURE
Loss, by a system or system element*, of functional integrity
to perform as intended, e.g., relay contacts corrode and will
not pass rated current closed, or the relay coil has burned
out and will not close the contacts when commanded the
relay has failed; a pressure vessel bursts the vessel fails.
A protective device which functions as intended has not
failed, e.g, a blown fuse.
8
8671
Definitions
9
8671
10
8671
Non-repairable system.
No sabotage.
Markov
Fault rates are constant = 1/MTBF = K
The future is independent of the past i.e., future
states available to the system depend only upon
its present state and pathways now available to it,
not upon how it got where it is.
Bernoulli
Each system element analyzed has two, mutually
exclusive states.
OR
AND
Events and Gates are not component parts of the system being analyzed. They are
symbols representing the logic of the analysis. They are bi-modal. They function flawlessly.
11
8671
1
3
Repeat/continue
NO
YES
Dont let gates feed
gates.
13
8671
14
8671
Flat Tire
?
Air
Escapes
From
Casing
15
8671
Tire
Pressure
Drops
Tire
Deflates
MAYBE
A gust of wind will come
along and correct the
skid.
A sudden cloudburst will
extinguish the ignition
source.
Therell be a power
outage when the workers
hand contacts the highvoltage conductor.
No miracles!
16
8671
Wheels-up landing
Mid-air collision
Sting failure
Subway derailment
Uncommanded ignition
Irretrievable loss of
primary test data
Inability to dewater
buoyancy tanks
Improved
Computer Outage
Exposed Conductor
Electrical
Low-temp.
Solar
EFFECT
Relay
Transducer
CAUSE
Proc.
case ruptures
Step 42 omitted
NOTE: As a group under an AND gate, and individually under an OR gate, contributing elements must
be both necessary and sufficient to serve as immediate cause for the output event.
19
8671
the logic
Spotting/correcting
some
common errors
Adding
20
8671
quantitative data
Sequence
Initiation
Failures
Oversleep
21
8671
Transport
Failures
Life
Support
Failures
Undesirable
Event
Process and
Misc.
System
Malfunctions
Causative
Modalities*
No Start
Pulse
Biorhythm
Fails
22
8671
Natural
Apathy
Artificial
Wakeup Fails
Verifying Logic
Oversleep
Does this
look
correct?
Should the
gate
be OR?
No Start
Pulse
Biorhythm
Fails
Natural
Apathy
Artificial
Wakeup Fails
?
23
8671
Wakeup
Succeeds
No Start
Pulse
BioRhythm
Fails
Failure
Domain
Natural
Apathy
Artificial
Wakeup Fails
Start
Pulse
Works
motivation
Success
Domain
BioRhythm
Fails
?
24
8671
Natural
High
Torque
Artificial
Wakeup Works
Alarm
Clocks
Fail
Nocturnal
Deafness
Main
Plug-in
Clock Fails
Power
Outage
25
8671
Faulty
Innards
Electrical
Fault
Mechanical
Fault
Hour
Hand
Falls
Off
Hour
Hand
Jams
Works
Backup
(Windup)
Clock Fails
Forget
to
Set
Faulty
Mechanism
Forget
to
Set
Forget
to
Wind
26
8671
Relating PF to R
PF Sources
S = Successes
F = Failures
S
Reliability R =(S+F)
Failure Probability PF = F
(S+F)
S + F 1
R + PF = (S+F)
(S+F)
= Fault Rate =
27
8671
1
MTBF
Random
Failure
BU
O RN
UT
(In B
fa UR
nt N
M IN
or
ta
lity
)
= 1 / MTBF
Significance of PF
T
0
0
0.63
PF = 1 T
0.5
= T
0
0
28
8671
1 MTBF
T = A B
T = A + B A B
R + PF 1
PF = 1 T
PF = PA + PB PA PB
for PA,B 0.2
PF PA + PB
with error 11%
[Union / ]
PF = PA PB
Rare Event
Approximation
For 3 Inputs
PF = PA + PB + PC
PA PB PA PC PB PC
+ PA PBPC
8671
PF = 1 T
PF = 1 ( A + B A B)
PF = 1 (A B)
29
AND Gate
For 2 Inputs
PF = PA PB PC
Omit for
approximation
[Intersection / ]
OR Gate
TOP
PT = Pe
PT = P1 P2
TOP
PT Pe
PT P1+ P2
[Intersection / ]
2
P1
[Union / ]
1
P2
2
P1
1&2
are
INDEPENDENT
events.
PT = P1 P2
PT = P1 + P2 P1 P2
Usually negligible
30
8671
P2
Success
TOP
2
P1
3
P2
TOP
PT = (1 Pe)
PT = ?
Failure
PT =
2
P1
P3
P1 = (1 P1)
Failure
Pe
3
P2
P3
P3 = (1 P3)
31
8671
32
8671
33
8671
External Event
An event normally
expected to occur.
Conditioning Event
Applies conditions or
restrictions to other
symbols.
34
8671
Manufacturers Data
MIL Standards
Simulation/testing
Delphi Estimates
0.0 2
0.03
0.04 0.05
0.07
0.0316+
PL
Lower
Probability
Bound 102
0.1
PU
Upper
Log PL + Log PU
Log Average = Antilog
= Antilog (2) + (1) = 101.5 = 0.0316228 Probability
2
2
Bound 101
Note that, for the example shown, the arithmetic average would be
0.01 + 0.1 = 0.055
2
i.e., 5.5 times the lower bound and 0.55 times the upper bound
* Reference: Briscoe, Glen J.; Risk Management Guide; System Safety Development Center; SSDC-11; DOE 76-45/11; September 1982.
35
8671
36
8671
Minimum
Average
Maximum
Semiconductor Diodes
0.10
1.0
10.0
Transistors
0.10
3.0
12.0
Microwave Diodes
3.0
10.0
22.0
MIL-R-11 Resistors
0.0035
0.0048
0.016
MIL-R-22097 Resistors
29.0
41.0
80.0
0.60
5.0
500.0
Connectors
0.01
0.10
10.0
Source: Willie Hammer, Handbook of System and Product Safety, Prentice Hall
37
8671
Error Rate
3 x 103
3 x 102
101
0.2-0.3
0.1-0.09 (0.5 avg.)
0.0001-0.005 (0.001 avg.)
39
8671
Experience
Stress
Training
Individual self discipline/conscientiousness
Fatigue
Perception of error consequences (to self/others)
Use of guides and checklists
Realization of failure on prior attempt
Character of Task Complexity/Repetitiveness
3.34 x 104
approx. 0.1 / yr
Alarm
Clocks
Fail
Nocturnal
Deafness
3.34 x 104
Negligible
Main
Plug-in
Clock Fails
1.82 x 102
Faulty
Innards
Power
Outage
3. x 104
1. x 102
3/1
Mechanical
Fault
Electrical
Fault
3. x 104
1/15
Hour
Hand
Falls
Off
40
8671
4. x 104
1/10
8. x 108
Hour
Hand
Jams
Works 2. x 104
1/20
Backup
(Windup)
Clock Fails
1.83 x 102
Forget
to
Set
8. x 103
2/1
Faulty
Mechanism
4. x 104
1/10
Forget
to
Set
8. x 103
2/1
Forget
to
Wind
1. x 102
3/1
41
8671
102- 103/exp MH
103/exp hr
104/exp hr
105/exp hr
106/exp MH
106/exp MH
107-108/exp MH
109/exp MH*
1010/exp hr
1014/exp hr
Apply Scoping
What power outages are of concern?
Power
Outage
1 X 102
3/1
Are undetected/uncompensated
Occur during the hours of sleep
Have sufficient duration to fault the system
This probability must reflect these conditions!
42
8671
Single-Point Failure
A failure of one independent element
of a system which causes an
immediate hazard to occur and/or
causes the whole system to fail.
Professional Safety March 1980
43
8671
Cost:
Assume two identical elements having P = 0.1.
PT = 0.01
Two elements having P = 0.1 may cost much
less than one element having P = 0.01.
Do
Faulty
Innards
Independent
Hand
Jams
Works
Hand
Falls Off
Hand
Falls/
Jams
Works
Elect.
Fault
Alarm
Failure
Gearing
Fails
Alarm
Failure
True Contributors
Alarm
Clock
Fails
45
8671
Toast
Burns
Backup
Clock
Fails
Alarm
Clock
Fails
Backup
Clock
Fails
Other
Mech.
Fault
Microwave
ElectroOptical
Seismic
Footfall
Acoustic
DETECTOR/ALARM FAILURES
47
8671
Detector/Alarm
Failure
Microwave
Electro-Optical
Seismic Footfall
Acoustic
Detector/Alarm
Power Failure
Here, power source failure has been recognized as an event which, if it occurs,
will disable all four alarm systems. Power failure has been accounted for as a
common cause event, leading to the TOP event through an OR gate. OTHER
COMMON CAUSES SHOULD ALSO BE SEARCHED FOR.
48
8671
49
8671
50
8671
Separation/Isolation/Insulation/Sealing/
Shielding of System Elements.
Separately powering/servicing/maintaining
redundant elements.
Missing Elements?
Contributing elements
must combine to
satisfy all conditions
essential to the TOP
event. The logic
criteria of necessity
and sufficiency must
be satisfied.
Unannunciated
Intrusion by
Burglar
Detector/Alarm
Failure
Detector/Alarm
System Failure
Microwave
Electro-Optical
Seismic Footfall
Acoustic
51
8671
SYSTEM
CHALLENGE
Intrusion By
Burglar
Detector/Alarm
Power Failure
Burglar
Present
Barriers
Fail
52
8671
Malpractice
0.019
Fail to Treat
Infection
(Disease)
0.001
False
Negative
Test
Infected
Astronaut
0.01
0.1
Treat
Needlessly
(Side Effects)
0.018
Healthy
Astronaut
0.9
False
Positive
Test
0.02
Cut Sets
AIDS TO
54
8671
System Diagnosis
Reducing Vulnerability
Cut Sets
55
8671
56
8671
57
8671
PROCEDURE:
Assign letters to gates. (TOP
gate is A.) Do not repeat
letters.
Assign numbers to basic
initiators. If a basic initiator
appears more than once,
represent it by the same
number at each appearance.
Construct a matrix, starting
with the TOP A gate.
TOP
A
B
1
8671
2
C
58
TOP event
gate is A, the
initial matrix
entry.
1 2
2 D 3
1 4
D (top row), is an
OR gate; 2 & 4, its
inputs, replace it
vertically. Each
requires a new
row.
59
8671
1 D
C D
B D
A is an AND
gate; B & D, its
inputs, replace it
horizontally.
1 2
2 2 3
1 4
2 4 3
B is an OR gate; 1
& C, its inputs,
replace it vertically.
Each requires a new
row.
D (second
row), is an OR
gate. Replace
as before.
1 D
2 D 3
reduce to these
minimal cut sets.
1 2
2 3
1 4
C is an AND
gate; 2 & 3, its
inputs, replace it
horizontally.
An Equivalent Fault
Tree can be constructed
from Minimal Cut Sets.
For example, these
Minimal Cut Sets
1
Boolean
Equivalent
Fault Tree
6
TOP
1 3
1 3 6
1 4 5
1 4
2 3 5
2 3 6
2 4
2 4
62
8671
TOP
A
6
F
5
G
1 D
F 6
B
C
1 2
F D
I E
1 2
3 5 G 6
1 E
1
3
1
1
3
2
5 G
3
4
5 1
1
1
1
3
2
3
4
4
4
4
TOP
1
64
8671
TOP
A
6
F
65
8671
5
4
1
6
TOP
The tree models a system fault, in failure
domain. Let that fault be System Fails to Function
as Intended. Its opposite, System Succeeds to
function as intended, can be represented by a
Reliability Block Diagram in which success flows
through system element functions from left to right.
Any path through the block diagram, not interrupted
by a fault of an element, results in system success.
3
2
B
1
F
66
8671
5
G
1
3
4
4
Note that
3/5/1/6 is a Cut
Set, but not a
Minimal Cut Set.
(It contains 1/3,
a true Minimal
Cut Set.)
67
8671
Evaluating PT
TOP
A
PT
6
F
5
G
68
8671
1
3
4
4
Pt P k =
P 1 x P2 +
P1 x P3 +
P1 x P4 +
P3 x P4 x P5 x P6
Note that propagating
probabilities through an
unpruned tree, i .e.,
using Boolean-Indicated
Cut Sets rather than
minimal Cut Sets, would
produce a falsely high PT.
1 2
3
1 3
1 4
3 5
TOP
A
1v
F
3m
2h
1v 4 m
3m 4m 5m 6m
5m
G
8671
4m
4m
1v
Analyzing Common
Cause Probability
TOP
PT
System
Fault
Analyze as
usual
70
8671
These
must be
OR
Common-Cause
Induced Fault
others
Moisture
Vibration
Human
Operator
Heat
TOP
A
6
F
5
G
TOP
A
PT
6
3
3
4
1
3
where Pk = Pe = P3 x P4 x P5 x P6
Item Importance
The quantitative Importance of an item (Ie) is the numerical
probability that, given that TOP has occurred, that item has
contributed to it.
Ne = Number of Minimal Cut Sets
containing Item e
Ne
Ie Ike
Ike = Importance of the Minimal
Cuts Sets containing Item e
73
8671
1
3
4
4
I1
Path Sets
Aids to
74
8671
Trade/Cost Studies
Path Sets
75
8671
TOP
A
1
F
5
G
76
8671
1
3
4
4
3
B
3
1
1
5
G
8671
77
6
D
2 3 4
Path Sets
5
4
1
6
78
8671
Path Sets
Pp
PPa
$a
PPb
$b
5
6
PPc
$c
1
1
PPd
$d
PPe
$e
Pp Pe
Evaluate Cut Set Importance. Rank items using Ik.} Ik= Pk/ PT
Identify items amenable to improvement.
N
For all new countermeasures, THINK COST EFFECTIVENESS FEASIBILITY (incl. schedule)
AND
Does the new countermeasure Introduce new HAZARDS? Cripple the system?
79
8671
80
8671
Sensitivity Testing
State-of-Component Method
10
23
12
17
16
22
11
24
8671
14
15
20
19
18
26
25
30
81
13
31
27
28
21
29
32
33
34
10 11
17
16
22
23
24
P22 = 3 x 103
1,000 peg
spaces
997 white
3 red
82
8671
21
25
30
31
32
33
34
10
11
12
P10 = ?
16
22
23
17
24
25
30
83
8671
32
33
34
10
23
24
84
8671
20
19
18
26
25
30
15
14
13
17
16
22
11 12
31
27
28
21
29
32
33
34
Severity
PT
Probability
Where do you stop the analysis? The analysis is a Risk Management enterprise.
The TOP statement gives severity. The tree analysis provides probability. ANALYZE
4
3
5
2
NO FURTHER DOWN THAN IS NECESSARY TO ENTER PROBABILITY
DATA
WITH CONFIDENCE. Is risk acceptable? If YES, stop. If NO, use the tree to guide
risk
reduction.
SOME EXCEPTIONS 8
6
9
7
1.) An event within the tree has alarmingly high probability. Dig deeper beneath it
to find the source(s) of the high probability.
10 11 12 must sometimes
13
14 analyze down
15 to the cotter-pin level to
2.) Mishap autopsies
produce a credible cause list.
16
17
18
20
19
85
8671
21
State-of-Component Method
WHEN Analysis has proceeded to
the device level i.e., valves,
pumps, switches, relays, etc.
Relay K-28
Contacts Fail
Closed
Basic
Failure/
Relay
K-28
Relay
K-28
Command
Fault
Install an OR gate.
Place these three events beneath
the OR.
This represents faults from
environmental and service
stresses for which the device is
not qualified e.g., component
struck by foreign object, wrong
component
selection/installation. (Omit, if
negligible.)
The Analysis
Findings
Preferred
FTA FMECA
*Adapted from Fault Tree Analysis Application Guide, Reliability Analysis Center, Rome Air Development Center.
88
8671
89
8671
90
8671
Closing Caveats
92
8671
Caveats
Do you REALLY have enough data to justify QUANTITATIVE ANALYSIS?
For 95% confidence
to give PF
and
1,000 tests
3 x 103
0.997
300 tests
102
0.99
100 tests
3 x 102
0.97
30 tests
101
0.9
10 tests
3 x 101
0.7
System Behavior
I Constant
System
Properties
I Constant
Service
Stresses
I Constant
Environmental
Stresses
94
8671