You are on page 1of 34

Using Root Cause Analysis

To Understand Failures & Accidents

7th Military and Aerospace Programmable Logic


Devices (MAPLD)
September 8, 2004

Faith Chandler
Office of Safety & Mission Assurance
2004 MAPLD/Mishaps Seminar 1 Chandler
Agenda

• What’s a NASA Mishap?


• Types of Mishaps
• Purpose of Mishap Investigation
• Investigating Causes of Mishaps
and Failures
• Definitions
• Steps in Root Cause Analysis
• Generating Recommendations
Stop and ask yourself…
• For More Information
Did you really find the causes
of the failure or accident?

2004 MAPLD/Mishaps Seminar 2 Chandler


What’s A Mishap?

NASA Mishap. An unplanned event that results in at least one of


the following:

• Injury to non-NASA personnel, caused by NASA operations.

• Damage to public or private property (including foreign


property), caused by NASA operations or NASA-funded
development or research projects.

• Occupational injury or occupational illness to NASA personnel.

• NASA mission failure.

• Destruction of, or damage to, NASA property.

2004 MAPLD/Mishaps Seminar 3 Chandler


Types of Mishaps

“What can go wrong?”


• Equipment will fail
• Software will contain errors
• Humans will make mistakes
• Humans will deviate from Challenger
accepted policy and practices Mars Climate
Orbiter

There is a lot at stake!


• Human life
• One-of-a-kind hardware Columbia
• Government equipment &
facilities NOAA N Prime
• Scientific knowledge
• Public confidence

Payload Canister
2004 MAPLD/Mishaps Seminar 4 Helios Chandler
Purpose of NASA Mishap Investigation

• The purpose of NASA mishap investigation process is solely to


determine cause and develop recommendations to prevent recurrence.

• This purpose is completely distinct from any proceedings the agency


may undertake to determine civil, criminal, or administrative culpability
or liability, including those that can be used to support the need for
disciplinary action.

2004 MAPLD/Mishaps Seminar 5 Chandler


Investigating Causes of Failures & Mishaps

Often investigators: Satellite


• Identify the part or individual System Failure
that failed.

Navigation
• Identify the type of failure. Equipment Failure

• Identify the immediate cause


of the failure. Circuit Shorted

• Stop the investigation. Bent Pin

Problem with this approach:


Incorrect
The underlying causes may
Installation
continue to produce similar
problems or mishaps in the
same or related areas.

2004 MAPLD/Mishaps Seminar 6 Chandler


Investigating Causes of Failures & Mishaps

Like icebergs, most of the problem is


usually below the surface!
2004 MAPLD/Mishaps Seminar 7 Chandler
Investigating Causes of Failures & Mishaps

When performing an investigation, it is necessary to look at more than


just the immediately visible cause, which is often the proximate cause.

There are underlying organizational causes that are more difficult to


see, however, they may contribute significantly to the undesired
outcome and, if not corrected, they will continue to create similar types
of problems. These are root causes.

Per NPR 8621.1: NASA Procedural Requirements for Mishap reporting,


Investigating, and Recordkeeping, all NASA mishap and close call
investigations must identify the proximate causes(s), root causes(s)
and contributing factor(s).

2004 MAPLD/Mishaps Seminar 8 Chandler


Definitions

Proximate Cause(s) (Direct Cause)


• The event(s) that occurred, including any condition(s) that existed
immediately before the undesired outcome, directly resulted in its
occurrence and, if eliminated or modified, would have prevented
the undesired outcome.

• Examples of proximate causes:


Equipment Human
• Arched • Pushed incorrect button
• Leaked • Fell
• Over-loaded • Dropped tool
• Over-heated • Connected wires

2004 MAPLD/Mishaps Seminar 9 Chandler


Definitions

Root Cause(s)
• One of multiple factors (events, conditions or organizational factors)
that contributed to or created the proximate cause and subsequent
undesired outcome and, if eliminated, or modified would have
prevented the undesired outcome. Typically multiple root causes
contribute to an undesired outcome.

Organizational factors
• Any operational or management structural entity that exerts control
over the system at any stage in its life cycle, including but not
limited to the system’s concept development, design, fabrication,
test, maintenance, operation, and disposal.

• Examples: resource management (budget, staff, training); policy


(content, implementation, verification); and management decisions.

2004 MAPLD/Mishaps Seminar 10 Chandler


Definitions

Root Cause Analysis (RCA)


• A structured evaluation method that identifies the root causes for
an undesired outcome and the actions adequate to prevent
recurrence. Root cause analysis should continue until
organizational factors have been identified, or until data are
exhausted.

• RCA is a method that helps professionals determine:


• What happened.
• How it happened.
• Why it happened.

• Allows learning from past problems, failures, and accidents.

2004 MAPLD/Mishaps Seminar 11 Chandler


Root Cause Analysis - Steps
1. Identify and clearly define the undesired outcome.
2. Gather data.
3. Create a timeline.
4. Place events & conditions on an event and causal factor tree.
5. Use a fault tree or other method/tool to identify all potential causes.
6. Decompose system failures down to a basic events or conditions
(Further describe what happened)
7. Identify specific failure modes (Immediate Causes)
8. Continue asking “WHY” to identify root causes.
9. Check your logic and your facts. Eliminate items that are not causes or
contributing factors.
10. Generate solutions that address both proximate causes and root causes.
2004 MAPLD/Mishaps Seminar 12 Chandler
Root Cause Analysis - Steps

Clearly define the undesirable outcome.


• Describe the undesired outcome.
• For example: “satellite failed to deploy,” “employee broke his
arm,” or “XYZ project schedule significantly slipped.”

Gather data.
Identify facts surrounding the undesired outcome.
• When did the undesired outcome occur?
• Where did it occur?
• What conditions were present prior to its occurrence?
• What controls or barriers could have prevented its occurrence
but did not?
• What are all the potential causes?
• What actions can prevent recurrence?
• What amelioration occurred? Did it prevent further damage or
injury?

2004 MAPLD/Mishaps Seminar 13 Chandler


Root Cause Analysis - Steps
Create a timeline (sequence diagram)
• Illustrate the sequence of events in chronological order
horizontally across the page.

• Depict relationships between conditions, events, and exceeded or


failed barriers/controls.

Condition

Exceeded- Undesired
Event Event Failed Barrier Event Outcome
Or Control

2004 MAPLD/Mishaps Seminar 14 Chandler


Root Cause Analysis - Steps
Create a timeline (sequence diagram)
• If amelioration occurred (e.g., put out the fire, cared for the injured),
this should be included in the evaluation to ensure that it did not
contribute to the undesired outcome.

Example: In the case of a death, the investigation should ensure that


the death was the result of the mishap and not a delay in medical care
or inappropriate medical care at the scene.

Condition

Exceeded- Undesired
Exceeded-
Outcome
Event Event Failed Barrier Event Failed
Or Control Amelioration

2004 MAPLD/Mishaps Seminar 15 Chandler


Root Cause Analysis - Steps

Example: simple timeline.

Lost High
Speed Data
Satellite Tech. Used Stream
Thrusters Oriented Satellite Failed to
Wrong Method (Mission
Powered Up Space Craft Deploy Antenna To Correct
Failure)

Poor
Line of
Sight

2004 MAPLD/Mishaps Seminar 16 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree.
(A visual representation of the causes that led to the failure or mishap.)

• Place the undesired outcome at the top of the tree.

• Add all events, conditions, and exceeded/failed barriers that occurred


immediately before the undesired outcome and might have caused it.

Lost High Speed Data Stream From Satellite


(Mission Failure)

Satellite Thrusters Poor Satellite Failed Technician Used


Power Up Oriented Line of Sight To Deploy Wrong
Space Craft Antenna Method to Correct

2004 MAPLD/Mishaps Seminar 17 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree.
• Brainstorm to ensure that all possible causes are included, NOT just
those that you are sure are involved.

• Be sure to consider people, hardware, software, policy, procedures, and


the environment.
Lost High Speed Data Stream From Satellite
(Mission Failure)

Satellite Thrusters Oriented Poor Satellite Failed Technician Used Wrong


Power Up Space Craft Line of Sight To Deploy Antenna Method to Correct

Meteoroid Logic Control Power Supply Arming Relay Firing Relay Sabotage
Impacted Satellite Failed Failed Failed Failed

2004 MAPLD/Mishaps Seminar 18 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree continued...
• If you have solid data indicating that one of the possible causes is not
applicable, it can be eliminated from the tree.
Caution: Do not be too eager to eliminate early on. If there is a possibility that it
is a causal factor, leave it and eliminate it later when more information is
available.

Lost High Speed Data Stream From Satellite


(Mission Failure)

X Satellite
Power Up
Thrusters Oriented
Space Craft
Poor
Line of Sight
Satellite Failed
To Deploy Antenna
Technician Used Wrong
Method to Correct

Meteoroid Logic Control Power Supply Arming Relay Firing Relay Sabotage
Impacted Satellite Failed Failed Failed Failed

2004 MAPLD/Mishaps Seminar 19 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree continued…
• You may use a fault tree to determine all potential causes and to
decompose the failure down to the “basic event” (e.g., system component
level).
Lost High Speed Data Stream From Satellite
(Mission Failure)

Thrusters Oriented Poor Satellite Failed Technician Used Wrong


Space Craft Line of Sight To Deploy Antenna Method to Correct

Meteoroid Logic Control Power Supply Arming Relay Firing Relay Sabotage
Impacted Satellite Failed Failed Failed Failed

Battery Failed Converter Current Detection


Failed Circuit Failed

2004 MAPLD/Mishaps Seminar 20 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree continued…
• A fault tree can also be used to identify all possible types of human
failures.
Lost High Speed Data Stream From Satellite
(Mission Failure)

Thrusters Oriented Poor Satellite Failed Technician Used Wrong


Space Craft Line of Sight To Deploy Antenna Method to Correct

Didn’t Perceive Didn’t Understand Correct Interpretation Correct Decision But


System Feedback System Feedback Incorrect Decision Incorrect Action
Perception Error Interpretation Error Decision-Making Error Action-Execution Error

Knowledge-Based Rule-Based Skill-Based


Error Error Error

2004 MAPLD/Mishaps Seminar 21 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree continued…
• After you have identified all the possible causes, ask yourself “WHY” each
may have occurred.

• Be sure to keep your questions focused on the original issue. For example
“Why was the condition present?”; “Why did the event occur?”; “Why was
the barrier exceeded?” or “Why did the barrier fail?”

Undesired Outcome

Event #1 Condition Event #2 Failed or Exceeded


Barrier or Control

WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
Event #1 Event #1 Event #1 Condition Failed Failed Failed
Condition Condition Event #2 Event #2 Event #2
Occurred Occurred Occurred Existed or Exceeded Exceeded Exceeded
Existed or Existed or Occurred Occurred Occurred Barrier or Barrier or Barrier or
Changed Changed Changed Control Control Control

2004 MAPLD/Mishaps Seminar 22 Chandler


Root Cause Analysis – Steps

Continue to ask “why” until you have reached:

1. Root cause(s) - including all organizational factors that exert


control over the design, fabrication, development, maintenance,
operation, and disposal of the system.

2. A problem that is not correctable by NASA or NASA contractor.

3. Insufficient data to continue.

2004 MAPLD/Mishaps Seminar 23 Chandler


Root Cause Analysis- Steps
The resultant tree of questions and answers should lead to a
comprehensive picture of POTENTIAL causes for the undesired
outcome

Undesired Outcome

Event #1 Condition Event #2 Failed or Exceeded


Barrier or Control

WHY WHY WHY WHY WHY WHY WHY WHY WHY


WHY WHY WHY
Event #1
Occurred
Event #1
Occurred
Event #1
Occurred
Condition
Existed or
Changed
Condition
Existed or
Changed
Condition
Existed or
Changed
Event #2
Occurred
Event #2
Occurred
Event #2
Occurred
Failed
Exceeded
Barrier or
Control
Failed
Exceeded
Barrier or
Control
XFailed
Exceeded
Barrier or
Control

WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY

WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY

2004 MAPLD/Mishaps Seminar 24 Chandler


Root Cause Analysis- Steps
Check your logic with a detailed review of each potential cause.
• Verify it is a contributor or cause.
• If the action, deficiency, or decision in question were corrected, eliminated
or avoided, would the undesired outcome be prevented or avoided?
> If no, then eliminate it from the tree.

Undesired Outcome

Event #1 Condition Event #2 Failed or Exceeded


Barrier or Control

WHY

X
WHY WHY WHY WHY WHY WHY WHY WHY

X
WHY

X
WHY WHY

X
Condition Condition Condition Event #2 Event #2 Event #2 Failed Failed Failed
Event #1 Event #1 Event #1 Existed or Existed or Existed or Occurred Occurred Occurred Exceeded Exceeded Exceeded
Occurred Occurred Occurred Changed Changed Changed Barrier or Barrier or Barrier or
Control Control Control

XX
WHY WHY WHY WHY WHY
XWHY WHY WHY WHY
XX
WHY WHY WHY WHY WHY WHY WHY

XX
WHY WHY WHY

XXWHY WHY WHY WHY WHY


XX WHY WHY WHY WHY
XX
WHY WHY WHY WHY WHY WHY WHY
XX X
WHY WHY WHY

2004 MAPLD/Mishaps Seminar 25 Chandler


Root Cause Analysis - Steps
Create an event and causal factor tree continued…
• The remaining items on the tree are the causes (or probable causes).
necessary to produce the undesired outcome.
• Proximate causes are those immediately before the undesired outcome.
• Intermediate causes are those between the proximate and root causes.
• Root causes are organizational factors or systemic problems located at the
bottom of the tree.

Undesired Outcome

Event #2 Failed or Exceeded


PROXIMATE
Event #1 Condition
Barrier or Control CAUSES

WHY WHY WHY WHY


Event #1 WHY WHY WHY WHY
Event #1 Condition Condition
Occurred Event #2 Event #2 Failed/Exceeded Failed/Exceeded
Occurred Existed or Existed or
Occurred Occurred Barrier or Control Barrier or Control
Changed Changed
INTERMEDIATE
CAUSES
WHY WHY WHY WHY WHY
WHY WHY WHY WHY WHY WHY WHY

WHY
WHY WHY WHY WHY WHY WHY WHY WHY WHY ROOT CAUSES
2004 MAPLD/Mishaps Seminar 26 Chandler
Root Cause Analysis- Steps
Some people choose to leave contributing factors on the tree to show
all factors that influenced the event.
Contributing factor: An event or condition that may have contributed to
the occurrence of an undesired outcome but, if eliminated or modified,
would not by itself have prevented the occurrence.
If this is done, illustrate them differently (e.g., dotted line boxes and arrows) so
that it is clear that they are not causes.

Undesired Outcome

Event #1 Condition Event #2 Failed or Exceeded


Barrier or Control

WHY WHY WHY WHY


Event #1 WHY WHY WHY WHY
Event #1 Condition Condition
Occurred Event #2 Event #2 Failed/Exceeded Failed/Exceeded
Occurred Existed or Existed or
Occurred Occurred Barrier or Control Barrier or Control
Changed Changed

Contributing
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
WHY Factors
WHY WHY WHY
WHY WHY WHY WHY WHY WHY WHY

2004 MAPLD/Mishaps Seminar 27 Chandler


Investigating Causes of Failures & Mishaps
Lost High Speed Data Stream From Satellite
(Mission Failure)

Thrusters Oriented Poor Satellite Failed Technician Used Wrong


Space Craft Line of Sight To Deploy Antenna Method to Correct

Power Supply
Failed

Battery Dead
Tells What Failed
Tells Types of Failures
Installed Beyond Shelf
Improperly Limit

Tells Why It Faile


Root Cause is Much Deeper
Keep Asking Why
2004 MAPLD/Mishaps Seminar 28 Chandler
Investigating Causes of Failures & Mishaps
Lost High Speed Data Stream From Satellite
(Mission Failure)

Thrusters Oriented Poor Satellite Failed Technician Used Wrong


Space Craft Line of Sight To Deploy Antenna Method to Correct

MMOD Hit Correct Interpretation


Power Supply
Space Craft Failed Incorrect Decision
After Oriented Decision-Making Error
Battery Failed

New Task Insufficient


Anomaly Training
Installed Beyond Shelf
Improperly Limit Training Does
Not Exist
Procedure No Quality
Incorrect Inspection Insufficient
Not Updated Training Budget
Insufficient
Not Under Quality Staff Organization Under
Configuration Mgmt Estimates Importance of
Insufficient Anomaly Training
2004 MAPLD/Mishaps Seminar 29
Budget Chandler
Root Cause Analysis- Steps
Generating Recommendations:
At a minimum corrective actions should be generated to eliminate proximate
causes and eliminate or mitigate the negative effects of root causes.

When multiple causes exist, there is limited budget, or it is difficult to


determine what should be corrected:

• Quantitative analysis can be used to determine the total contribution of


each cause to the undesirable outcome (see NASA Fault Tree
Handbook, Version 1.1, for more information).

• Fishbone diagrams (or other methods) can be used to arrange causes


in order of their importance.

• Those causes which contribute most to the undesirable outcome


should be eliminated or the negative effects should be mitigated to
minimize risk.

2004 MAPLD/Mishaps Seminar 30 Chandler


For More Information

• NASA PBMA Mishap Investigation Website


(http://ai-pbma-kms.intranets.com/login.asp?link=)
– Includes:
• Links (e.g, to Root Cause Analysis Software, a RCA
Library).
• Documents (e.g., Methods, Techniques, Tools,
Publications and Presentations).
• Threaded Discussions and Polls.

• HQ Office of Safety & Mission Assurance


– Faith.T.Chandler@nasa.gov

2004 MAPLD/Mishaps Seminar 31 Chandler


BACK-UP SLIDES

2004 MAPLD/Mishaps Seminar 32 Chandler


Definitions of RCA & Related Terms
Cause (Causal Factor) An event or condition that results in an effect. Anything that shapes or influences the outcome.

Proximate Cause(s) The event(s) that occurred, including any condition(s) that existed immediately before the undesired
outcome, directly resulted in its occurrence and, if eliminated or modified, would have prevented the
undesired outcome. Also known as the direct cause(s).

Root Cause(s) One of multiple factors (events, conditions or organizational factors) that contributed to or created the
proximate cause and subsequent undesired outcome and, if eliminated, or modified would have
prevented the undesired outcome. Typically multiple root causes contribute to an undesired outcome.
Root Cause Analysis (RCA) A structured evaluation method that identifies the root causes for an undesired outcome and the
actions adequate to prevent recurrence. Root cause analysis should continue until organizational
factors have been identified, or until data are exhausted.
Event A real-time occurrence describing one discrete action, typically an error, failure, or malfunction.
Examples: pipe broke, power lost, lightning struck, person opened valve, etc…

Condition Any as-found state, whether or not resulting from an event, that may have safety, health, quality,
security, operational, or environmental implications.
Organizational Factors Any operational or management structural entity that exerts control over the system at any stage in its
life cycle, including but not limited to the system’s concept development, design, fabrication, test,
maintenance, operation, and disposal.
Examples: resource management (budget, staff, training); policy (content, implementation, verification);
and management decisions.
Contributing Factor An event or condition that may have contributed to the occurrence of an undesired outcome but, if
eliminated or modified, would not by itself have prevented the occurrence.

Barrier A physical device or an administrative control used to reduce risk of the undesired outcome to an
acceptable level. Barriers can provide physical intervention (e.g., a guardrail) or procedural separation
in time and space (e.g., lock-out-tag-out procedure).
2004 MAPLD/Mishaps Seminar 33 Chandler
NASA Mishap Classification Levels
Classification Level Property Damage Injury

Type A Total direct cost of mission failure and property damage is $1,000,000 or more, Occupational injury and/or illness that resulted in:
or A fatality,
Crewed aircraft hull loss has occurred, or
or A permanent total disability,
Occurrence of an unexpected aircraft departure from controlled flight (except or
high performance jet/test aircraft such as F-15, F-16, F/A-18, T-38, and T-34, The hospitalization for inpatient care of 3 or more people
when engaged in flight test activities). within 30 workdays of the mishap.

Type B Total direct cost of mission failure and property damage of at least $250,000 but Occupational injury and/or illness has resulted in
less than $1,000,000. permanent partial disability.
or
The hospitalization for inpatient care of 1-2 people within
30 workdays of the mishap.

Type C Total direct cost of mission failure and property damage of at least $25,000 but Nonfatal occupational injury or illness that caused any
less than $250,000. workdays away from work, restricted duty, or transfer to
another job beyond the workday or shift on which it
occurred.

Type D Total direct cost of mission failure and property damage of at least $1,000 but Any nonfatal OSHA recordable occupational injury and/or
less than $25,000. illness that does not meet the definition of a Type C
mishap.

Close Call Total direct cost of mission failure and property damage is less than $1,000 Minor injury requiring first aid which possesses the
or potential to cause a mishap
An occurrence or condition of employee concern in which there is no property or
damage but possesses the potential to cause a mishap. An occurrence or condition with no injury but possesses
the potential to cause a mishap.

2004 MAPLD/Mishaps Seminar 34 Chandler

You might also like