You are on page 1of 52

+

Failure
Is Trying to Tell
Us Something

Root Cause Analysis


Root Cause Analysis is a method of problem solving that identifies the sources of failure or
problems. A root cause is the source of a problem and its resulting symptom, that once
removed, corrects or prevents an undesirable outcome from recurring.

It is not the Root Cause We Seek, It Is an Effective


Solution
+Our Path to Better Root Cause 2

Analysis
 Principles of Root Cause Analysis

 Understanding the weaknesses in our current method

 Introduction the Apollo Method

 Steps to applying Apollo

 Transition from the current method to Apollo


Beyond Conventional
+Wisdom of Problem
Solving
The common approach to problem solving is to categorize
causes or identify causal factors and look for root causes
within the categories.
Categorization schemes do not reveal the cause and effect
relationships needed to find effective solutions.
It is effective solution we are after

3
+Before we get started 4

We can not solve problems by


using the same kind of
thinking we used when we
created them. — Albert
Einstein
+The Persistent Problem … 5

The development and use of larger and more complex


systems results in greater number of problems,
evidenced by the symptoms of unstable software,
latent defects, unexpected performance issues.
Many of these problems and their symptoms, have a
serious impact on business operations than ever
before.
In many cases these problems are also more difficult
to solve. Recur more often, and remain unresolved.
— A Management System for the Information
Business, Edward Van Schaik
+… and our Persistent Unsuccessful 6

Solutions to recurring problems

In every human endeavor, a critical component to


success is the ability to solve problems.
Unfortunately, we often set ourselves up to fail with
our problem–solving strategies and our inherent
prejudices.
We typically rely on what we believe to be common
sense, storytelling, and categorizing to resolve our
problems.
Conventional wisdom has us believe that problem
solving is inherent to the subject at hand.
— Dean L. Gano, The Apollo Method
+Service Operation Processes Involve 7

Root Cause Analysis†


§2.4.5.1 Event Management – is the process that monitors
all events that occur through the IT infrastructure to allow for
normal operation and also to detect and escalate exception
conditions.

§2.4.5.2 Incident Management – concentrates on restoring


the service to users as quickly as possible, in order to minimize
business impact.

§2.4.5.3 Problem Management – involves root–cause


analysis to determine and resolve the cause of events and
incidents, proactive activities to detect and prevent future
problems/incidents and a Known Error subprocess to allow
quicker diagnosis and resolution if further incidents do occur.
† §2.4.5. Processes within Service Operation, ITIL V3 Service Operation
+What is Root Cause Analysis? 8

Root Cause Analysis is a structured process designed


to help understand the causes of problems for the
purpose of preventing recurrence.
It is step–wise and structured so that it can be
consistently applied to different problems at different
times by different people.
Solutions will only be effective if they act on the
specific known causes of a defined problem.
+Steps in Root Cause Analysis 9

Any structured process used to understand the causes


of past events for the purpose of prevents recurrence
 Define the problem
 Include the significant or consequences to the stakeholders

 Define the causal relationships that combined to cause the


defined problem
 Provide a graphical representation of the causal relationships
 Define how the causes are interrelated
 Provide evidence to support each cause

 Describe how the solutions will prevent recurrence of the


defined problem

 Provide a report that clearly presents all of the above


+The Notion of Root Cause Analysis 10

Symptom:
 We have late additions to the release that breaks the
software.
 We have “core” defects that should been caught long
before production release.
 We make changes to software, stored procedures, or
the database only to discover it was a mistake.
 We make promises to the customer before assessing
the impact on our resources or the technical difficulty.

Problem:
 Test coverage insufficient to detect latent bugs in
software.
 We commit before understanding the consequences.
Root Cause:
 No software structure to determine test coverage or
change impacts on baseline.
 No detailed understanding of our capacity for work
and productivity of our technical staff.
Understanding the
Weakness of our Current
+Approach to Root Cause
Analysis
The overriding theme of traditional Root Cause Analysis is
the focus on the Root Cause. We can eliminate the problem
if we eliminate the Root Cause.
This assumes the causal relationships are linear and that
problems come from a single source.

11
+ Root Cause Analysis is Not about Story 12

Telling

Stories seldom identify causes because they are busy setting the
stage for who was where and when some action occurred.
A story is a sequence of events starting in the past, leading to the
consequences disguised as a root cause
+Core Failure of Story Telling and the 13

Filling Out of Forms


 Stories rely on experience and judgment of the authors to
connect the causes of the problem. The mapping between
Event, Cause, and Effect not provided in the story narrative.

 Story telling can be used to document the investigation and


describe the corrective actions. But stories are poor in
providing the analytical connections between cause and
effect.

 Measures of the effectiveness of corrective actions can not be


provided by narratives. Traceability between Effect, Action,
and Condition can not be provided by the narrative.

 It is a false premise that analysis of a problem, its causes,


effects, and solutions can be reduced to filling out a form and
checking boxes.
+Story Telling Is Not Good Root Cause 14

Analysis Approach
 Story telling describes an event by relating people (who),
places (where), and things (what) in a linear time frame
(when).

 When using storytelling to analyze an event (system, outage


for example), the causes identified in the report are difficult
to follow and hinder our ability to understand the
relationships between all the causes and provide a critique of
the analysis.

 The investigators may well understand all the causal


relationships, but because they are not presented causally it
is difficult to know these relationships.

 Peer reviews will result in more questions because of the


missing connections between Primary, Intermediate Effects,
and their Causes.
+Problem with the Story Telling 15

Approach to Root Cause Analysis


 Stories start with the past – we saw this happen and
something else happened after that, and them something
else happened…
 Causal relationships leading to the Root Cause start with
the present and work backwards to the causes – both
Activities and Conditions of this cause.

 Stories are linear – they come from the minds of the story
tellers, usually as a linear time line.
 The linear understanding of an event in a time sequence
from past to present, ignores the cause–and–effect
principle.
 Since we do not understand the branched causes, we use
our own understanding of cause rather than the actual
causal connections.

 Stories use inference to communicate causes.


+Root Cause Analysis is the 16

Event, the Cause, and the Resulting


Effect
 We need a structured approach to investigating and
analyzing significant adverse events or system deficiencies
and their required improvement – not based on Story Telling.

 We need an approach that provides information and tools to


be incorporated into risk management, quality management,
independent verification and validation and improvement
procedures in order to:
 PREVENT future occurrence of adverse events that cause or can
cause undesired performance of our systems.
 CORRECT practices that have led to identified deficiencies.

 This approach separates story telling from the Primary Effect,


and the cause–effect chain leading to the Primary Effect.
+A Better Approach to Root Cause 17

Analysis for Primary Effect, Cause,


and Effect
 Direct causes often result from another set of causes – the
intermediate causes – and these may be the result of still
other causes.

 This chain of cause and effect needs to be revealed in a way


that clearly points to the corrective actions.

 When a chain of cause and their effects is followed from a


known end–state (time now), back to an origin or starting
point, root causes are revealed and corrective actions can be
applied.

 A root cause is an initiating cause of the causal chain which


leads to an outcome or effect of interest.
+Why Root Cause Analysis is Hard 18

 The problem is poorly defined.


 Systematic approach is not used to classify problem and cause.
 Investigations stopped prematurely – moving on to next problem.
 Decisions based on guesses, hunches or assumptions.
 Inadequate level of detail used to get to the Primary Effect.
 Interim containment fixes sometimes allowed to become
"permanent.”
 Skills, knowledge, and experience needed to uncover the root
cause not available.
 Lack of organizational will to address bigger issues.
 Fear of being blamed.
 I really don’t have time for this, we have bigger problems to solve.
+The Problems with Categorical 19

Thinking
 We need to put order to the things we perceive

 This is a natural process, but creates laziness in our thinking


processes.

 The notion of good and bad is categorical thinking at its base


level.

 Categorical thinking creates the believe that once


categorized, we can establish relationships, and act on the
according to other perceived solutions.

 Filling out root cause forms or assigning elements to Fishbone


charts reinforces the perception we can put the rot causes in
categories (boxes) and assign solutions.
+The Real Problem with Categorical 20

Thinking
 When interacting with others, we assume there is a single
reality and therefore their categories are like ours.

 They are not.

 We assign value that establishes our basis of understanding


and prejudices.

 If this is not recognized there is danger these prejudices set


us up for failure when trying to produce an effective solution
to the root cause.
+Testing Answers from the Five Whys 21

Question Stream
 What evidence is there that this cause exists?
 Is it concrete?
 Is it measurable?

 What evidence is there that this cause could lead to the observed effects?
 Are we merely asserting causation without evidence?

 What evidence is there that this cause actually contributed to the Primary
Effect?
 Even given that it exists and could lead to this problem, how do we know it wasn't
actually something else?

 Is anything else needed, along with this cause, for the stated effect to occur?
 Is it self–sufficient?
 Is something needed to help it along?

 Can anything else, besides this cause, lead to the stated effect?
 Are there alternative explanations that fit better?
 What other risks are there?
22

To navigate the path to the


actual Root Cause, we need
to connect the Action and
Condition causes to the
primary Effect and all the
Intermediate Effects in a
single picture revealing the
corrective actions that
prevent the Primary Effect
+
The Apollo Method
Nothing happens without a cause. Every time we ask WHY
we must find at least two causes – the Action and the
Condition in which that Action causes the effect.
The Apollo Method breaks the dependence of story telling
and linear past to present approach. It replaces it with a
present to past change of cause and effect to discover the
original root causes of the Effect.

23
+Principles of the Cause and Effect Map 24

For each Primary Effect we need ask why that


Effect occurred.
 Causes are never part of a Linear Chain found in standard
Fishbone diagram or narrative approach.

 Look for causes to create the effect. Two causes are needed
for each Effect.
 Conditions – may exist prior to the Effect. Or conditions may be
in motion or active during the Effect. Conditions are the causes
often ignored or beyond our knowledge.
 Actions – momentary causes that bring conditions together to
cause an effect. Actions are causes most easily recognized.

 Connect all causes (Actions and Conditions) with a Caused


By phrase to either an action or a condition.

 Support each Cause with evidence or an answered question.


+Five Steps to using the Five Whys 25

 Invite all affected parties


to contribute to the map.

 Select the leader


(someone trained in RCA)

 Ask Why 5 times for each


topic area in the Cause
and Effect map.

 Assign responsibilities for


collected actual factual
data.

 Publish the map.


Start this process using Apollo now.
+Each Effect Has Two or More Causes in the 26

form of an Action and a Condition

 Primary Effect – is the


effect we want to correct
or prevent.
 Cause – the answer to
Why stated as a verb or
noun. Two forms are
Action and Condition.
 Action – momentary
causes that bring
conditions together to
cause an effect.
 Condition – causes that
exist over time before the
Action brings them
together to cause an
effect.
+Four Phases of the Apollo
Method
These four phases are the basis of discovering the corrective
actions for the undesirable Effects we see in our
development, testing, and deployment efforts.

27
+Four Phases of the Apollo Method 28

1.Define the problem 3. Identify effective solutions


• What is the problem? must
• When did it happen? • Prevent recurrence
• Where did it happen? • Be within our control
• What is the significance of • Meet our goals and
the problem? objectives
4. Implement the best
2.Create the Cause and Effect solutions
chart • Measure the effectiveness
• For the primary Effect, ask of these solutions in units
Why did this happen defined in the Action and
• Look for causes in Actions Condition causes
and Conditions
• Connect all the causes
with Caused By for the
next cause and its effect
• Support causes with
evidence or an open
Question
+No Fishbone Charts or Narratives 29

Allowed in the Apollo Method


+The Flaw of our Linear Thinking 30

Process
 Like a string of dominos, asking why in the conventional Five
Whys method assumes, A caused B, B caused C, and C
caused D.

 At the end of this chain we believe the Root Cause of the


undesirable outcome can be found.

 In the traditional Fishbone approach we are looking for the


event that caused the Effect.

 Instead we need to find the Actions and Conditions that


ALLOWED the event to happen.

 These Actions and Conditions are the actual Root Cause.


+The Principles of the Apollo Method 31

 Cause and Effect are the same thing.


 If we look closely at cause and effect, we see that a “cause” and an
“effect” are the same thing.
 A single thing may be both a cause and an effect.
 They differ only by how we perceive them in time.

 Each Effect has at least two causes in the form of Actions and
Conditions.
 This is the most important and overlooked principle of causation.
 Unlike storytelling used to capture the Fishbone style charts, which
focuses on linear action causes, reality demands that each effect have
at least one action cause and one or more conditional causes.

 Causes and Effects are part of a continuum of causes.

 An Effect exists only if its causes exist in the same space and
time frame.
+Cause and Effect are the Same Thing 32

Effect Caused Action or


by Condition
Injury Caused Fall
by
Fall Caused Slip
by
Slip Caused Wet Surface
by
Wetthing
 The cause of one Caused Leaky
becomes the Faucet
effect when you connect
caused by. Surface by
 The cause of the “Injury” was a “Fall”, and when you ask why
Leaky Caused Seal Failure
“Fall”, it changes to an effect and the cause is “Slip.”
Faucet By
 This relationship continues as long as we continue to ask why.
Seal Failure Caused Seal Not
+Cause and Effect are Part of 33

Continuum of Causes
 Causes are not linear.

 They branch out into at


least two causes each
time we ask why of an
effect and if we ask
why of each of those
causes we find an ever
expanding set of
causes as shown in the
example to the right.

 Causes and Effects


are Part of an Infinite
Continuum of Causes.
+An Effect Exists Only if Its Causes Exist in 34

the same Space and Time Frame

 Cause–and–effect
relationships exist with or
without the human
understanding.
 We perceive them
relative to time and
space.
 Every causal relationship
is made up of Conditional
causes with a history of
existence over time,
combining with an Action
cause in some defined
time frame and existing
in the same space to
create an effect.
+The Apollo Method structures our current 35

information collection method to identify solutions


to the Root Cause

Every time-series entry in our


current narrative method is an
ACTION cause.
By focusing on ACTIONs and
not associated CONDITION
causes, we leave out important
causes that can be acted on to
provide an effective SOLUTION.
+Example of a Current RCA 36
+Missing Elements of Success 37

 Linear thought process


 A caused B, B caused C, C caused D.

 No causal chain from Primary Effect to related Cause and


Effect

 No Actions and Conditions to connect to the Effect


 Under what conditions was the Effect observed?
 What actions triggered the Effect?

 No evidence for each Action and Condition

 Stopping too soon, before actual Root Cause found


Every Effect is caused by momentary Action
cause coming together with existing Condition
cause in the same relative time and space.
+
Seven Steps to Discovery
Effective problem solving and Strategies for business
success that move away from blame finding and linear
thinking of Fish Bone diagrams.
And move toward finding the interconnected factors where
Cause and Effect are intertwined.
These Seven Steps expand on the Four phases, to further
detail the process of arriving at the Root Cause

38
+Seven Steps of the Apollo Method 39

1. Define the Problem.

2. Determine the Causal Relationships.

3. Provide a Graphical Representation of Cause and Effect that is


not linear thinking.

4. Provide Evidence for each Cause and Effect.

5. Determine if each Cause is Sufficient and Necessary.

6. Identify Effective Solutions ⬅ THIS IS WHAT WE’RE AFTER.


 Finding the cause is needed.
 Preventing the effect is needed.
 But installing an effective solution is the desired business outcome.

7. Implement And Track the Effective Solutions.


+An Effective Solution … 40

 Prevents recurrence of the Primary Effect.

 Assures corrections and prevents actions within our control.

 Meets our goals and objectives, including a solution that …


 Does not cause unacceptable problems.
 Prevents similar occurrences.
 Provides reasonable value for the cost.
+ 41

A Recent Example of Cause and Effect


Analysis
+An Ugly Truth About Root
Causes
The truth?
You can’t handle the Truth!

42
+In the end, it usually comes down to People. People 43

are the Root Cause that creates the Primary Effect

This does NOT and CAN NOT mean a Blame Game.


If we ask properly, people will see their role in the failure of the
process.
+And this is our core problem† 44

 Getting staff engaged in Preventing problems not just


Correcting problems means:
 If I do this what will happen?
 Is this the right thing to do at this time?
 Is this really what the customer wanted me to do?
 Did I consider the impact of my actions?
 If I don’t have time to be careful, then what damage will result
from my actions?
 Did I consult with someone who knows more than I do about what
the solution should look like?
 Am I being as careful as I should be, when I make a change?
 Am I being pressured to do something I know isn’t the right thing
to do?
If the people change, the culture will change
† IEEE 13th Annual Workshop on Human Performance / Root Cause / Trending / Operating Experience / Self Assessment,
Augiust 26–31, 2007, Monteray Marriott, Monteray, CA
45
Basis of Successful Root
+Cause Analysis using the
Apollo Method
The following guidelines are provided to help us create an
effective problem solving culture.

46
+Critical Elements 47

To be effective, everyone …
 Must be exposed to the principles of causation to understand that
“stuff” does not just happen. It always has a cause connected to
the effect, in a long chain of cause and effect

 Should know that we can find effective solutions to event–based


problems using RealityCharting ® and RC Simplified™.

 Must understand that different perspectives are a key to effective


solutions and easily accommodated when you use the
Realitycharting® process, avoiding the blame game and he said,
she said exchange.

 Must know their role in defining problems and finding effective


solutions to prevent recurrence.

 Must know that management is behind this initiative.


+Infrastructure 48

Elements required for success


 Top–level management support.

 A Program Champion. The name of this person may take


many forms depending on the existing infrastructure.
Examples include; Management Representative, Continuous
Improvement Champion, and Problem–Solving Champion.

 Dedicated Incident Investigation Facilitators.

 Incorporation of the Realitycharting® process into existing


procedures and protocol.

 Involve every employee in this effective problem–solving


initiative.
+Review of Needed Elements of any 49

Credible Root Cause Analysis


 Actions are causes that interact
with the conditions to cause an
Effect.

 Conditions are causes that exist in


time prior to an action bring them
together to cause an Effect.

 Cause Set is the fundamental


causal element of all that
happened. It is made up of an
Effect and its immediate causes
that represent a single causal
relationship. As a minimum, the
causes consist of an Action and
one or more Conditions. Causal
sets, like causes, cannot exist
alone. They are part of a continuum
of causes with no beginning or end.
+When we skip story telling and start 50

connecting the cause and effect, we’ll


find the root cause of the Event
51

 Our story
telling
approach is not
the basis of
corrective
actions.
 The Apollo
method
prevents the
story telling
and focuses on
Event,
Activities and
Conditions in a
causal chain to
the Primary
Effect
52

You might also like