You are on page 1of 19

Key acronyms

RCA = root cause analysis

Incident Investigation and Reporting SVA = security vulnerability analysis

SAND No. 2011-1036C

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin


Company,
for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000. 2

Incident investigation resources Incident investigation resources

CCPS 2003. Center for Chemical Process Safety, D.A. Crowl and J. F. Louvar 2001. Chemical
Guidelines for Investigating Chemical Process Process Safety: Fundamentals with Applications,
Incidents, 2nd Edition, NY: American Institute of 2nd Ed., Upper Saddle River, NJ: Prentice Hall.
Chemical Engineers.
Chapter
1 Introduction
Chapter 12 • Accident Investigations
2 Designing an incident investigation management system 12.1 Learning from accidents
3 An overview of incident causation theories 12.2 Layered investigations
4 An overview of investigation methodologies
5 Reporting and investigating near misses 12.3 Investigation process
6 The impact of human factors 12.4 Investigation summary
7 Building and leading an incident investigation team 12.5 Aids for diagnosis
8 Gathering and analyzing evidence
9 Determining root causes – structured approaches 12.6 Aids for recommendations
10 Developing effective recommendations
11 Communication issues and preparing the final report
...
3 4
Incident investigation resources Incident Investigation and Reporting

CCPS 2007a. Center for Chemical Process Safety, 1. What is an incident investigation ?
Guidelines for Risk Based Process Safety, NY: 2. How does incident investigation fit into PSM?
American Institute of Chemical Engineers. 3. What kinds of incidents are investigated?
4. When is the incident investigation conducted?
Chapter 19 • Incident Investigation 5. Who performs the investigations?
19.1 Element Overview
19.2 Key Principles and Essential Features 6. What are some ways to investigate incidents?
19.3 Possible Work Activities 7. How are incident investigations documented?
19.4 Examples of Ways to Improve Effectiveness
19.5 Element Metrics 8. What is done with findings & recommendations?
19.6 Management Review 9. How can incidents be counted and tracked?

Photo credit: U.S. Chemical Safety & Hazard Investigation Board


5 6

Incident Investigation and Reporting What is an incident investigation ?

1. What is an incident investigation ? An incident investigation


is the management process
by which underlying causes of
undesirable events are uncovered
and steps are taken to
prevent similar occurrences.
Results of explosion and fire at a waste - CCPS 2003
flammable solvent processing facility
(U.S. CSB Case Study 2009-10-I-OH)

7 8
Have system Train team
in place before members before
incident Incident incident Learning from incidents
occurs

Investigations that will enhance learning


Activate investigation team Conduct incident
investigation • are fact-finding, not fault-finding
Incident • Develop investigation plan
Investigation
• Gather, analyze evidence • must get to the root causes
• Determine root causes
Critique
investigation; Functions • Develop recommendations
• must be reported, shared and retained.
improve
system
Management and legal review; Generate incident
decide actions, restart criteria report

Implement actions

Communicate learnings
10

Definition - Root cause Incident Investigation and Reporting

1. What is an incident investigation ?


Root Cause: A fundamental, underlying,
2. How does incident investigation fit into PSM?
system-related reason why an incident
occurred that identifies a correctable
failure or failures in management
systems.
There is typically more than one root
cause for every process safety incident.
- CCPS 2003

11 12
How does incident investigation fit into PSM?
Historical Potential
Risk-Based Process Safety (CCPS 2007a)
Commit to Understand Manage Risk Learn from
Process Safety Hazards and  Operating procedures Experience Four perspectives for designing,
Risks
 Process safety
culture  Process
 Safe work practices
 Asset integrity and
 Incident
investigation building and operating a
 Compliance with
standards
knowledge
management
reliability
 Contractor management
 Measurement
and metrics
safe, secure and profitable facility
 Process safety  Hazard  Training and  Auditing
competency identification performance assurance
and risk
 Management of change
 Management
Hypo-
Actual
 Workforce analysis review and
involvement
thetical
 Operational readiness continuous
 Stakeholder  Conduct of operations improvement
outreach  Emergency management

13 14

Historical Potential
Codes, Standards, Hazards,
RAGAGEPs Consequences

• The historical perspective tells


us what to do based on codes, • The potentials are what could
standards and best practices happen if containment or control
that represent our accumulated of a process hazard was lost or if
experience and lessons learned a security incident occurred.
from previous industry incidents.

15 16
• The actual or real-time perspective can
• The hypothetical, or predictive,
inform us of previously unrecognized or
perspective looks at what could go
uncorrected problems, as they are
wrong, even if it has never happened
manifested in actual incidents and near
before. This is a probabilistic
misses, as well as by ongoing inspections
perspective, based on hypothetical
and tests that can detect incipient
loss event scenarios.
problems.

Hypothetical Actual
What-If, HAZOP, Incidents,
SVA Inspections, Tests

17 18

Incident Investigation and Reporting What kinds of incidents are investigated?

1. What is an incident investigation ? •The first step in an incident


2. How does incident investigation fit into PSM? investigation is recognizing
3. What kinds of incidents are investigated? that an “incident” has
occurred!

19 20
What kinds of incidents are investigated? What kinds of incidents are investigated?

•The first step in an incident •The first step in an incident


investigation is recognizing investigation is recognizing
that an “incident” has that an “incident” has
occurred! occurred!

Yes ?
21 22

Definitions Incident types

Incident: An unplanned event Three categories of incidents, based on outcomes:


or sequence of events
that either resulted in Loss event Near miss Operational
or had the potential to result in
adverse impacts.
interruption

Incident sequence: A series of events composed


of an initiating cause and intermediate events
leading to an undesirable outcome.

Source: CCPS 2008a


23 24
Incident types Incident types

Three categories of incidents, based on outcomes: Three categories of incidents, based on outcomes:

Loss event Near miss Operational Loss event Near miss Operational
interruption interruption
- Actual loss
or harm occurs - Actual impact
(also termed
Near miss: An occurrence in which an accident
on production
accident when (i.e., property damage, environmental impact, or
or product quality
not related to human loss) or an operational interruption could
occurs
security) have plausibly resulted if circumstances had been
slightly different. - CCPS 2003
(Same concept for security incidents also)
25 26

One type of near miss DISCUSSION

Safeguards
Give three or four examples of simple near-
Contain
Preventive Mitigative miss scenarios that would fit the graphic on
& Control the previous slide.
Hazards Regain control Include at least one related to facility security.
or shut down
(NEAR MISS) __
Deviation Mitigated
__
Loss Event Impacts __
__
Unmitigated

27 28
Preventive safeguards revisited REVIEW

Preventive Operational Mode: Abnormal operation What are the equivalent of preventive
safeguards for facility security physical
Objective: Regain control or shut down;
Regain control
protection systems?
keep loss events from happening
or shut down
__
Examples of Preventive Safeguards:
– Operator response to alarm __
Loss Event – Safety Instrumented System __
– Hardwired interlock
__
– Last-resort dump, quench, blowdown
– Emergency relief system

29 30

Incident Investigation and Reporting When is the incident investigation conducted?

1. What is an incident investigation ? • Basic answer: As soon as possible.


2. How does incident investigation fit into PSM?
• Reasons:
3. What kinds of incidents are investigated? – Evidence gets lost or modified
4. When is the incident investigation conducted? • Computer control historical data overwritten
• Outside scene exposed to rain, wind, sunlight
• Chemical residues oxidize, etc.
– Witness memories fade or change
– Other incidents may be avoided
– Restart may depend on completing actions to
prevent recurrence
– Regulators or others may require it
• E.g., U.S. OSHA PSM: Start within 48 h
31 32
When is the incident investigation conducted? DISCUSSION

Challenges to starting as soon as possible: What might be done to overcome some of the
– Team must be selected and assembled challenges to starting an investigation sooner?
– Team may need to be trained
– Team may need to be equipped –
– Team members may need to travel to site
– Authorities or others may block access –
– Site may be unsafe to approach / enter

33 34

Incident Investigation and Reporting Who performs the investigations?

1. What is an incident investigation ? Options:


2. How does incident investigation fit into PSM? • Single investigator
3. What kinds of incidents are investigated?
• Team approach
4. When is the incident investigation conducted?
5. Who performs the investigations?

35 36
Who performs the investigations? Who performs the investigations?

Options: The “best team” will vary depending on the


• Single investigator
nature, severity and complexity of the incident.

• Team approach Some possible team members:


• Team leader / investigation method facilitator
Advantages of team approach: (CCPS 2003) • Area operator • Union safety representative
- Multiple technical perspectives help analyze findings
- Diverse personal viewpoints enhance objectivity • Process engineer • Contractor representative
- Internal peer reviews can enhance quality • Safety / security specialist • Other specialists (e.g.,
- More resources are available to do required tasks
• I&E / process control or metallurgist, chemist)
- Regulatory authority may require it
computer systems support

37 38

Train team
members before
incident Incident Investigation and Reporting

Training site management, 1. What is an incident investigation ?


potential team members and Conduct incident 2. How does incident investigation fit into PSM?
support personnel ahead of investigation
• Develop investigation plan 3. What kinds of incidents are investigated?
time will speed up the start of • Gather, analyze evidence
the investigation. 4. When is the incident investigation conducted?
• Determine root causes
• Develop recommendations 5. Who performs the investigations?
– Larger companies may have
one or more specially trained 6. What are some ways to investigate incidents?
persons available for major Generate incident
incident investigations report
– All personnel need to be
familiar with the basic
incident recognition and
reporting requirements
39 40
Older investigations Layered investigations

• Only identified obvious causes; e.g., • Deeper analysis


– “The line plugged up” • Additional layers of recommendations:
– “The operator screwed up”
1 Immediate technical recommendations
– “The whole thing just blew up”
• e.g., replace the carbon steel with stainless steel
• Recommendations were superficial 2 Recommendations to avoid the hazards
– “Clean out the plugged line” • e.g., use a noncorrosive process material
– “Re-train the operator”
3 Recommendations to improve the
– “Build a new one” management system
• e.g., keep a materials expert on staff

41 42

Investigation process Discovery phase

1 Choose investigation team • Develop a plan


2 Make brief overview survey • Gather evidence
3 Set objectives, delegate responsibilities – Take safety precautions; use PPE
– Preserve the physical scene and process data
4 Gather, organize pre-incident facts
– Gather physical evidence, samples
5 Investigate, record incident facts – Take photographs, videos
6 Research, analyze unknowns – Interview witnesses
– Obtain control or computer system charts and data
7 Discuss, conclude, recommend
8 Write clear, concise, accurate report

43 44
Analysis of facts Some analysis methods

• Develop a timeline • Five Why’s


• Analyze physical and/or electronic evidence • Causal Tree
– Chemical analysis
– Mechanical testing
• RCA (Root Cause Analysis)
– Computer modeling • FTA (Fault Tree Analysis)
– Data logs
• MORT (Management Oversight and Risk Tree)
– etc.
• Conduct multiple-root-cause analysis • MCSOII (Multiple Cause, Systems Oriented
Incident Investigation)
• TapRooT®

45 46

Some analysis methods Incident sequence questions

General analysis approach: Determine, for the incident being investigated:


• Develop, by brainstorming or a more structured • What was the cause or attack that changed the
approach, possible incident sequences situation from “normal” to “abnormal”?
• Eliminate as many incident sequences as possible • What was the actual (or potential, if a near miss)
based on the available evidence loss event ?
• Take a closer look at those that remain until the • What safeguards failed? What did not fail?
actual incident sequence is discovered (if possible)
Hazards
• Determine the underlying root causes of the Regain control
or shut down

actual incident sequence Deviation Mitigated

Loss Event Impacts


47 48 Unmitigated
“Swiss cheese model” revisited EXERCISE

Conduct “Five Why’s” on the most recent loss event


that has happened to you personally.

REMEMBER: Why did the loss event happen? Because ________________


_____________________________________________________
No protective
barrier is 100% Why? Because _______________________________________
reliable. Why? Because _______________________________________

Why? Because _______________________________________

Why? Because _______________________________________


49 50

Discuss, conclude, recommend Aids for diagnosis

• Find the most likely scenario that fits the facts • Location of fire ignition?

• Determine the underlying management system • Deflagration or detonation?


failures • Hydraulic or pneumatic failure?
• Develop layered recommendations • Pressure required to rupture containment?
• Medical evidence?

See Crowl and Louvar 2001 Section 12.5 for details

51 52
Incident Investigation and Reporting How are incident investigations documented?

1. What is an incident investigation ? A written report documents, as a minimum:


2. How does incident investigation fit into PSM? • Date of the incident
3. What kinds of incidents are investigated?
• When the investigation began
4. When is the incident investigation conducted?
5. Who performs the investigations? • Who conducted the investigation
6. What are some ways to investigate incidents? • A description of the incident
7. How are incident investigations documented? • The factors that contributed to the incident
• Any recommendations resulting from the
investigation

53 54

Typical report format Investigation summary

1 Introduction • The investigation report is generally too detailed


to share the learnings to most interested persons
2 System description
• An Investigation Summary can be used for
3 Incident description broader dissemination, such as to:
4 Investigation results – Communicate to management
– Use in safety or security meetings
5 Discussion
– Train new personnel
6 Conclusions – Share lessons learned with sister plants
7 Layered recommendations (See also: Crowl & Louvar 2001, Figure 12-1 and Example 12-2)

55 56
Investigation Investigation
summary summary
example example

Source: S2S -
A Gateway for Plant
and Process Safety,
www.safety-s2s.eu

57 58

Incident Investigation and Reporting

1. What is an incident investigation ?


2. How does incident investigation fit into PSM?
3. What kinds of incidents are investigated?
4. When is the incident investigation conducted?
5. Who performs the investigations?
6. What are some ways to investigate incidents?
7. How are incident investigations documented?
8. What is done with findings & recommendations?

59 60
Findings and recommendations Findings and recommendations

What is the most important product of an What is the most important product of an
incident investigation? incident investigation?
1. The incident report 1. The incident report
2. Knowing who to blame for the incident 2. Knowing who to blame for the incident
3. Findings and recommendations from the study 3. Findings and recommendations from the study
4. The actions taken in response to the findings
and recommendations from the study

61 62

Findings and recommendations Aids for recommendations

Example form to document recommendations: Overriding principles (Crowl and Louvar 2001, p. 528):
ORIGINAL STUDY FINDING / RECOMMENDATION
• Make safety [and security] investments on cost
Source:  PHA  Incident Investigation  Compliance Audit  Self-Assessment  Other
Source Name
and performance basis
Finding No. Risk-Based Priority (A, B, C or N/A)
• Improve management systems
Finding / Rec-
ommendation
• Improve management and staff support
• Develop layered recommendations, especially
Date of Study or Date Finding / Recommendation Made to eliminate underlying causes

63 64
Aids for recommendations

Overriding principles:
• Make safety [and security] investments on cost
and performance basis
• Improve management systems
• Improve management and staff support
• Develop layered recommendations, especially
to eliminate underlying causes and hazards

65

(continued from previous slide)


Implementation

As for PHA action items,


a system must be in place to ensure all incident
investigation action items are completed on time
and as intended.

– Same system can be used for both


– Include regular status reports to management
– Communicate actions to affected employees

67 68
Incident Investigation and Reporting How can incidents be counted and tracked?

1. What is an incident investigation ? • “Lagging indicators” — actual loss events


2. How does incident investigation fit into PSM? – Major incident counts and monetary losses
3. What kinds of incidents are investigated? – Injury/illness rates
4. When is the incident investigation conducted? – Process safety incident rates
5. Who performs the investigations?
6. What are some ways to investigate incidents?
7. How are incident investigations documented?
8. What is done with findings & recommendations?
9. How can incidents be counted and tracked?

69 70

How can incidents be counted and tracked? Pyramid Principle revisited

• “Lagging indicators” — actual loss events


– Major incident counts and monetary losses
– Injury/illness rates
– Process safety incidents rate
Reducing the
• “Leading indicators” — precursor events frequency of
– Near misses precursor events
– Abnormal situations
• E.g., Overpressure relief events
and near misses...
• Safety alarm or shutdown system actuations
• Flammable gas detector trips
– Unsafe acts and conditions
– Other PSM element metrics
71
Pyramid Principle revisited Additional resources

• AIChE Loss Prevention Symposium,


Case Histories session (every year)
• www.csb.gov reports and videos

… will reduce the • CCPS 2008b, Center for Chemical


Process Safety, Incidents that Define
likelihood of a Process Safety, NY: American Institute
major loss event of Chemical Engineers
• CCPS, “Process safety leading and
lagging metrics – You don’t improve
what you don’t measure,” available at
www.aiche.org/uploadedFiles/CCPS/Publications/CCPS_ProcessSafety2011_2-24.pdf

73 74

You might also like