Professional Documents
Culture Documents
The ChE as
Sherlock Holmes:
Investigating Process Incidents
James A. Klein Investigating process incidents to understand
ABS Group
what happened and why provides vital information
that will help prevent future incidents.
S
an incident does not happen again. While we may lack the
herlock Holmes is a fictional consulting detective cre- personal skills, insights, and confidence of Holmes, we can
ated by Sir Arthur Conan Doyle in 1887 (1), ultimately learn from him, learn from process incidents, and, ultimately,
appearing in over 50 novels and short stories, many of contribute to improved process safety performance.
which have been popularized in movies and on television. An incident is an unplanned sequence of events with the
He is renowned for his powers of observation and his ability potential for undesirable consequences (2). Process safety
to develop significant insights into the causes and perpe- incidents involve process hazards, such as toxicity, flamma-
trators of difficult criminal cases that others miss — often bility, and reactivity, that can lead to hazardous events, such
figuring out what happened when no one else can. In process as toxic releases, fires, and runaway reactions. A catastrophic
safety, we similarly need to analyze what happened when incident might involve a large explosion or toxic release
process incidents occur so we can determine what went with multiple consequences, such as injuries, environmental
wrong and how future incidents can be prevented. harm, and/or business disruption. For example, an explo-
Sometimes after a process safety incident, what hap- sion and fire at a polyvinyl chloride (PVC) manufacturing
pened may seem obvious. Usually, though, many things facility killed five workers and seriously injured three others
have to go wrong for a serious incident to occur, and one (Figure 1) (3).
or more of the causes may not be obvious. Even if some A near-miss incident is one with the potential for sig-
of them do seem obvious, our view is often incomplete or nificant consequences if it were to progress, but did not due
simply wrong. As a result, incident investigations must rely to the actions of safeguards, emergency response, or other
on appropriate investigation techniques to ensure that the factors.
lessons learned from an incident are valid and effective. You can learn from all types of process safety incidents.
The processes of solving criminal cases and investigating When something unplanned or unexpected occurs, the
process incidents are not very different at a high level. Both consequences should be identified to determine whether
*This article is based on a paper presented at the AIChE 2015 Spring an investigation is appropriate. Incident investigations
Meeting and 11th Global Congress on Process Safety, April 2015. must identify and document important facts about what
28 www.aiche.org/cep October 2016 CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
t Figure 1. An explosion at a PVC manufacturing facility
resulted in five fatalities and ignited PVC resins stored in an
adjacent warehouse. Concerns about the ensuing smoke
forced a two-day community evacuation. Source: (3).
Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP October 2016 www.aiche.org/cep 29
Back to Basics
ate for the type of incident, as determined by the incident such as equipment failure and human error are readily
classification. observable. However, underlying management system root
3. Form a qualified incident investigation team. The causes and safety culture and leadership issues are often at
team should include members with relevant operational, the heart of process incidents (Figure 3). An incident may
maintenance, and engineering experience and skill in using be caused by (2, 5):
effective investigation methods. Additional participants, • equipment causal factors (P) — physical equipment
such as contractors or technical experts, should be involved failures, such as pump, valve, mixer, or relief valve failures
in the investigation as needed. • human causal factors (H) — human errors, such as an
4. Collect and preserve evidence to support the investiga- employee not following procedures, using incorrect tools, or
tion immediately following an incident. For example: incorrectly using personal protective equipment
• collect samples • management system root causes (S) — management
• isolate equipment that has failed system failures that allowed the equipment and/or human
• take pictures of the operating area and equipment posi- causal factors to occur or exist through failures to prevent,
tions (e.g., valve and switch positions) detect, and/or correct them; many of these are related to gaps
• collect data from control systems in process safety system elements, such as incomplete proce-
• interview personnel dures, poor preventive maintenance, ineffective training, or
• collect documents (e.g., management of change [MOC] incorrect risk analysis
forms, inspection records, process hazard analyses [PHAs]). • safety culture and leadership issues. Often the under
This information is used to establish a chronology of lying cause of management system failures, an organiza-
events and to provide facts for use in the investigation. tion’s safety culture can be influenced by factors such as
5. Analyze the evidence that has been collected using provision or allocation of resources, production priorities,
appropriate investigation methods to determine the causes of and major organizational changes. Many investigations will
the incident, as discussed in the following sections. not focus on developing recommendations at this level due
6. Develop recommendations that address the causes of to the potential difficulties in identifying issues and/or imple-
the incident. Recommendations should be specific for all menting improvements; however, it is beneficial to consider
identified causes and should clearly document the actions these issues during the incident investigation.
to be taken. Develop interim actions, if needed, to provide Investigation techniques can range from simpler meth-
temporary improvements while recommendations are being ods, such as 5-whys, to more-detailed root-cause methods,
implemented. such as cause-and-effect tree analysis (why tree). The
7. Document and share, as appropriate, the investigation method selected should be appropriate for the complexity
results. and severity of the incident.
8. Track all recommendations to make sure they are The 5-why method typically starts by questioning the
implemented within a reasonable time. primary consequences of the incident that occurred, such
9. Maintain data on incident types, severities, and
causes using appropriate metrics to help measure process
safety performance and to identify additional improvement Equipment
and Human
opportunities. Causal Factors
More Observable
Why trees — Identifying root causes Causes
Underlying Root
Causes
“When you have eliminated the
impossible, whatever remains, however Management
System
improbable, must be the truth.” Root Causes
— Sherlock Holmes
30 www.aiche.org/cep October 2016 CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
as “Why did a release of chlorine occur?” Based on the 4. Based on the confirming and contradicting data, the
answer, such as a transfer hose failed, an additional why team confirms or eliminates the hypothesis. Remaining
question is asked, such as “Why did the transfer hose fail?” hypotheses are possible causes until they can either be
Typically, at least five why questions are asked until root eliminated or confirmed based on additional investigation.
causes of the incident are identified. In this example, failure In some cases, it may be difficult to confirm or eliminate a
to properly specify which hose should have been purchased, hypothesis; if needed, the team can rely on their relevant
due to miscommunication with a vendor, might be one of experience to decide or to assign probabilities for the likeli-
the causes. Additional questions, such as “Why was the hood of the hypothesis.
incorrect hose not identified when received or before use?” 5. Continue to ask why and develop and eliminate
would also be appropriate to evaluate why the specification hypotheses for each question until management system root
error was not detected. Multiple 5-why question sets can causes are identified, based on hypotheses that are confirmed
also be used if needed for different incident factors, such by the data available.
as “Why was an operator near the chlorine cylinders?” or 6. Ask why at least one more time to determine whether
“Why were the cylinders stored at that location?” other possible management system root causes or safety
For complex incidents, the more-rigorous why tree culture or leadership issues can be identified.
method is often used to help ensure that the incident investi- Figure 4 shows a simple why tree for an operator fatality.
gation is comprehensive and unbiased. The why tree method Note that the why tree for this example is not fully devel-
employs the following steps: oped, and “other observations” or “other causes” are used
1. Identify the top event of interest (e.g., a serious injury for simplicity. The top event is the operator fatality, and
due to exposure to a hazardous chemical). previous investigation has already found that the fatality was
2. Determine what could have caused the top event of due to exposure to a toxic chemical.
interest by asking why and using cause-and-effect logic. The Employees observed a toxic chemical being vented from
investigation team should brainstorm to identify possible a cylinder in the storage area, so the first hypothesis (i.e.,
causes (hypotheses) of the event. the fatality occurred from a toxic release from a cylinder)
3. For each hypothesis, the team should collect and appears to be correct. From this starting point, ask the
document confirming and contradicting data. Confirming question “Why was the toxic chemical released from the cyl-
data support the conclusion that the hypothesis was a cause
of the effect (or why question); contradicting data refute the
conclusion.
Cause-and-Effect Logic
C ause-and-effect logic is used in a why tree to help
ensure that appropriate hypotheses explaining
incident causes are developed. Here is an example of
cause-and-effect logic:
Effect: The car stopped.
Potential causes (hypotheses): (1) It ran out of gas.
(2) The driver braked. (3) The engine failed. (4) The oil had
just been changed. (5) The driver had been shopping.
Logic test: Each of the first three hypotheses makes
sense as an individual cause. However, Hypothesis 4
involves a large step in logic, and Hypothesis 5 has no
apparent connection to the car stopping. It’s possible that
insufficient oil could lead to the car stopping, but only if it
causes the engine failure of Hypothesis 3. An error during
the oil change, such as failure to replace the oil or secure
the drain plug, could cause insufficient oil.
At each level in the why tree, all potential causes
should be listed and the cause-and-effect logic should be
tested. Hypotheses are listed using OR logic (the hypoth-
esis can produce the effect by itself) or AND logic (multi-
ple hypotheses must exist simultaneously to produce the p Figure 4. After an operator fatality, a why tree can be used to investigate
effect). OR logic is assumed in the figures. the incident and help determine the reasons it occurred. Note that this why
tree is not fully developed, and “other causes” are noted for simplicity.
Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP October 2016 www.aiche.org/cep 31
Back to Basics
32 www.aiche.org/cep October 2016 CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
Hazard consequences of the top event are shown on the
right side of the figure, depending on the avail-
ability and effectiveness of mitigative barriers and
Top
Event specific circumstances at the time of the event. The
different pathways give the diagram the shape of a
Threats Consequences
bow tie.
For example, a release of a flammable and
toxic material could result from high pressure in
a column and failure of instrumentation to detect
increasing pressure and other safeguards protect-
Preventive Barriers Mitigative Barriers ing against high pressure. Depending on the loca-
tion, release conditions, meteorological conditions
Process Hazards Hazardous Events Consequences at the time of the release, and the availability and
Toxicity Toxic Releases Fatalities/Injuries effectiveness of mitigative barriers, consequences
Flammability Fires and Explosions Environmental Harm
Reactivity Out-of-Control Reactions Property Damage
such as a fire or explosion could occur, which
could cause injuries, fatalities, environmental
p Figure 6. A bow tie diagram can be used to investigate the sequence and the success harm, and/or property damage.
or failure of protective layers to prevent and mitigate process incidents. Source: (8). The bow tie method allows analysis of:
• the sequence of expected protective (preven-
The use of a rigorous methodology, such as a why tree, tive and mitigative) layer actions
also helps eliminate the influence of bias or politics on the • which protective layers worked, which failed, and why
results because it uses cause-and-effect logic with factual • what protective layers could have been provided, but
evidence, rather than opinion, to identify incident causes. were not, and why.
Cause-and-effect tree methods involving root cause maps The bow tie method provides a visual of the incident that
(5) and similar approaches have been developed to help is useful for understanding what happened, for comparison
make incident investigations comprehensive, consistent, and to process hazard analysis studies, and for training related to
effective. the incident investigation. Bow tie analysis allows the inci-
dent investigation team to establish whether preventive and
Bow ties — Evaluating protective layers mitigative protection layers worked as designed (Figure 7). If
not, the failure can be investigated, using the 5-why or why
“To the curious incident of the dog in tree method to question why the barrier failed. For example,
the nighttime. The dog did nothing in the the team may ask “Why did the pressure switch fail?” or
nighttime. That was the curious incident.” “Why was the relief valve designed incorrectly?”
— Sherlock Holmes Similarly, the team should consider whether an addi-
tional protective layer could have been provided as part
A possible drawback of the why tree method is that the of the PHA study. If an additional layer could have been
sequence of events, such as the actions of protective layers implemented, the team should question why it was not
that have been provided, is often not evaluated in detail
beyond the event chronology. Protective layers help prevent
Effective Barrier
Available Barrier
Available Barrier
Missing Barrier
Failed Barrier
Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP October 2016 www.aiche.org/cep 33
Back to Basics
provided. For example, the team might ask, “Why wasn’t a • Potential evidence has not been collected or may have
high-pressure alarm provided?” or “Why did the PHA fail to been discarded (e.g., interviews, samples, equipment).
identify this scenario?” • Incident investigation methods have not been applied
As with the why tree method, trained experts and spe- correctly, stopping at easily observable causes (e.g., equip-
cialized software can help ensure appropriate application ment failure, human error) rather than searching deeper for
of the bow tie method. Use of both methods can ensure a management system root causes and other safety culture and
comprehensive incident investigation and help maximize the and leadership issues.
lessons learned to prevent future incidents. • The investigation has been excessively influenced by
individuals, organizational politics, or other circumstances.
Ensuring effective investigations • Recommendations are not provided for all incident
causes (including interim actions if needed before recom-
“You have been very remiss in not coming to mendations can be completed).
me sooner. You start me on my investigation • Recommendations are not clearly written in terms of
with a very serious handicap.” what needs to be done and why, so the person responsible
— Sherlock Holmes for follow-up is unsure of the proper action.
• Recommendations are not tracked to closure, and the
Even with the use of appropriate techniques as discussed, action taken is not clearly documented.
various problems can reduce the effectiveness of incident • Completed recommendations, such as system or proce-
investigations. Some of the problems include: dural changes, are not sustained.
• Potential incidents are not recognized or reported for Instead of succumbing to these problems, follow the cor-
possible investigation. rect procedures in Table 1 to complete an effective incident
• The incident investigation team does not include people investigation.
with the right experience and skills, including specialists Proper application of incident investigation steps
who may be needed part-time during the investigation. and techniques, like the approaches used by Sherlock
• The investigation is started too late and/or is rushed to Holmes, can enable you to successfully identify incident
meet reporting deadlines. causes, which will help maximize the knowledge you
Table 1. To complete a successful incident investigation,
gain from incidents and help prevent future incidents. We
follow these steps. never want serious incidents to occur. Conducting effective
incident investigations helps ensure that they do not occur
1. Begin the investigation as soon as possible.
again. CEP
2. Collect and preserve evidence immediately.
3. Form an investigation team with appropriate experience.
4. Provide sufficient time for the investigation.
Literature Cited
1. Doyle, A. C., “Sherlock Holmes: The Ultimate Collection,”
5. Apply appropriate methods to identify incident causes.
2nd ed., Maplewood Books (2014).
6. Apply appropriate methods to evaluate active, failed, and 2. Center for Chemical Process Safety, “Guidelines for Risk
missing protective layers. Based Process Safety,” American Institute of Chemical
7. Make specific, actionable recommendations for all findings. Engineers, New York, NY, and Wiley, Hoboken, NJ (2007).
3. U.S. Chemical Safety and Hazard Investigation Board,
8. Document and communicate incident findings. “Investigation Report: Vinyl Chloride Monomer Explosion,”
9. Ensure recommendations are completed in a timely manner. Report No. 2004-10-I-IL, Formosa Plastics Corp., Illiopolis, IL
(Mar. 2007).
10. Sustain improvements.
4. Ness, A., “Lessons Learned from Recent Process Safety
Incidents,” Chemical Engineering Progress, 111 (3), pp. 23–29
JAMES A. KLEIN (Email: jklein@absconsulting.com) joined ABS Consulting in (Mar. 2015).
2013 as a senior process safety consultant conducting risk and safety 5. Vanden Heuvel, L. N., et al., “Root Cause Analysis Handbook,”
culture assessments, compliance audits, and other process safety ser- 3rd Ed., Rothstein Associates, Inc., Brookfield, CT (2008).
vices. Prior to that, he was a senior process safety competency consul-
tant and PSM co-lead for North America Operations at DuPont and had 6. Hollnagel, E., “Barriers and Accident Prevention,” Ashgate
33 years of experience in process safety, engineering, and research. He Publishing, Aldershot, UK (2004).
has over 50 publications, conference presentations, and university talks 7. ABS Consulting, “Thesis Bowtie Software Solution,” www.
and has participated in several CCPS book projects, including as the abs-group.com/content/documents/Software/thesis-bowtie-risk-
committee leader for Conduct of Operations and Operational Discipline
and as a committee member for Risk Based Process Safety. He has a BS management-software.pdf (2016).
in chemical engineering from MIT, an MS in chemical engineering from 8. Klein, J. A., and B. K. Vaughen, “Process Safety: Key Concepts
Drexel Univ., and an MS in management of technology from the Univ. of and Practical Approaches,” CRC Press, Boca Raton, FL (2017).
Minnesota. He has been a member of AIChE since 1984.
34 www.aiche.org/cep October 2016 CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)