You are on page 1of 7

Back to Basics

The ChE as
Sherlock Holmes:
Investigating Process Incidents
James A. Klein Investigating process incidents to understand
ABS Group
what happened and why provides vital information
that will help prevent future incidents.

involve looking for clues and evidence. Both involve identi-


“There is nothing more deceptive than
fying and analyzing various possibilities to determine what
an obvious fact.” most likely happened and what didn’t.
— Sherlock Holmes
Process investigations can serve as a tool to ensure such

S
an incident does not happen again. While we may lack the
herlock Holmes is a fictional consulting detective cre- personal skills, insights, and confidence of Holmes, we can
ated by Sir Arthur Conan Doyle in 1887 (1), ultimately learn from him, learn from process incidents, and, ultimately,
appearing in over 50 novels and short stories, many of contribute to improved process safety performance.
which have been popularized in movies and on television. An incident is an unplanned sequence of events with the
He is renowned for his powers of observation and his ability potential for undesirable consequences (2). Process safety
to develop significant insights into the causes and perpe- incidents involve process hazards, such as toxicity, flamma-
trators of difficult criminal cases that others miss — often bility, and reactivity, that can lead to hazardous events, such
figuring out what happened when no one else can. In process as toxic releases, fires, and runaway reactions. A catastrophic
safety, we similarly need to analyze what happened when incident might involve a large explosion or toxic release
process incidents occur so we can determine what went with multiple consequences, such as injuries, environmental
wrong and how future incidents can be prevented. harm, and/or business disruption. For example, an explo-
Sometimes after a process safety incident, what hap- sion and fire at a polyvinyl chloride (PVC) manufacturing
pened may seem obvious. Usually, though, many things facility killed five workers and seriously injured three others
have to go wrong for a serious incident to occur, and one (Figure 1) (3).
or more of the causes may not be obvious. Even if some A near-miss incident is one with the potential for sig-
of them do seem obvious, our view is often incomplete or nificant consequences if it were to progress, but did not due
simply wrong. As a result, incident investigations must rely to the actions of safeguards, emergency response, or other
on appropriate investigation techniques to ensure that the factors.
lessons learned from an incident are valid and effective. You can learn from all types of process safety incidents.
The processes of solving criminal cases and investigating When something unplanned or unexpected occurs, the
process incidents are not very different at a high level. Both consequences should be identified to determine whether
*This article is based on a paper presented at the AIChE 2015 Spring an investigation is appropriate. Incident investigations
Meeting and 11th Global Congress on Process Safety, April 2015. must identify and document important facts about what

28  www.aiche.org/cep  October 2016  CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
t Figure 1. An explosion at a PVC manufacturing facility
resulted in five fatalities and ignited PVC resins stored in an
adjacent warehouse. Concerns about the ensuing smoke
forced a two-day community evacuation. Source: (3).

involvement, and legal action may exist. Under


these circumstances, it is difficult to ascertain the
framework of fact about what happened and how
it can be prevented in the future.
An effective incident investigation process,
as shown in Figure 2, can help overcome these
difficulties (2). The steps for investigating process
incidents are:
1. Report unusual occurrences to determine
whether an incident has occurred. Companies
should establish a simple process for prompt
reporting and train all employees on what should
be reported. If potential incidents are not reported,
then potential learnings that could help prevent
future incidents will be lost.
2. Determine the extent of investigation
happened; determine what might have caused the incident; required, based on management review of the initial inci-
and make recommendations to prevent such incidents from dent report and classification criteria in incident inves-
occurring in the future. Investigations are a mechanism for tigation program guidance. Some small events without
learning from actual operating experience, providing con- significant potential consequences, such as low-hazard
tinuing feedback on process safety program effectiveness, liquid spills, may not warrant an extensive investigation,
and identifying where improvements are needed. Keeping although metrics should be kept that track the number and
track of the types, severities, and causes of incidents can types of such occurrences. Events with either significant
help you gauge how well process safety programs are work- consequences or the potential for significant consequences
ing, how they can be improved, and the lessons that were may be investigated using different techniques appropri-
learned from the incidents (4).

Investigating process incidents

“The tragedy has been so uncommon,


so complete and of such personal
importance, to so many people, that
we are suffering from a plethora of
surmise, conjecture, and hypothesis. The difficulty
is to detach the framework of fact — of absolute
undeniable fact — from the embellishments of
theorists and reporters. Then, having established
ourselves upon this sound basis, it is our duty to
see what inferences may be drawn …”
— Sherlock Holmes

After a significant process incident, especially if there


have been fatalities, severe injuries, and major equipment
damage, site employees may be highly emotional and
under stress. Some may jump to conclusions about what
happened. Some may have a vested interest in assigning p Figure 2. Follow these steps to conduct an incident investigation.
blame for what happened. Threats of job loss, regulatory Source: Adapted from (2).

Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP  October 2016  www.aiche.org/cep  29
Back to Basics

ate for the type of incident, as determined by the incident such as equipment failure and human error are readily
classification. observable. However, underlying management system root
3. Form a qualified incident investigation team. The causes and safety culture and leadership issues are often at
team should include members with relevant operational, the heart of process incidents (Figure 3). An incident may
maintenance, and engineering experience and skill in using be caused by (2, 5):
effective investigation methods. Additional participants, • equipment causal factors (P) — physical equipment
such as contractors or technical experts, should be involved failures, such as pump, valve, mixer, or relief valve failures
in the investigation as needed. • human causal factors (H) — human errors, such as an
4. Collect and preserve evidence to support the investiga- employee not following procedures, using incorrect tools, or
tion immediately following an incident. For example: incorrectly using personal protective equipment
• collect samples • management system root causes (S) — management
• isolate equipment that has failed system failures that allowed the equipment and/or human
• take pictures of the operating area and equipment posi- causal factors to occur or exist through failures to prevent,
tions (e.g., valve and switch positions) detect, and/or correct them; many of these are related to gaps
• collect data from control systems in process safety system elements, such as incomplete proce-
• interview personnel dures, poor preventive maintenance, ineffective training, or
• collect documents (e.g., management of change [MOC] incorrect risk analysis
forms, inspection records, process hazard analyses [PHAs]). • safety culture and leadership issues. Often the under­
This information is used to establish a chronology of lying cause of management system failures, an organiza-
events and to provide facts for use in the investigation. tion’s safety culture can be influenced by factors such as
5. Analyze the evidence that has been collected using provision or allocation of resources, production priorities,
appropriate investigation methods to determine the causes of and major organizational changes. Many investigations will
the incident, as discussed in the following sections. not focus on developing recommendations at this level due
6. Develop recommendations that address the causes of to the potential difficulties in identifying issues and/or imple-
the incident. Recommendations should be specific for all menting improvements; however, it is beneficial to consider
identified causes and should clearly document the actions these issues during the incident investigation.
to be taken. Develop interim actions, if needed, to provide Investigation techniques can range from simpler meth-
temporary improvements while recommendations are being ods, such as 5-whys, to more-detailed root-cause methods,
implemented. such as cause-and-effect tree analysis (why tree). The
7. Document and share, as appropriate, the investigation method selected should be appropriate for the complexity
results. and severity of the incident.
8. Track all recommendations to make sure they are The 5-why method typically starts by questioning the
implemented within a reasonable time. primary consequences of the incident that occurred, such
9. Maintain data on incident types, severities, and
causes using appropriate metrics to help measure process
safety performance and to identify additional improvement Equipment
and Human
opportunities. Causal Factors
More Observable
Why trees — Identifying root causes Causes
Underlying Root
Causes
“When you have eliminated the
impossible, whatever remains, however Management
System
improbable, must be the truth.” Root Causes

— Sherlock Holmes

Once evidence related to an incident has been collected


Safety Culture and
and preserved, the investigation team must analyze the evi- Leadership Issues
dence to determine what happened and why. This involves
developing a chronology of events, or timeline, from
available instrumentation records, personnel interviews, and p Figure 3. Effective incident investigations identify the underlying root
other sources. Various incident investigation methodologies causes of the incident, in addition to the more readily observable equipment
may be used to determine the causes of the incident. Causes and human causal factors. Source: Adapted from (2).

30  www.aiche.org/cep  October 2016  CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
as “Why did a release of chlorine occur?” Based on the 4. Based on the confirming and contradicting data, the
answer, such as a transfer hose failed, an additional why team confirms or eliminates the hypothesis. Remaining
question is asked, such as “Why did the transfer hose fail?” hypotheses are possible causes until they can either be
Typically, at least five why questions are asked until root eliminated or confirmed based on additional investigation.
causes of the incident are identified. In this example, failure In some cases, it may be difficult to confirm or eliminate a
to properly specify which hose should have been purchased, hypothesis; if needed, the team can rely on their relevant
due to miscommunication with a vendor, might be one of experience to decide or to assign probabilities for the likeli-
the causes. Additional questions, such as “Why was the hood of the hypothesis.
incorrect hose not identified when received or before use?” 5. Continue to ask why and develop and eliminate
would also be appropriate to evaluate why the specification hypotheses for each question until management system root
error was not detected. Multiple 5-why question sets can causes are identified, based on hypotheses that are confirmed
also be used if needed for different incident factors, such by the data available.
as “Why was an operator near the chlorine cylinders?” or 6. Ask why at least one more time to determine whether
“Why were the cylinders stored at that location?” other possible management system root causes or safety
For complex incidents, the more-rigorous why tree culture or leadership issues can be identified.
method is often used to help ensure that the incident investi- Figure 4 shows a simple why tree for an operator fatality.
gation is comprehensive and unbiased. The why tree method Note that the why tree for this example is not fully devel-
employs the following steps: oped, and “other observations” or “other causes” are used
1. Identify the top event of interest (e.g., a serious injury for simplicity. The top event is the operator fatality, and
due to exposure to a hazardous chemical). previous investigation has already found that the fatality was
2. Determine what could have caused the top event of due to exposure to a toxic chemical.
interest by asking why and using cause-and-effect logic. The Employees observed a toxic chemical being vented from
investigation team should brainstorm to identify possible a cylinder in the storage area, so the first hypothesis (i.e.,
causes (hypotheses) of the event. the fatality occurred from a toxic release from a cylinder)
3. For each hypothesis, the team should collect and appears to be correct. From this starting point, ask the
document confirming and contradicting data. Confirming question “Why was the toxic chemical released from the cyl-
data support the conclusion that the hypothesis was a cause
of the effect (or why question); contradicting data refute the
conclusion.

Cause-and-Effect Logic
C ause-and-effect logic is used in a why tree to help
ensure that appropriate hypotheses explaining
incident causes are developed. Here is an example of
cause-and-effect logic:
Effect: The car stopped.
Potential causes (hypotheses): (1) It ran out of gas.
(2) The driver braked. (3) The engine failed. (4) The oil had
just been changed. (5) The driver had been shopping.
Logic test: Each of the first three hypotheses makes
sense as an individual cause. However, Hypothesis 4
involves a large step in logic, and Hypothesis 5 has no
apparent connection to the car stopping. It’s possible that
insufficient oil could lead to the car stopping, but only if it
causes the engine failure of Hypothesis 3. An error during
the oil change, such as failure to replace the oil or secure
the drain plug, could cause insufficient oil.
At each level in the why tree, all potential causes
should be listed and the cause-and-effect logic should be
tested. Hypotheses are listed using OR logic (the hypoth-
esis can produce the effect by itself) or AND logic (multi-
ple hypotheses must exist simultaneously to produce the p Figure 4. After an operator fatality, a why tree can be used to investigate
effect). OR logic is assumed in the figures. the incident and help determine the reasons it occurred. Note that this why
tree is not fully developed, and “other causes” are noted for simplicity.

Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP  October 2016  www.aiche.org/cep  31
Back to Basics

inder?” The investigation team must brainstorm the answer


to this question with credible causes (hypotheses) based on
cause-and-effect logic. The team assesses each hypothesis
based on the confirming and contradicting data that were
collected as evidence. Evidence is then documented as sup-
porting or disputing each hypothesis.
Hypotheses that are contradicted or proven false by the
data are not considered further (Figure 5). For example, the
team hypothesized that the toxic release from the cylinder
was caused by a cylinder failure, a transfer hose failure,
or possibly other causes that are not listed in this example.
Examination and testing of the cylinder and the transfer
hose confirmed that the hose failed, so the team was able to
eliminate the cylinder failure hypothesis.
At each level in the why tree, additional why questions
are asked, additional hypotheses are developed using cause-
and-effect logic, and hypotheses are supported or eliminated
based on confirming or contradicting evidence. If evidence
confirming or eliminating a hypothesis is not available,
the team can use their knowledge (or that of consultants,
if involved) to decide whether to confirm or eliminate the
hypothesis or to assign probabilities for its likelihood.
In Figure 5, hypotheses for the transfer hose failure
include it was an old hose, it was not correctly connected to
p Figure 5. After a hypothesis is disproven based on factual evidence and
the cylinder, it was the wrong hose, or other possible causes. investigation, it can be eliminated from the why tree.
The investigation team reviewed the available data and
determined that the wrong transfer hose was used, which If the team uncovers additional observations, such as a
allowed them to eliminate the other hypotheses. This process similar chemical release that occurred the previous year, then
continues until the management system root causes have the process should be repeated until the root causes have
been identified. been found for all relevant observations. Was the previous
Identifying the management system root causes is the incident investigated effectively and were recommendations
most beneficial way to prevent future incidents, because appropriate and resolved? Ultimately, the need to identify the
they have a broader impact on process safety at an entire root causes requires a thorough incident investigation that
site or within an organization. For example, if training is looks deeply into what went wrong rather than a super­ficial
ineffective, asking why it is ineffective can lead to signif- focus on what is apparent or obvious about the incident.
icant training improvements. If only obvious equipment Trained why tree leaders, either internal staff or external
failures or human causal factors are considered, one pump consultants, are often used because the method can seem
might be fixed or one operator might be retrained, but the complex if it is not practiced often (which hopefully is the
system-level improvements that could reduce overall pump case for most team members because there are few incidents
failures or operator errors will likely be overlooked. needing to be investigated).
In Figure 5, the transfer hose specification was deter- Brainstorming with cause-and-effect logic to identify
mined to have been incorrectly changed by an engineer as possible hypotheses is a powerful method for identifying
part of an MOC request. Further investigation of this simple incident causes that may not be obvious. This method goes
example would evaluate why this system failure occurred, beyond the simpler approach of the 5-why method. Con-
why the incorrect hose specification was not corrected as structing a why tree for all relevant observations provides
part of the MOC reviews, and why it was not detected when a comprehensive investigation that does not focus solely
the hose was received. Whenever the management system on the obvious top event of interest. For example, a why
root cause is determined, it is useful to ask why at least one tree could be developed for the observation that a similar
more time to determine whether more can be learned about incident occurred previously (as noted above), a chemical
why the system failure occurred, such as another manage- storage location had been changed recently, required per-
ment system root cause or issues related to safety culture or sonal protective equipment (PPE) had been modified, or an
leadership. operator was working alone for the first time.

32  www.aiche.org/cep  October 2016  CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)
Hazard consequences of the top event are shown on the
right side of the figure, depending on the avail-
ability and effectiveness of mitigative barriers and
Top
Event specific circumstances at the time of the event. The
different pathways give the diagram the shape of a
Threats Consequences
bow tie.
For example, a release of a flammable and
toxic material could result from high pressure in
a column and failure of instrumentation to detect
increasing pressure and other safeguards protect-
Preventive Barriers Mitigative Barriers ing against high pressure. Depending on the loca-
tion, release conditions, meteorological conditions
Process Hazards Hazardous Events Consequences at the time of the release, and the availability and
Toxicity Toxic Releases Fatalities/Injuries effectiveness of mitigative barriers, consequences
Flammability Fires and Explosions Environmental Harm
Reactivity Out-of-Control Reactions Property Damage
such as a fire or explosion could occur, which
could cause injuries, fatalities, environmental
p Figure 6. A bow tie diagram can be used to investigate the sequence and the success harm, and/or property damage.
or failure of protective layers to prevent and mitigate process incidents. Source: (8). The bow tie method allows analysis of:
• the sequence of expected protective (preven-
The use of a rigorous methodology, such as a why tree, tive and mitigative) layer actions
also helps eliminate the influence of bias or politics on the • which protective layers worked, which failed, and why
results because it uses cause-and-effect logic with factual • what protective layers could have been provided, but
evidence, rather than opinion, to identify incident causes. were not, and why.
Cause-and-effect tree methods involving root cause maps The bow tie method provides a visual of the incident that
(5) and similar approaches have been developed to help is useful for understanding what happened, for comparison
make incident investigations comprehensive, consistent, and to process hazard analysis studies, and for training related to
effective. the incident investigation. Bow tie analysis allows the inci-
dent investigation team to establish whether preventive and
Bow ties — Evaluating protective layers mitigative protection layers worked as designed (Figure 7). If
not, the failure can be investigated, using the 5-why or why
“To the curious incident of the dog in tree method to question why the barrier failed. For example,
the nighttime. The dog did nothing in the the team may ask “Why did the pressure switch fail?” or
nighttime. That was the curious incident.” “Why was the relief valve designed incorrectly?”
— Sherlock Holmes Similarly, the team should consider whether an addi-
tional protective layer could have been provided as part
A possible drawback of the why tree method is that the of the PHA study. If an additional layer could have been
sequence of events, such as the actions of protective layers implemented, the team should question why it was not
that have been provided, is often not evaluated in detail
beyond the event chronology. Protective layers help prevent
Effective Barrier

Available Barrier
Available Barrier
Missing Barrier

and mitigate potential hazardous events, and are imple-


Failed Barrier

Failed Barrier

mented based on process hazard and risk analysis. Bow ties


are often used, separately or to supplement the why tree
technique, for investigating incidents (6, 7). The bow tie
method allows a more direct analysis of the protective layers
that existed and whether they worked as designed in the
sequence expected. Why did
Figure 6 is an example of a bow tie diagram (8). A these barriers Was a barrier
fail? missing? Why?
process hazard in the top middle of the figure causes the
top event, which could be a loss of containment, runaway p Figure 7. A bow tie diagram helps an incident investigation team
consider the sequence and effectiveness of protective layers (what
reaction, or other event. Various threats, based on physical, safeguards worked or failed and why), as well as determine why additional
human, or system failures, can cause the top event if the protective layers were not provided (what safeguards could have been
preventive barriers or protective layers fail. The potential provided but were not and why).

Copyright © 2016 American Institute of Chemical Engineers (AIChE) CEP  October 2016  www.aiche.org/cep  33
Back to Basics

provided. For example, the team might ask, “Why wasn’t a • Potential evidence has not been collected or may have
high-­pressure alarm provided?” or “Why did the PHA fail to been discarded (e.g., interviews, samples, equipment).
identify this scenario?” • Incident investigation methods have not been applied
As with the why tree method, trained experts and spe- correctly, stopping at easily observable causes (e.g., equip-
cialized software can help ensure appropriate application ment failure, human error) rather than searching deeper for
of the bow tie method. Use of both methods can ensure a management system root causes and other safety culture and
comprehensive incident investigation and help maximize the and leadership issues.
lessons learned to prevent future incidents. • The investigation has been excessively influenced by
individuals, organizational politics, or other circumstances.
Ensuring effective investigations • Recommendations are not provided for all incident
causes (including interim actions if needed before recom-
“You have been very remiss in not coming to mendations can be completed).
me sooner. You start me on my investigation • Recommendations are not clearly written in terms of
with a very serious handicap.” what needs to be done and why, so the person responsible
— Sherlock Holmes for follow-up is unsure of the proper action.
• Recommendations are not tracked to closure, and the
Even with the use of appropriate techniques as discussed, action taken is not clearly documented.
various problems can reduce the effectiveness of incident • Completed recommendations, such as system or proce-
investigations. Some of the problems include: dural changes, are not sustained.
• Potential incidents are not recognized or reported for Instead of succumbing to these problems, follow the cor-
possible investigation. rect procedures in Table 1 to complete an effective incident
• The incident investigation team does not include people investigation.
with the right experience and skills, including specialists Proper application of incident investigation steps
who may be needed part-time during the investigation. and techniques, like the approaches used by Sherlock
• The investigation is started too late and/or is rushed to Holmes, can enable you to successfully identify incident
meet reporting deadlines. causes, which will help maximize the knowledge you
Table 1. To complete a successful incident investigation,
gain from incidents and help prevent future incidents. We
follow these steps. never want serious incidents to occur. Conducting effective
incident investigations helps ensure that they do not occur
1. Begin the investigation as soon as possible.
again. CEP
2. Collect and preserve evidence immediately.
3. Form an investigation team with appropriate experience.
4. Provide sufficient time for the investigation.
Literature Cited
1. Doyle, A. C., “Sherlock Holmes: The Ultimate Collection,”
5. Apply appropriate methods to identify incident causes.
2nd ed., Maplewood Books (2014).
6. Apply appropriate methods to evaluate active, failed, and 2. Center for Chemical Process Safety, “Guidelines for Risk
missing protective layers. Based Process Safety,” American Institute of Chemical
7. Make specific, actionable recommendations for all findings. Engineers, New York, NY, and Wiley, Hoboken, NJ (2007).
3. U.S. Chemical Safety and Hazard Investigation Board,
8. Document and communicate incident findings. “Investigation Report: Vinyl Chloride Monomer Explosion,”
9. Ensure recommendations are completed in a timely manner. Report No. 2004-10-I-IL, Formosa Plastics Corp., Illiopolis, IL
(Mar. 2007).
10. Sustain improvements.
4. Ness, A., “Lessons Learned from Recent Process Safety
Incidents,” Chemical Engineering Progress, 111 (3), pp. 23–29
JAMES A. KLEIN (Email: jklein@absconsulting.com) joined ABS Consulting in (Mar. 2015).
2013 as a senior process safety consultant conducting risk and safety 5. Vanden Heuvel, L. N., et al., “Root Cause Analysis Handbook,”
culture assessments, compliance audits, and other process safety ser- 3rd Ed., Rothstein Associates, Inc., Brookfield, CT (2008).
vices. Prior to that, he was a senior process safety competency consul-
tant and PSM co-lead for North America Operations at DuPont and had 6. Hollnagel, E., “Barriers and Accident Prevention,” Ashgate
33 years of experience in process safety, engineering, and research. He Publishing, Aldershot, UK (2004).
has over 50 publications, conference presentations, and university talks 7. ABS Consulting, “Thesis Bowtie Software Solution,” www.
and has participated in several CCPS book projects, including as the abs-group.com/content/documents/Software/thesis-bowtie-risk-­
committee leader for Conduct of Operations and Operational Discipline
and as a committee member for Risk Based Process Safety. He has a BS management-software.pdf (2016).
in chemical engineering from MIT, an MS in chemical engineering from 8. Klein, J. A., and B. K. Vaughen, “Process Safety: Key Concepts
Drexel Univ., and an MS in management of technology from the Univ. of and Practical Approaches,” CRC Press, Boca Raton, FL (2017).
Minnesota. He has been a member of AIChE since 1984.

34  www.aiche.org/cep  October 2016  CEP Copyright © 2016 American Institute of Chemical Engineers (AIChE)

You might also like