This action might not be possible to undo. Are you sure you want to continue?
Root Cause Analysis For Beginners
by James J. Rooney and Lee N. Vanden Heuvel
oot cause analysis (RCA) is a process designed for use in investigating and categorizing the root causes of events with safety, health, environmental, quality, reliability and production impacts. The term “event” is used to
In 50 Words Or Less
• Root cause analysis helps identify what, how and why something happened, thus preventing recurrence. • Root causes are underlying, are reasonably identifiable, can be controlled by management and allow for generation of recommendations. • The process involves data collection, cause charting, root cause identification and recommendation generation and implementation.
generically identify occurrences that produce or have the potential to produce these types of consequences. Simply stated, RCA is a tool designed to help identify not only what and how an event occurred, but also why it happened. Only when investigators are able to determine why an event or failure occurred will they be able to specify workable corrective measures that prevent future events of the type observed. Understanding why an event occurred is the key to developing effective recommendations. Imagine an occurrence during which an operator is instructed to close valve A; instead, the operator closes valve B. The typical investigation would probably conclude operator error was the cause. This is an accurate description of what happened and how it happened. However, if the analysts stop here, they have not probed deeply enough to understand the reasons for the mistake. Therefore, they do not know what to do to prevent it from occurring again. In the case of the operator who turned the wrong valve, we are likely to see recommendations such as retrain the operator on the procedure, remind all operators to be alert when
I JULY 2004 I 45
If the analysts arrive at vague recommendations such as. rence. We must also identify a root cause that management can influence. Root causes are those over which management has control. the root causes identified across the population of occurrences can be used to target major opportunities for improvement. Such recommendations do little to prevent future occurrences. over time. Recommendations should directly address the root causes identified during the investigation.QUALITY BASICS manipulating valves or emphasize to all personnel that careful attention to the job should be maintained at all times. Identifying “severe weather” as the root cause of parts not being delivered on time to customers is not appropriate. 2. a significant number of analyses point to procurement inadequacies. An added benefit of an effective RCA is that. It is not practical to keep valuable manpower occupied indefinitely searching for the root causes of occurrences. example recommendations might include revising the procedure or performing procedure validation to ensure references to valves match the valve labels found in the field. 3. In the case of the valve error. 2. Occurrence investigations must be cost beneficial. for example. the easier it will be to arrive at recommendations that will prevent recurrence. Identifying root causes is the key to preventing similar recurrences. Trending of root causes allows development of systematic improvements and assessment of the impact of corrective programs. then resources can be focused on improvement of this management system. Such causes are not specific enough to allow management to make effective changes. Causal factor charting. Root causes are those that can reasonably be identified. Definition Although there is substantial debate on the definition of root cause. Analysts should avoid using general cause classifications such as operator error. Root causes are those for which effective recommendations can be generated. Root causes are underlying causes. Root causes are those for which effective recommendations for preventing recurrences can be generated. 4. equipment failure or external factor. Generally. The more specific the investigator can be about why an event occurred. Four Major Steps The RCA is a four-step process involving the following: 1. Root causes are those that can reasonably be identified. Root causes are those management has control to fix. In the case of the valve error. Root causes are specific underlying causes. If.org . The investigator’s goal should be to identify specific underlying causes.asq. Severe weather is not controlled by management. mistakes do not just happen but can be traced to some well-defined causes. we might ask. “Improve adherence to written policies and procedures. 46 I JULY 2004 I www. Management needs to know exactly why a failure occurred before action can be taken to prevent recurrence. “Was the procedure confusing? Were the valves clearly labeled? Was the operator familiar with this particular task?” The answers to these and other questions will help determine why the error took place and what the organization can do to prevent recur- Identifying “severe weather” as the root cause of parts not being delivered on time to customers is not appropriate.” then they probably have not found a basic and specific enough cause and need to expend more effort in the analysis process. Structured RCA helps analysts get the most out of the time they have invested in the investigation. we use the following: 1. Data collection.
oil Mary Jane rings the doorbell Mary Fire generates smoke Fire starts on the stove Mary sees the fire on the stove Fire extinguisher is not charged Mary Jane. Mary Mary Mary Mary Mary begins frying chicken 5:00 pm Mary leaves the frying chicken unattended CF Mary Smoke detector alarms About 5:10 pm Mary runs into the kitchen Mary tries to use the fire extinguisher Mary meets with Jane 10 minutes Fire extinguisher does not operate when Mary tries to use it CF Pan Mary Mary uses an aluminum pan Mary pulls the plug on the fire extinguisher Is "plug" the same as pin? Mary Does Mary know how to use a fire extinguisher? Mary CF = Causal factor Figure 1 continued on next page QUALITY PROGRESS I JULY 2004 I 47 .FIGURE 1 Causal Factor Chart Burner Part one Electric burner shorts out CF Pan Arcing heats bottom of aluminum pan Had it not been originally charged? Fire extinguisher Pan Jane Aluminum melts. floor Jane comes to the door Conclusion Had it been previously used? Inspection tag Mary Jane. pan. forming hole in pan What exactly did she see? Mary Assumed Mary Had it leaked? Fire extinguisher. Mary Grease ignites when it contacts burner How much oil is used? How much chicken? Chicken.
The causal factor chart is simply a sequence diagram with logic tests that describes the events leading up to an occurrence. Step two—Causal factor charting. When the entire occurrence has been charted out. Consequently. would have either prevented the occurrence or reduced its severity. called causal factors. the investigators are in a good position to identify the major contributors to the incident. 4. pan What is Jane doing during this time? Mary. The causal factor chart should 48 drive the data collection process by identifying data needs. FD Observation FD. Step one—data collection. the most visible causal factor is given all the attention. Step three—root cause identification. the occurrence may repeat itself because the organization did not learn all that it could from the event. The majority of time spent analyzing an event is spent in gathering data.asq. When only one obvious causal factor is addressed. After all the causal factors have been identified. the causal factors and root causes associated with the event cannot be identified. the list of recommendations will likely not be complete. They begin with a skeleton chart that is modified as more relevant facts are uncovered. 47). the investigators begin root cause identification. plus the conditions surrounding these events (see Figure 1. Without complete information and an understanding of the event. Data collection continues until the investigators are satisfied with the thoroughness of the chart (and hence are satisfied with the thoroughness of the investigation). This step I JULY 2004 I www. Mary Mary. events are usually the result of a combination of contributors. Causal factors are those contributors (human errors and component failures) that. p.QUALITY BASICS Part two Did she know this was wrong? Lack of practice fighting fires? Did she do anything else? Mary Mary Mary. however. The first step in the analysis is to gather data. Preparation of the causal factor chart should begin as soon as investigators start to collect information about the occurrence. Recommendation generation and implementation. In many traditional analyses. Causal factor charting provides a structure for investigators to organize and analyze the information gathered during the investigation and identify gaps and deficiencies in knowledge as the investigation progresses. Rarely.org . Jane Was Mary trying to do this? Mary Mary Fire was a grease fire How long did it take for the FD to arrive? FD dispatcher Did the FD use the correct techniques? FD Kitchen. Root cause identification. is there just one causal factor. observation Kitchen destroyed by fire Mary throws water on the fire CF Fire spreads throughout the kitchen Mary calls the fire department Time? Fire department arrives Time? Fire department puts out fire Time? Other losses from smoke and water damage? 3. if eliminated.
but the kitchen was destroyed. Following identification of the root causes for a particular causal factor. Mary. Because data from people are the most fragile. but the completed causal factor chart and causal factor summary tables provide most of the information required by most reporting systems.involves the use of a decision diagram called the Root Cause Map (see Figure 2. After about 10 minutes. I threw water on the fire. The analysts then developed the causal factor chart (see Figure 1. In addition. Photographs were taken to record physical and position data. • In the first column. The following narrative is the account of the event according to Mary: It was 5 p. I reached for the fire extinguisher and pulled the plug. Jane and the firefighters were interviewed immediately after the fire. allowing the reader to focus on the analysis process and not the technical aspects of the situation. Presentation of Results Root cause summary tables (see Table 1. The end result of an RCA investigation is generally an investigation report. 52) can organize the information compiled during data analysis. the most visible causal factor is given all the attention. In desperation. The fire extinguisher was not charged. Nothing happened. The fire department arrived in time to save the rest of the house. I invited her into the living room so we could talk. Step four—recommendation generation and implementation.m. Each column represents a major aspect of the RCA process. The map structures the reasoning process of the investigators by helping them answer questions about why particular causal factors exist or occurred. The root cause analyst is often not responsible for the implementation of recommendations generated by the analysis. Use of this three-column format aids the investigator in ensuring root causes and recommendations are developed for each causal factor. • The third column presents recommendations to address each of the root causes identified.m. 47) to clearly define the sequence of events that led to the fire. The fire spread throughout the kitchen. p. the effort expended in performing the analysis is wasted. achievable recommendations for preventing its recurrence are then generated. the events that triggered the analysis should be expected to recur. I was frying chicken. • The second column shows the Path or Paths through the Root Cause Map associated with the causal factor. if the recommendations are not implemented. p. As the chart develops from QUALITY PROGRESS I JULY 2004 I 49 . I ran into the kitchen and found a fire on the stove. and she was very upset. I called the fire department. The identification of root causes helps the investigator determine the reasons the event occurred so the problems surrounding the occurrence can be addressed. The format of the report is usually well defined by the administrative documents governing the particular reporting sys- Data gathering began as soon as possible after the event to prevent loss or alteration of the data. Example Problem The following example is nontechnical. However. the smoke detector near the kitchen came on. Organizations need to ensure that recommendations are tracked to completion. p. a general description of the causal factor is presented along with sufficient background information for the reader to be able to understand the need to address this causal factor. tem. root cause identification and recommendation generation. My friend Jane stopped by on her way home from the doctor. The next step is the generation of recommendations. 50) to identify the underlying reason or reasons for each causal factor. The RCA team toured the area as soon as the fire In many traditional analyses. department declared it safe. The causal factor chart begins with the event. Mary begins frying chicken at 5 p.
contradictory or incomplete 61 • Technical error 62 • Responsibility for item/activity not adequately defined 63 • Planning. scheduling or tracking of work activities LTA 64 • Rewards/incentives LTA 65 • Employee screening/ hiring LTA 66 Safety/hazard/ risk review 72 • Review LTA or not performed 74 • Recommendations not yet implemented 75 • Risk acceptance criteria LTA 76 • Review procedure LTA 77 Product/material control 85 • Handling LTA 87 • Storage LTA 88 • Packaging/ shipping LTA 89 • Unauthorized material substitution 90 • Product acceptance criteria LTA 91 • Product inspections LTA 92 Procurement control 93 • Purchasing specifications LTA 95 • Control of changes to procurement specifications LTA 96 • Material acceptance requirements LTA 97 • Material inspections LTA 98 • Contractor selection LTA 99 SPACs not used 67 • Communication of SPACs LTA 69 • Recently changed • Enforcement LTA 70 71 Problem identification control 78 • Problem reporting LTA 80 • Problem analysis LTA 81 • Audits LTA 82 • Corrective action LTA 83 • Corrective actions not yet implemented 84 Not used 112 • Not available or inconvenient to obtain 113 • Procedure difficult to use 114 • Use not required but should be 115 • No procedure for task 116 Misleading/confusing 117 • Format confusing or LTA 118 • More than one action per step 120 • No checkoff space provided but should be 121 • Inadequate checklist 122 • Graphics LTA 123 • Ambiguous or confusing instructions/ requirements 124 • Data/computations wrong/incomplete 125 • Insufficient or excessive references 126 • Identification of revised steps LTA 127 • Level of detail LTA 128 • Difficult to identify 129 Wrong/incomplete 130 • Typographical error 131 • Sequence wrong 132 • Facts wrong/ requirements not correct 133 • Wrong revision or expired procedure revision used 134 • Inconsistency between requirements 135 • Incomplete/situation not covered 136 • Overlap or gaps between procedures 137 Figure 2 continued on next page 50 I JULY 2004 I www.QUALITY BASICS FIGURE 2 Root Cause Map Start here with each causal factor.org .asq. Proactive maintenance LTA 41 • Event specification LTA 42 • Monitoring LTA 43 • Scope LTA 44 • Activity implementation LTA 45 Failure finding maintenance LTA 46 • Frequency LTA 47 • Scope LTA 48 • Troubleshooting/ corrective action LTA 49 • Repair implementation 50 Routine equipment rounds LTA 51 • Frequency LTA 52 • Scope LTA 53 • Activity implementation LTA 54 Document and configuration control 100 • Change not identified 102 • Verification of design/ field changes LTA (no PSSR*) 103 • Documentation content not kept up to date 104 • Control of official documents LTA 105 Customer interface/ services 106 • Customer requirements not identified 108 • Customer needs not addressed 109 • Implementation LTA 110 Standards. 1 1 Equipment difficulty 2 Section one Equipment design problem 5 Equipment reliability program problem 6 Installation/ fabrication 7 Equipment misuse 8 2 Design input/ output 15 Design input LTA 16 Design output LTA 17 Equipment records 18 Equipment reliability program design less than adequate (LTA) 21 No program 22 Program LTA 23 • Analysis/design procedure LTA 24 • Inappropriate type of maintenance assigned 25 • Risk acceptance criteria LTA 26 • Allocation of resources LTA 27 Equipment reliability program implementation LTA 28 Corrective maintenance LTA 29 • Troubleshooting/corrective action LTA 30 • Repair implementation LTA 31 Preventive maintenance LTA 32 • Frequency LTA 33 • Scope LTA 34 • Activity implementation LTA 35 Predictive maintenance LTA 36 • Detection LTA 37 • Monitoring LTA 38 • Troubleshooting/ corrective action LTA 39 • Activity implementation LTA 40 Administrative/ management systems 55 Procedures 111 Equipment design records LTA 19 Equipment operating/ maintenance history LTA 20 Note: Node numbers correspond to matching page in Appendix A of the Root Cause Analysis Handbook. policies or administrative controls (SPACs) LTA 57 • No SPACs 59 • Not strict enough 60 • Confusing.
1 Personal difficulty 1 Section Two 3 Other difficulty 4 Company employee 9 Contract employee 10 Natural phenomena 11 Sabotage/ horseplay 12 External events Other 13 14 2 Human factors engineering 138 No training 164 • Decision not to train 165 • Training requirements not identified 166 Immediate supervision 180 Training LTA 170 • Job/task analysis LTA 171 • Program design/ objectives LTA 172 • Lesson content LTA 174 • On-the-job training LTA 175 • Qualification testing LTA 176 • Continuing training LTA 177 • Training resources LTA 178 • Abnormal events/ emergency training LTA 179 Preparation 181 • No preparation 182 • Job plan LTA 183 • Instructions to workers LTA 184 • Walkthrough LTA 185 • Scheduling LTA 186 • Worker selection/ assignment LTA 187 Supervision during work 188 • Supervision LTA 189 • Improper performance not corrected 190 • Teamwork LTA 191 Personal performance 208 Problem detection LTA 209 *Sensory/perceptual capabilities LTA 210 *Reasoning capabilities LTA 211 *Motor/physical capabilities LTA 212 *Attitude/attention LTA 213 *Rest/sleep LTA (fatigue) 214 *Personal/medication problems 215 Training 163 Training records system LTA 167 • Training records incorrect 168 • Training records not up to date 169 Communications 192 No communication or not timely 194 • Method unavailable or LTA 195 • Communication between work groups LTA 196 • Communication between shifts and management LTA 197 • Communication with contractors LTA 198 • Communication with customers LTA 199 Misunderstood communication 200 • Standard terminology not used 201 • Verification/ repeat back not used 202 • Long message 203 Wrong instructions 204 Job turnover LTA 205 • Communication within shifts LTA 206 • Communication between shifts LTA 207 *PSSR = Project scope summary report Shape Description Primary difficulty source Problem category Workplace layout 140 • Controls/displays LTA 141 • Control/display integration/ arrangement LTA 143 • Location of controls/displays LTA 144 • Conflicting layouts 145 • Equipment location LTA 146 • Labeling of equipment or locations LTA 147 Work environment 148 • Housekeeping LTA 149 • Tools LTA 150 • Protective clothing/ equipment LTA 151 • Ambient conditions LTA 152 • Other environmental stresses excessive 154 Workload 155 • Excessive control action requirements 156 • Unrealistic monitoring requirements 157 • Knowledge based decision required 158 • Excessive calculation or data manipulation required 159 Intolerant system 160 • Errors not detectable 161 • Errors not correctable 162 Root cause category Near root cause Root cause © 1995. 2000 and 2001. ABSG Consulting Inc.Start here with each causal factor. 1997. QUALITY PROGRESS I JULY 2004 I 51 . 1999. *Note: These nodes are for descriptive purposes only.
• Review other skill based activities to ensure appropriate level of hands-on training is provided. Equipment reliability program design LTA. Abnormal events/emergency training LTA. Equipment reliability program problem. • Verify that all fire extinguishers are on the quarterly fire extinguisher audit list. Causal factor # 2 Description: Electric burner element fails (shorts out).classroom. simulator. Recommendations • Refill the fire extinguisher. Paths Through Root Cause Map is a trademark of ABSG Consulting. • Consider alternative methods for preparing chicken that may involve fewer hazards. Paths Through Root Cause Map • Personnel difficulty. policies or administrative controls (SPACs) less than adequate (LTA). • Determine whether policies should be developed for other types of hazards in the facility to ensure they are not left unattended. Equipment proactive maintenance LTA. • • • • Paths Through Root Cause Map Equipment difficulty. Company employee. • Add this fire extinguisher to the audit list. • No SPACs. such as baking the chicken or purchasing the finished product from a supplier. • Have all maintenance work requests that involve fire protection equipment routed to the safety engineer so the quarterly checklists can be modified as required. • Review the training development process to ensure adequate guidance is provided for determining the proper training setting (for example. • Standards.QUALITY BASICS TABLE 1 Root Cause Summary Table Event #: 2003-1 Recommendations • Implement a policy that hot oil is never left unattended on the stove. • • • • Paths Through Root Cause Map Equipment difficulty. Causal factor # 4 Description: Mary throws water on fire. Recommendations • Replace all burners on stove. Administrative/management systems. • Administrative/management systems.asq. Training. • Have incident reports describing the use of fire protection equipment routed to maintenance to trigger refilling of the fire extinguishers. • Inspect other fire extinguishers in the facility to ensure they are full. • Develop a preventive maintenance strategy to periodically replace the burner elements. Causal factor # 1 Description: Mary leaves the frying chicken unattended. on the job training. Recommendations • Provide practical (hands-on) training on the use of fire extinguishers. • • • • • Paths Through Root Cause Map Personnel difficulty. lab. • Modify the risk assessment process or procedure development process to address requirements for personnel attendance during process operations. • • • • Equipment difficulty. computer based training). Equipment reliability program problem. Classroom training may be insufficient to adequately learn this skill.org . Equipment reliability program problem. No program. 52 I JULY 2004 I www. Problem identification and control LTA. Causal factor # 3 Description: Fire extinguisher does not operate when Mary tries to use it. Event description: Kitchen is destroyed by fire and damaged by smoke and water. Training LTA. Activity implementation LTA.
Ted S. Occupational Safety and Health Administration Accident Investigation Course. LEE N. BIBLIOGRAPHY Accident/Incident Investigation Manual. co-developed the RootCause Leader software and was a co-author of the Center for Chemical Process Safety’s Guidelines for Investigating Chemical Process Incidents. The loss events—kitchen destroyed by fire and other losses from smoke and water damage—are the shaded rectangles in the causal factor chart. DOE/SSDC 76-45/14. Root Cause Analysis Handbook: A Guide to Effective Investigation.’s Risk Consulting Division in Knoxville. Note the recommendations in Table 1 are written as if Mary’s house were an industrial facility. starting the fire. 1985. second edition. Numerous questions are usually generated that identify additional necessary data. If the wrong causal factor is identified. Department of Energy.left to right. Department of Energy. Center for Chemical Process Safety. quality auditor-hazard analysis and critical control points. American Institute of Chemical Engineers. quality engineer. Please comment If you would like to comment on this article. Development always starts at the end because that is always a known fact. 1991 (and earlier versions). Office of Training and Education. It wasn’t overheating of the oil or splattering of the oil that ignited the fire. VANDEN HEUVEL is a senior risk and reliability engineer with ABSG Consulting Inc. 1986. The analyst must be willing to probe the data first to determine what happened during the occurrence. second to describe how it happened.I. Savannah River Laboratory. Root Cause Analysis Handbook. Department of Energy. duPont de Nemours. WSRC-IM-91-3. There are four causal factors for this event (see Table 1). TN. Although we read the chart from left to right. the sequences begin to unfold. Guidelines for Investigating Chemical Process Incidents. 1992.org. Elimination of these causal factors would have either prevented the occurrence or reduced its severity.. Events and Causal Factors Charting. The application of the technique identified that the electric burner element failed by shorting out. 1999. 1993. He develops and teaches courses on the subject. He earned a master’s degree in nuclear engineering from the University of Wisconsin. TN. Rooney is a Fellow of ASQ and an ASQ certified quality auditor. John Wiley and Sons. quality improvement associate. DOE/SSDC 76-45/27. Modern Accident Investigation and Analysis. Vanden Heuvel co-authored the Root Cause Analysis Handbook: A Guide to Effective Incident Investigation. He earned a master’s degree in nuclear engineering from the University of Tennessee. User’s Guide for Reactor Incident Root Cause Coding Tree. QUALITY PROGRESS I JULY 2004 I 53 . quality manager and reliability engineer. JAMES J. second edition. DPST-87-209.asq. ABSG Consulting Inc. Logic and time tests are used to build the chart back to the beginning of the event. it is developed from right to left (backwards). revision five. The short melted Mary’s aluminum pan. After the causal factor chart was complete (additional data were gathered to answer the questions shown in Figure 1).org. please post your remarks on the Quality Progress Discussion Board at www. Ferry. or e-mail them to editor@asq. the analysts identified the factors that influenced the course of events. 1988..’s Risk Consulting Division in Knoxville. ROONEY is a senior risk and reliability engineer with ABSG Consulting Inc. E. releasing the oil onto the hot burner. the wrong corrective actions will be developed. and third to understand why. Notice that causal factor two may be unexpected.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.