Professional Documents
Culture Documents
Equipment Troubleshooting: Equipment Can't Talk, But People Can and Should
Equipment Troubleshooting: Equipment Can't Talk, But People Can and Should
Summary
When the real reasons for equipment problems are known the
professional troubleshooter can recommend and implement
corrective actions that will prevent further similar problems and
allow increased mean time between failures.
SI04003
System Improvements Inc.
13 pages
February 2004
group decide that it is too difficult for them to current problem is the same as the last time
fix and the Maintenance group decide that it is whether or not it in fact is or not.
too minor for them to get involved. The result
is often just replacing the item or addressing Systematic Troubleshooting
the symptoms only.
In order to effectively conduct troubleshooting
The second trap is the “Mr. Machinery.” This and avoid the above-mentioned traps, one
is where a facility may rely on a certain should follow a logical process. A logical
individual that has been at the facility for a process should be able to determine “what”
long time. This is not necessarily adverse, it happened, “why” it happened and then
can be a problem if this individual has not develop effective fixes.
kept up with current technology.
What Happened?
The third trap is “Telephone troubleshooting.” The “what’ happened can effectively be
This occurs when the troubleshooter attempts portrayed by creating a sequence of events
to solve the problem by interviewing people chart. This provides a graphical presentation
over the phone and not taking the opportunity of what happened to those reviewing the
to look at the failed equipment in person. incident. A common technique is to put the
incident or problem in a circle, the actions or
The fourth and final trap is “Familiarity.” This events into boxes and amplifying information
situation can arise if a person becomes very in ovals often called conditions. Let’s take a
familiar with a piece of equipment or look at an incident (Figure 1) that occurred to
machinery and begins to assume that the a cooling pump motor.
Motor and
Pump motor compressor and
catches fire process
are shut down
Motor was
refurbished 4 months Motor inboard and
ago outboard
bearings were greased
6 weeks ago
Motor shaft was struck
bent and subsequently
repaired
This pump has been in operation for over four Also symptoms can be collected by operator
years. Four months ago it was taken out of observations in the control room, such as by
service and refurbished. Six weeks ago it was direct observation: vapor, fume, fluid, leakage,
greased as per the manufacturers smoke. Indirect observations could include:
recommendations. One the day of the incident,
the pump motor began to smoke eventually Changes in indicator readings:
catching fire. After disassembly of the motor • Pressure
we find that the inboard bearing is burned and
melted. The outboard bearing appears to be in • Temperature
good condition. We can put this into a
• Flow
sequence of events chart and then begin to
troubleshoot using a combination of brain- • Position speed
storming and a cause and effect technique.
Changes in performance:
• Pressure ratios
• Temperature ratios
• Power demand
• Product loss efficiencies
Continuing with the bearing example, the bearing found that the shaft clearance was
Equifactor® table for bearings was used. excessive (0.007 negative). The housing
Various causes for overheated bearing are surrounding the bearing outer ring was made
rejected, based on investigation. Finally, too tight, increasing friction heat, thermal
measurements taken after the failure of the expansion, etc. (too tight interference fit).
After the failure mode we would proceed to • Time - Did equipment or a component fail
the failure agents (Figure 8). The failure agent due to reaching its end of expected life?
is the catalyst that allowed the failure mode to Did the equipment or component reach or
occur. The failure modes consist of: Force, exceed the working life of similar
Reactive Environment, Time, Temperature equipment or components?
("FRETT"). One of these will be the primary
often creating secondary and tertiary failure • Temperature - Did equipment or a
agents that are exhibited. It is similar to asking component fail due to excessive
why as in option 1 above, only now we have a temperature, either hot or cold?
more structured and systematic process that
anyone can use and document. After the failure agent we proceed to
determine the failure classification (Figure 9).
• Force - Did a component fail due to This is where we determine whether the issue
application of excessive force? (due to is strictly an "equipment difficulty" or whether
possible misalignment, thermal expansion there may be a "human performance
and contraction, overloading etc.) difficulty" associated with it. Too often, many
troubleshooters’ stop at this point. They are
• Reactive Environment - Did a component missing a valuable opportunity to take their
fail due an environment that is not troubleshooting to another level. This is where
conducive to the material of the equipment we now involve the people talking back part
or component such as brass and seawater? of the investigation.
questions from the front of the Root Cause An example might be where a person was
Tree. Under each of the basic cause performing maintenance and the procedure
categories there are root causes the was not specific enough causing the
troubleshooter should evaluate to determine if maintenance person to have to interpret the
they apply to the equipment issue (See Figure intent of the procedure and subsequently
11). making an incorrect interpretation causing the
equipment to fail or not work properly.
To adequately answer the 15 questions and not hold people accountable for their actions,
then identify root causes from the Root Cause only that we need to look at our systems first
Tree the troubleshooter will need to conduct to ensure that they are providing the people
interviews with the appropriate people. The 15 the tools they need to be successful in their
questions and the root causes on the Root jobs. Once we are confident that our systems
Cause Tree are designed to minimize if not are in order we can then have better
eliminate the “blame” syndrome of many justification in applying the appropriate
companies. This does not mean that we should discipline that is warranted for the situation.
Returning to the bearing example, four months • No records were found of measurements
ago this motor had been taken out of service taken before and after the fix at the repair
for refurbishment. During the refurbishment shop.
process the motor shaft was struck, bent and
repaired. The bearing housing was also • No mention was made of the difficulty
refurbished. The shaft was repaired using a experienced during installation of the
“weld-overlay method.” bearing (due to excessive interference fit).
The problems definitely needed further equipment's and incident's root causes or an
investigation into human errors. This to in-depth audit or observation to proactively
prevent these problems from happening again. improve performance. TapRooT® and
Following the Root Cause Tree in Figure 10, Equifactor® are an efficient, systematic set of
various human performance basic causes were tools that allow people to look at facts
identified where it went wrong. Further objectively, identify the problems and find the
investigation into the root causes showed non- root causes of the problems.
existing procedures, lack of training, lack of
standard and administration controls, and lack For further information contact Andrew
of supervision, see Figure 14. Marquardt of System Improvements, Inc.
Phone: +1 (865)-539-2139.
Conclusions
http://www.taproot.com
The TapRooT® System and Equifactor®
Equipment Troubleshooting processes provide
a methodology to lead an investigator through
the techniques/steps used to perform an in-
depth investigation and troubleshooting of an