Professional Documents
Culture Documents
The mechanical seal on pump CP4826 has been a constant headache for the
maintenance department. The seal has failed three times in the last 12 months and
had just been replaced 30 days ago. This chronic seal failure has prompted the
Maintenance Manager, Jim Phillips to proclaim “those darn Jack Crane seals are no
good ... I’ve always had better luck with Chesterfield seals!” The seal fluid, as
recommended by the manufacturer, is a 50/50 mixture of propylene glycol and water.
Incident Description
On Tuesday morning, at 5:15 am, Kevin Walters, the WWTP
Operator, was just completing his hourly rounds and had
returned to the WWTP control room. Kevin noticed a critical
alarm on the alarm management screen of his DCS monitor. A
high pressure indication was being registered by pump CP4826
within the tandem seal buffer cavity. Kevin went out to
physically inspect the pump and when he arrived at the pump
location he described the scene as such “there was seal fluid
everywhere, and the pump was making a high pitch squeal like
metal grinding on metal!” Kevin immediately shut down the
pump and alerted his supervisor.
Kevin mentioned that he thought the pump was making more noise and
vibrating more than usual on his hourly rounds but it was difficult to tell
because the pump room is so noisy. He also mentioned that he thought he
noticed that the pump housing was warm to the touch but he did not have a
thermometer to measure it.
The failed mechanical seal was sent to Jack Crane Industries who performed
an autopsy. They responded in their failure analysis report that the seal had
an unusual oval wear pattern indicative of eccentric loading on the seal. The
report did not elaborate further on possible causes of eccentric loading.
The failed bearing was sent to SKG Bearings for failure analysis. They found
debris within the bearing from the bearing material itself. The bearing
material was sent to their metallurgical laboratory which concluded that the
bearing material had failed due to fatigue. Fatigue failure could be caused by a
combination of high temperature and high vibration brought on by
misalignment of drive and pump shafts.
Date/Time Description
January 12 12:00 AM Round Check OK
January 12 1:00 AM Round Check OK
January 12 2:00 AM Plant-wide power outage for approx. 30 seconds.
January 12 3:00 AM Round Check OK Recovery from power outage OK.
January 12 4:00 AM Round Check OK CP4826 running rough (noisy and warm to touch).
January 12 5:00 AM Round Check OK
January 12 5:15 AM Low pressure alarm for CP4826 seal on DCS panel.
January 12 5:17 AM K. Walters inspected CP4826 condition.
January 12 5:18 AM K. Walters observed high pitch squeal and seal fluid leak around CP4826.
January 12 5:20 AM CP4826 manually shut down at local panel.
January 12 5:25 AM CP4826 locked and tagged-out
January 12 5:35 AM Seal fluid spill contained and cleaned-up
January 12 8:00 AM Root Cause Investigation Team formed.
Don't
Yes No Know # Root Cause Analysis Question Comments Root Cause Implication PM Strategy Implication
X 1 Was the failure preceded by a process interruption? Not really - but there was an electrical outage at Investigate details of process Modify startup procedure &
2:00 am interruption checklist
X 2 Was the failure preceded by a process spike (eg Investigate details of process spike Consider additional process controls
temperature, pressure, flow rate, concentration, etc)? and/or alarms
X 3 Did the component fail during the time frame of another Investigate which failure has
component failure? occurred first
X 4 Was the component operated outside of process Investigate reason for non-standard
specifications? operation
X 5 Was there a change in utilities (air, steam, water, electricity, Investigate details of utility change Consider additional utility controls
etc) prior to component failure? and/or alarms
X 6 Was there a dramatic change in the ambient environment Yes - temperature in the pump house fell below Investigate details of ambient Consider insulating component from
(eg temp fell below freezing, thunderstorm, high humidity, freezing during the night. environment change ambient environment
high temp) prior to component failure?
X 7 Is the component brand new? Infant mortality, warranty claim
X 8 Has this same component failed in the last 6 months? Yes - CP4826 pump seal has failed 3 times in the Component under-specified, start-up,
last 12 months. shut-down and/or operation
procedure inappropriate
X 9 Was the component recently rebuilt? Seal was replaced last year on December 14. Rebuild procedure, install procedure Depends on autopsy
X 10 Did the component fail suddenly? Seal should last longer than 30 days. Fatigue failure, contamination, Depends on autopsy
thermal overload
X 11 Did the component performance degrade over time? Dirt accumulation, component Depends on autopsy
deterioration, lubrication failure
X 12 Is the component performance requirement at or above its Component under-specified
design limit?
X 13 Did the component exhibit any external signs prior to failure Pump seemed to be more noisy and vibrating Use external sign to focus root cause
(eg vibration, temperature build-up, leaking, noise, odor, more than usual during hourly rounds. detection during autopsy
etc)?
X 14 Is there evidence of external damage to component? Component may have been Consider installing protective
accidentally damaged shielding
Don't
Yes No Know # Root Cause Analysis Question Comments Root Cause Implication PM Strategy Implication
X 15 Does autopsy indicate component corrosion as failure Material selection inappropriate for Increase frequency of wall thickness
mode? process fluid & corrosion check
X 16 Does autopsy indicate incorrect assembly or missing parts? Warranty claim or inadequate
rebuild/install procedure
X 17 Does autopsy indicate excessive wear for the service time of Seal had oval wear pattern. Bearing was totally Material selection inappropriate, Consider adding periodic lubricant
the component? seized. lubrication insufficient analysis
X 18 Does autopsy indicate external particulate contamination? Bearing has debris inside from the bearing Component requires particulate Consider installing protective
material itself. contamination protection shielding and/or filtration
X 19 Does autopsy indicate presence of foreign liquid? Identify source of foreign liquid and
eliminate
X 20 Does autopsy indicate seal failure? Primary seal has failed. Seal material/design, seal fluid Add seal fluid check to Operator
system Round Sheets
X 21 Does autopsy indicate electrical system failure? Identify failed component and Consider adding electric power
potential sources of failure supply controls/conditioning
X 22 Does autopsy indicate loose electrical connections or Connector and/or shielding design
shielding failure? and thickness
X 23 Does autopsy indicate wear parts have fallen below their Unable to tell because bearing and seal have Material selection inappropriate or
acceptable tolerance? been badly torn up. component under-specified
X 24 Does autopsy indicate damaged internal parts? Main pump bearing has seized causing scoring Identify source of internal damage
to the drive shaft. Metallurgical report from
SKG indicates that bearing has fatigue failure
which could be caused by high temperature
and high vibration conditions.
X 25 Does autopsy indicate jammed or slow-moving internal Bearing has debris inside which could have Identify source of internal friction or
parts? jammed balls. jam
X 26 Does autopsy indicate nothing wrong? Intermittent failure - conduct bench
top stress tests
X 27 When was last time drive motor or pump were replaced on Drive motor was replaced last year in December
CP4826? because it had burnt out.
X 28 Were any alignment tools used during motor installation to There are no tools available. There is no
ensure that the drive shaft and pump shaft were in procedure for drive/pump shaft alignment.
alignment?
2/18/2017 Ronald Morgan Shewchuk 8
Operational Excellence
Reliability Centered Maintenance
Case Study XVIII: Root Cause Analysis of Centrifugal Pump Failure
The RCA team next prepared the Reality Chart, a horizontal tree diagram linking effects to action causes and condition causes as
shown in Figure 11.15. The term reality chart is used to indicate that all action causes and condition causes are supported by
evidence such as Operator observations, DCS alarm records, data historians, laboratory analyses, failure analysis autopsies, etc.
Nothing on this chart can be conjecture; otherwise it does not reflect reality.
The team started from the right side of the Reality Chart working toward the left, proposing solutions to eliminate condition and
action causes. These proposed solutions are captured in Figure 11.16. The RCA team utilized a PICK Chart to prioritize solution
alternatives, shown in Figure 11.17. It was decided to purchase a laser alignment kit, write an alignment Standard Operating
Procedure (SOP) and train the relevant maintenance personnel on the alignment SOP. The Operations Group did not want to be held
waiting for an outside contractor to perform the work. This equipment purchase and training was viewed as a necessary skill set
development for the plant to increase reliability.
Incident Report
The Incident Report should concisely communicate the conclusions and recommendations of the Root Cause Analysis team. Required
elements of the incident report include date investigation started, problem definition, summary statement of causes, solutions, action
items and associated causes, responsible person and completion date, incident time line, laboratory/failure analysis reports, reality
chart, cost information, contact name and investigation team members, and report date. Thus far, we have utilized the software tools
of the Microsoft Office Suite to generate the graphics for root cause analysis. If we chose to use the wizard function of RealityCharting
software as shown in the screen shot of Figure 11.18, an incident report would be automatically generated provided that we have
properly completed each of the RealityCharting steps. The Apollo RCA Incident Report is captured in Figure 11.19.
Solutions 1, 2 & 3
or 4 are best alternatives
I. Problem Definition
What: CP4826 Centrifugal Pump Failure
When: Jan. 12 5:15 am
Where: WWTP Pump House
Significance: Risk of solids settling in digestion tank T3759. Risk of aerobic bacteria depletion in tank.
Safety: Slip hazard due to propylene glycol/water seal fluid spill on floor of WWTP
Environmental: Risk of shutting down WWTP
Revenue: Shutting down WWTP will result in production shut down.
Cost: Replacement pump, mechanical seal, drive shaft and installation labor $13,750
Frequency: 4 times in last 12 months
III. Solutions
Causes Solutions Owner Due Date
No Alignment Procedure Write alignment SOP Jim Phillips Feb. 5
No Alignment Training Train maintenance Jim Phillips Feb. 10
personnel on alignment SOP
No Alignment Tools Purchase laser alignment kit Jim Phillips Jan. 29
V. Notes
1. The Realitychart and Incident Report have been finalized.
2/18/2017 Ronald Morgan Shewchuk 14
Operational Excellence
Reliability Centered Maintenance
Reliability Analysis
• We have been introduced to the reliability metric of failure rate through the
Weibull distribution hazard function of Equation 11.1.
• The practical application of this function in a plant environment is to calculate
failure rate by Equation 11.2.
r
λ= Eqn 11.2
T
• Let’s say we have eight electric motors which have been in service for three years.
During this period four of the motors have failed. The failure rate is calculated as
follows.
4 failures
λ=
8 motors x 3 years
λ = 0.167 failures/yr
• Continuing our example from above, the mean time between failures is calculated
as follows.
8 motors x 3 years
θ=
4 failures
θ = 6 years
The facility start-up was difficult with many premature failures, exacerbated by an accelerated time line to open the warehouse.
After the first couple months, however, the number of air handling units failing in a given month has stabilized at the baseline level
shown in Figure 11.20.
Figure 11.20 Air Handling Unit Failure Record
Lately, the number of failures per month has been increasing and Ricky is concerned that the air handling units are headed for a
repeat performance of their start-up days. He decides to conduct a reliability analysis on the air handling unit population as shown
in Figure 11.21.
Open the Excel worksheet with the air handling unit failure records. Add a column for cumulative number of failures, failure rate as
calculated by Eqn 11.2 and mean time between failures as calculated by Eqn 11.3.
Open a new worksheet in Minitab. Copy and paste the air handling unit failure data into the worksheet.
Insert a column at the left hand side of the spreadsheet and label it as Start_Month. Rename the column Time in Service to End_Month.
Select the Scatterplot With Connect Line icon in the dialogue box. Click OK.
Select Failure Rate for the first Y variable and End_Month for the first X variable. Select MTBF for the second Y variable and End_Month for
the second X variable in the dialogue box. Click Multiple Graphs.
2/18/2017 Ronald Morgan Shewchuk 23
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Click the radio toggle button to show pairs of graph variables in separate panels of the same graph in the dialogue box. Check the box to
select the same X scale for the graphs. Click OK. Then click OK one more time.
2/18/2017 Ronald Morgan Shewchuk 24
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
0 10 20 30 40
Failure Rate (months-1) MTBF (months)
0.14
50
0.12
40
0.10
0.08
30
0.06
20
0.04
10
0.02
0 10 20 30 40
End_Month
Scatter plots are generated for failure rate and mean time between failures versus time in service.
2/18/2017 Ronald Morgan Shewchuk 25
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Next, we will determine which distribution best fits the data set. Return to the active worksheet. Click Stat Reliability/Survival
Distribution Analysis (Arbitrary Censoring) Distribution ID Plot on the top menu.
2/18/2017 Ronald Morgan Shewchuk 26
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Select Start_Month for start variable, End_Month for end variable, and Number of Failures for frequency column in the dialogue box.
Click the radio toggle button to use all distributions. Click OK.
2/18/2017 Ronald Morgan Shewchuk 27
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Probability Plot for Start_Month
LSXY Estimates-Arbitrary Censoring
Correlation Coefficient
Weibull Lognormal
Weibull
99 0.936
90 Lognormal
90 0.885
50 Exponential
*
Percent
Percent
50 Loglogistic
0.875
10
10
1 1
0.1 1 10 100 1 10 100
Start_Month Start_Month
Exponential Loglogistic
99
90
90
50
Percent
Percent
50
10
10
1 1
0.1 1 10 100 1 10 100 1000
Start_Month Start_Month
Percent
50 3-Parameter Loglogistic
0.965
10
10
1 1
270 300 330 360 950 975 1000 1025
Start_Month - Threshold Start_Month - Threshold
90
50
Percent
Percent
50
10
10
1 1
0.1 1 10 100 950 1000 1050
Start_Month - Threshold Start_Month - Threshold
Percent
50
10
10
1 1
-50 0 50 0 30 60
Start_Month Start_Month
Logistic
99
90
Percent
50
10
1
-30 0 30 60
Start_Month
The 3-Parameter Weibull distribution and the Smallest Extreme Value distribution have the highest Pearson Correlation
Coefficient implying the best fit to the data set. For the sake of example, we will focus on the Weibull distribution.
2/18/2017 Ronald Morgan Shewchuk 30
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Return to the active worksheet. Click Stat Reliability/Survival Distribution Analysis (Arbitrary Censoring) Distribution
Overview Plot on the top menu.
2/18/2017 Ronald Morgan Shewchuk 31
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Select Start_Month for start variable, End_Month for end variable, and Number of Failures for frequency column in the dialogue
box. Click the radio toggle button to perform parametric analysis using the Weibull distribution. Click OK.
2/18/2017 Ronald Morgan Shewchuk 32
Operational Excellence
Reliability Centered Maintenance
Figure 11.21 Steps for Reliability Analysis
Distribution Overview Plot for Start_Month
LSXY Estimates-Arbitrary Censoring
Table of Statistics
Probability Density Function Weibull
Shape 0.828398
Scale 21.0200
90
0.075 Mean 23.2506
StDev 28.2308
50 Median 13.5047
Percent
0.050 IQR 26.5083
PDF
AD* 3.174
10 Correlation 0.936
0.025
0.000
1
0 50 100 150 0.1 1 10 100
Start_Month Start_Month
Rate
0.06
50
0.04
0
0 50 100 150 0 50 100 150
Start_Month Start_Month