You are on page 1of 16

Weibull Analysis of Pump Seal Life

Two pumps operate functionally in parallel (most pumps will give distress signals prior
to their failure so that the failed pump can be taken off line with replacement of the
unfailed pump that restores the system to service to prevent system failure—assuming
perfect switching). Pump A is considered the primary device until it fails. Pump B is the
secondary device that has been standing and waiting until pump A fails, and it is then
brought in to service and runs until it fails. The pumps operate in a 1 out of 2
configuration. Failures of these pumps are paced by seal failures, and when the pump is
down for seal replacements, other maintenance activities are performed such as replacing
bearings, housings, etc.

Here the seal ages-to-failure are recorded and used for the
Pump A Pump B
Weibull analysis (these are in-service hours—see
(Runs) (Stands/waits)
http://www.barringer1.com/jul07prb.htm for reliability
14400 2200
data demands). The age-to-failure data are shown below
19800 3800
in rank order:
21300 4600
23600 5000
Data was acquired over a very long time period—almost
24300 6700
38 years and shows a system MTBF = 332400hours/20
28100 7500
failures = 16620 hours/failure or 1.9 years/failure along
29600 7800
with MTBFA = 26,290 hrs/failure and MTBFB = 69500
29600 8000
hrs/failure.
34200 11000
38000 12900
Note the data to the right has been rounded to hundreds of
262900 69500
hours—this is a minor problem and not a heart attack.
Seldom do you have a complete set of data and seldom does it span such a long period of
time. Often the data includes suspensions (censored data) and the quantity of data is
smaller—so, in this case we’re data rich.

All data was erroneously combined (pooled data) into a single column of 20 data points
to make the Weibull plot in Figure 1.

This plot show a good curve fit with the PVE (p-value estimate which is a goodness of fit
criteria) as 64.87%. You need a minimum p-value estimate of 10% for a good curve fit.
The characteristic life is 18259 hours (with a mean life of 16673 hours based on beta and
eta) and the beta value (a shape factor) suggests a wear-out failure mode with a beta
greater than 1.

Just because you get a good Weibull curve fit DOES NOT mean you have a valid
Weibull plot and Figure 1 has fatal flaws. The flaws are shown in Figure 2 where the
flaws of the pooled data stand out. In short, Figure 1 and Figure 2 produce junk
information—or if you’re interested, this tells you how the system is responding!
Pump A Pump B In Figure 2 the data has labels assigned to each data point
(Runs) (Stands/Waits) to show the data is stratified, and it is not homogeneous!
14400_A 2200_B If the data were homogenous, we should have expected
19800_A 3800_B the data to be arranged randomly up and down the trend
21300_A 4600_B line with A and B points scattered randomly.
23600_A 5000_B
24300_A 6700_B How did we label the data in Figure 2? The input is
28100_A 7500_B shown to the left and were all put into a single column in
29600_A 7800_B WinSMITH Weibull, and under the magnifying glass
29600_A 8000_B icon the Point Symbol Type was toggled to “Point Label”
34200_A 11000_B to active the symbols on the plot.
38000_A 12900_B
Note the stratified data in Figure 2 tells us the data should
NOT have been pooled as the B-data fails at a young age while A-data fails at a much
older age—of course, that is also obvious when you look at the two columns of data.
The seal life data is correctly plotted in Figure 3. Each seal’s results must go into a
different column to get two different trend lines.

Note, it is roughly ηA/ηB =28995/7928 ≈ 4 times more severe service to stand and wait
rather than it is to run.
In Figure 3, the Y-axis shows the cumulative distribution function, which is a statement
of unreliability. The X-axis shows the age-to-failure. In short, Figure 3 tells you what
percent of each population is expected to be dead by a given number of running hours.

Take the data from the separate lines as shown in Figure 3 and run a formal test of
significance to find if the trend lines are significantly different as inferred with the
segregated data from Figure 2. Use the likelihood ratio test (See The New Weibull
Handbook, 5th edition) and the likelihood ratio test in WinSMITH Weibull to aid the
decision.

Figure 4 shows the likelihood ratio test results. Lack of overlap of the contours shows
significant differences at 90% confidence.

You cannot pool datasets with significant differences! Inside the contours is a triangle.
The triangle symbol inside the contour lines of Figure 4 represents the top of the
likelihood mountain, and this is the reported beta/eta for each dataset.

Often people want to see the probability density functions (PDF) of the curves in Figure
3; they are shown in Figure 5.
Figure 5 shows tally sheet contours of the number of failures expected to occur at any
time. The area under a PDF curve is 1. The Y-axis shows the relative occurrences of
failures and the X-axis shows ages-to-failure. The long life seal has a rather symmetrical
curve but the short life curve has a long tail to the right.

So we have the Weibull analysis details, what are we going to do with it?
• We can quantify average times to failure for the existing system (which will be
different than determined by arithmetic),
• We can determine how many repairs we will make in a 5-year interval (43800
hours) as the turnaround period with the existing system (which will be different
than determined by arithmetic),
• We can determine the strategy for when to switch pumps into/out of service
• We can build system models for determining the risk for system failure and
reliability of the system given different conditions and how many replacements
we will need to make in a 5 year turnaround interval.
• We can make a financial decision regarding making repairs on over time or at
regular time. For example, if working repairs on overtime achieves restoration of
service in 40 hours (total repair costs = $10,000) or not working overtime and
achieving restoration of service in 730 hours (total repair costs = $5,000)—which
is one month, given the outage cost of the system in $10,000 per hour of
downtime. Which course of action should we follow?

Weibull average times to failure for the existing system (which will be slightly
different that determined by arithmetic).
Pump A’s Mean Time = η*Γ(1+1/β) = 28665*0.90568 = 25961 hours/failure
Pump B’s Mean Time = η*Γ(1+1/β) = 7928*0.88562 = 7021 hours/failure

How many repairs to make in a 5-year turnaround interval (43800 hours) with the
existing system.
Arithmetic average number of repairs =(25961 hours/failure +7021
hours/failure+10818 hours/partial failure) Æ 1+1+0.416 = 2.416 failures using the
two Weibull curves in a deterministic fashion

What strategy should we follow for when to switch pumps into/out of service
The route to longer life (and thus fewer maintenance interventions) on this system is
to rotate the pumps into and out of service on a regular basis to prevent deterioration
in this ammonia system because of standing and waiting. You rotate equipment into
and out of service primarily for maintaining competences in the workers and
secondarily for the equipment (Think about this: Why do you do frequent fire drills?)
Successful rotation of equipment into/out of service requires a written procedure to be
maintained and followed for success.

First quartile companies have written procedures for rotating equipment into/out of
service without failure based on a disciplined approach where employees are
carefully drilled in effective operation of equipment. They can tolerate longer periods
of time between swapping equipment—say every 3 to 4 months.

Second quartile companies need a little more drill and perhaps they will swap
equipment every 2 to 3 months.

Third quartile companies may need much more frequent drill for refreshers about
procedures and processes, so they may have a cycle of every 1 to 2 months.

Fourth quartile companies seldom have written procedures. If they have the written
procedures, they often can’t find them. They are lax about carefully following the
written details. Thus, they seldom have functional standby equipment, which means
they often run to system failure with the higher costs.

Often people are concerned with rotating equipment into/out of service because if you
have two old pieces of equipment both may die at the same time. Remember
equipment dies in a probabilistic manner, and not in a deterministic manner. If you
truly believe equipment dies in a deterministic manner, then tell me precisely how
much life remains in each of your pieces of equipment. Of course you can’t tell me
that as only the Old One knows the answer!

We can build system models for determining the risk for system failure and
reliability of the system given different conditions.
Consider the use of RAPTOR reliability block diagram modeling systems which
provide no-cost modeling software for small systems. The RAPTOR model is shown
below in Figure 6 and will approach the details in a probabilistic manner (like real
life) rather than a deterministic manner (like idealistic life). You can download the
models for Figure 6 and Figure 13 by clicking here for RAPTOR version 7.0.

Figure 6
The RAPTOR Reliability Block Diagram

If you double click on each block in the RAPTOR model it will open for more details
obtained from the life curves from Figure 3 where
Pump A: life data = Weibull, shape factor (beta) = 3.943, scale factor (eta) = 28995
hours, location (t0)* = 0 ; repair data = Lognormal**, mean = 730 hours (MuAL)**,
standard deviation = 2.0 (SigF)** in Table 1 below. See Figure 7.
Figure 7
Block A Details For Reliability Block Diagram

Pump B: life data = Weibull, shape factor (beta) = 2.145, scale factor (eta) = 7928
hours, location (t0)* = 0; repair data = Lognormal, mean = 730 days (MuAl)**,
standard deviation = 2.0 (SigF)**. See Figure 8.
Figure 8
Block B Details For Reliability Block Diagram

Set up the standby node so block A starts first (while B is idle), as shown in Figure 9.
Figure 9
Stand By Node For Reliability Block Diagram

The bias always starts block A first

Under the Run icon, choose a simulation mission time = 5 years*8760 hours/year =
43800 hours. Repeat the simulation 1000 times (without graphics, this requires ~3
minutes to complete). If you use graphics, you can see live equipment (green), failed
equipment (red), and standby equipment (blue)—the simulation requires ~7 minutes
with graphics.

You will find the system availability is very high 99.9979% and the reliability is also
very high, at 0.996 (this means you have a 0.004 chance for failure on a 5-year
mission as shown in Figure 10).
Figure 10
Block A ≠ B Details For Reliability Block Diagram

Availability = 99.99798% for a 5-year mission

High reliability
Four failures in 1000, 5-year intervals
for a 5 year mission

Figure 11 shows how the time is used. About 96.7% of the time dual equipment is
available, while ~1020 hours out of the 43800 hour mission you are operating on solo
equipment. On the average you loose 0.883 hours each 5-year mission based on 1000
iterations.

Figure 11
Details From A ≠ B Simulation About The Use Of Time

Loose 0.00002*43800 hrs in 5-year = 0.883 hrs


0.0328*43800 hrs in 5-year = 1019.7 hrs solo
96.7% of the time you have dual equipment
Figure 12 shows to expect on the average ~2 maintenance interventions to occur
during the 5-year mission, although in some simulations up to 4 maintenance
interventions were required, while in other cases no interventions occurred.

Figure 12
Details From A ≠ B Simulation About Replacements

1.988 maintenance events in 5-years

If by rotating equipment into and out of service at very little extra cost, to prevent
deterioration of idle devices, we can change the model as shown in Figure 13.

Figure 13
Both Block Equal Performance
The results of A = B shows, in Figure 14, substantial improvement compared to
Figure 10 simply by avoiding deterioration by standing and waiting.

Figure 14
Block A = B Details For Reliability Block Diagram

Availability = perfecto for a 5-year mission

No failures in 1000 5-year intervals

The details of how time is spent are shown in Figure 15, which can be contrasted to
Figure 11 where A ≠ B.
Figure 15
Details From A = B Simulation About The Use Of Time

No system lost time in 5-years


0.01983*43800 hrs in 5-year = 868.6 hrs solo
98.0% of the time you have dual equipment

While the system does not incur failures, individual pieces of equipment do require
maintenance interventions, as shown in Figure 16 where the assumption is that the
failure data on A and B are the same. Notice the reduction in maintenance
interventions in Figure 12 where A ≠B (compared to Figure 16).

Figure 16
Details From A = B Simulation About Replacements

1.2 maintenance events in 5-years


We can make a financial decision: repairs on overtime or at regular time.
Figures 13, 14, 15, and 16 show excellent results when repair times are, on the
average, 730 hours and when pumps are maintained in superior conditions, thus, no
motivation exists for repairing them on overtime.

Figures 10 and 11 for the good/bad pump condition may look different using the
Monte Carlo simulation risks and working repairs on overtime to achieve restoration
of service in 40 hours (total repair costs = $10,000) or not working overtime and
achieving restoration of service in 730 hours (total repair costs = $5,000) given the
outage cost of the system in $10,000 per hour of downtime.

From Figure 11, the system is projected to be down 0.883 hours in 5-years which
calculates to a loss of 0.883hr*$10,000/hr = $8830 for a 5 year period. Figure 12
says to expect 1.988 maintenance events in 5 years with and extra cost of $5,000 per
incident which calculates to 1.988 incidents*$5,000 = $9990.

Put another way: Would you spend $9990 to save at most $8830 in a 5-year period?
The answer is NO overtime repairs unless the system is down!

Once you have the statistics, the answers are rather obvious. Without the statistics, we
have many arguments and oftentimes we take the wrong actions that cost us money.
_________
*Please note:
Use of the 3rd Weibull parameter, t0, is called the location parameter. Use of this
parameter has four strict requirements for when it can be used (See The New Weibull
Handbook, 5th edition) and all 4 restrictions must be met:
1. You must have a physical reason for use of a location offset. (Simply
making a better curve fit to the data is not one of the reasons!!)
2. You must see curvature in the raw age-to-failure data on a Weibull plot
3. You must have more than 21 failure data—maybe more than 100 for subtle
offsets
4. You must get a better goodness of fit statistic after use of the t0

** Roughly 85% to 95% of life data is adequately represented by a Weibull distribution.


Similarly 85 to 95% of all repair data is adequately represented by a Lognormal
distribution. In a Lognormal distribution the mean value (MuAL) is represented at 50%
probability, and the slope of the trend line (SigF) is determined by a shape factor, which
is a measure of how consistently the job can be repaired.

If the job is repaired and is always finished in the same amount of time the shape factor
would be a perfect 1.0. A well-controlled repair time would demonstrate a shape factor =
2. A less orderly repair time would demonstrate a shape factor of 3. A highly variable
repair time would show a shape factor of 4—or if the organization demonstrates a
Keystone Cops disorder then it’s greater than 4!
Here are some typical amounts of scatter in repair times shown in Table 1.

Table 1: Lognormal Repair Data Has Long Tails To The Right


If The Repair Time (MuAL) Is 8 Hours What’s The Repair Time Scatter?
50% Completed 80% Completed 90% Completed 98% Completed
Sig F
Within (hours) Within (hours) Within (hours) Within (hours)
1.0 8.0 to 8.0 8.0 to 8.0 8.0 to 8.0 8.0 to 8.0
1.5 6.1 to 10.5 4.8 to 13.5 4.1 to 15.6 3.1 to 20.5
2 5.0 to 12.8 3.3 to 19.4 2.6 to 25.0 1.6 to 40.1
3 3.8 to 16.8 2.0 to 32.7 1.3 to 48.8 0.6 to 103
4 3.1 to 20.4 1.4 to 47.3 0.8 to 78.3 0.3 to 201
If The Repair Time (MuAL) Is 730 Hours and SigF is 2 For The Model
2 476 to 1445 467 to 2053 378 to 2533 255 to 3753

Comments:
Refer to the caveats on the Problem of the Month Page about the limitations of the
following solution. Maybe you have a better idea on how to solve the problem. Maybe
you find where I've screwed up the solution and you can point out my errors as you check
my calculations. E-mail your comments, criticism, and corrections to Paul Barringer
by clicking here. Return to top of page.

You can download a copy of this page as a PDF file.

Return to Barringer & Associates, Inc. homepage

Last revised November 27, 2007


© Barringer & Associates, Inc. 2007

You might also like