You are on page 1of 6

Probability Plotting

In last month's Reliability Basics, we examined the reliability function - what it is and how it can
be used. The concept of the lifetime distribution was introduced, as was the probability density
function (pdf), which mathematically defines that function. The pdf for a particular distribution
will contain a number of parameters, which can then be used in other functions derived from
the pdf.

In this issue, we will look at how we can begin to determine estimates of the parameters for each
lifetime distribution, based on test data. These estimates can then be used to construct reliability
functions and plots, as well as other life data statistics, such as the MTBF. The simplest and
longest-used method for parameter estimation is that of probability plotting. This methodology
involves plotting the failure times on a specially-constructed plotting paper to determine the fit of
the data to a given distribution and, if applicable, estimates of the distribution's parameters.

Probability Plotting Paper

A probability plot allows the user to plot time-to-failure data on a specially-constructed plotting
paper, which differs from distribution to distribution. Based on the linearity of the data points on
the plot, the user can determine whether he or she has chosen a distribution that is appropriate to
the data. The user can also make estimates of the distribution's parameters from scales on the
plot.
A distribution's probability plotting paper is constructed by linearizing the cumulative density
function (cdf) or unreliability function of the distribution. Once this has occurred, the scales for
the x- and y-axis of the distribution's plotting paper can be constructed and the plotting can
commence. As an example, we will use the well-known Weibull distribution. The cdf or
unreliability function of the two-parameter Weibull distribution is given by:

where and are parameters. We now need to linearize this function into the form y = mx + b:

If we now set:
and:

the cdf equation can now be rewritten as:

This is now a linear equation, with a slope of and an intercept of ln( ). Now the x- and y-
axes of the Weibull probability plotting paper can be constructed. The x-axis is simply
logarithmic, since x = ln(T). The y-axis is slightly more complicated, since it must represent:

where Q(T) is the unreliability. In a similar fashion, the cdfs for other lifetime distributions can
be linearized to construct the probability plotting paper. The final result for Weibull probability
plotting paper looks like the following:

Note that since the mathematical expression for the cdf differs from distribution to distribution,
the structure of the plotting paper will differ from distribution to distribution as well. These
different types of plotting papers can be obtained through engineering supply stores or, more
commonly, generated with various software packages.

Plotting Failure Points

With the probability plotting paper obtained, we can now begin to think about plotting our failure
data. For the sake of simplicity, we will only address complete data, that is, data from a life test
where all of the test units were tested to failure and their failure times were recorded. We will
use a simple set of failure times from a test group of six units that failed at 10, 20, 30, 40, 50, and
80 hours. We will assume that these failure times follow a two-parameter Weibull distribution
and we will use Weibull probability plotting paper to perform our analysis.
The question now arises of how to plot our failure times on the plotting paper. We can see that
the x-axis values will correspond to our failure times, since x = ln(T). Each x-axis value is
simply the natural logarithm of each time-to-failure. What about the corresponding y-coordinate
values to go with our x-coordinate failure times? Taking another look at the y-axis equation:

we see that the y-coordinate is based on Q(T), or the unreliability. This means that we need to
come up with unreliability estimates for each of our failure times in order to plot the data on a
two-dimensional plot. These unreliability estimates are accomplished with what are
called median ranks.
Median ranks are based on a solution for the cumulative binomial distribution, based on sample
size and failure number. The median ranks represent the 50% confidence level ("best guess")
estimate for the true unreliability for a failure, based on the total number of failures and the order
number (first, second, etc.) of the failure in question. There is also an approximation that can be
used to estimate median ranks, called Benard's approximation. It has the form:

where N is the total number of failures and j is the failure order number. We will not delve any
further into the derivation of the median ranks, other than to say that tables of median ranks can
be found in many statistics and life data texts.

Based on Benard's approximation, we can now calculate unreliability estimates for each of our
failure times. These are shown in the following table:
Unreliability
Failure Time (hours)
Estimate
10 10.9%
20 26.6%
30 42.2%
40 57.8%
50 73.4%
80 89.1%
Now that we have y-coordinate values to go with the x-coordinate failure times, we can now plot
our failure data on a Weibull probability plot:
The failure times plotted on Weibull probability paper fall in a fairly linear fashion, indicating
that our choice of the two-parameter Weibull distribution was valid. If the points did not seem to
follow a straight line, we might want to consider using another lifetime distribution to analyze
the data. We can now draw a best-fit line through the points.

This line represents the model of the unreliability, as expressed by the linearized unreliability
function, or cdf.
Determining Parameter Estimates

Of course, we must do more with our data than to arrange them in a nice line on a complicated
piece of paper. We need to get useful reliability information from the probability plot. To that
end, we must be able to estimate the values of the Weibull distribution parameters, and .
With these estimates, we can determine the reliability function, the mean life function, and all of
the other reliability-related functions that can be derived from the pdf.
Determination of , or the Weibull slope, is relatively easy. As we saw when we were
discussing the linearization of the two-parameter Weibull pdf, the slope of the linear equation is
simply . In other words, the slope of the linearized line on the Weibull probability plot is equal
to the Weibull slope (or shape parameter), . Many types of Weibull plotting paper have scales
that allow one to read the slope of the line directly, rather than having to calculate it based on
"rise over run."

By drawing a line parallel to the best-fit model line through the slope scale, we can see that the
estimate for for this data set is approximately 1.4.
Mathematical manipulation of the Weibull cdf, or unreliability, equation will be required to
determine the estimate of , the Weibull scale parameter. The two-parameter Weibull
unreliability function is given by:

We want to be able to read the value of from the x-axis time scale, which can be expressed
mathematically as T = . Substituting this into the Weibull unreliability function at T = , we
get:
Hence, is where our best-fit unreliability model line intersects with a horizontal line extended
from the 63.2% level of the unreliability, or y-axis scale:

As the graphic shows, the best-fit model line intersects the 63.2% unreliability line at
approximately 44 hours. Therefore, the estimate for for our data is 44 hours.
This illustrates the basics of probability plotting for complete data using a two-parameter
Weibull example. The methodology can be more difficult for other types of analysis. For
example, if the data set contained suspensions, we would have to be able to account for them.
This is dealt with by modifying the median rank values for the failure times, although that
particular methodology exceeds the scope of this article.

You might also like