You are on page 1of 5

Goble05-AppA.

fm Page 239 Thursday, March 31, 2005 11:14 PM

Appendix A
Statistics

Random Variables
Many processes have outcomes that cannot be predicted given our current
level of knowledge about the process. In such situations statistical analysis
can be used to gain knowledge about a process from a set of data. Data is
gathered by recording a specific random variable. Statistical analysis
provides specific information about that random variable. In reliability
engineering the primary random variable is “time to failure,’ the
successful operating time interval until a failure occurs.

Statistical analysis is quite useful because data, when gathered, is often


hard to understand. Consider a set of data shown in Table A-1. This set of
data is a record of failure times for thirty systems. Assume that all thirty
systems are installed, commissioned and operating successfully. The units
are checked every hour and the total number of hours of successful
operating time is incremented. When a particular system fails, the
successful operating time is no longer incremented. For example in this
data set, system one failed after 96 hours. System two failed after 3091
hours. System thirty failed after a successful operating time interval of 409
hours. A set of data exists, but often the useful information hides inside
the data.

One can study the data and gain insight regarding when a system might
fail. It is notable that system 12 failed after only 33 hours. This system
failed first and had the shortest successful operating time interval. Several
systems had much longer successful operating times. System 17 had the
longest successful operating time. It ran for 13990 hours. It is hard to really
gain an in depth understand from just looking at the raw data.
Fortunately, one can apply some statistical analysis to gain further insight.

239
Goble05-AppA.fm Page 240 Thursday, March 31, 2005 11:14 PM

240 Appendix A: Statistics

Table A-1. Time To Failure Data Set


System Hours System Hours
1 96 16 1282
2 3091 17 13990
3 4862 18 12751
4 13853 19 2106
5 8339 20 5431
6 614 21 2740
7 1815 22 11460
8 10305 23 6056
9 7499 24 3471
10 1540 25 2414
11 831 26 4348
12 33 27 3886
13 240 28 9270
14 196 29 13351
15 1045 30 409

Statistical Analysis
Often, statistical analysis is done by grouping data into “bins.” For
example, the failure data may track the number of units failed in one
thousand hour increments (Table A-2). In many cases the data is gathered
this way as operation is checked periodically. When successful operation
is only checked every one thousand hours, the data set would naturally
look like Table A-2.

An examination of the data in this format shows additional information.


One can observe that more systems fail in the first block of time than any
other. It is also clear that the quantity of systems that fail in each block
decreases with increasing time. After that the quantity of systems that fail
in each time block remains rather constant until the last block. This form
of data is often presented graphically in a form called a histogram,
Figure A-1. Many consider the graphical format of the histogram to be a
very effective to quickly represent the information.
Goble05-AppA.fm Page 241 Thursday, March 31, 2005 11:14 PM

Appendix A: Statistics 241

Table A-2. Failure Data Grouped into One Thousand Hour Increments

Hours Units Cum.


0-1000 7 7
1001-2000 4 11
2001-3000 3 14
3001-4000 3 17
4001-5000 2 19
5001-6000 1 20
6001-7000 1 21
7001-8000 1 22

8001-9000 1 23
9001-10000 1 24
10001-11000 1 25
11001-12000 1 26
12001-13000 1 27
13001-14000 3 30

Censored Data

8
Failed Units

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Operational Hours - 1000

Figure A-1. Histogram of Failure Data

An assignment of probability can be made based on this data set. The


histogram in Figure A-1 is “normalized” by dividing each data quantity
by the total. In this way a histogram is converted into a “probability
density function (pdf).” A pdf is a plot of probability versus the statistical
variable. In this example, the statistical variable is operational failure time.
Goble05-AppA.fm Page 242 Thursday, March 31, 2005 11:14 PM

242 Appendix A: Statistics

Figure A-2 shows the pdf for this set of data. The probability of a failure in
the first time period (0-1000) hours is 0.233.

P rob ab ility D e nsity F unction

0. 25

0. 2
Probability - p(x)

0. 15

0. 1

0. 05

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
X

Figure A-2. Probability Density Function of Failure Data

Mean – Median – Standard Deviation


Statistical analysts can provide numbers that characterize the data set.
Two of the most common numbers provided are called the “mean” and
the “median.” The mean is an arithmetic average of the data. It provides
insight into the data set.

EXAMPLE A-1

Problem: A set of time to failure data is given in Table A-1. What is the mean
of this data?

Solution: The numbers are summed and divided by the total quantity of
systems (30).

Mean = 147,324 / 30 = 4,910.8

The median is the “middle” value. For an even quantity of data, the median is
average of the two middle numbers.

The difference between the mean and the median indicates the symmetry
of the probability density function (pdf). For a symmetrical distribution
like a normal distribution, the numbers would be similar. For this example
the difference in the two numbers indicates a non-symmetric pdf. This can
be seen in Figure A-2.
Goble05-AppA.fm Page 243 Thursday, March 31, 2005 11:14 PM

Appendix A: Statistics 243

EXAMPLE A-2

Problem: A set of time to failure data is given in Table A-1. What is the
median of this data?

Solution: Median = (3091+3471)/2 = 3281 Hours

Another common number calculated for data sets is called the standard
deviation. This number indicates the number range of the pdf. In failure
data analysis this number is not commonly used as the failure rate pdf is
characterized by other numbers.

Statistical Significance – Applicability


Two things that should be remembered about all statistics, and especially
failure statistics, are statistical significance and applicability. The more
data we have, the more certain we are that the statistics have meaning.
That measure has been fully developed and is called statistical
significance. With little data, one cannot be sure about the accuracy of the
result.

One must also be concerned about applicability of the statistics. This is


especially relevant regarding failure data. If the statistical data set was
obtained under conditions that are completely different than current
conditions, one must ask if the data is relevant.

Many statisticians will take a “Bayesian” approach to the problem


estimating the probability that a data set is valid. Given a number of data
sets, effectively an average of the data sets is obtained.

EXAMPLE A-3

Problem: A set of time to failure data has a mean of 5521 hours. A second set
of time to failure data has a mean of 4911 hours. A third set of time to failure
data has a mean of 12,340 hours. It is estimated that the first data set has a
50% chance of being correct. It is estimated that the second set of data has a
40% chance of being correct. It is estimated that the third set of data has a
10% chance of being correct. What is most likely value for the mean?

Solution:

Expected value of Mean = 5521 × 0.5 + 4911 × 0.4 + 12340 × 0.1 = 5959

This tells us that one must record not only the data under study but all
suspected relevant conditions.

You might also like