Appendix Statistics: Random Variables

Goble05-AppA.
fm Page 239 Thursday, March 31, 2005 11:14 PM
Appendix A
Statistics
Random Variables
Many processes have outcomes that cannot be predicted given our current
level of knowledge about the process. In such situations statistical analysis
can be used to gain knowledge about a process from a set of data. Data is
gathered by recording a specific random variable. Statistical analysis
provides specific information about that random variable. In reliability
engineering the primary random variable is “time to failure,’ the
successful operating time interval until a failure occurs.
Statistical analysis is quite useful because data, when gathered, is often

hard to understand. Consider a set of data shown in Table A-1. This set of
data is a record of failure times for thirty systems. Assume that all thirty
systems are installed, commissioned and operating successfully. The units
are checked every hour and the total number of hours of successful
operating time is incremented. When a particular system fails, the
successful operating time is no longer incremented. For example in this
data set, system one failed after 96 hours. System two failed after 3091
hours. System thirty failed after a successful operating time interval of 409
hours. A set of data exists, but often the useful information hides inside
the data.
One can study the data and gain insight regarding when a system might
fail. It is notable that system 12 failed after only 33 hours. This system
failed first and had the shortest successful operating time interval. Several
systems had much longer successful operating times. System 17 had the
longest successful operating time. It ran for 13990 hours. It is hard to really
gain an in depth understand from just looking at the raw data.
Fortunately, one can apply some statistical analysis to gain further insight.
239
Goble05-AppA.fm Page 240 Thursday, March 31, 2005 11:14 PM
240 Appendix A: Statistics
Table A-1. Time To Failure Data Set

System Hours System Hours
1 96 16 1282
2 3091 17 13990
3 4862 18 12751
4 13853 19 2106
5 8339 20 5431
6 614 21 2740
7 1815 22 11460
8 10305 23 6056
9 7499 24 3471
10 1540 25 2414
11 831 26 4348
12 33 27 3886
13 240 28 9270
14 196 29 13351
15 1045 30 409
Statistical Analysis
Often, statistical analysis is done by grouping data into “bins.” For
example, the failure data may track the number of units failed in one
thousand hour increments (Table A-2). In many cases the data is gathered
this way as operation is checked periodically. When successful operation
is only checked every one thousand hours, the data set would naturally
look like Table A-2.
An examination of the data in this format shows additional information.

One can observe that more systems fail in the first block of time than any
other. It is also clear that the quantity of systems that fail in each block
decreases with increasing time. After that the quantity of systems that fail
in each time block remains rather constant until the last block. This form
of data is often presented graphically in a form called a histogram,
Figure A-1. Many consider the graphical format of the histogram to be a
very effective to quickly represent the information.
Appendix A: Statistics 241
Table A-2. Failure Data Grouped into One Thousand Hour Increments
Hours Units Cum.

0-1000 7 7
1001-2000 4 11
2001-3000 3 14
3001-4000 3 17
4001-5000 2 19
5001-6000 1 20
6001-7000 1 21
7001-8000 1 22
8001-9000 1 23
9001-10000 1 24
10001-11000 1 25
11001-12000 1 26
12001-13000 1 27
13001-14000 3 30
Censored Data
8
Failed Units
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Operational Hours - 1000
Figure A-1. Histogram of Failure Data
An assignment of probability can be made based on this data set. The

histogram in Figure A-1 is “normalized” by dividing each data quantity
by the total. In this way a histogram is converted into a “probability
density function (pdf).” A pdf is a plot of probability versus the statistical
variable. In this example, the statistical variable is operational failure time.
242 Appendix A: Statistics
Figure A-2 shows the pdf for this set of data. The probability of a failure in
the first time period (0-1000) hours is 0.233.
P rob ab ility D e nsity F unction
0. 25
0. 2
Probability - p(x)
0. 15
0. 1
0. 05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
X
Figure A-2. Probability Density Function of Failure Data
Mean – Median – Standard Deviation

Statistical analysts can provide numbers that characterize the data set.
Two of the most common numbers provided are called the “mean” and
the “median.” The mean is an arithmetic average of the data. It provides
insight into the data set.
EXAMPLE A-1
Problem: A set of time to failure data is given in Table A-1. What is the mean
of this data?
Solution: The numbers are summed and divided by the total quantity of
systems (30).
Mean = 147,324 / 30 = 4,910.8
The median is the “middle” value. For an even quantity of data, the median is
average of the two middle numbers.
The difference between the mean and the median indicates the symmetry
of the probability density function (pdf). For a symmetrical distribution
like a normal distribution, the numbers would be similar. For this example
the difference in the two numbers indicates a non-symmetric pdf. This can
be seen in Figure A-2.
Appendix A: Statistics 243
EXAMPLE A-2
Problem: A set of time to failure data is given in Table A-1. What is the
median of this data?
Solution: Median = (3091+3471)/2 = 3281 Hours
Another common number calculated for data sets is called the standard
deviation. This number indicates the number range of the pdf. In failure
data analysis this number is not commonly used as the failure rate pdf is
characterized by other numbers.
Statistical Significance – Applicability

Two things that should be remembered about all statistics, and especially
failure statistics, are statistical significance and applicability. The more
data we have, the more certain we are that the statistics have meaning.
That measure has been fully developed and is called statistical
significance. With little data, one cannot be sure about the accuracy of the
result.
One must also be concerned about applicability of the statistics. This is

especially relevant regarding failure data. If the statistical data set was
obtained under conditions that are completely different than current
conditions, one must ask if the data is relevant.
Many statisticians will take a “Bayesian” approach to the problem

estimating the probability that a data set is valid. Given a number of data
sets, effectively an average of the data sets is obtained.
EXAMPLE A-3
Problem: A set of time to failure data has a mean of 5521 hours. A second set
of time to failure data has a mean of 4911 hours. A third set of time to failure
data has a mean of 12,340 hours. It is estimated that the first data set has a
50% chance of being correct. It is estimated that the second set of data has a
40% chance of being correct. It is estimated that the third set of data has a
10% chance of being correct. What is most likely value for the mean?
Solution:
Expected value of Mean = 5521 × 0.5 + 4911 × 0.4 + 12340 × 0.1 = 5959
This tells us that one must record not only the data under study but all
suspected relevant conditions.

Appendix Statistics: Random Variables

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Appendix Statistics: Random Variables

Uploaded by

Copyright:

Available Formats

Goble05-AppA.

fm Page 239 Thursday, March 31, 2005 11:14 PM

Statistical analysis is quite useful because data, when gathered, is often

240 Appendix A: Statistics

Table A-1. Time To Failure Data Set

An examination of the data in this format shows additional information.

Appendix A: Statistics 241

Hours Units Cum.

Figure A-1. Histogram of Failure Data

An assignment of probability can be made based on this data set. The

242 Appendix A: Statistics

P rob ab ility D e nsity F unction

Figure A-2. Probability Density Function of Failure Data

Mean – Median – Standard Deviation

Mean = 147,324 / 30 = 4,910.8

Appendix A: Statistics 243

Solution: Median = (3091+3471)/2 = 3281 Hours

Statistical Significance – Applicability

One must also be concerned about applicability of the statistics. This is

Many statisticians will take a “Bayesian” approach to the problem

You might also like