Professional Documents
Culture Documents
Appendix A
Statistics
Random Variables
Many processes have outcomes that cannot be predicted given our current
level of knowledge about the process. In such situations statistical analysis
can be used to gain knowledge about a process from a set of data. Data is
gathered by recording a specific random variable. Statistical analysis
provides specific information about that random variable. In reliability
engineering the primary random variable is “time to failure,’ the
successful operating time interval until a failure occurs.
One can study the data and gain insight regarding when a system might
fail. It is notable that system 12 failed after only 33 hours. This system
failed first and had the shortest successful operating time interval. Several
systems had much longer successful operating times. System 17 had the
longest successful operating time. It ran for 13990 hours. It is hard to really
gain an in depth understand from just looking at the raw data.
Fortunately, one can apply some statistical analysis to gain further insight.
239
Goble05-AppA.fm Page 240 Thursday, March 31, 2005 11:14 PM
Statistical Analysis
Often, statistical analysis is done by grouping data into “bins.” For
example, the failure data may track the number of units failed in one
thousand hour increments (Table A-2). In many cases the data is gathered
this way as operation is checked periodically. When successful operation
is only checked every one thousand hours, the data set would naturally
look like Table A-2.
Table A-2. Failure Data Grouped into One Thousand Hour Increments
8001-9000 1 23
9001-10000 1 24
10001-11000 1 25
11001-12000 1 26
12001-13000 1 27
13001-14000 3 30
Censored Data
8
Failed Units
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Operational Hours - 1000
Figure A-2 shows the pdf for this set of data. The probability of a failure in
the first time period (0-1000) hours is 0.233.
0. 25
0. 2
Probability - p(x)
0. 15
0. 1
0. 05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
X
EXAMPLE A-1
Problem: A set of time to failure data is given in Table A-1. What is the mean
of this data?
Solution: The numbers are summed and divided by the total quantity of
systems (30).
The median is the “middle” value. For an even quantity of data, the median is
average of the two middle numbers.
The difference between the mean and the median indicates the symmetry
of the probability density function (pdf). For a symmetrical distribution
like a normal distribution, the numbers would be similar. For this example
the difference in the two numbers indicates a non-symmetric pdf. This can
be seen in Figure A-2.
Goble05-AppA.fm Page 243 Thursday, March 31, 2005 11:14 PM
EXAMPLE A-2
Problem: A set of time to failure data is given in Table A-1. What is the
median of this data?
Another common number calculated for data sets is called the standard
deviation. This number indicates the number range of the pdf. In failure
data analysis this number is not commonly used as the failure rate pdf is
characterized by other numbers.
EXAMPLE A-3
Problem: A set of time to failure data has a mean of 5521 hours. A second set
of time to failure data has a mean of 4911 hours. A third set of time to failure
data has a mean of 12,340 hours. It is estimated that the first data set has a
50% chance of being correct. It is estimated that the second set of data has a
40% chance of being correct. It is estimated that the third set of data has a
10% chance of being correct. What is most likely value for the mean?
Solution:
Expected value of Mean = 5521 × 0.5 + 4911 × 0.4 + 12340 × 0.1 = 5959
This tells us that one must record not only the data under study but all
suspected relevant conditions.