Practical Statistical Tools For The Reliability Engineer

Practical Statistical
Tools
for the Reliabilit y
Engineer
Reliability Analysis Center
RAC is a DoD Information Analysis Center sponsored by the Defense Technical

Information Center and operated by IIT Research Institute, Rome, NY
Ordering No.: STAT
Practical Statistical Tools for the

Reliability Engineer
3UHSDUHGE\
5HOLDELOLW\$ QDO\VLV&HQWHU
0LOO6WUHHW
5RPH1<
8QGHU&RQWUDFWWR
'HIHQVH6XSSO\&HQWHU&ROXPEXV
'6&&3/,
32%R[%XLOGLQJ
&ROXPEXV2+
Reliability Analysis Center

5$&LVD'R',QIRUPDWLRQ$QDO\VLV&HQWHUVSRQVRUHGE\WKH'HIHQVH7HFKQLFDO
,QIRUPDWLRQ&HQWHUDQGRSHUDWHGE\,,75HVHDUFK,QVWLWXWH5RPH1<
$SSURYHGIRU3XEOLF5HOHDVH'LVWULEXWLRQ8QOLPLWHG
The information and data contained herein have been compiled from government and
nongovernment technical reports and from material supplied by various manufacturers and are
intended to be used for reference purposes. Neither the United States Government nor IIT
Research Institute warrant the accuracy of this information and data. The user is further
cautioned that the data contained herein may not be used in lieu of other contractually cited
references and specifications.
Publication of this information is not an expression of the opinion of The United States
Government or of IIT Research Institute as to the quality or durability of any product mentioned
herein and any use for advertising or promotional purposes of this information in conjunction
with the name of The United States Government or IIT Research Institute without written
permission is expressly prohibited.
-ii-
Form Approved
REPORT DOCUMENTATION PAGE OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching, existing data sources,
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget. Paperwork Reduction Project (0704-0188), Washington, DC 20503.
1. AGENCY USE ONLY (Leave Blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
September 1999
4. TITLE AND SUBTITLE 5. FUNDING NUMBERS
Practical Statistical Tools for the Reliability Engineer

65802S
6. AUTHOR(S)
Anthony Coppola
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION

Reliability Analysis Center REPORT NUMBER
201 Mill Street STAT
Rome, NY 13440-6916
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING
Defense Technical Information Center (DTIC-AI) AGENCY REPORT NUMBER
8725 John J. Kingman Road, Suite 0944 SP0700-97-D-4006
Ft. Belvoir, VA 22060-6218
11. SUPPLEMENTARY NOTES:
Hard copies available from the Reliability Analysis Center, 201 Mill Street, Rome, NY 13440-6916.
(Price: $75.00 U.S., $85.00 Non-U.S.).
12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

Approved for public release; distribution unlimited. Unclassified
13. ABSTRACT (Maximum 200 words)
This report provides basic instruction in statistics and its applications to reliability engineering. General probability and
statistical concepts are explained and specific statistical tools are introduced by considering their applications to measuring
reliability, demonstrating reliability, reliability growth testing, sampling, statistical quality control, and process improvement.
14. SUBJECT TERMS 15. NUMBER OF PAGES

120
Probability
Statistics
16. PRICE CODE
Reliability
$75.00
17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT
OF REPORT OF THIS PAGE OF ABSTRACT
NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)

Prescribed by ANSI Std. Z39-18
298-102
-iii-
The Reliability Analysis Center (RAC) is a Department of Defense Information Analysis Center sponsored by the Defense
Technical Information Center, managed by the Air Force Research Laboratory (formerly Rome Laboratory), and operated by
IIT Research Institute (IITRI). RAC is chartered to collect, analyze and disseminate reliability, maintainability and quality
information pertaining to systems and products, as well as the components used in them. The RAC addresses both military
and commercial perspectives and includes such reliability related topics as testability, Total Quality Management and lifetime
extension.
The data contained in the RAC databases is collected on a continuous basis from a broad range of sources, including testing
laboratories, device and equipment manufacturers, government laboratories and equipment users (government and industry).
Automatic distribution lists, voluntary data submittals and field failure reporting systems supplement an intensive data
solicitation program. Users of RAC are encouraged to submit their reliability, maintainability and quality data to enhance
these data collection efforts.
RAC publishes documents for its users in a variety of formats and subject areas. While most are intended to meet the needs
of reliability practitioners, many are also targeted to managers and designers. RAC also offers reliability consulting, training,
and responses to technical and bibliographic inquiries.
REQUESTS FOR TECHNICAL ASSISTANCE ALL OTHER REQUESTS SHOULD BE

AND INFORMATION ON AVAILABLE RAC DIRECTED TO:
SERVICES AND PUBLICATIONS MAY BE
DIRECTED TO:
Reliability Analysis Center Air Force Research Laboratory

201 Mill Street AFRL – Information Directorate
Rome, NY 13440 Attn: R. Hyle
525 Brooks Road
General Information: (888) RAC-USER Rome, NY 13441-4505
(888) 722-8737
Product Ordering: (800) 526-4802 Telephone: (315) 330-4857
Training Inquires: (800) 526-4803 DSN: 587-4857
TQM Inquiries: (800) 526-4804 TeleFax: (315) 330-7647
Technical Inquiries: (315) 337-9933 E-Mail: hyler@rl.af.mil
TeleFax: (315) 337-9932
DSN: 587-4151
E-Mail: rac@iitri.org
Internet: http://rac.iitri.org
© 1999, IIT Research Institute

This material may be reproduced by or for the US Government pursuant to
the copyright license under the clause at DFARS 252.227-7013 (Oct. 1988)
-iv-
STAT Preface v
PREFACE
Statistical tools are powerful aids to reliability engineering and related disciplines.
However, many people, including engineers, consider the process of mastering statistics to be
painful. This book is an attempt to provide the reliability practitioner a reasonable capability in
the use of statistical tools without the pain. For this reason, discussion of statistical theory will
be kept to a minimum, and useful tools will be demonstrated by showing their practical
application to various reliability engineering tasks.
Reliability Analysis Center (RAC) • 201 Mill Street, Rome, NY 13440-6916 • 1-888-RAC-USER
vi Practical Statistical Tools for the Reliability Engineer STAT
This page intentionally left blank.
STAT Table of Contents vii
TABLE OF CONTENTS
Page
1.0 WHAT YOU NEED TO KNOW ABOUT PROBABILITY...................................... 1
1.1 When Events are Independent .......................................................................... 1
1.2 When Events are Mutually Exclusive............................................................... 2
1.3 When Events are Not Independent ................................................................... 3
1.3.1 Bayes’ Theorem..................................................................................... 4
1.4 In Summary....................................................................................................... 6
2.0 INTRODUCTION TO STATISTICS ......................................................................... 7
2.1 Many Ways to be "Average"............................................................................. 7
2.2 Ways to Measure Spread .................................................................................. 8
2.3 Introduction to Distributions............................................................................. 9
2.4 Testing Hypotheses........................................................................................... 11
2.5 For Further Study.............................................................................................. 12
3.0 SOME DISTRIBUTIONS AND THEIR USES.......................................................... 13
3.1 Discrete Distributions ....................................................................................... 13
3.1.1 The Binomial Distribution.................................................................... 13
3.1.2 The Poisson Distribution ...................................................................... 15
3.1.3 The Hypergeometric Distribution ......................................................... 16
3.2 Continuous Distributions .................................................................................. 18
3.2.1 The Normal Distribution....................................................................... 19
3.2.1.1 The Standard Normal Distribution ........................................ 20
3.2.1.2 The Normal Distribution’s Role in Sampling ........................ 22
3.2.2 Various Other Useful Distributions, In Brief........................................ 22
3.2.2.1 The Lognormal ...................................................................... 22
3.2.2.2 The Exponential..................................................................... 23
3.2.2.3 The Weibull ........................................................................... 23
3.2.2.4 The Student t.......................................................................... 23
3.2.2.5 The F Distribution.................................................................. 25
3.2.2.6 The Chi-Square Distribution.................................................. 25
3.3 In Summary....................................................................................................... 25
4.0 MEASURING RELIABILITY.................................................................................... 27
4.1 General Principles............................................................................................. 27
4.2 The Versatile Weibull Distribution .................................................................. 28
4.2.1 Caveats.................................................................................................. 32
4.3 Measuring Reliability of Repairable Systems................................................... 33
4.3.1 Testing for Trends................................................................................. 33
4.3.2 Confidence Limits when the Failure Rate is Constant ......................... 36
4.4 Measuring Reliability of "One-Shot" Products................................................. 40
5.0 DEMONSTRATING RELIABILITY ......................................................................... 41
5.1 Zero Failure Tests ............................................................................................. 41
5.2 Tests Allowing Failures .................................................................................... 43
5.2.1 Controlling the Producer’s Risks .......................................................... 43
5.3 Testing Under the Exponential Distribution..................................................... 44
5.3.1 Sequential Tests: A Short Cut.............................................................. 46
5.4 Other Test Considerations ................................................................................ 48
viii Practical Statistical Tools for the Reliability Engineer STAT
TABLE OF CONTENTS (CONT’D)
Page
6.0 RELIABILITY GROWTH TESTING ........................................................................ 49
6.1 Duane Growth Analysis.................................................................................... 49
6.1.1 Least Square Regression ....................................................................... 50
6.2 AMSAA Growth Analysis................................................................................ 52
7.0 SAMPLING (POLLING) AND STATISTICAL QUALITY CONTROL.................. 55
7.1 Measuring Quality from Samples ..................................................................... 55
7.1.1 Caveats.................................................................................................. 58
7.2 Demonstrating Acceptability Through Sampling ............................................. 58
7.3 Statistical Quality Control ................................................................................ 61
7.3.1 Control Charts....................................................................................... 62
7.3.2 Control Charts for Variables................................................................. 63
7.3.3 Range Charts......................................................................................... 64
7.3.4 Interpreting Control Charts ................................................................... 66
7.3.5 Controlling Attributes........................................................................... 66
7.3.5.1 Proportions............................................................................. 67
7.3.5.2 Rates ...................................................................................... 68
7.3.6 Caveat: "In Control" May Not Be "In-Spec" ....................................... 68
7.3.6.1 Measuring Process Capability................................................ 68
7.3.6.2 Measuring Process Performance............................................ 70
8.0 USING STATISTICS TO IMPROVE PROCESSES ................................................. 71
8.1 Designing Experiments..................................................................................... 71
8.1.1 Saturated Arrays: Economical, but Risky............................................ 75
8.1.2 Testing for Robustness ......................................................................... 75
8.2 Is There Really a Difference? ........................................................................... 76
8.3 How Strong is the Correlation? ........................................................................ 79
9.0 CLOSING COMMENTS............................................................................................ 83
Appendix A: Poisson Probabilities ..................................................................................... 85
Appendix B: Cumulative Poisson Probabilities.................................................................. 89
Appendix C: The Standard Normal Distribution ................................................................ 93
Appendix D: The Chi-Square Distribution ......................................................................... 97
Appendix E: The Student t Distribution.............................................................................. 101
Appendix F: Critical Values of the F Distribution for Tests of Significance...................... 105
STAT Table of Contents ix
LIST OF FIGURES
Page
Figure 2-1: Distribution of Heads in Two Tosses of a Coin ........................................... 10
Figure 3-1: Distribution of Heights................................................................................. 18
Figure 3-2: More Detailed Distribution of Heights......................................................... 18
Figure 3-3: Continuous Distribution Curve for Height................................................... 19
Figure 3-4: Standard Normal Distribution ...................................................................... 21
Figure 4-1: Probability of Failure as Represented by the Area Under the
Probability Density Function........................................................................ 27
Figure 4-2: Weibull Plot ................................................................................................ 32
Figure 5-1: Devising a Reliability Test ........................................................................... 44
Figure 5-2: Typical Sequential Test ................................................................................ 47
Figure 6-1: Typical Duane Plot....................................................................................... 52
Figure 7-1: Ideal O-C Curve ........................................................................................... 59
Figure 7-2: Practical O-C Curve ..................................................................................... 59
Figure 7-3: Run Chart ..................................................................................................... 62
Figure 7-4: Control Chart................................................................................................ 63
Figure 7-5: X and R Chart Combination ....................................................................... 64
Figure 7-6: "p chart" for Different Sample Sizes ............................................................ 67
Figure 7-7: Process Capability (Cp) Chart ...................................................................... 69
Figure 7-8: Process Performance (Cpk) Chart ................................................................. 70
Figure 8-1: Scattergram................................................................................................... 80
Figure 8-2: Scattergram of Data in Table 8-10 ............................................................... 81
x Practical Statistical Tools for the Reliability Engineer STAT
LIST OF TABLES
Page
Table 1-1: Known Data.................................................................................................. 4
Table 1-2: Converted Data............................................................................................. 5
Table 1-3: Summary of Section 1 .................................................................................. 6
Table 2-1: Salary Data ................................................................................................... 7
Table 2-2: Spread Analysis............................................................................................ 9
Table 2-3: Experimental Data........................................................................................ 9
Table 2-4: Comparison of Results ................................................................................. 10
Table 3-1: Extracts from Appendix A ........................................................................... 16
Table 3-2: Standard Normal Distribution Data.............................................................. 21
Table 3-3: Critical Values of z....................................................................................... 22
Table 3-4: Summary of Distributions ............................................................................ 25
Table 4-1: Life Data....................................................................................................... 30
Table 4-2: Ordered Data ................................................................................................ 31
Table 4-3: Completed Data Table.................................................................................. 31
Table 4-4: Failure Data .................................................................................................. 33
Table 4-5: Critical Values for the Laplace Statistic....................................................... 35
Table 4-6: Chi-Square Values........................................................................................ 37
Table 4-7: Confidence Interval Formulas ...................................................................... 40
Table 5-1: Fixed-Time Reliability Tests........................................................................ 46
Table 5-2: Sequential Tests............................................................................................ 47
Table 5-3: Sequential Test Plan for 10% Risks, 2.0 Discrimination Ratio ................... 47
Table 6-1: Growth Data ................................................................................................. 51
Table 6-2: Growth Data Revisited ................................................................................. 53
Table 6-3: Comparison of Estimates ............................................................................. 54
Table 7-1: Critical Values of z....................................................................................... 56
Table 7-2: Statistical Constants ..................................................................................... 65
Table 8-1: Two Factor Orthogonal Array...................................................................... 72
Table 8-2: Expanded Test Matrix .................................................................................. 72
Table 8-3: Orthogonal Array.......................................................................................... 73
Table 8-4: Sample Test Results ..................................................................................... 74
Table 8-5: Three Factor Full-Factorial Array ................................................................ 75
Table 8-6: Saturated Array (Table 8-2 Modified).......................................................... 75
Table 8-7: Testing for Robustness ................................................................................. 76
Table 8-8: Defect Data................................................................................................... 76
Table 8-9: Critical Values for F at 0.05 Significance .................................................... 79
Table 8-10: Paired Data ................................................................................................... 81
Table 8-11: Data Analysis ............................................................................................... 82
STAT Table of Contents xi
INTRODUCTION
This book presents some basic material on probability and statistics and provides examples
of how they are used in reliability engineering. To keep the book short and uncomplicated, not
all subjects will be treated in detail and many more topics were ignored. Nevertheless, this text
should help the novice reliability engineer understand the utility of probability and statistics, and
can serve as a quick reference and refresher for the experienced engineer.
It is important to remember that reliability engineering is not just the application of

probability and statistics, and probability and statistics are not exclusively dedicated to reliability
engineering. Reliability engineering is the science of designing products and processes to be
reliable. Probability and statistics are simply tools that can help evaluate, predict, and measure
reliability, among other uses. However, it is important for every reliability engineer to be able to
use these tools effectively.
xii Practical Statistical Tools for the Reliability Engineer STAT
STAT Section 1: What You Need to Know About Probability 1
1.0 WHAT YOU NEED TO KNOW ABOUT PROBABILITY
Statistical inferences will often be expressed as probabilities. Probability can be defined as

our degree of belief that an event will occur (e.g., I think I have an even chance of finishing this
book on schedule). In statistical analysis, however, probability is usually defined as the expected
frequency that an outcome will occur (e.g., nine times out of ten the actual value of a product’s
failure rate will be within a range calculated by a particular method). This frequency may be
stated as a percentile (e.g., 90% of the time the failure rate will actually be within the calculated
range) or its decimal equivalent (e.g., there is a probability of 0.90 that the failure rate is really in
the calculated range). Probability may also be referred to as "confidence" (e.g., there is a 90%
confidence that the failure rate is within the calculated range).
In dealing with probabilities, some useful relationships can be applied, depending on

certain assumptions.
1.1 When Events are Independent
If we know the probabilities of two events happening, and can assume that the events are
independent (i.e., the occurrence of one does not increase or decrease the probability that the
other will occur), then the probability of both events happening is:
P(a and b) = P(a)P(b) (1-1)
where:
P(a and b) = probability of both event "a" and event "b" happening
P(a) = probability of event "a" happening
P(b) = probability of event "b" happening
For example, suppose an airplane uses a satellite receiver to track its position from the
Global Positioning System (GPS) and can also track its position from a radio direction finder
(RDF) receiver. If the GPS receiver fails once in 100 flights, the probability of losing the GPS
tracking capability is 0.01 per flight. If the RDF receiver fails once every fifty flights, the
probability of losing the RDF tracking capability is 0.02 per flight. If we can assume that failure
of one does not affect the other (this is not a trivial assumption: both could fail simultaneously
from some common cause, for example, lightning hitting the aircraft) then the probability of
losing both position tracking systems on the same flight is:
P = (0.01)(0.02) = 0.0002 (1-2)
Or, two times in 10,000 flights both position tracking systems will be out of service.
2 Practical Statistical Tools for the Reliability Engineer STAT
1.2 When Events are Mutually Exclusive
When probabilities are mutually exclusive (i.e., the occurrence of one event precludes the
other), the probability of either of two events happening is:
P(a or b) = P(a) + P(b) (1-3)
A useful fact is that the sum of the probabilities of all possible outcomes of an event must
equal unity. Further, the probability that an event will occur (P) plus the probability that it will
not occur (Q) must also equal one, since there are no other possibilities. Thus:
P + Q = 1 or P = (1 - Q) or Q = (1 - P) (1-4)
If the probability of a GPS receiver failure is 0.01, then the probability of no failure is
(1 - 0.01) or 0.99.
Often, it is much easier to calculate one of the parameters (P or Q) than the other. Since
P + Q = 1, one parameter can always be found from the other.
Continuing the position tracking example, and the assumption of independence, the
probability of a flight without total loss of position tracking capability would be:
P(s) = P(g)P(r) + P(g)Q(r) + Q(g)P(r) (1-5)
where:
P(s) = probability of success (no total loss of all position tracking systems)
P(g) = probability of no failure in GPS receiver = (1 - 0.01) = 0.99
Q(g) = probability of failure in GPS receiver = 0.01
P(r) = probability of no failure in RDF receiver = (1 - 0.02) = 0.98
Q(r) = probability of failure in RDF receiver = 0.02
Hence:
P(s) = (0.99)(0.98) + (0.99)(0.02) + (0.01)(0.98) = 0.9702 + 0.0198 + 0.0098 = 0.9998 (1-6)
Each of these events is mutually exclusive and they constitute all the "successful"
situations. The other possibility is Q(g)Q(r), the probability that both the GPS and RDF receivers
will fail, which was computed by Equation 1-2 as equal to 0.0002. From Equation 1-4:
P(s) = 1.0 - Q(g)Q(r) = 1.0 - 0.0002 = 0.9998 (1-7)
Note: in this example, an event can be defined either as the occurrence of a failure or as the
lack of a failure. Hence P(i) can be the probability of no failure or the probability of failure in the
component identified as (i). By convention, P(i) is usually the probability of success (no failure),
and Q(i) the probability of failure, when both notations are used in one formula. P(i) is generally
used when only one notation is needed, whether it refers to a failure event or a non-failure event.
We are following this convention, even though this reverses the meaning of P(i) from the
previous example. Q(g)Q(r) in Equation 1-7 is identical in meaning to P(a)P(b) in Equation
1-1.
Finally, consideration of mutually exclusive events leads to another solution for the case of
independent events, shown in Equation 1-8.
P(s) = P(g) + P(r) - P(g)P(r) (1-8)
where all terms are as defined for Equation 1-5.
The rationale for this is that P(g) includes all of the cases in which the GPS receiver is
operating, including both the times that the RDF receiver is operating and the times that it has
failed. Similarly, P(r) includes all of the cases in which the RDF receiver is operating, including
both the times the GPS receiver is operating and the times it fails. Thus, P(g) + P(r) twice counts
the times that both the GPS and RDF receivers are operating, and so these times must be
subtracted to yield P(s). This is easily proven by decomposing P(g) and P(r) into mutually
exclusive events:
P(g) = P(g)P(r) + P(g)Q(r) (1-9)
P(r) = P(r)P(g) + P(r)Q(g) = P(g)P(r) + Q(g)P(r) (1-10)
Substituting Equations 1-9 and 1-10 into Equation 1-8 we get:
P(s) = P(g)P(r) + P(g)Q(r) + P(g)P(r) + Q(g)P(r) - P(g)P(r) (1-11)

= P(g)P(r) + P(g)Q(r) + Q(g)P(r)
which is the same result as Equation 1-5.
1.3 When Events are Not Independent
In the examples given in Sections 1.1 and 1.2, a failure of the GPS receiver does not change
the probability that the RDF receiver will also fail. This is not always true. Suppose one-tenth of
all failures in the GPS are due to external events, like lightning strikes, which also take out the
RDF. Then, our calculation of the probability of both receivers failing becomes more
complicated. First, we need a new term: P(b|a), defined as the conditional probability that event
"b" will occur, given that event "a" has occurred. Then:
P(a and b) = P(a)P(b|a) (1-12)
where:
P(a and b) = the probability that both events "a" and "b" will occur
P(a) = the probability that event "a" will occur
P(b|a) = the probability that event "b" will occur, given that event "a" occurs
Since "a" and "b" are arbitrary labels, Equation 1-12 can also be written:
P(a and b) = P(b)P(a|b) (1-13)
If the events were independent, P(b|a) = P(b) and P(a|b) = P(a), and Equations 1-12 and
1-13 would be identical in form to Equation 1-1. Since the events are not independent, we must
do a little more work.
If P(a and b) is the probability of both the GPS and RDF receivers failing, and P(a) is the
probability of the GPS failing, then P(b|a) is the probability of the RDF failing on a flight when
the GPS failed. We know that one-tenth (10% or 0.10) of all GPS failures are caused by factors
that also kill the RDF (probability of RDF failure = 1.0). This means that for 90% of the GPS
failures, any RDF failures must be from other causes, which have some probability of occurrence
whether or not the GPS has failed. To determine this probability, we could search our records
using just those flights when there was no failure of the GPS, thus eliminating any effects of GPS
failure. Suppose we found that the failure rate for the RDF, using the censored data, was 19
failures in 1,000 flights = 0.019. Hence:
P(b|a) = 0.10(1) + 0.90(0.019) (1-14)
Using Equations 1-12 and 1-14, the probability of both receivers failing is:
P(a and b) = P(a) P(b|a) = P(a)[0.10 (1) + 0.90(0.019)] = 0.01 [0.10 + 0.90(0.019)]
= 0.001 [0.1 + 0.017] = 0.0001 + 0.00017 = 0.00027 (1-15)
This result is higher than the 0.0002 found by Equation 1-2, where we assumed
independence, even though the RDF failure rate, exclusive of simultaneous failures, is higher in
Equation 1-2 than it is in Equation 1-15. When some failure mechanisms take out both units
simultaneously, the overall probability of both units failing must go up.
1.3.1 Bayes’ Theorem
A noted derivation from conditional probabilities is Bayes’ Theorem. For our discussion,
let us consider a radar installed in an aircraft. Assume we have gathered some statistics as shown
in Table 1-1.
Table 1-1: Known Data

Mission Profile Percent of Sorties Using Probability of Radar
Mission Profile Failure During Mission
Combat 0.20 0.20
Training 0.20 0.10
Transport 0.60 0.05
Letting event ai represent the probability of a sortie using a specific mission profile, and
event b represent a radar failure, we can convert the data in Table 1-1 to terms of probabilities
and conditional probabilities. Table 1-2 shows the converted data.
Table 1-2: Converted Data

i Mission Profile P(ai) P(b|ai)
1 Combat 0.20 0.20
2 Training 0.20 0.10
3 Transport 0.60 0.05
Suppose we are given the information that an aircraft came back from a sortie with a failed
radar. We can use the information to calculate the probability that the sortie was a combat
mission. To do so, we must derive Bayes’ Theorem.
Since P(a and b) = P(a|b) P(b) = P(b|a) P(a), it follows that:
P(ai) P(b) = P(ai|b) P(b) = P(b|ai) P(ai) (1-16)
Therefore:
P(ai|b) = P(b|ai) P(ai)/P(b) (1-17)
Since P(a1) is the probability of a combat mission, the solution we seek is:
P(a1|b) = P(b|a1) P(a1)/P(b) (1-18)
Since event "a" is a set of mutually exclusive events, and there is a different conditional
probability of event "b" happening for each event in set "a", the total probability of event "b"
happening is:
P(b) = Σ P(b|ai) P(ai) (1-19)
Substituting Equation 1-19 into Equation 1-18:
P(b | a1 ) P(a1 )
P(a 1 | b) = (1-20)
∑ P(b | a i ) P(a i )
In the equation, P(a1|b) is the probability of the returned airplane having flown a combat
profile, given that it came back with a failed radar. The other terms are quantified in Table 1-2.
Equation 1-20 is Bayes’ Theorem.
Substituting the data from Table 1-2 into Equation 1-20:
(0.20)(0.20) 0.04
P(a 1 | b) = =
(0.20)(0.20) + (0.20)(0.10) + (0.60)(0.05) 0.04 + 0.02 + 0.03
0.04
= 0.44 (1-21)
0.09
Without the knowledge of the radar failure, we would have estimated the probability of the
aircraft having just flown a combat profile at 0.20, based on the data in the second column of
Table 1-1. The information that the radar failed raises our estimate to 0.44. The 0.20 figure is
called the "prior" estimate, because it comes before the gathering of additional data (i.e., that the
radar failed on the mission), and the 0.44 figure is called the "posterior" estimate, because it
comes after the new data is considered.
Bayes’ Theorem is the foundation of both useful and dubious analyses combining "prior"
data with information from statistical sampling to produce a "posterior" estimate. The theorem is
quite correct. Some applications that depend on "subjective priors" (i.e., the known information
is an opinion or assumption rather than a conclusion from a set of statistics) can be questionable.
1.4 In Summary
Table 1-3 summarizes the material presented in this section.
Table 1-3: Summary of Section 1

When events are: And the following apply: You can use:
Independent The occurrence of one event has no effect P(a and b) = P(a) P(b)
on the occurrence of the other P(a or b) = P(a) + P(b) - P(a and b)
Mutually exclusive The occurrence of one event precludes the P(a or b) = P(a) + P(b)
other
P(a) = probability event "a" occurs P(a) + Q(a) = 1
Q(a) = probability event "a" does not P(a) = 1 - Q(a)
occur Q(a) = 1- P(a)
Not independent The occurrence of one event may affect the P(a and b) = P(a) P(b|a)
other = P(b) P(a|b)
One event may have several different P(b | a 1 ) P(a 1 )
outcomes, each affecting the other event P(a 1 | b) =
differently ∑ P(b | a i ) P(a i )
STAT Section 2: Introduction to Statistics 7
2.0 INTRODUCTION TO STATISTICS
"Statistics" as a mathematical discipline is concerned with describing something in useful

numerical terms (e.g., the average salary for a reliability engineer) or with providing the means
for making useful inferences about something (e.g., the percent of a product that will be outside
specified limits). These products are based on measured values, such as the number of defects in
a sample of parts. One measured value is called a "statistic" and more than one are called
"statistics". The term “random variable” refers to the function that describes the values the
statistics can take. For example, the outcome of a toss of a coin is a random variable that can
take the value "heads" or "tails". The number of heads in two tosses of a coin is a random
variable that can take the values 0, 1, or 2. (A case could be made that the constraints on a
random variable make it neither truly random nor greatly variable, but we shall have to take the
term as given.)
A variety of statistical tools are used to convert statistics (i.e., measurements) into the
desired descriptions or inferences. These will be discussed in the following sections organized
by the use of the results. Common to all of these tools will be the terminology and concepts
discussed in this section.
2.1 Many Ways to be "Average"
The word "average" is used to describe a central point for the values taken by a random
variable. However, more specific measures are needed in statistical analysis. To illustrate,
consider the following (completely contrived) data shown in Table 2-1 which we will say
represents the salaries of reliability engineers in a mythical company, ordered from smallest to
largest.
Table 2-1: Salary Data

$20,000
20,000
20,000
40,000
50,000
75,000
125,000
What is the average salary for a reliability engineer in the company? (We suggest you form
your own opinion before going on.)
The company claims its reliability engineers earn an average of $50,000. The engineers
usually figure the average salary as $40,000, and the company union claims it's $20,000. And
they are all right! The differences arise from different definitions of "average", and the different
definitions are based on the different uses of the measure.
The company is interested in what it pays for reliability engineers and calculates the
"average" using the arithmetic mean (sum the values and divide by the number of data points).
This gives their cost per capita for reliability engineers.
The engineers are interested in their standing vis a vis each other and compute the
"average" based on the median (the value for which there are as many data points above as
below, i.e., the 50th percentile).
The union uses the mode (the value most frequently measured) as its "average" because it
represents the most people.
Thus the definition of "average" depends on its use. In most of the methods described
below, the arithmetic mean will be the definition of choice. We will call this simply "the
mean," ignoring the existence of the geometric mean, harmonic mean, and others, which do exist
but will not be needed for our purposes. Any "average" measure not using the arithmetic mean
will be noted when it occurs.
2.2 Ways to Measure Spread
Besides measuring central tendency, we often want to measure variation, or spread. For
example, if we were producing rods designed to be one inch long, we would want the mean
length of a production sample to be close to one inch. However, if the mean length were one
inch, but the individual rods varied from one-half inch to one and one-half inches in length, we
would probably not be happy. Hence, in specifying or measuring product parameters, we are
usually concerned with measures of spread.
The most obvious measure of spread is, perhaps, the range, defined as the difference
between the highest and lowest values of the parameter of interest. Specified values most often
include a stated range or "tolerance" around the design value (e.g., "rods shall be one inch long
plus or minus 0.001 inch"). This establishes a desired limit on spread, if a somewhat arbitrary
one (is 1.0009 inches always good and 1.0011 inches always bad?).
We seldom reject a group (or a "lot") of products because some units measure outside the
specified range. We do reject groups of products that we feel have too many units outside the
specified range. To determine what proportion of the product group is outside the specified
range, we need another measure of spread.
One way of measuring spread would be to take a sample of the product, measure the
parameter of interest in each unit of the sample and compare these measurements to the mean
value of the measurements. However, we could not merely subtract each measurement from the
mean (or the mean from each measurement), sum these results and divide by the number of units
measured. Because some measurements will be higher than the mean and others lower, our
results would tend to zero. For example, suppose we measured the length of a sample of rods,
and calculated the difference between each measurement and the mean, as shown in Table 2-2.
Table 2-2: Spread Analysis

Sample Rod Length in Inches Deviation from Mean Value
1 0.50 -0.50
2 0.90 -0.10
3 1.00 0
4 1.10 +0.10
5 1.50 +0.50
Sum of all Data 5.00 0
Mean Value 1.00 0
We therefore need a better way to measure spread. We could use the absolute values of the
differences between the individual measurements and the mean, but a more common alternative
is to use squared values. One measurement of spread, called the variance, is formulated by
subtracting the mean value from each measurement (x - x ), squaring the results (x - x )2,
summing the squares and dividing by the number of measurements, ∑ (x - x )2/n. A more
convenient measure is the standard deviation, which is merely the square root of the variance
(Equation 2-1). The larger the standard deviation, the more spread there is to the data.
2
∑ (x - x)
= (2-1)
n
2.3 Introduction to Distributions
We have discussed the random variable for the number of heads occurring in two tosses of
a coin. Suppose we are interested in wagering on the outcome of two tosses. We would want to
know the probabilities of all the possible outcomes, so we could establish appropriate odds.
(Note: the use of gambling examples is common in statistics, which was actually created to
analyze gaming odds. Gaming examples are quite analogous to some engineering applications,
where we are interested in the odds of making a wrong decision based on a set of statistics.)
Returning to our example, we could experiment by tossing a coin twice and counting the
number of heads. One such experiment would not be any help, but many replications, say 1,000,
would give a useful set of statistics. We would then have to organize these statistics into a useful
format. There are several ways to do this. One is simply as shown in Table 2-3.
Table 2-3: Experimental Data

Outcome (Number of Heads) Frequency (in 1000 Experiments)
0 238
1 509
2 253
A better way, because it graphically shows the relative frequencies, is shown in Figure 2-1,
called a histogram.
509
Frequency
In 1,000
Trials
253
238
0 1 2
Number of Heads
Figure 2-1: Distribution of Heads in Two Tosses of a Coin
The histogram of Figure 2-1 shows a frequency distribution. The next step towards
determining the odds on each outcome would be to convert the frequency distribution to a
probability distribution. This could be done by dividing the value of each column by the number
of experiments. This makes the total area enclosed by the data equal to one, and the area of each
column equal to the probability of the outcome it represents. The odds for a fair bet are then
simply the ratio of the probability of the outcome to one minus the probability. For example, the
probability of no heads is 0.238 so the odds are 0.238 to (1 - 0.238) or 0.762. One betting on the
outcome could put up $2.38 against $7.62 and should break even in the long run, if the computed
probability holds. Now, for comparison, we shall compute the same probabilities from a
theoretical approach, using probability theory.
Let us assume our coin is fair, meaning that it has no bias towards falling either heads or
tails, and that our method of tossing is also unbiased. We would therefore expect heads or tails
to be equally likely. This means the probability of a head is 0.5 on each toss, and the probability
of not getting a head is (1 - 0.5), which is the same as 0.5, but the longer expression shall be used
in the following formulas so that it will be clear which probability is meant. From probability
theory, the probability of no heads in two tosses is (1 - 0.5)(1 - 0.5) = 0.25. The probability of
one head in two tosses is the sum of the probability of getting heads on the first toss and not on
the second and the probability of getting heads on the second toss and not the first: (0.5)(1 - 0.5)
+ (1 - 0.5)(0.5) = 0.50. The probability of getting two heads is (0.5)(0.5) = 0.25. Let us now
compare these to our experimental results, as shown in Table 2-4.
Table 2-4: Comparison of Results

Outcome
Experimental Results Theoretical Results
(Number of Heads)
0 0.238 0.25
1 0.509 0.50
2 0.253 0.25
The experimental results show a little bias towards getting more heads than tails, but would
you bet on it? The difference could just be experimental error, which always exists. Many of the
methods described in this book will be concerned with separating experimental error from true
indications of bias. Instead of trying to decide whether or not we have a fair coin, we will try to
decide such things as how many failures it will take to convince us that a product is not as
reliable as we expected. Many different statistical distributions will be useful in making such
decisions. Some, like the one shown in our example, will be discrete distributions, having only
integers (e.g., number of failures) as possible outcomes. Others will be continuous distributions,
where an infinite delineation in outcomes is possible (for example, times between failures). We
will sort these out in Section 3.
2.4 Testing Hypotheses
One of the specialties of statistics is called hypothesis testing, which has some applications
to reliability engineering that we will cover later in Sections 4, 5 and 7. It involves using
statistical analysis to come to a conclusion to accept or reject a stated hypothesis with a known
risk of being in error. A hypothesis might be that a product has a given mean time between
failures (which can be one we want it to have or one we don’t want). This is called the null
hypothesis.
With each null hypothesis is an associated alternate hypothesis. This is usually merely the
negation of the null hypothesis. If the null hypothesis is that a product has a certain MTBF, the
alternate hypothesis is that it does not have that MTBF. However, there are other ways. In
sequential reliability testing (Section 5.3.1) the null hypothesis is that a product has a specific
MTBF that is considered desirable and the alternate hypothesis is that it has a specific MTBF that
is considered undesirable. In any event, each hypothesis test can have any of four results:
• Null hypothesis is true and is accepted. This is a correct result, which may be the
desired result or not. If the null hypothesis is that a product has a poor MTBF, we
would probably prefer it to be refuted, but whether we like it or not, we do want a
correct conclusion.
• Null hypothesis is true and is rejected. This is called a Type I error in hypothesis
testing. The probability of it occurring is conventionally called " α ". If the null
hypothesis is that the product has a parameter (e.g., MTBF, failure rate, number of
defects, et al) which is acceptable, " α " is the probability that we would conclude the
product is not acceptable. In this case, the "producer's risk" defined in Section 5 and
Section 7 would be equal to α since it is the probability that a customer will not accept
a good product.
• Alternate hypothesis is true and is accepted (null hypothesis rejected). This is another
correct result, and whether we prefer the null hypothesis or the alternate hypothesis, we
want a correct result.
• Alternate hypothesis is true and is rejected (null hypothesis accepted). This is called a
Type II error and its probability of occurrence is conventionally called " β ". The
"consumer's risk" discussed in Sections 5 and 7 represents the probability of accepting a
bad product, and is equal to β when the rejected alternate hypothesis is that the product
is bad (i.e., has an unacceptable MTBF, failure rate, number of defects, etc.).
Basically, the hypothesis test is performed by defining an acceptable value of " α " and
comparing sample data to a distribution representing the expected results when the null
hypothesis is true. If the distribution would produce the sample data with a probability of α or
less, the data is considered sufficiently unlikely to have come from the distribution and the null
hypothesis is rejected. Otherwise, the null hypothesis is accepted. Hypothesis tests can also be
defined based on " β " and on both risks, as we shall see in Sections 5 and 7.
2.5 For Further Study
Some recommended references in probability and statistics, ranked by increasing level of

difficulty, are:
Introductory:
• Probability and Statistics, Stephen S. Willoughby, Silver Burdett Company,

Morristown, NJ, 1968.
• The Cartoon Guide to Statistics, Larry Gonik & Woollcott Smith, HarperCollins
Publishers, New York, NY, 1993.
Intermediate:
• Basic Statistics, M.J. Kiemele & S.R. Schmidt, Air Academy Press, Colorado Springs,
CO, 1990.
• Statistical Methods in Engineering and Manufacturing, John E. Brown, Quality Press,
Milwaukee, WI, 1990.
Advanced:
• Methods for Statistical Analysis of Reliability & Life Data, N.R. Mann, R.E. Schaefer
& N.D. Singpurwalla, John Wiley & Sons, New York, NY, 1974.
• Military Handbook 338-1B, Electronic Reliability Design Handbook, U.S. Department
of Defense, Washington, DC, 1997.
STAT Section 3: Some Distributions and Their Uses 13
3.0 SOME DISTRIBUTIONS AND THEIR USES
Statistical analysis of practical problems often requires consideration of the distribution of

the data. There are non-parametric methods that do not assume any distribution for the data.
Analysis of a distribution is invariably advantageous, however, either because it is simpler or
because it gives more precise results. This, of course, requires the assumption of a distribution
for the data, and the usefulness of the results depends on the assumption being reasonable. No
one distribution fits all data, and more than one distribution may describe a set of data, depending
on the problem addressed. As an example, the distribution of the lengths of a lot of rods may
follow a normal distribution, but the number of rods with lengths out of specified limits is
described by a binomial distribution. So there are a variety of distributions we may find useful.
3.1 Discrete Distributions
Discrete distributions are concerned with random variables which are integers. We will use
these distributions to determine the probabilities that certain outputs will be experienced, such as
the probability of no failures, of less than "x" failures or more than "y" failures, etc. The most
useful discrete distributions are the binomial and the Poisson. The hypergeometric distribution is
also of interest to reliability engineers.
3.1.1 The Binomial Distribution
The binomial distribution, as the name implies, is concerned with "yes-no" outcomes.
Either a person is over six feet tall or he is not. Of more interest, either a product has failed or it
has not. It assumes that the probability of an event is the same in every trial. Then:
n!
f(x) = p x q (n - x) (3-1)
x! (n - x)!
where:
f(x) = probability of exactly "x" events occurring in "n" trials

p = probability of a successful trial (success = the event happened)
q = probability of an unsuccessful trial = (1 - p)
x (n - x)
The product p times q is simply the probability of "x" successes and "n - x" failures in
"n" trials. (Note: success of a trial could be the occurrence of a product failure and failure of a
trial the non-occurrence of a product failure, or vice-versa). However, there is more than one
way for this to happen. There could be "x" successful trials followed by "n - x" failures, or one
success, "n - x" failures, then "x - 1" successes, etc. Each of these is a mutually exclusive way of
getting the result of interest. The term
n!
(3-2)
x! (n - x)!
is a counting formula giving the total number of different ways one can have exactly "x"
successes in "n" trials. Since each way is presumably equally likely and its probability of
x (n - x)
occurrence equal to p times q , multiplying the two expressions gives the total probability of
exactly "x" successes in "n" trials.
For example, to use Equation 3-1 to determine the probability of an aircraft with two
engines getting through a flight with no engine failures, we can define p as the probability that a
randomly selected engine will fail during a flight (hence, q = probability the engine will not fail
during a flight). Let us assume that experience indicates that the frequency of engine failures is
10% per flight. Therefore, p = 0.10 and q = 0.90. Noting that n = number of engines = 2, and x
= number of failures we are interested in = 0, Equation 3-1 becomes:
2! 2 x1
P(0) = (0.10) 0 (0.90) (2 - 0) = (1)(0.90) 2 = 0.81 (3-3)
0! (2 - 0)! 1 (2 x 1)
Quite often we are interested in the probability of getting "x or less", or "x or more"
successes, rather than merely "x". For example, in testing one-shot devices, such as missiles, we
may want to know the probability of passing a test of 50 firings with no more than two failures
allowed. Or, we may have six radios in a cockpit and only need four for a successful mission.
We may then want to know the probability of having more than two failures during a mission.
To answer such questions, we note that we can have from zero to "n" successes in "n" trials, and
the probabilities of these events are mutually exclusive. Hence, the probability of "x or less"
events is:
x n!
P(x or less) = ∑ p x q (n − x) (3-4)
0 x! (n - x)!
And the probability of "x or more" events is:
n n!
P(x or more) = ∑ p x q (n − x) (3-5)
x x! (n - x)!
As always, it is useful to remember:
P(x or less) = 1 - P(x + 1 or more) (3-6)
To illustrate, suppose an airplane with two engines could still fly safely with one engine
failed. The probability of a successful flight would then be equal to the probability of one or less
failures occurring. Using Equation 3-4, and letting p = probability of engine failure = 0.10, as
before, we get:
2! 2!
P(1 or less) = (0.10) 0 (0.90) (2 - 0) + (0.10)1 (0.90) (2 - 1)
0! (2 - 0)! 1! (2 - 1)!
2 x1 2 x1
= (1)(0.90) 2 + (0.10)(0.90) = (0.81) + 2(0.09) = 0.81 + 0.18 = 0.98 (3-7)
1(2 x 1) 1(1)
The same result would have been obtained by finding the probability of two engine failures,
using Equation 3-5 and subtracting this from one, per Equation 3-6, as the reader may wish to
verify.
3.1.2 The Poisson Distribution
The Poisson distribution can be considered an extension of the binomial distribution when
"n" is infinite. It assumes events occur at a constant average rate. These events could be product
failures, or roses sold in a flower shop, etc. The number of events occurring in any interval is
independent of the number in any other interval (e.g., a recent failure does not make another
failure more or less likely). Under these assumptions, we can determine an expected number of
events in any given interval (e.g., the expected number of failures in a mission is the failure rate
" " times the mission length "t"). If there is any doubt that events are occurring at a constant
rate, the trend test in Section 4.3.1 may be used to test for constancy. Should a non-constant rate
apply, the methods in Section 6, Reliability Growth Testing, may be more appropriate. When the
assumptions of a constant failure rate and independent failures do hold, for an expected number
of events "a" the probability of getting exactly "x" events is:
a x e −a
f(x) = (3-8)
x!
When a = t, the probability of zero failures is:
P(0) = e -λt (3-9)
Equation 3-9 is a basic equation in reliability engineering. It calculates the probability of a

product completing a mission of time "t" without failure (under the assumptions that is
constant with time, and that failures are independent events).
We can use the Poisson distribution to determine the probability of passing a test run for a
fixed time with an allowable number of failures by the expression:
a x e −a
n
P(n or less) = ∑ (3-10)
0 x!
where:
n = the number of failures allowed
The probability of failing the test would be:
∞ a x e −a
P(n + 1 or more) = ∑ (3-11)
n +1 x!
or more practically:
n a x e −a
P(n + 1 or more) = 1 - ∑ (3-12)
0 x!
It is not necessary to compute Equations 3-8, 3-10 or 3-11. Tables of Poisson probabilities
(probability of "x" events when "a" are expected; i.e., solutions to Equation 3-8) have been
tabulated. One such table is presented in Appendix A. Also tabulated are cumulative Poisson
probabilities giving the probabilities of "x" or less events when "a" are expected (solutions to
Equation 3-10), and even cumulative probabilities of "x" or more events (solutions to Equation
3-11). Appendix B provides a table with solutions to Equation 3-10. The reader may use
Equation 3-12 to convert these to solutions of Equation 3-11.
As an example, suppose we tested repairable products and would consider the product
acceptable if no more than two failures occurred during the test. We assume the failure rate is
constant. If the product had a failure rate ( ) and was tested for the test time (t), the expected
number of failures for the product would be ( t). If we assume ( t) = 0.3, what is the
probability of the product passing the test?
The solution can be found from Equation 3-10, where (a) = 0.3, and (x) = 2. However it is
easier to use the tables. Table 3-1 is extracted from Appendix A.
Table 3-1: Extracts from Appendix A

Number of Failures Probability of (x) Failures
(x): When (a) = 0.3:
0 0.7408
1 0.2222
2 0.0333
Hence, the probability of passing the test (i.e., having 2 or less failures) = 0.7408 +
0.2222 + 0.0333 = 0.9963, which we could also have found directly using Appendix B.
It may be useful to note that the Poisson and binomial are related in that the expected value
"a" in the Poisson distribution can be expressed as the probability of an event "p" times the
number of opportunities or trials "n", where "p" and "n" are terms of the binomial. For any given
value of "a", as "p" approaches zero and "n" approaches infinity, the result of the binomial
formula (Equation 3-1) approaches the result of the Poisson formula (Equation 3-8).
3.1.3 The Hypergeometric Distribution
The binomial and Poisson distributions assume independent events. For example, the
occurrence of a defective unit in a sample does not increase or decrease the probability that the
next unit in the sample will be defective. Where the population being sampled is large in relation
to the sample size, or when sample units are replaced after selection and can be drawn again, this
assumption is reasonable. However, when the population is relatively small compared to the
sample size (say no larger than 10 times the sample size) and units sampled are not replaced, the
assumption may not be reasonable. For example, assume an office with six employees, two of
whom are women. Suppose we wish to select two at random for a survey of job satisfaction.
The probability that the first selected will be a woman is two in six, or one-third. If the first
selection is a woman, the probability of the second also being a woman is one in five, or one-
fifth. If the first selection were a man, the probability of the second being a woman would be
two in five or two-fifths. Hence, the samples are not independent. To handle such situations, we
can use the hypergeometric distribution.
The distribution is essentially a combination of three counting formulas like that of

Equation 3-2. The probability of "x" events happening in our sample (number of woman
selected, number of green apples in a bag, etc.) is:
D! (N - D)!
x! (D - x)! (n - x)![(N - D) - (n - x)]!
P(x) = (3-13)
N!
n! (N - n)!
where:
N = number of units in the population being sampled

D = number of units with the characteristic of interest (event happens) in the population
being sampled
n = number of units in the sample (sample size)
x = number of events in the sample (units with the characteristic of interest)
The numerator of Equation 3-13 is the number of ways we can select "x" units from the "D"
units in the population that have the characteristic of interest times the number of ways we can
select (n - x) of the (N - D) units without the characteristic of interest. The denominator is the
number of ways we can select a sample of "n" units from a population of "N" units.
For our office example, N = six people, D = two women, and n = two people selected for
the survey. From Equation 3-13 the probability of selecting no women for the survey would be:
2! (6 - 2)! 2! 4! 4! 4 * 3 * 2 *1
0! (2 - 0)! (2 - 0)![(6 - 2) - (2 - 0)]! 2! 2![4 - 2]!
P(0) = = = 2! 2! = 2 * 1 * 2 *1
6! 6! 6! 6 * 5 * 4 * 3 * 2 *1
2! (6 - 2)! 2! (4)! 2! 4! 2 *1 * 4 * 3 * 2 *1
24
6
= 4 = = 0.4 (3-14)
720 15
48
Four times out of ten we will conduct the survey with no women in the sample.
3.2 Continuous Distributions
Suppose we wanted to describe the height of male adults in a community. We could

measure the heights of a random sample of adult males (it would not do to measure only
basketball players, for example). After enough statistics had been collected, we would have to
organize the data in a meaningful format, and might elect to use a histogram as shown in Figure
3-1.
Figure 3-1: Distribution of Heights
If our sample size increases, and our measurements become more precise, our histogram
will become more like Figure 3-2.
Figure 3-2: More Detailed Distribution of Heights
As our sample size approaches infinity and our measurement intervals become infinitely
small, the histogram approaches a smooth curve, as shown in Figure 3-3.
Figure 3-3: Continuous Distribution Curve for Height
While many continuous distributions exist, the distribution of heights, and many other
parameters, usually follow a symmetrical bell-shaped curve called the Normal or Gaussian
distribution (to be discussed in Section 3.2.1). Other distributions of interest in reliability
engineering, such as distributions of times to failure, are usually not symmetrical. In either case,
we shall be dealing with probability distributions, meaning that the distribution is normalized so
that the total area under the curve is equal to one. The utility of this is that we can then calculate
the probability of a measurement being in any given range of values. For symmetrical
continuous distributions such as the normal, the mean and standard deviation (Equation 2-1) of
the distribution provide all the information we need. More generally, for example in considering
distributions of times to failure, we shall be concerned with three parameters:
• The probability density function, designated by f(x), is the height of the curve at the
value "x". This in itself is of little value, except as the foundation for the other two
parameters.
• The cumulative density function, designated by F(x), is the integral of f(x). This
provides the area under the curve from minus infinity to x. For example, let f(x) = f(t),
the distribution of times to failures. Then F(t) = proportion of products which will fail
before time = t (see Section 3.2.2.2 for an example).
• The reliability function, designated by R(t), is the proportion of products which have
not failed in time = t, when f(t) is a distribution of failure times. It is simply 1 - F(t).
This is equivalent to the probability that a randomly selected product will operate
without failure for time = t.
3.2.1 The Normal Distribution
While many continuous distributions are of interest for specific applications, the normal
distribution is undoubtedly the most useful overall. Many parameters of interest, such as the
physical dimensions of a product, can be described by it. In addition, the binomial distribution
can be approximated by a normal distribution when the number of binomial trials (n) is high (30
or more). Since the Poisson becomes approximately equal to the binomial when the number of
trials (n) is high and the probability of an event (p) is low, it can also be approximated by the
normal distribution under these conditions. A particularly important characteristic of the normal
distribution is that it applies to data from samples, even when the population sampled is not
normally distributed (see Section 3.2.1.2).
A normal distribution is symmetrical; the mean, median and mode (see Section 2.1) are
identical, and the distribution theoretically extends from minus infinity to plus infinity. The
latter obviously could not apply to a distribution of heights, but the probability of a value being
far in the tails of the curve is so small that it may be neglected, making the fit adequate.
The normal distribution is completely defined by its mean and standard deviation. The
percent of the population of interest under any portion of the curve can be determined from the
mean and standard deviation. For example, the percent of adult males who are over six feet tall.
Or, of more interest to reliability engineers, the percent of a product which falls outside specified
limits.
Since the mean and standard distribution of a population are not usually directly
measurable, they are estimated from sample data. The mean of the sampled data is the estimated
distribution mean. The standard deviation of the distribution can be estimated from the standard
deviation of the sample (Equation 2-1). However, when the number of samples is low, this gives
a biased estimate. In such cases, the population standard distribution is estimated from the
unbiased expression:
2
∑ (x - x)
S = (3-15)
n -1
Since both the mean and the standard distribution can take any value, there can be an
infinite number of normal distributions. However, analysis is facilitated by converting data of
interest to a standard normal curve, for which the relation of the area under the curve to the
distance from the mean (in standard deviations) is available in tables, such as the one presented
in Appendix C.
3.2.1.1 The Standard Normal Distribution
The standard normal distribution is one in which the mean is zero and the standard
deviation is 1.0. Data is converted from the actual distribution to the standard normal by the
formula:
x-
z = (3-16)
where:
= the mean of the distribution being converted

= the standard deviation of the distribution being converted
x = a data point from the distribution being converted
z = the corresponding data point on the standard normal distribution
An abbreviated tabulation of data (extracted from Appendix C) for a standard normal curve
is given in Table 3-2. Figure 3-4 shows the area between zero and z quantified in column two of
Table 3-2.
Table 3-2: Standard Normal Distribution Data

Area Between 0 and z
z Area Between -z and z
(or Between 0 and -z)
0.5 0.1915 0.3830
1.0 0.3413 0.6826
1.5 0.4332 0.8664
2.0 0.4772 0.9544
3.0 0.4987 0.9974
∞ 0.5000 1.000
0 z
Figure 3-4: Standard Normal Distribution
Suppose we estimate the mean length of a rod at one inch with a standard deviation of
0.001 inch, following a normal distribution, and want to find the proportion of rods that are more
than 1.002 inches. From Equation 3-16:
1.002 - 1.0
z = = 2 (3-17)
0.001
Where z is the point on the standard normal distribution equivalent to 1.002 inches.
The solution to our problem is simply the area under a standard normal distribution from
z = 2 to ∞ . Table 3-2 does not directly list this area. However, we can calculate it by
subtracting the areas we don’t want from 1.0, the total area under the curve. The areas we don’t
want are the area between - ∞ and zero (0.5000 from Table 3-2) and the area from zero to 2.0
(0.4772).
Area wanted = 1.0 - (0.5000 + 0.4772) = 1 - 0.9772 = 0.0228 (3-18)
Thus, 0.0228 or 2.28% of the rods we produce will be longer than 1.002 inches.
If we specified that the rods made should be one inch long plus or minus 0.002 inches, the
proportion of rods "in spec" would be given by the area under the standard normal from z = -2 to
+2. From the table, 0.9544 or 95.44% of the rods will meet the specified tolerance.
Table 3-3 is another set of values for the standard normal distribution, but one that solves
for z given areas of interest, rather than one which solves for areas given values of z, like Table
3-2 or Appendix C. This table defines "critical values" of (z) marking the ends of some specific
areas of the standard normal distribution that are often used in determining confidence intervals
in measuring and demonstrating values of parameters that conform to a normal distribution. We
shall use this table in Section 7.
Table 3-3: Critical Values of z

Critical Value of z Area Between z and -z Area From z to ∞
1.28 0.80 0.10
1.645 0.90 0.05
1.96 0.95 0.025
2.33 0.98 0.01
2.58 0.99 0.005
3.2.1.2 The Normal Distribution’s Role in Sampling
One of the reasons for the usefulness of the normal distribution is that means of samples
from any distribution with a finite mean and variance, if the sample is large enough, fit well to a
normal distribution whose mean is the mean of the samples and is equal to the mean of the parent
distribution. For example, the times to failure of a rod under stress may follow a Weibull
distribution (to be discussed later). The mean time to failure of the rods may be estimated from a
sample. If many samples are taken, the means of the samples will follow a normal distribution
with its mean the same value as the mean time to failure of the parent population of rods. In
addition, the standard deviation of the distribution of sample means is equal to the standard
deviation of the parent distribution divided by the square root of the sample size ( σ/ n ). This
phenomena, called the central limit theorem, means the normal distribution can be applied to
samples from almost any distribution to provide information on the parent distribution. We shall
see this in action later.
3.2.2 Various Other Useful Distributions, In Brief
3.2.2.1 The Lognormal
The lognormal distribution is one in which the logarithms of measurements from a

population are distributed normally. The distribution of the measurements themselves will be
skewed to the right. The lognormal is descriptive of such things as the time it takes to get to
work or the time to repair a failure. In these cases, there are often circumstances which lengthen
the time (accidents on the road, troubleshooting difficulties, etc.), but few opportunities to
shorten the time significantly.
The lognormal is handled by taking the logarithms of the measurements of interest, and
analyzing the resultant normal distribution as discussed in Section 3.2.1. Since there is a one-to-
one transformation of the points on the lognormal to the points on the normal, we can translate
the results back to the original data. For example, if a measured value from a lognormal
distribution of repair times "y" = 20 minutes, then the corresponding value on a normal
distribution "x" = ln 20 = 2.9957. If analysis (which probably involves further transformation of
the data to the standard normal and back, as discussed in Section 3.2.1.1) finds that 90% of the
area of the normal is below x = 2.9957, then 90% of the area on the lognormal (i.e., 90% of
x 2.9957
repair times) will be below y = e = e = 20 minutes.
A practical application is found in MIL-HDBK-470A, Designing and Developing

Maintainable Products and Systems, which includes maintainability demonstration tests based on
lognormal distributions of repair time.
3.2.2.2 The Exponential
This distribution describes time to failures of a repairable system, when the failure rate is
reasonably constant with time. Its probability density function is:
f(t) = λe -λt (3-19)
resulting in a cumulative density function (percent of units failed at time = t, or probability of a

unit failing by time = t) of:
F(t) = 1 - e -λt (3-20)
-λt
Since the probability of a unit not failing at time t = R(t) = 1 - F(t), R(t) = e , which is the
widely used reliability expression encountered earlier in Equation 3-9.
3.2.2.3 The Weibull
When a constant failure rate cannot be assumed, the Weibull is often the distribution of
choice, because it can accommodate increasing, decreasing and constant failure rates. Weibull
analysis assumes no repair of failed units. We will become more familiar with this distribution
in Section 4.
3.2.2.4 The Student t
Using the central limit theorem (see Section 3.2.1.2), one must assume a large sample size
and that , the standard deviation of the parent population, is known. When the sample size is
small, and is estimated from the sample using the unbiased estimator of Equation 3-15, the
normal distribution does not really apply. Instead of the standard normal random variable:
x-
z = (3-21)
/ n
where:
x = a sample measurement
= mean of the sample measurements (and of the parent population)
= standard deviation of the parent population
n = sample size
/ n = standard deviation of the sample measurements (the distribution being
converted to the standard normal)
we have:
x-
t = (3-22)
S/ n
where:
S = the estimated standard deviation of the parent population (per Equation 3-15)
S/ n = the standard deviation of the sample measurements
other terms as above
There is a family of "t" distributions, one for each value of "n". As "n" increases, values of
"t" become close to the values of "z". Appendix E provides a tabulation of Student t distribution
data.
To compare the two distributions, we will compute the values of "z" and "t" which cover
95% of the area under the curves starting from minus infinity (i.e., only 5% of the area is
excluded).
From Appendix C, we note that the area from the mean (z = 0) to "z" is 0.45 when "z" is
between 1.6 and 1.7. We will interpolate this to z = 1.65. Hence, as discussed in the text above
the table in Appendix C, the area in the tail of the curve from 1.65 to
∞ = 1 - 0.5 - 0.45 = 0.05. Hence, when z = 1.65, 95% of the area under the curve is between
- ∞ and z. Since "z" is measured in standard deviations, 95% of the curve is below +1.65
standard deviations from the mean.
The tabulated Student t in Appendix E gives the values of "t" for defined areas from - ∞ to
"t", directly. However, the values differ with "degrees of freedom", roughly representing the
amount of information available in a sample and equal to the sample size minus one. For a
sample of ten (nine degrees of freedom), the value of "t" marking the edge of 95% of the area
under the curve is 1.833. Since "t" is also measured in standard deviations, 90% of the curve is
below +1.833 standard deviations from the mean. This shows that the Student t distribution is
wider than the standard normal for relatively small samples. However, as sample size increases,
the value of "t" approaches the value of "z" for the same area under the curve, as the reader can
verify by comparing values derived from Appendix C (the Standard Normal distribution) to those
given in Appendix E (the Student t distribution). For our example, the value of "t" for 95% goes
to a limiting value of 1.645 as sample size increases, which agrees with our interpolated value of
z = 1.65.
3.2.2.5 The F Distribution
The F distribution describes the ratio of the variances of two independent samples. It is a
family of distributions, dependent on the sample sizes, and is used to test whether or not the
samples are from the same population. When the samples are from the same population, the
value of the F statistic will be distributed about 1.0. By measuring the value of F derived from
two samples and determining the probability that the value measured would occur if the samples
were from the same population, we can accept or reject the hypothesis that the samples are from
the same population within a specified risk of error. Appendix F is a tabulation of "critical
values" of the F distribution. These are the values of F which cannot be exceeded if we are to
conclude there is no difference in the variance of our samples, for stated risks of error. We will
discuss this in Section 8.
3.2.2.6 The Chi-Square Distribution
The Chi-square distribution describes the relation of the true mean of a population to the
mean of a sample. It is also a family of distributions, dependent on sample size. It may be used
to determine the confidence limits around a measured failure rate or to determine if a failure rate
is stationary (i.e., does not change with time). We will use the Chi-square distribution in Section
4.
3.3 In Summary
Table 3-4 presents a brief summary of the distributions discussed in this section, by type
and use in reliability engineering.
Table 3-4: Summary of Distributions

Distribution Type Main Uses in Reliability Engineering
Binomial Discrete Finding probability of "x" events in "n" trials
Poisson Discrete Finding probability of "x" events when "a" are expected
Hypergeometric Discrete Replaces binomial for samples from small populations, when samples are
not replaced
Normal Continuous Describes many parameters, including mean values of samples from any
distribution with a finite mean and variance
Standard Normal Continuous All normal distributions can be converted to the standard normal for ease
of analysis
Lognormal Continuous Describes some parameters of interest, such as repair times
Exponential Continuous Describes distribution of failures when failure rate is constant
Weibull Continuous Describes distribution of failures for constant or changing failure rate, no
repair
Student t Continuous Replaces standard normal for small samples
F distribution Continuous Testing for significance of differences in the variances of two samples
Chi-square Continuous Estimating confidence intervals, and testing for a constant failure rate
STAT Section 4: Measuring Reliability 27
4.0 MEASURING RELIABILITY
To select the statistical tools useful in measuring reliability, we have to be more specific
about what we are measuring. Are we concerned with the reliability of products which are
discarded on failure or of products repaired on failure? Do we know if the reliability is constant,
decreasing or increasing? Is reliability better expressed as a probability (e.g., of a successful
missile launch), as an expected life (mean time to failure), or as a frequency of failures (failure
rate or its reciprocal, mean time between failures)? The answers to these questions determine the
statistical tools of interest. We shall first consider some general principles.
4.1 General Principles
We can crudely measure reliability from test data by dividing the number of failures seen
by the total hours of operation of our test sample (for a failure rate) or the number of products
tested (for a probability of failure). However, these are rarely sufficient. Most often, we need
more information, and can get it by finding (or assuming) a failure distribution and determining
its parameters.
f(x)
F(x) = ∫0 f(x )dx
x Probability
Density
Function
0 x Time To
Failure
Figure 4-1: Probability of Failure as Represented by the Area Under the Probability Density
Function
Figure 4-1 represents a probability density function, showing the relative probability of a
random variable occurring, in this case a failure, plotted against time. The area under the curve is
unity. The curve may be defined by the parameter f(t) which describes its height against time.
From f(t) we can obtain three useful measurements:
t2
1. The integral ∫ f(t) = the percent of all failures occurring between t1 and t2.
t1
When t1 = 0, the integral is the cumulative density function, F(t), defined as the percent
of the population failed in the interval ending at t = t2. It is also the probability that any
given unit will fail in a mission of length t2. Note that the integral of f(t) from minus
infinity to plus infinity (or from zero to plus infinity) equals unity.
2. The reliability function, R(t), the probability of a unit not failing in a given period of
time (t), is simply 1 - F(t).
3. The failure rate in an interval from t1 to t2 is given by:
R(t 2 ) - R(t1 )
R(t1 )(t 2 - t1 )
where the numerator represents the proportion of a population failed in the interval, and the
denominator represents the proportion surviving at the start of the interval and the length of the
interval. Using the relationships between F(t), f(t) and R(t), it can be shown that as the interval
becomes infinitely small it becomes:
f(t) f(t)
or
R(t) 1 - F(t)
This defines an instantaneous failure rate at the instant "t". This is also occasionally called
the hazard rate, force of mortality, or ROCOF (Ascher’s acronym for the rate of occurrence of
failures for a repairable system).
With these concepts understood, we are ready to discuss measuring reliability. We shall
first consider a useful tool when repair is not a consideration: the Weibull distribution.
4.2 The Versatile Weibull Distribution
When we are interested in the reliability of a part, or a non-repairable assembly, or the first
failure of a repairable assembly (e.g., an automobile drive chain), we can often make use of a
versatile statistical distribution invented in 1937 by Waloddi Weibull, which can describe
constant, increasing or decreasing failure rates. It is often used to describe the reliability of
mechanical items that are subject to a wearout failure mechanism. The Weibull Probability
Density Function is described by the formula:
-1
t
f(t) =   e - (t/ ) (4-1)
 
where:
= characteristic life (> 0)

= shape parameter (> 0)
t = time
The reliability function (surviving portion of the population at time = t) would be:
R = e - (t/ ) (4-2)
Instead of time, "t" could represent cycles, miles, or any other parameter appropriate to the
failure mechanism of interest. Although it has been used to model mixed failure modes, such as
infant mortality due to defects, the Weibull is really a model for a single failure mechanism.
Equations 4-1 and 4-2 apply to the two-parameter Weibull. There is also a three-parameter
Weibull, obtained by replacing "t" in Equations 4-1 and 4-2 with "t - t0". This is used in the case
where the failure mechanism cannot result in a failure until a certain time "t0" is reached. An
example might be cracks in a crystal which do not grow large enough to cause a failure before t0
is reached. The two-parameter form of the Weibull is more common than the three-parameter
form.
When = 1.0, the Weibull formula reduces to the exponential formula, showing a constant
failure rate whose reciprocal, the MTBF, equals . A value of > 1 indicates an increasing
failure rate (i.e., wearout), and < 1 shows a decreasing failure rate (i.e., infant mortality). It is
possible for the failure distribution of a product to be described by three different Weibull
functions at different times in its life: first by a Weibull function with < 1 reflecting improved
failure rates as initial quality defects are eliminated; then by a Weibull function with = 1
reflecting a relatively constant failure rate during the product’s useful life; and finally by a
Weibull with > 1 as wearout mechanisms act to increase failure rate with time.
Test data following a Weibull distribution will plot as a straight line when the cumulative
percent failed is plotted against time to failure on special graph paper. On this paper the X-axis
is scaled as the natural logarithm of time. The Y-axis is scaled as:
Y = ln ln (1/1 - F(t)) (4-3)
where:
F(t) = the estimated percent of the population failed
Weibull analysis paper is designed to permit the values of and to be found by

graphical methods. Essentially, the value of is that point on the time axis marking the point
where the plot crosses the 63rd percentile on the Y-axis (actually, 63.2% cumulative failures).
The slope of the plot, , can be calculated by:
∆Y ln ln (1/1 - F2 ) - ln ln (1/1 - F1 )
= = (4-4)
∆X ln t 2 - ln t1
where:
F2 = the percentage failed at time = t2

F1 = the percent failed at t1
To determine the parameters of a Weibull distribution, therefore, one needs to plot

cumulative percent failures against time on Weibull analysis paper and either use the graphical
solutions or the calculations described above. One could even forgo the graph paper by knowing
the times corresponding to two percentiles. From this, one could calculate . With and the
known time for any cumulative percent failed, the Weibull reliability function (Equation 4-2)
could be used to solve for . The use of the graph is easier, and provides some verification that
the data is indeed Weibull. If the plot is not a straight line, something is amiss. However, there
are some subtleties involved in determining the cumulative percent failed.
The most obvious way to estimate the cumulative percent failed is to divide the number of
failures by the sample size. If we had ten samples and three failed, the time of the third failure
would represent 30% cumulative failures. This, however, is considered a biased estimator.
(Consider a sample of one: its time of failure would be counted as the 100th percentile, but,
intuitively, one would expect the failure of one sample to better represent the 50th percentile.)
Accordingly, there are various schemes to determine the cumulative percent failure represented
by each failure in a sample. One is to determine the "median rank" by Bernard’s formula:
I - 0.3
Median rank (cumulative percent failed) = x 100 (4-5)
N + 0.4
where:
I = rank order of a given failure (I = 1 for the shortest time to failure, etc.)
N = sample size
Also, it should be noted that the rank order is determined by the operating times to failure
on the individual units, not the sum of times among the samples, or any measure of calendar
time. The unit that has accrued the lowest operating time when it fails is the first failure in the
rank order, regardless of the operating time on the other units or when it went on test.
To illustrate these points, we will use an example from RAC Publication NPS, Mechanical
Applications in Reliability Engineering.
Let us assume we have tested six items to failure and have measured the life to failure in
terms of operating cycles, with the following results shown in Table 4-1.
Table 4-1: Life Data

5
Part Number Life (10 Cycles)
1 6.6
2 1.3
3 4.0
4 2.7
5 5.2
6 9.8
The first step would be to rank order the data as shown in Table 4-2.
Table 4-2: Ordered Data

5
Rank Part Number Life (10 Cycles)
1 2 1.3
2 4 2.7
3 3 4.0
4 5 5.2
5 1 6.6
6 6 9.8
Next, we need to determine the median rank of each failure by Bernard’s formula, to
estimate the cumulative percent failure it represents. For part number 2, the first in rank:
1 - 0.3 0.7
Median rank = x 100 = x 100 = 10.91% (4-6)
6 + 0.4 6.4
After computing the median rank for all six parts, we have the data shown in Table 4-3.
Table 4-3: Completed Data Table

5
Rank Part Number Median Rank (%) Life (10 Cycles)
1 2 10.91 1.3
2 4 26.55 2.7
3 3 42.18 4.0
4 5 57.82 5.2
5 1 73.45 6.6
6 6 89.09 9.8
We now have all the information we need to plot the data on Weibull analysis paper as
shown in Figure 4-2.
On this Weibull paper, a line parallel to the plot is drawn from the circle to the arc at the
left edge at the 60% failure line. The slope of the curve is read from the scale on the arc at the
point of intersection. This is equal to in the Weibull reliability function. In this case, is
about 1.5. The characteristic life, , is found by moving horizontally from the circle (across the
63.2 percentile) to the plot, and then down to the corresponding number of cycles, here about
5
5.8 x 10 .
A Weibull reliability function can be used for many purposes, such as to determine an
effective burn-in time, calculate the probability of success for a mission, determine the expected
number of spare parts needed for a set of products, help establish appropriate warranty terms, etc.
Weibull analysis paper is available commercially. Two sources are: Team Graph Papers,
Box 25, Tamworth, NH 03886 (Tel: 603 323-8843), and Chartwell Technical Papers, H.W. Peel
& Co., Jeymer Drive, Greenford, Middlesex, England (Tel: 01-578-6861).
There are also PC software packages for Weibull analysis. These include SuperSMITH
(TM) by Fulton Findings (on the world wide web at http://www.weibullnews.com) and
Weibull++ (TM) by Reliasoft (on the web at http://www.Weibull.com).
6 .0
4.0
3.0
0
2.
99.%
1.4
2
1.
0
1.
95.% WEIBULL
.8
SLOPE .7
90.%
.6
.5
80.%
70.%
60.%
50.%
40.%
PERCENT FAILURE
30.%
20.%
10.%
5.0%
4.0.%
3.0%
2.0%
1.0%
1 2 3 4 5 6 7 8 91 2 3 4 5 6 7 8 91
x 104 x 105
Figure 4-2: Weibull Plot
4.2.1 Caveats
Weibull analysis is not always as easy as the example shown. The data may not plot as a
straight line on Weibull paper, for many reasons. First of all, the data may not be distributed
according to the Weibull distribution. Some data are better described by the lognormal, for
example, a distribution which is not convertible into a Weibull (it is handled by converting it to a
normal, as discussed in Section 3.2.2.1). There may be suspensions, which are parts on test
which are removed from test before failing or which fail from mechanisms other than the primary
one. There may be more than one significant failure mode. Test data may not include all time on
the parts (i.e., time zero is some unknown time before the test starts). Ways to handle these
situations are discussed in the New Weibull Handbook, by Dr. Robert B. Abernethy. This
authoritative reference is available from the Reliability Analysis Center under RAC order code
WHDK.
4.3 Measuring Reliability of Repairable Systems
Our failure data may not represent the times to failure of a number of parts, but rather the
times between failures of a system which is repaired after each failure. For such a system, it is
often assumed that the failures occur at a reasonably constant rate, which has many advantages in
analysis. This assumption is reasonable in that a system is a collection of parts with differing
failure mechanisms, which, in aggregate, can appear to fail at a constant rate. This is referred to
as a stationary stochastic process ("stochastic" meaning involving a random variable, and
"stationary" meaning that the expected number of failures in a given interval will be the same
regardless of the age of the system). It is also called a homogeneous Poisson process (HPP)
because the number of failures in an interval follows a Poisson distribution that is
"homogeneous" or stationary (i.e., unchanging with time). However, there are also non-
homogeneous Poisson processes (NHPPs) which describe systems where the number of failures
in an interval may follow a Poisson distribution, but the distribution will change with time,
because the expected number of failures in a given interval depends on the system age at the start
of the interval. There are practical cases where reliability improves with time and also practical
cases where reliability degrades with time. We will discuss this further later, but it seems
reasonable to start this section with some statistical tools for determining what kind of process
we have.
4.3.1 Testing for Trends
To illustrate the tool we will be discussing, we shall use some contrived data, borrowed
from the text Repairable Systems Reliability, by Harold Ascher and Harry Feingold (Marcel
Dekker Inc., 1984). The data compares three systems, each having seven failures as shown in
Table 4-4.
Table 4-4: Failure Data

Time of Failure in Time of Failure in Time of Failure in
Failure Number
System A System B System C
1 15 177 51
2 42 242 94
3 74 293 121
4 117 336 298
5 168 368 313
6 233 395 378
7 410 410 410
The data represents seven failures, which in system A arrive at intervals increasing with
time, in system B arrive in intervals decreasing with time, and in system C arrive (pseudo)
randomly. In each case the seven intervals between failures are identical except for the order of
occurrence. In system A, for example, the first failure occurs 15 hours after turn-on, and the last
occurs 177 hours after the second to last, while in system B, the first failure occurs 177 hours
after turn-on, and the last occurs 15 hours after the penultimate.
It is obvious that system A has an increasing time between failures, or a decreasing rate of
occurrence of failure (ROCOF), and that system B has an increasing ROCOF. But real data is
not so obligingly obvious. Hence, we find useful a statistical measure invented by Laplace.
When our data is failure truncated (i.e., the records end at the occurrence of the last failure), as it
is in the data given in Table 4-4, the statistic is calculated by the formula:
n -1 
 ∑ (t i ) /(n - 1) - t n /2
U =  
i =1
(4-7)
t n 1/[12(n - 1)]
th
where "n" is the number of failures, "ti" is the time of the "i " failure in order of occurrence, and
"tn" is the last failure. The result, "U", will be zero for perfectly random data, positive for
increasing intervals between failures (ROCOF decreasing with time) and negative for decreasing
intervals (increasing ROCOF).
For time truncated data (i.e., the records end at a given time while the system is in a non-
failed state):
n 
 ∑ (t i ) /(n - 1) - T/2
U =  
i =1
(4-8)
T 1/[12(n)]
where:
T = the time at the end of the data
The statistic approximates the distance (in standard deviations) that the data differs from
the mean of a standard normal distribution. A system with no changes in ROCOF would have an
expected value of zero, the mean of the standard normal distribution. Statistical variation results
in it taking other values, with probability decreasing proportionally with the distance from zero.
Applying the principles of hypothesis testing, discussed in Section 2.4, we can reject the
hypothesis that there is no trend when the statistic calculates a value whose probability is
satisfactorily small.
"Satisfactorily small" can be defined quantitatively as an acceptable risk, i.e., a probability

of making a wrong decision that is small enough to satisfy the decision maker. This probability
can be directly translated into the distance from the mean of the standard normal distribution. As
discussed in Section 2, we simply determine the area of the distribution which falls outside a
given distance from the mean. This area is the risk of error (in this case the probability that a
system with no trend would produce a data set falling in the tail of the curve cut off by the value
calculated for the distance from the mean). We can define a "critical value" for the Laplace
statistic which represents the value the statistic must exceed for a risk of error satisfactory to us.
Some common critical values are shown in Table 4-5. Table 4-5 relates to Appendix C in that
"U" corresponds to "Z" and the probability of error corresponds to the tails of the standard
normal distribution (the area under the curve outside of the range -Z to +Z).
Table 4-5: Critical Values for the Laplace Statistic

Critical Value of U (Absolute Value) Probability of Error (%)
3.09 0.2
2.576 1.0
2.326 2.0
1.960 5.0
1.645 10.0
1.282 20.0
For example, there is only 0.2% of the standard normal distribution outside the limits of
±3.09, so if the absolute value of "U" exceeds 3.09, we can assume there is a trend with only a
0.002 probability of error. Further, if the value exceeds +3.09, we can assume a trend for
increasing ROCOF (reliability decreasing) with only 0.1% error, and a value less than -3.09
permits assuming an improving trend with only a 0.1% risk, because only one tail would be
outside the limit. The refuted hypothesis would be that there is no trend in the direction of our
conclusion.
Applying Equation 4-7 to the data in Table 4-4 for system A:
15 + 42 + 74 + 117 + 168 + 233 410

-
U = 6 2 = - 2.0 (4-9)
410 1 /(12 x 6)
The negative result indicates a decreasing ROCOF (improving reliability). The value of U
exceeds the critical value given in Table 4-5 for 5% risk (meaning that there is less than a 5%
probability that there is really no trend), and indicates we can accept that reliability is improving
with only a 2.5% risk.
Doing the analysis for the data of system B:
177 + 242 + 293 + 336 + 368 + 395 410

-
U = 6 2 = + 2.0 (4-10)
410 1 /(12 x 6)
Which is the same result as Equation 4-9, except for sign, which indicates a deteriorating
reliability to a 97.5% confidence (1 - 0.025 risk).
Repeating the analysis using the data for system C:
51 + 94 + 121 + 298 + 313 + 378 410

-
U = 6 2 = + 0.086 (4-11)
410 1 /(12 x 6)
Which does not permit the rejection of the hypothesis that there is no trend even at a 20%
risk of error (U = 1.282 for a 20% risk, as shown in Table 4-5). We can come up with an
estimate of the risk of error by using Appendix C. In the Appendix C, for z = 0.1, the lowest
figure listed, the area from 0 to z = 0.0398. Using a linear extrapolation, for z = 0.086, the area
from 0 to z would be (0.086/0.1) x 0.0398 = 0.034. Since the area in the upper tail is one minus
0.05 minus the area from 0 to z, as explained in the text of Appendix C, it is equal to
1 - 0.5 - 0.034 = 0.466. The area in both tails would be twice this or 2 x 0.466 = 0.932. Hence,
to reject the hypothesis that there is no trend, we would have to be willing to accept a 0.932
probability of error. A constant ROCOF is indicated.
4.3.2 Confidence Limits when the Failure Rate is Constant
If we assume a constant failure rate, the mean time between failures (MTBF) of a product
can be simply determined from the total operating time of all the samples (i.e., the sum of the
operating times accrued by each unit tested) divided by the total number of failures. For
example, 100 hours of operation during which one failure occurred would give us an MTBF of
100 hours. So would 1,000 hours of operating time with 10 failures. Which data set would you
prefer as a measure of your product?
It should be obvious that the risk in making conclusions based on one failure is greater than
the risk in making conclusions from many failures. Put another way, we have more confidence
in conclusions reached from extensive data than in conclusions reached from skimpy data.
Statistical methods quantify confidence by defining it as the probability of being correct when we
state the MTBF of a product is within a given range of values, called the confidence interval.
The values marking the limits of the interval are called confidence limits. The confidence
interval can be two-sided (e.g., between confidence limits of 100 and 200 hours), or one-sided
(e.g., extending from a confidence limit of 100 hours to infinity). As might be expected, the
wider the confidence interval, the greater the confidence provided by a given set of data. For
example, we have a greater probability of being correct (i.e., confidence) that a product’s true
MTBF has a one-sided confidence limit of 100 hours than we would have in a confidence limit
of 200 hours, for any given set of data. Risk is defined as the probability that the true MTBF of
the product is outside the confidence interval, and is equal to one minus the confidence. If we
have a 10% probability of error, we have a 10% risk and a 90% confidence.
To determine the width of the confidence interval for any given value of confidence, we
need to relate the confidence interval to a portion of a probability distribution curve. Because we
have assumed the constant failure rate, the times to failure are distributed in accordance with an
exponential distribution, and the probability of no failure in any given time is calculated by the
familiar reliability function:
R = e -(t/MTBF) (4-12)
The reliability function is the portion of the failure probability distribution function
extending from t to infinity, and represents the portion of a population still operating after time t
or the probability that a given unit will not fail between time zero and t. However, the
distribution we need for determining confidence intervals is not the distribution of failures itself,
but a dependent distribution describing the relation between the measured and the true MTBF of
the product.
When the exponential distribution of failures holds, the relation between the measured and
the true MTBF of the product is described by a distribution called the chi-square. In actuality,
the chi-square is a family of distributions, each member determined by a function of the number
of failures recorded, called the degrees of freedom (roughly, degrees of freedom represents the
amount of information at hand, which is a function of the number of failures, the exact function
dependent on circumstances, as will be explained). Table 4-6 is an abbreviated chi-square table;
a more extensive table is provided in Appendix D.
Table 4-6: Chi-Square Values

Degrees of Chi-Square Value at α
Freedom 95% 90% 10% 5%
2 0.103 0.211 4.605 5.991
3 0.352 0.584 6.251 7.815
4 0.711 1.064 7.779 9.448
5 1.145 1.610 9.236 11.070
10 3.940 4.865 15.987 18.307
20 10.851 12.443 28.412 31.410
22 12.338 14.041 30.813 33.924
30 18.493 20.599 40.256 43.773
In Table 4-6 and Appendix D, α = the selected risk (one minus the specified confidence),
or the area in the tail of the curve from the listed values to infinity. The reader may have noted
that we have used different ways of presenting statistical tables, when a more consistent format
could have been used. This lack of consistency is intended to help the reader understand and deal
with statistical tables in many different forms, since there is no standard format used by all
references.
To use Table 4-6, we select the chi-square values for the appropriate degrees of freedom
based on the number of failures accrued and whether the test was time-truncated or failure-
truncated (to be explained later). From the selected values, the total test time and the specified
risk, we can calculate the confidence, whether for a one-sided limit or for a two-sided interval.
A time-truncated test is one which ends with a certain amount of time accrued, either by
reaching a pre-designated limit or by simply ceasing to test. The most common scenario is that
of a group of equipment on test for a period determined by the willingness of management to tie
up the test units.
A failure-truncated test is one which ends at a given number of failures, the most common
manifestation being the test of a number of units until all fail. Another way is the test of one
unit, repaired after each failure, until a certain number of failures occur.
The one-sided confidence limit for a set of test data from a time-truncated test is calculated
from the formula:
2T
MTBF = (4-13)
(Chi - square value for 2n + 2 degrees of freedom at percentile α)
where:
T = total test time

n = total failures accrued
α = acceptable risk (1 - desired confidence)
For example, if we test for 100 hours and have one failure, we have 2(1) + 2 = 4 degrees of
freedom. For a 90% confidence, the risk (α) would be (1 - 0.90) or 0.10. The value listed in
Table 4-6 for ten percent risk and four degrees of freedom is 7.779. Plugging this value into the
formula we have:
2(100) 200
MTBF = = = 25 hours (4-14)
7.779 7.779
Hence, we would be 90% confident that the "true" MTBF is actually 25 hours or more.
If, instead, we had 1,000 hours of test time and 10 failures, we would have 2(10) + 2 = 22
degrees of freedom, and our 90% confidence limit is found by:
2(1,000) 2,000
MTBF = = = 65 hours (4-15)
30.813 30.813
Hence, we would be 90% confident that the "true" MTBF is 65 hours or more.
If our test data were failure-truncated, rather than time-truncated, we would use the
following formula to determine the one-sided confidence limit:
2T
MTBF = (4-16)
(Chi - square value for 2n degrees of freedom at percentile α)
where:
T = total test time

N = total failures accrued
α = acceptable risk (1 - desired confidence)
This formula is identical to the previous one, except that the degrees of freedom are
reduced, resulting in a slightly higher MTBF. The degrees of freedom is less because we have
less information; there is no operating time after the last failure as there was in the time-truncated
case. For the one failure in 100 hours scenario:
2(100) 200
MTBF = = = 46 hours at 90% confidence , (4-17)
4.605 4.605
as opposed to 25 hours for the time-truncated test.
These formulas essentially cut off a portion of a probability density function equal to the
desired risk. In our examples, we have found the value of MTBF below which there is only 10%
probability that the true MTBF would fall. To calculate a two-sided confidence interval, we
would cut off portions of a density function at both ends. The lower end is determined by the
same formulas as the one-sided limits, except that the chi-square value for α/2 is used. (If we
want a 90% confidence in a two-sided confidence interval, we want to lop off 5% of the
distribution at each end, rather than 10% of the distribution at one end.) The upper limit, for
either time-truncated or failure-truncated tests, is determined by the formula:
2T
MTBF = (4-18)
(Chi - square value for 2n degrees of freedom at percentile = 1 - α / 2)
For example, the upper 90% confidence limit would be determined by finding the chi-
square value for 2n failures at the 95th percentile (1 - 0.10/2).
Note that the degrees of freedom remain the same for both time-truncated and failure-
truncated tests. This is, roughly, because the additional information of time after the last failure
does not significantly affect the estimation of the upper confidence limit as it does for the lower
confidence limit.
Hence, for a time-truncated test of 100 hours with one failure, the two-sided confidence
limits are:
2(100)
Lower limit =
Chi - square value for 4 degrees of freedom at the 5 th percentile
200
= = 21 hours (4-19)
9.448
2(100)
Upper limit =
Chi - square value for 2 degrees of freedom at the 95 th percentile
200
= = 1,940 hours (4-20)
.103
Hence, the 90% confidence interval for the data is from 21 to 1,940 hours. We are 90%
confident (have a 10% risk of error) that the true MTBF is between these limits.
Table 4-7 summarizes the formulas discussed above.
Table 4-7: Confidence Interval Formulas

For time truncated tests For failure truncated tests
One-sided 2T 2T
Confidence interval
2 2
(MTBF lower limit) X (α, 2n + 2) X (α, 2n)
Two-sided 2T 2T 2T 2T
Confidence intervals , ,
2 2 2 2
X (α/2, 2n + 2) X (1 - α/2, 2n) X (α/2, 2n) X (1 - α/2, 2n)
MTBF limit: Lower Upper Lower Upper
These formulas apply only to the estimation of an MTBF based on a constant failure rate.
To calculate other parameters or under other assumptions, the analyst must use the appropriate
distribution, if he can identify or generate one. For example, MIL-HDBK-189, Reliability
Growth Management, provides tables for a distribution (origin not identified) used to determine
the confidence intervals of a failure rate that changes according to a nonhomogeneous Poisson
process. See Section 6 for more on reliability growth and MIL-HDBK-189.
4.4 Measuring Reliability of "One-Shot" Products
The reliability of the electronics modules in the space shuttle could be measured in terms of
mean-time-between-failures (MTBF) and this result used (in Equation 4-12, or even 4-2) to
determine the probability of these items operating satisfactorily for a particular mission.
However, the reliability of the shuttle’s booster rockets would not be appropriately measured in
MTBF or any other measure of "life". Even if recoverable and re-usable, the rockets are subject
to only one short demand per mission. This type of product is called a "one-shot" device. Either
the booster will work properly on demand, or it will fail with no possibility of repair during the
mission. Its proper figure of merit is the probability of success, and this must be found directly
by dividing the number of successful uses by the total number of attempts to use the product.
Naturally, the more data available, the better our confidence in the measured probability of
success, and a quantitative measure of describing confidence would be quite useful. Fortunately,
we can determine confidence limits on the reliability of one-shot devices by making use of the
fact that the successes and failures can be described by the binomial distribution. This is an
exercise in measuring quality from samples and is described in detail in Section 7.1.
STAT Section 5: Demonstrating Reliability 41
5.0 DEMONSTRATING RELIABILITY
Measuring reliability seeks to answer the question, "What is the true reliability of the
product?" In contrast, reliability demonstration seeks to answer, "How sure can I be that the
reliability is satisfactory?" Since these questions are different, the methods used to answer them
are also different. We have covered the measurement of reliability, and now will examine some
ways to demonstrate reliability. These are derived from a branch of statistics called hypothesis
testing, a set of methods designed to accept or reject hypotheses (e.g., the reliability meets a
specified number) within acceptable risks, as described in Section 2.4. Leaving the testing of
"one-shot" devices to Section 7.2, we will consider here the testing of life measures (i.e.,
characteristic life and mean time between failures). To begin, let us consider the simplest
reliability demonstration test, the zero failure test.
5.1 Zero Failure Tests
A zero failure test requires a given number of samples to be tested for a specified time. If
no failures occur the product is accepted as meeting reliability requirements. The determination
of sample size and test length is accomplished through consideration of the product’s reliability
function.
For example, the reliability function of a product following a Weibull distribution of

failures is:
R = e - (t/ ) (5-1)
where:
R = probability of failure free operation

t = operating time
= characteristic life ( > 0)
= shape parameter ( > 0)
When several products are tested together,
R = e - n(t/ ) (5-2)
where:
n = number of products tested

other terms as above
To demonstrate reliability, we would select a characteristic life which represents the

minimum acceptable value, (or perhaps better, the highest value that we would call "bad") and
determine the risk we are willing to take of accepting a product with that characteristic life. That
risk is the probability of the "bad" product having no failures during the test. Then, assuming we
have an estimate for and a known sample size:
Risk = e - n(t/ ) (5-3)
The equation is solved for "t", the only unknown.
Thus, if we test "n" samples for time "t" and have no failures, we can accept the product
with the predetermined risk that we may have accepted products with the defined "bad"
characteristic life. Products with higher characteristic lives will have a higher probability of
passing the test, and products with lower lives will have a lesser probability of passing. For
example, if we are willing to take a 10% risk of accepting products with a defined "bad" , know
that = 2, and can test 100 units:
2
0.10 = e -100(t/ ) (5-4)
Which gives:
ln(0.10) = - 100(t/ ) 2
- 2.3 = - 100(t/ ) 2
- 2.3/ - 100 = (t/ ) 2 = 0.023
(t/ ) = 0.023 = 0.15
t = 0.15( ) (5-5)
Thus the samples are tested for a time (t) equal to 0.15( ). If no failures occur, the product
is accepted.
The zero failure test procedure will work with any distribution for which a reliability
function can be defined and all parameters identified, though some distributions, such as the
Weibull, are easier to work with than others. The exponential distribution is the easiest of all.
Under the Weibull assumption, all samples are tested for the same operating time. When the
exponential distribution of failures is assumed (i.e., the failure rate is assumed constant), equal
test times are not necessary, as we will discuss in Section 5.3. However, we will first discuss the
derivation of tests in which some failures are allowed, in Section 5.2.
It should be noted that the only risk considered in our discussion so far has been the
"consumer’s risk" (i.e., the probability of a "bad" product passing the test). There was no
consideration for the probability of a "good" product failing the test (the "producer’s risk"). We
shall also discuss this in Section 5.2.
5.2 Tests Allowing Failures
If we test a number of units for the same test time on each unit, except for failed units, and
do not allow a failed unit to be repaired and returned to the test, we can derive test plans allowing
some failures by using the binomial distribution. As discussed in Section 3.1.1, Equation 3-4,
repeated here as Equation 5-6, yields the probability of getting "x or less" events in "n" trials.
x n!
P(x or less) = ∑ p x q (n - x) (5-6)
0 x! (n - x)!
where:
p = probability of an event happening in one trial

q = probability of event not happening in one trial
n = number of trials
If we let an event be a failure, then the probability of a failure, "p", is equal to one minus
the reliability function (1 - e − (t/ ) ) , and the probability of no failures, "q", is equal (by
definition) to the reliability function (e − (t/ ) ) . Letting "n" equal the number of units on test,
Equation 5-7 gives the probability of "x" or less failures when all units are operated for "t" time
units unless prevented by failure.
(x) n−x
P(x or less) = ∑
x n! 1 - e - (t/ ) β   e - (t/ ) β 
    (5-7)
0 x! (n - x)!    
We then set P(x or less) equal to the risk we are willing to take of accepting products with
an MTBF equal to , and solve for "t" given any desired values of "x". This is obviously not a
closed-form solution, and would ordinarily be done by trial and error iteration using a computer.
It becomes even more difficult when the producer’s risk is considered, as discussed in Section
5.2.1.
5.2.1 Controlling the Producer’s Risks
The producer’s risk is the probability of a test rejecting "good" products. A "good" product
may be one with an MTBF clearly acceptable for the mission, or equal to the state-of-the-art for
the product, or just something that is an arbitrary ratio higher than the "bad" MTBF that we want
to reject. It should always be possible to achieve the "good" MTBF with reasonable effort. It
warrants consideration because a test based solely on satisfactory consumer’s risks may
unintentionally provide a high risk of rejecting "good" products. Though this is called the
producer’s risk (for obvious reasons), it does the consumer no good to reject satisfactory
products.
Equation 5-7 yields the consumer’s risk when " " is equal to the "bad" MTBF. The
producer’s risk is calculated from Equation 5-8, when " " is equal to the "good" MTBF.
(x) n−x
P(x + 1, or more) = ∑
nn!  - (t/ )   - (t/ ) 
1 - e  e  (5-8)
x + 1 x! (n - x)!    
To formulate a test with satisfactory producer’s and consumer’s risks, one defines the values
of both risks, and solves for values of "n" and "t" that satisfy both risk equations. One way is the
procedure shown in Figure 5-1.
Calculate time needed to Calculate the probability that If the producer’s risk is too
satisfy the consumer’s risk in a value of life considered good high, assume a one-failure
Start a zero-failure test. would be rejected (producer’s test and re-compute the test
risk) using the zero-failure time for a satisfactory
test time. consumer’s risk.
If the producer’s risk is still Calculate the probability that a

too high, allow another failure, value of life considered good
End recompute test time and would be rejected (producer’s
recalculate probability until risk) using the one-failure
an acceptable producer’s risk test time.
is achieved.
Figure 5-1: Devising a Reliability Test
5.3 Testing Under the Exponential Distribution
When the exponential distribution of failures applies (i.e., the failure rate is constant), the
reliability function is:
R = e -(t/ ) (5-9)
where:
R = probability of zero failures in time (t)

= the mean time between failures
Since the failure rate does not change with time, it does not matter whether or not the test
units have equal operating times. The test time "t" can therefore be set to the sum of the test time
among all the units. In addition, unlike the tests based on the Weibull distribution, failures can
be repaired and the failed unit returned to test. Equation 5-6 can then be used to determine a zero
failure test as shown in Section 5.1. Since the assumptions we used in Section 5.2 do not hold,
the binomial distribution cannot be used to derive tests allowing one or more failures. However,
the assumption of a constant failure rate permits us to derive tests allowing failures by using the
Poisson formula.
(u) n e -(u)
P(n) = (5-10)
n!
where:
P(n) = the probability of exactly n events

u = the expected number of events
Since the number of failures expected in time (t) = t/ , the probability of exactly n failures
in a test with time equal to t is:
(t/ ) n e -(t/ )
P(n) = (5-11)
n!
If we establish a test where the sum of test time = t and up to n failures are allowed, the
probability of passing the test (i.e., the consumers risk, when = the "bad" MTBF) is:
n (t/ ) n e -(t/ )
P(n) = ∑ (5-12)
0 n!
Thus, we can establish a set of tests with different numbers of allowable failures, all of
which provide the same consumer’s risk. We shall need this capability in order to formulate tests
that have satisfactory producer’s risks as well as satisfactory consumer’s risks.
Based on the Poisson formula, the producer’s risk for a test time of "t" with "n" failures
allowed, is:
n (t/ ) n e -(t/ )
P(n) = 1 - ∑ (5-13)
0 n!
where:
= "good" MTBF
other terms as defined previously
We can use the procedure shown in Figure 5-1 to derive tests with satisfactory producer’s
and consumer’s risks. However, there is no need to go through this routine. Table 5-1 tabulates
test plans for the most reasonable combinations of risk and the ratio of the "good" MTBF to the
"bad" MTBF (called the "discrimination ratio"). In the table, 0 is the "good" MTBF and 1, the
"bad" MTBF. (Note: in some references 0 designates the "bad" MTBF and 1 is the "good"
MTBF.)
Table 5-1: Fixed Time Reliability Tests

Nominal decision Risks Discrimination Accept-Reject Criteria
Producer’s Consumer’s Ratio Test Duration Reject (Failures Accept (Failures
Risk (%) Risk (%) (θ0/θ1) (x θ1) or More) or Less)
10 10 1.5 45.0 37 36
10 20 1.5 29.9 26 25
20 20 1.5 21.5 18 17
10 10 2.0 18.8 14 13
10 20 2.0 12.4 10 9
20 20 2.0 7.8 6 5
10 10 3.0 9.3 6 5
10 20 3.0 5.4 4 3
20 20 3.0 4.3 3 2
30 30 1.5 8.0 7 6
30 30 2.0 3.7 3 2
30 30 3.0 1.1 1 0
Table 5-1 is taken from MIL-HDBK-781, Reliability Test Methods, Plans, and
Environments for Engineering Development, Qualification and Production, which is the most
comprehensive reference on reliability testing.
5.3.1 Sequential Tests: A Short Cut
As Table 5-1 shows, small values of risk and small discrimination ratios can result in long
tests. Where this is unsatisfactory, a probability ratio sequential test may be used. The
sequential test is based on the ratio of two probabilities: (1) the probability that a combination of
failures and test time will occur when the test units actually have the "bad" MTBF, and (2) the
probability of occurrence when the test units have the "good" MTBF. If the former is
satisfactorily higher than the latter, a reject decision can be made. Conversely, when a
combination of failures and test time is a predetermined times more likely to have occurred from
a test of "good" units than from a test of "bad" units, an accept decision can be made. Where the
ratio of the probabilities is not great enough to make a decision, the test continues to an arbitrary
truncation point (used to assure that the test ends in a reasonable time). Figure 5-2 shows the
form of the test, which will permit more rapid decisions than the fixed time test, when the true
MTBF of the test units is much closer to one of the defined "good" or "bad" values than it is to
the other.
A summary of sequential test plans for the most common risks and discrimination ratios is
presented in Table 5-2. More details (i.e., decision times for each failure) will be needed for
application, as provided in Table 5-3 for a test with both risks approximately 10% and a
discrimination ratio of 2.0. Details on other sequential tests are available in MIL-HDBK-781.
8
Reject
7
Cumulative Failures
6 Continue Test
3
End Accept
2 Test Data Test
100 200 300 400 500 600 700 800

Cumulative Test Time, Hrs.
Figure 5-2: Typical Sequential Test
Table 5-2: Sequential Tests

Nominal Decision Risks Time to Accept Decision (Multiples of θ1)
Producer’s Consumer’s Discrimination Ratio Minimum Expected1 Maximum2
Risk (%) Risk (%) (θ0/θ1)
10 10 1.5 6.6 25.95 49.5
20 20 1.5 4.19 11.4 21.9
10 10 2.0 4.40 10.2 20.6
20 20 2.0 2.80 4.8 9.74
10 10 3.0 3.75 6.0 10.35
20 20 3.0 2.67 3.42 4.5
30 30 1.5 3.15 5.1 6.8
30 30 2.0 1.72 2.6 4.5
Notes: 1. Expected time for a true MTBF equal to 0
2. Arbitrary truncation point
Table 5-3: Sequential Test Plan for 10% Risks, 2.0 Discrimination Ratio
Number of Failures Reject if t ≤ θ1 Times: Accept if t ≥ θ1 Times:
0 N/A 4.40
1 N/A 5.79
2 N/A 7.18
3 0.70 8.56
4 2.08 9.94
5 3.48 11.34
6 4.86 12.72
7 6.24 14.10
8 7.63 15.49
9 9.02 16.88
10 10.40 18.26
11 11.79 19.65
12 13.18 20.60
13 14.56 20.60
14 15.94 20.60
15 17.34 20.60
16 20.60 N/A
As an illustration, assume we put a number of products on test using Table 5-3 for our
accept-reject criteria, and 1 = 100 hours. The first opportunity to accept the product would
happen when we accrue 440 hours (4.4 x 1) among all the units on test. If there was no failures
at that time, we would accept the product. If we had one failure before 440 hours were
accumulated, we would have to wait until 579 hours to accept the product, presuming it did not
fail a second time. The first opportunity to reject the product would be when three failures
occurred, if we had accrued 70 hours (0.70 x 1) or less among the units on test. If we had more
time than 70 hours accumulated at the third failure, we could not make a reject decision until the
fourth failure, and then only if it occurred before we had more than 208 hours. When the failures
and times do not permit a decision to accept or to reject, the test continues. An arbitrary
truncation at 16 failures (reject) or 2,060 hours (accept), whichever comes first, prevents the test
from continuing indefinitely.
5.4 Other Test Considerations
There is more to reliability testing (or, for that matter, any testing) than the statistics
involved. Some other considerations are:
• Definition of Failure: What is a failure? Is a transient event a problem? How much

degradation is acceptable?
• Test Environment: Are the temperature, shock, vibration, thermal and power cycling,
and other test conditions a realistic representation of the environment the product will
face in its expected use?
• Product Configuration and Operation: Is the product on test truly representative of the
product of interest? Is it being exercised to the same degree?
• Monitoring: How often will it be checked for proper functioning, and how?
• Failure Analysis: Will failures be analyzed for root cause and corrective action
recommended?
• ESS and Preventive Maintenance: Do the products on test have the same benefits of
environmental stress screening (ESS) and preventive maintenance that the production
units will have (and no better)?
• Special Ground Rules: If the number of failures is acceptable, but a pattern of failures
exists, must the cause for the pattern be found and rectified before the product is
accepted?
These considerations are beyond the scope of this text. However, MIL-HDBK-781,
Reliability Test Methods, Plans and Environments for Engineering Development, Qualification,
and Production covers them in great detail.
STAT Section 6: Reliability Growth Testing 49
6.0 RELIABILITY GROWTH TESTING
Reliability growth is the improvement in reliability of a product as design defects are found
and eliminated. This can be done using data from all operational tests of the product and/or by
dedicated reliability growth testing. In either case, it is of interest to estimate when the reliability
will have grown to a satisfactory value. Various models have been proposed, and the two most
popular are based on fitting the data to a straight line on log-log scales. Both methods will be
explained. The first, the Duane model, because its solution uses least square regression, a
method of broad utility. The other, the AMSAA/Crow model, because it assumes an underlying
distribution to the data. Both of these are described, with others, in MIL-HDBK-189, Reliability
Growth Management.
6.1 Duane Growth Analysis
The first formal reliability growth model was created by James T. Duane, who noted that
failure rate data plotted as a straight line against operating time on log-log scales. Predicting how
much operating time would be required to achieve a desired failure rate was therefore a matter of
fitting an equation to the data and solving the equation, finding time for a given failure rate.
Duane’s equations for reliability growth are:
-
cum = KT (6-1)
where:
cum = cumulative failure rate

K = initial failure rate
T = test time
α = growth rate
By finding the rate of change in the number of failures against time,
-α
inst = K(1 - α) T (6-2)
where inst = instantaneous failure rate at time T (the failure rate expected if reliability growth
stops at time = T).
cum and inst plot as parallel straight lines on log-log graph paper. One can, of course, obtain
a solution by drawing a straight line through the data points, extending the instantaneous failure
rate curve until it intersects with the desired failure rate, and reading the corresponding point on
the time axis. However, given any spread in the data, placing the straight line becomes rather
arbitrary. The parameters of the equations are therefore determined by a statistical analysis
method called least square regression.
6.1.1 Least Square Regression
Least square regression is a way of finding a straight line fitting a set of data believed to be
linear, but showing scatter. Basically, it minimizes the sum of the squares of the distances
between the data points and the line.
The basic equation for any straight line plot is:
Y = mX + b (6-3)
To determine the equation from a set of paired data points using the method of least
squares:
ΣX i Yi - (ΣX i ΣYi )/n

m = (6-4)
ΣX i2 - (ΣX i )2 /n
ΣYi - mΣ X i
and b = (6-5)
n
Relating this general procedure to Duane’s equations:
Y = log of cum
b = log K
m = -α
X = log of cum time
As an example, we will perform the least squares regression on the set of data shown in
Table 6-1.
Table 6-1: Growth Data

Cum. Cum. Test Log Cum. 2
Xi λ Logλ(Yi) XiYi
Failures Time Time (Xi)
1 1 0 0 1.0 0 0
2 4 0.60206 0.36248 0.500 -0.30102 -0.1812
3 8 0.90309 0.81557 0.375 -0.42597 -0.3847
4 13 1.1139 1.2408 0.308 -0.51145 -0.5697
5 20 1.3010 1.6926 0.250 -0.60206 -0.7833
6 30 1.4771 2.1819 0.200 -0.69897 -1.032
7 42 1.6232 2.6348 0.167 -0.77728 -1.262
8 57 1.7559 3.0832 0.140 -0.85387 -1.499
9 78 1.8921 3.5797 0.115 -0.93930 -1.777
10 104 2.0170 4.0683 0.0962 -1.0168 -2.051
11 136 2.1335 4.5518 0.0809 -1.0921 -2.330
12 177 2.2480 5.0535 0.0678 -1.1688 -2.627
13 228 2.3579 5.5597 0.0570 -1.2441 -2.933
14 292 2.4654 6.0782 0.0479 -1.3197 -3.253
15 372 2.5705 6.6075 0.0403 -1.3947 -3.585
16 473 2.6749 7.1551 0.0338 -1.4711 -3.935
17 599 2.7774 7.7140 0.0284 -1.5467 -4.296
18 757 2.8791 8.2892 0.0238 -1.6234 -4.674
19 956 2.9805 8.8834 0.0199 -1.7011 -5.070
20 1205 3.0810 9.4926 0.0166 -1.7799 -5.484
21 1518 3.1813 10.121 0.0138 -1.8601 -5.918
22 1879 3.2739 10.718 0.0117 -1.9318 -6.325
23 2262 3.3545 11.253 0.0102 -1.9914 -6.680
24 2668 3.4262 11.739 0.00899 -2.0462 -7.011
25 3099 3.4912 12.188 0.00807 -2.0931 -7.307
2
ΣX i = 55.58 ΣX i = 145.1 ΣYi = -30.39 ΣX i Yi = -80.97
Using the values from Table 6-1:
80.97 - (55.58)(-30.39)/25
m = -α = = - 0.62, or α = 0.62 (6-6)
145.1 - (55.58) 2 /25
b = log(K) = [(-30.39) - (-0.62)(55.58)]/25 = 0.16, or K = 10 0.16 = 1.45 (6-7)
A typical Duane plot is shown in Figure 6-1. The cumulative failure rate is plotted as a
straight line on the log-log scale, and the instantaneous failure rate is plotted by another straight
line parallel to the cumulative failure rate.
Numerically, at the time of the last failure, the cumulative failure rate, calculated from the
data using Equation 6-1, is:
-α
cum = KT = 1.45(3,099) -0.62 = 1.45 x 0.0068 = 0.00986 (6-8)
Failure Rate
(Failures/
Hours)
Cumulative
Failure
Rate
Instantaneous
Failure
Rate
Cumulative Test Time

Figure 6-1: Typical Duane Plot
The instantaneous failure rate, using Equation 6-2, is:
-α
inst = K(1 - α) T = 1.45 (1 - 0.62)(3,099) -0.62 = 0.0037 (6-9)
6.2 AMSAA Growth Analysis
Dr. Larry Crow, then at the Army Material Systems Analysis Agency (AMSAA), devised
another approach to reliability growth analysis based on the assumption that reliability growth is
a non-homogenous Poisson process with a Weibull intensity function (Translation: a process in
which the number of failures follows a Poisson distribution, but the Poisson distribution changes
with time in such a manner that the instantaneous failure rate resembles a Weibull function).
Under this assumption:
cum = T ( -1) (6-10)
inst = βT ( -1) (6-11)
where:
= growth rate
= initial failure rate
T = test time
The parameters of the AMSAA/Crow model are determined by what is called a maximum
likelihood method:
N
= (6-12)
N
∑ ln (T/X i )
i =1
where:
N = no. of recorded failures
T = total test time (= X n when test ends at a failure)
Xi = time when an individual failure occurs
and:
N
= (6-13)
T
Using the AMSAA model on the same data that we applied the Duane model yields the
results shown in Table 6-2.
Table 6-2: Growth Data Revisited
Cumulative Failure Cumulative Test
Xn/Xi ln(Xn/Xi)
Count Time at Failure (Xi)
1 1 3099 8.0388
2 4 774.75 6.6525
3 8 387.38 5.9594
4 13 238.38 5.4739
5 20 154.95 5.0431
6 30 103.30 4.6376
7 42 73.786 4.3012
8 57 54.368 3.9958
9 78 39.731 3.6821
10 104 29.798 3.3944
11 136 22.787 3.1262
12 177 17.508 2.8627
13 228 13.592 2.6095
14 292 10.613 2.3621
15 372 8.3306 2.1199
16 473 6.5518 1.8797
17 599 5.1736 1.6436
18 757 4.0938 1.4095
19 956 3.2416 1.1761
20 1205 2.5718 0.94460
21 1518 2.0415 0.71369
22 1879 1.6493 0.50034
23 2262 1.3700 0.31483
24 2668 1.1615 0.14975
25 3099 1.0000 0
24
∑ ln (X n /X i ) = 72.99
i =1
25
= = 0.34 (6-14)
72.99
25
= = 1.625 (6-15)
(3,099) 0.34
At the end of test, using Equations 6-10 and 6-11:
cum = T ( - 1) = (1.625)(3,099) (0.34 - 1) = 0.008066 failures/hour (6-16)
inst = T ( - 1) = (0.34)(1.625)(3,099) (0.34 - 1) = 0.0027 failures/hour (6-17)
The AMSAA/Crow equations will yield the same type of straight line plots on log-log
paper as the Duane equations. However, though we used the same data for both, a comparison of
Equations 6-8 and 6-9 to 6-16 and 6-17 shows that the Duane and AMSAA models can lead to
different answers. Table 6-3 compares the estimates at the end of the test (3,099 hours.)
Table 6-3: Comparison of Estimates

Parameter estimated Duane Estimate at 3099 hrs. AMSAA Estimate at 3099 hrs.
0.00986 failures/hour 0.008066 failures/hour
cum
0.0037 failures/hour 0.0027 failures/hour
inst
One reason for the differences shown in Table 6-3 might be that the least squares solution
treats all the data points equally, and in growth data the early points are not as significant as the
later points, because they are based on less data. Or, the assumption of a non-homogenous
Poisson process with a Weibull intensity function may not quite fit. There is no preferred
solution. However, there are some advantages to the AMSAA/Crow model simply because a
distribution is assumed. For example, confidence intervals can be drawn around the plot by
finding the limits of specified areas of a distribution (as we did in Section 4.3.2, for example).
Tables of the appropriate distribution for analyzing data from a non-homogeneous Poisson
process are found in MIL-HDBK-189, Managing Reliability Growth.
STAT Section 7: Sampling (Polling) and Statistical Quality Control 55
7.0 SAMPLING (POLLING) AND STATISTICAL QUALITY CONTROL
Almost all statistical tools are designed to analyze the characteristics of a population of
something from data on a sample taken from that population.
In our discussion of measuring reliability, we used confidence limits to better understand

the trust we could put into a measured value of life, failure rate or mean-time-between-failures.
We also often want to measure quality, in terms of an attribute such as the proportion of products
in a distribution containing defects. An analogous problem is to determine the proportion of
voters in favor of some proposed government action, and, we shall see, the same methods of
solution apply.
In our discussion of demonstrations, we used tests of hypotheses to decide how confident

we were that an accept or reject decision was correct, for measures of life. We also often want to
demonstrate the acceptability of a product, in terms of the proportion defective, from sample
data. A problem analogous to this, and solved by the same methods, is the demonstration of
reliability for a "one-shot" product, as mentioned in Section 4.4.
A key assumption in these applications is that the attribute of interest follows the binomial
distribution (a product is defective or not, a voter is for or against, etc.).
7.1 Measuring Quality from Samples
The central limit theorem (discussed in Section 3.2.1.2) tells us that when we measure a
parameter such as percent defective in an infinite number of samples, the mean of these
measurements will be equal to the percent defective in the population from which we took the
samples, and the measurements will be distributed normally. (Caveat: for proportions, the
normal applies well only when p or (1 - p) > 0.1 and np > 5, where "p" is the proportion defective
and "n" is the sample size.)
Because we are sampling a parameter distributed binomially:
= p(1 - p) (7-1)
where:
= the standard deviation of the population

p = the proportion of interest (defects, yes votes, etc.)
The standard deviation of the measurements, "S", is a function of the standard deviation of
the parent population, " ", and the size of the sample, "n".
p(1 - p)
S = (7-2)
n
where:
S = standard deviation of the distribution of sample measurements taken

n = sample size (no. of parts checked, people polled, etc.)
Since the measurements follow a normal distribution, we can convert it to a standard

normal, as discussed in Section 3:
p̂ - p
z = (7-3)
S
where:
p̂ = a measurement taken on one sample (of size n)

p = the mean of the measurements on all samples (= the population mean)
S = standard deviation of the measurements
Our next step is to choose a critical value of "z" for the confidence we want. The
confidence is equal to the area under the standard normal defined as between "z" and "-z", when
"z" equals the critical value. For example, if we desire a 95% confidence, we need the critical
value of "z" marking the limits to 95% of the area under the normal curve around the mean.
Table 7-1 (a copy of Table 3-3) gives these values for some common areas of interest.
Table 7-1: Critical Values of z

Critical Value of z Area Between z and -z Area From z to Infinity
1.28 0.80 0.10
1.645 0.90 0.05
1.96 0.95 0.025
2.33 0.98 0.01
2.58 0.99 0.005
As Table 7-1 shows, in the standard normal distribution 95% of all data fall between -1.96
and + 1.96, or:
0.95 = probability that ( p̂ - p)/S is between -1.96 and + 1.96
where:
p̂ is the measured proportion in the sample.
Put another way, we are 95% sure the true proportion is between p̂ - 1.96S and p̂ + 1.96S.
However, since we don’t know p, we also don’t know S. Fortunately, we can approximate it
by substituting p̂ for p, yielding an expression called the standard error:
p̂(1 - p̂)
SE(p̂) = (7-4)
n
To illustrate this, let us discuss a political poll. Suppose someone polled 1,000 people on
some question and found 500 yes votes. This means p̂ = 500/1000 = 0.50.
Our 95% confidence limits are therefore:
Lower limit = p̂ - 1.96S (7-5)

Upper limit = p̂ + 1.96S (7-6)
Using the Standard Error in place of S:
1.96 [p̂(1 - p̂)] 1.96 [0.5(1 - 0.5)]

Lower limit = p̂ - = 0.5 - = 0.469 (7-7)
n 1,000
1.96 [p̂(1 - p̂)] 1.96 [0.5(1 - 0.5)]

Upper limit = p̂ + = 0.5 + = 0.5309 (7-8)
n 1,000
Thus we are 95% sure that the true value (p) is between 0.469 and 0.5309, which is the
measured value ( p̂ ) of 50%, plus or minus about 3%. Note that the latter figure is not 3% of the
measured value of p̂ (50%), but 3% of the total number of votes. In polling, this 3% is called the
margin of error.
The margin of error will change with the desired confidence. As an example, for a 99%
confidence interval, the area under the standard normal curve (see Table 7-1) is between ±2.58
standard deviations. Using 2.58 in place of 1.96 in Equations 7-7 and 7-8 yield a margin of error
of 4%, for 99% confidence. However, the selection of sample size can have a much more
significant effect. We will leave it to the reader to verify, if he (or she) wishes, that a poll of 100
people with the same measured value (0.5), for a 95% confidence interval, would have about a
10% margin of error (0.098 to be more precise), and a poll of 10,000 people with the same result
would have a margin of error of about 1%. The expense of polling 10,000 people instead of
1,000 or so is seldom considered worth the greater precision.
The margin of error is also dependent on the value of p̂ , but is highest when p̂ = 0.5. For
example, when p̂ = 0.1 (or = 0.9), the margin of error for 1,000 samples at 95% confidence is
0.0185 or less than 2%, as compared to 3% at p̂ = 0.5.
7.1.1 Caveats
These conclusions assume that the sample is truly representative of the population of
interest. Fortunately for us, making a random selection of products to measure defect rate is a lot
easier than assuring that people polled (and their responses) really represent the population of
interest.
This method presumes a large sample. If small samples are used (less than 30 or so), the
Student t distribution (see Section 3.2.2) should be used instead of the standard normal. This is
done by using Student t tables (see Appendix E) in lieu of Standard Normal tables in finding
critical values for the desired confidence. For example, we have determined from Table 7-1 that
the critical value for 95% confidence using the standard normal is 1.96. Using Appendix E, we
would find a value of "t" under the column for 0.975 (when 0.975 of the area under the curve is
between -∞ to "t", 1 - 0.975 = 0.025 is in one tail of the curve. Hence, both tails contain
2 x 0.025 = 0.05, or the area between "t" and "-t" is 0.95). For a sample size of 4 (three degrees
of freedom), this value (from Appendix E) would be 2.353. Using 2.353 instead of 1.96 in
Equations 7-7 and 7-8 yields wider confidence limits, reflecting the greater uncertainty caused by
small samples. As sample size increases, the differences between Table 7-1 and Appendix E
vanish.
Neither the normal nor the Student t distribution apply well when p < 0.1 or np < 5. It is
possible to estimate confidence limits using the Poisson distribution when p < 0.1 and the
binomial when p < 0.1 and np < 5, but since these are discrete distributions, they are awkward to
handle. The reliability engineer may not have enough interest to pursue this further, but quality
engineers have done so. The usual practice is to approximate the distributions by a smooth curve
(the higher the sample size, the better this works) and use graphical methods devised for finding
the confidence limits. Some simple examples of this are presented in Statistical Methods in
Engineering and Manufacturing, by John E. Brown, Quality Press, Milwaukee, 1990.
7.2 Demonstrating Acceptability Through Sampling
In Section 7.1, we were concerned with measuring an attribute of a population through a

sample. Quite often, we are more concerned with assuring that the value of an attribute is
acceptable than knowing the value itself. For this purpose, we are not concerned with measuring
to some confidence, but rather testing a hypothesis (the product is acceptable defined as having
better than a stated defect rate, or the product is not acceptable, defined as having worse than a
given defect rate) to some confidence (or equivalently, with some acceptable risk of error). Or,
we may be concerned with the reliability of a "one-shot" device, and seek to verify that its
probability of success when called upon is satisfactory to some desired confidence. The same
methods will apply.
A sampling test is defined by sample size and an allowable number of samples with the
undesired trait. For example, take a sample of 5 units and allow no defects. The probability of
passing this test will depend on the true defect rate of the product. Intuitively, we would expect a
product with a defect rate of 1% to pass this test most of the time and a product with a defect rate
of 50% to fail most of the time. Statistical analysis is needed to determine what "most of the
time" is in terms of probability of passing, especially for products with less extreme defect rates.
Plots of the probability of acceptance versus the actual defect rate of a product are called
operating characteristic (OC) curves. An ideal OC curve would look like Figure 7-1, where all
products below a specified value would fail and all above would pass. Unfortunately, the real
world is not so obliging and the typical shape of an operating characteristic curve is as shown in
Figure 7-2.
Probability
of Acceptance
P(θ)
0
0%
θ 100%
(High (Low
Qual) θ (% Defective)
Qual)
Figure 7-1: Ideal O-C Curve
P(θ)
Probability
of Acceptance
of a Batch of }α
Quality θ
0
} β
0% 100%
(High Qual) θ1 θ2 (Low Qual)
θ (% Defective)
Process
Average
α = risk of rejecting product with defect rate = 1
= risk of accepting product with defect rate = 2
Figure 7-2: Practical O-C Curve
Figure 7-2 illustrates two types of risks involved in sampling. A customer who considers
2 the worst defect rate that he can tolerate would like tests that keep to an acceptably low
value. For this reason is called the consumer’s risk. A manufacturer who considers 1 the best
defect rate he can practically achieve would like tests that keep α to an acceptably low level.
Hence, α is called the producer’s risk. These terms have been discussed in Section 5, Reliability
Tests, where we derived tests that considered both risks. Typically, however, sampling plans
consider only one risk.
A customer doing incoming inspections of suppliers products will use test plans based on
, while a supplier checking his quality control will use plans based on α . In either case the
risks are computed from the binomial formula, Equation 3-1 in Section 3.1.1, reprinted here as
Equation 7-9.
n!
P(r) = p r (1 - p) n - r (7-9)
r!(n - r)!
where:
p = the proportion defective

n = sample size
r = number defective
P(r) = probability of getting exactly r defective units in a sample of n units
If r or less failures are allowed in a test of n units, the probability of passing the test is:
r n!
P(accept) = ∑ p r (1 - p) n - r (7-10)
0 r!(n - r)!
The consumer can set P(accept) = for p = worst acceptable defect rate and generate a set
of plans by setting values of "r" and solving the equation to determine n for each value of "r".
These plans are referred to as LTPD Plans. LTPD stands for Lot Tolerance Percent Defective.
LTPD plans are designed to reject the designated LTPD value most of the time.
The producer can set P(accept) = 1 - α for p = the defect rate he considers realistic for his
product, and generate a set of test plans in a similar manner. These plans are referred to as AQL
plans. AQL stands for Acceptable Quality Level and AQL plans are designed to accept the
designated AQL most of the time.
In practice, rather than solve Equation 7-10, a value of "p" is selected and P(accept)
determined for various values of "n" and "r" until a plan giving an acceptable value of P(accept)
for a reasonable sample size is produced. This iterative process is an ideal job for a computer,
but quite tedious when done manually. However, when "r" = 0 we can solve Equation 7-10 in
closed form, and will use this fact to give an example.
When "r" = 0, Equation 7-10 simplifies to Equation 7-11:
n!
P(accept) = p 0 (1 - p) n - 0 = (1 - p) n (7-11)
0! (n - 0)!
Taking logarithms of both side, Equation 7-11 becomes:
log P(accept) = n log(1 - p) (7-12)
or:
log P(accept)
n = (7-13)
log (1 - p)
Let us assume a consumer wants to be 90% confident that products he receives have an
LTPD of 0.15, using a test with no failures allowed. For this case, P(accept) would be the
consumers risk or one minus the confidence or (1 - 0.90) = 0.10, and (1 - p) = 1 - 0.15 = 0.85.
Using Equation 7-13:
log (0.10) - 1.0

n = = = 14.16 (7-14)
log (0.85) - 0.0706
The consumer would therefore test 14 units and accept the product if no failures occurred.
Let us suppose the producer of the same product wanted to be 90% confident that the
products made had an AQL of 0.05, using a test with no failures allowed. In this case, P(accept)
would equal one minus the producers risk or the desired confidence (0.90) and (1 - p) = (1 - 0.05)
= 0.95. Again using Equation 7-13:
log (0.90) - 0.04576

n = = = 2.05 (7-15)
log (0.95) - 0.02228
Hence, the producer would test 2 units and allow no failures.
There are published tabulations of test plans for use by quality professionals. LTPD plans
are tabulated in MIL-PRF-19500K, Performance Specification, Semiconductor Devices, General
Specification For, in Appendix C, and also in MIL-PRF-38535E, Performance Specification,
Integrated Circuits (Microcircuits) Manufacturing, General Specification For, in Appendix D.
AQL plans are found in ANSI/ASQC Z1.4-1993, American National Standard, Sampling
Procedures and Tables for Inspection by Attributes, available from the American Society for
Quality (ASQ). The ANSI/ASQC standard replaces MIL-STD-105, Sampling Procedures, which
has been cancelled. The reader should note that these publications are not user-friendly to the
neophyte and take a procedural approach which obscures the correlation of the table entries to a
confidence limit.
7.3 Statistical Quality Control
A product is created by some process. There is always some variability among products
due to inherent variability in the process. When the process is satisfactory, the variability will
consistently be between acceptable limits about a target value. Special causes (e.g., an
inadequately trained process operator, a bad lot of parts, tool wear, etc.) can change the
variability among products in production so that it is no longer within satisfactory limits, is no
longer centered about the target value, or both. To help maintain stable processes during
production, the discipline of Statistical Quality Control was established. In this section we will
discuss these statistical tools and also some statistical tools that can help define an acceptable
process.
Statistical Quality Control (SQC), is also known as Statistical Process Control (SPC). The
basic assumption is that as long as sample measurements taken periodically during production
vary randomly within an expected variance, the process is in control and needs no adjustment.
Non-random measurements or indications of variance outside the expected distribution show
some special influence at work which can be found and corrected to restore the expected
performance. The sample data are plotted and analyzed on control charts. An authoritative
reference for SQC is Statistical Quality Control, by E.L. Grant and R.S. Leavenworth, McGraw-
Hill Book Co., NY, 1989 (6th Edition).
7.3.1 Control Charts
If samples of a process are taken over time, and a parameter of interest measured, the
results can be presented on a run chart, such as Figure 7-3.
1 2 3 4 5 6
Data Samples
Figure 7-3: Run Chart
To determine the significance of the data in Figure 7-3, we need to establish an expected
variance for the data. Taking advantage of the central limit theorem, we assume that the data will
be normally distributed about the process mean, and that 99.7% of the data points should be
within plus or minus three standard deviations from the mean (from Appendix C, at z = 3.0, the
area from 0 to z is 0.4987, or the area between "z" and "-z" is 2 x 0.4987 = 0.9974 = 99.7%). We
can therefore establish control limits at three standard deviations from the mean with little chance
that samples from a stable process will exceed them. Should the data plot exceed these limits,
we can feel we have good evidence that the process has gone out of control (see Figure 7-4).
With the limits in place, the run chart has become a control chart.
Note that in the automotive industry, where many thousands of control charts are used,
using control limits encompassing even 99.7% of expected variation would result in too many
false alarms. Hence, control limits are often set higher than three standard deviations in
automotive plants. Most other applications are satisfied using three standard deviations.
Indicates Process
Out of Control
Upper Control Limit

(UCL) = Mean + 3S
Centerline = Process
Mean
Lower Control Limit

(LCL) = Mean - 3S 1 2 3 4 5 6
Data Samples
Figure 7-4: Control Chart
The control limits are determined from the standard deviation of the distribution of the
sample mean measurements. This is obtained by dividing the standard deviation of the process
by the square root of the size of one sample. The standard deviation of the process, in turn, can
be estimated from the sample data. The procedure for doing this will vary dependent on the
distribution of the parameter of interest.
7.3.2 Control Charts for Variables
Controlling variables, such as part diameter, tensile strength, etc., which ordinarily follow a
normal distribution, can be done with the aid of a control chart called the X chart. This looks
like Figure 7-4. The center line is the process mean determined by the average measurement of a
(hopefully large) sample, or the mean of the means of many equal sized samples. The process
target can also be used as the centerline, for example a desired part diameter. The upper and
lower control limits would be:
3S′
UCL = X + (7-16)
n
3S′
LCL = X − (7-17)
n
where:
X = the centerline value

n = the number of units in a sample
S′ = the process standard deviation
The process standard distribution, S’, is determined from sample data. One way to do this
is to use the formula:
n
2
∑ (X i - X )
S’ = i =1 (7-18)
n -1
where:
Xi = mean of one sample

X = mean of sample means
n = number of samples taken
7.3.3 Range Charts
If we are interested in the mean of a variable such as product diameter, we will also usually
be interested in the variation about the mean. If the mean of a sample is on target, but the
variation is too great, we do not have a good situation. For this reason, the X chart is often
accompanied by a Range chart, as shown in Figure 7-5.
UCL
X Mean Variation
LCL
UCL
R Range Variation
LCL
Figure 7-5: X and R Chart Combination
Range is simply the difference between the highest and lowest measurements of the units in
the sample. It is a measure of variation, and, as such, can be used instead of the process standard
deviation in setting control limits for both the X and R charts. It is necessary to determine the
average range, preferably from the mean of many samples. This value becomes the centerline of
the R chart. The centerline of the X chart is determined by the mean of many sample means, as
before. However the control limits for both charts are determined from the following formulas:
UCL(X) = X + A 2 R (7-19)
LCL(X) = X − A 2 R (7-20)
UCL(R) = D 4 R (7-21)
LCL(R) = D 3 R (7-22)
where:
UCL(X) = upper control limit for X chart

LCL(X) = lower control limit for X chart
UCL(R) = upper control limit for Range chart
LCL(R) = lower control limit for Range chart
X = process mean = mean of sample means
A2, D4, D3 = constants (see Table 7-2)
The constants in the control limit formulas have been worked out by statisticians, assuming
the processes measured follow a normal distribution, and are presented in Table 7-2.
Table 7-2: Statistical Constants

Sample Size A2 D3 D4
2 1.88 0 3.27
3 1.02 0 2.57
4 0.73 0 2.28
5 0.58 0 2.11
6 0.48 0 2.00
7 0.42 0.08 1.92
8 0.37 0.14 1.86
9 0.34 0.18 1.82
10 0.31 0.22 1.78
12 0.27 0.28 1.72
15 0.22 0.35 1.65
20 0.18 0.41 1.59
For example, if we are producing rods with a mean length of one inch and find our mean
range in many samples is 0.002 inches, then X = 1.000, R = 0.002. For a sample size of three:
UCL(X) = X + A2 R = 1.000 + 1.02(0.002) = 1.000 + 0.00204 = 1.00204 (7-23)
LCL(X) = X - A2 R = 1.000 - 1.02(0.002) = 1.000 - 0.00204 = 0.99796 (7-24)
UCL(R) = D4 R = 2.57(0.002) = 0.00514 (7-25)
LCL(R) = D3 R = 0(0.002) = 0 (7-26)
7.3.4 Interpreting Control Charts
A plot running outside the control limits is not the only sign of unexpected variation. Any
plot that does not appear to be random is evidence of loss of control, and often provides clues
about the problem. For example, a plot hugging the centerline (all points falling within one
standard deviation of the centerline) shows some abnormal condition, probably the misrecording
of data because of a fear of reporting measurements off target. In contrast, a plot hugging the
control limits, going from close to the Upper Control Limit to Close to the Lower Control Limit
without any points in between, shows some factor switching the distribution from high to low.
One example could be a process done on either of two machines that are adjusted differently.
Some possible signs of non-random variation (i.e., trouble) are:
• One or more points outside the control limits

• 10 consecutive points above the center line or below the center line
• 7 consecutive points in a steadily increasing or steadily decreasing pattern
• 14 points alternating up and down
• 15 consecutive points within plus and minus one standard deviation of the center line
• 2 consecutive points in the band between plus two and plus three standard deviations or
in the band between minus two and minus three standard deviations.
In summary, if the plot does not look random, you have grounds for concern, even if the
control limits are not exceeded. However, if the plot is random and between control limits, the
process should not be adjusted. If, for example, a machine is making rods and every time a
sample measures below the center line an adjustment is made to produce longer rods, and every
time a sample measures above the center line an adjustment is made to produce shorter rods, the
end result would be to double the natural variance of the process. If the process is in control, do
not tinker with it. The only way to improve the results would be to change the process to a better
one.
7.3.5 Controlling Attributes
Variables such as dimensions are not always the parameter of interest. We may be
concerned with an attribute, which is something that a product either has or lacks, rather than has
in some range of value. A ball is red or it is not, for example. Of more interest, a product is
defective or it is not. This was discussed in Section 7.1, where we were concerned with
measuring the attribute of interest, and in Section 7.2, where we were concerned with
demonstrating the attribute of interest. Statistical Quality Control for attributes is most akin to
the latter, taking an AQL approach.
If defects are our concern, we could be interested in the proportion of defective product or
the defect rate. For the former, we would not care whether or not a product had more than one
defect, while for the latter we are interested in such ratios as defects per product, defects per
cubic foot, etc. The proportion of defective product is described by a binomial distribution (as
discussed in Section 7.2) and the defect rate by a Poisson distribution. Hence, we use different
formulas in determining the control limits. It also makes a difference whether or not a constant
sample size is used.
7.3.5.1 Proportions
The control chart for proportions (called a "p" chart) is derived from the value of the
proportion of interest, estimated by the number of defective parts divided by the number of parts
tested in a large sample. This serves as the centerline for the p chart. The process standard
deviation would be the square root of p(1 - p), and the standard deviation of samples containing
"n" units would be:
S = p(1 − p) / n (7-27)
Again setting control limits at plus and minus three standard deviations (of the distribution
of sample measurements) from the center line, we have:
UCL = p + 3 p(1 − p)/n (7-28)
LCL = p − 3 p(1 − p)/n (7-29)
It is possible for the calculated LCL to be a negative number. This has no meaning, so the
LCL in such cases is set to zero.
If the sample size is not constant, the control limits would be different for every sample, as
shown in Figure 7-6.
Effect of
Smaller Sample
Effect of
UCL Larger Sample
Centerline
LCL (0)
Min LCL = 0
Figure 7-6: "p chart" for Different Sample Sizes
7.3.5.2 Rates
If we are interested in the defect rate or defect density, the average defect rate, estimated
from a large sample, is the centerline. For rates, the standard deviation of the process would be
the square root of the average rate, "u", and the sample standard deviation would be "u" divided
by the square root of the sample size. Hence, setting the control limits as usual, we have:
UCL = u + 3 (u / n ) (7-30)
LCL = u − 3 (u / n ) (7-31)
For constant sample sizes, the average number of defects, "c", can be used for the
centerline. The standard distribution of both the process and the sample’s number of defects is
the square root of c. Hence:
UCL = c + 3 c (7-32)
LCL = c − 3 c (7-33)
These formulas assume a Poisson distribution for the process, and are most accurate when
the sample size is ≤ 10% of the size of the population sampled, to assure that the probability of
finding a defect remains reasonably constant as samples are selected and removed from the
population.
7.3.6 Caveat: "In Control" May Not Be "In-Spec"
It is important to note that a process that is "in control" is not necessarily one meeting
specified limits. If it is in control, it is doing as well as can be expected, and no amount of
tinkering will improve the product, short of changing the process itself. Necessary changes can
be determined by the statistical design of experiments, described in Section 8, and, once a
satisfactorily capable process is installed, SQC may be used to monitor its stability. We will now
discuss some measures that can be used to determine the capability of a process to create
products within specified limits. These measures assume the process is stable, but not
necessarily satisfactory.
As discussed in Section 3, when the product can be described by a normally distributed

parameter, the standard deviation can be estimated from a sample and then used to calculate the
proportion of the product outside any given limits. An extension to this procedure is the
calculation of figures of merit describing the relationship between the distribution of the
parameter of interest and the designated acceptable range for its values.
7.3.6.1 Measuring Process Capability
Process Capability (Cp) is one measure of the ability of a process to produce acceptable
products. To compute Cp, it is necessary to determine the standard deviation of the parameter of
interest. This is done by measuring the parameter in a sample of the product and using the
formula:
n
∑ (x i - x )
2
= i =1 (7-34)
n -1
where:
= standard deviation
xi = ith measurement
x = mean of all measurements
n = number of measurements
Cp is then calculated by:
USL - LSL
Cp = (7-35)
6
where:
USL = Upper Specification Limit (highest value considered acceptable)

LSL = Lower Specification Limit (lowest value considered acceptable)
= Standard Deviation
Where the difference between the USL and LSL is 6 , Cp = 1. This means that all the
product from the target value to plus and minus 3 are within the specified limits, as shown in
Figure 7-7.
LSL Target USL
Figure 7-7: Process Capability (Cp) Chart
As mentioned in Section 7.3.1, 99.7% of a product is distributed between plus and minus
3 . Therefore, a Cp of 1.0 means that 99.7% of the product is in the range defined as acceptable.
In current conditions, this is considered marginal. It is not until Cp equals or exceeds 1.3 that a
process is considered good. The "6 " quality program initiated by Motorola aims for a full 6
between the target value and either of the specification limits (a total spread of 12 between the
upper and lower specification limits), a Cp of 2.0. However, Motorola does not assume the target
value will be centered, since the mean of a process is also subject to variation. Where the
process mean is off target, Cp cannot be translated into percent "in-spec". For this reason,
Motorola also utilizes a process performance figure of merit, described in Section 7.3.6.2.
7.3.6.2 Measuring Process Performance
Process Performance (Cpk) measures the capability of a process when the mean value of the
parameter of interest is not necessarily on target, as shown in Figure 7-8.
Lower Target Process Upper

Spec Value Mean Spec
Limit Limit
Figure 7-8: Process Performance (Cpk) Chart
Cpk is calculated by the formula:
Min {(USL - ); ( - LSL)}

C pk = (7-36)
3
where:
Min = smaller of the two values

= process mean
other terms as defined above
Motorola’s "6 " goal is a Cp of 2.0 and a Cpk of 1.5. This means that when the process
mean is 1.5 off target, the shortest distance from the process mean to either specification limit
is 4.5 , which equates to 3.4 parts in a million "out of spec".
The "6 " philosophy can also be applied to processes parameters not described by a
normal distribution. For example, taxpayer questions to the IRS are answered either correctly or
not, a binomial process. Such cases can be related to the others through the error rate. If the IRS
answered incorrectly only 3.4 time in a million queries, their process would be equivalent to a
"6 " process (one having a Cpk of 1.5). Most industrial and commercial practices are, or are
equivalent to, about "4 " processes, having error rates around 6,200 per million opportunities.
STAT Section 8: Using Statistics to Improve Processes 71
8.0 USING STATISTICS TO IMPROVE PROCESSES
If a process is in control (see Section 7), the only way to improve it is to change it. A new
process may be created, or, more often, the parameters of the process are changed. For example,
to improve a wave solder process, we might change the temperature of the solder, the height of
the solder wave, or add flux to the mixture. We could test the effects of each possible change
individually, but this is inefficient and would not show any interactions between factors (e.g.,
perhaps a higher solder wave works better with colder solder than with hotter solder). To most
efficiently find desired improvements, and to examine interactions of factors, if necessary, we
can employ the statistical design of experiments (DOE). We can use DOE to find the optimum
process parameters for a defined use environment, or adapt it to find a robust design (i.e., one
well suited for a range of use environments). We can also evaluate the significance of our results
using tools from a family of methods called the analysis of variance (ANOVA).
8.1 Designing Experiments
The first consideration in designing an experiment is the identification of the factors to be

tested. These are the process parameters that we can control and that we believe will affect the
product. Examples given above were the temperature, wave height and use of flux in a wave
solder process. It is a good idea to have the factors selected by a team of people involved with
the process.
The next step is to set factor levels to be used as test settings. We will need a high and low
setting for each factor. It is possible to use more than two levels of a factor, but much simpler to
use only two. The two settings should be close enough together to assure that any difference in
the outcome caused by the factors is reasonably linear between the high and low test settings, and
far enough apart so any affects are noticeable.
"High" and "low" are arbitrary terms and it does not matter if the "low" value (of
temperature, for example) is actually greater than the "high" value. For parameters that are either
present or absent, like the use of flux, one setting, "high" or "low" will represent the presence of
the factor, and the other setting will represent the absence of the factor. For analysis purposes,
the values are coded, each high setting given the value plus one (indicated on test matrixes by
"+"), and each low setting given the value minus one ("-" on the test matrixes). Hence, if the
high value of temperature were 120 degrees and the low value were 80 degrees, a setting of 120
degrees would be valued as plus one and one of eighty degrees valued as minus one. A value of
zero would correspond to a setting of 100 degrees which is not a test setting, but may be a
solution after analysis (to be discussed later). We also need a measure of the process output.
This could be solder defects per card for the wave solder example, rod length for a rod
manufacturing process, or miles per gallon for an automobile. Notice we may want to minimize,
normalize or maximize the output. We will do so by setting process parameters to values
determined by the outcome of the experiment.
The key to test efficiency is the use of "orthogonal arrays". These are matrices describing
test settings that allow the effects of each factor to be separated from the others. An example for
a two parameter test is given by Table 8-1.
Table 8-1: Two Factor Orthogonal Array

Run A B
1 - -
2 + -
3 - +
4 + +
In Table 8-1, A and B represent test factors, such as temperature and wave height. A plus
in the matrix under a factor means the high setting is used during the corresponding test run and a
minus means that the low test setting is used. Each test run is a repetition of the established test
(e.g., processing 100 cards through the wave solder machine) with the settings as shown in the
matrix.
The orthogonal array of Table 8-1 is a full factorial array in that all combinations of high
and low settings for all factors are tested. Such an array will also permit the analysis of all
possible interactions between factors as a by-product. Expanding Table 8-1 to include the
interaction of the two factors, and providing a column for results, we get the matrix shown in
Table 8-2.
Table 8-2: Expanded Test Matrix

Factors Interactions Results
Run (test settings) (by-products) (measured outcomes)
A B A*B
1 - - +
2 + - -
3 - + -
4 + + +
Setting both factors A and B to "high" or both to "low" is equivalent to setting their
interaction to "high". For other combinations, the interaction is "low". The analysis will not
differentiate between factors that are set in the experiment and factors that are defined as by-
products.
The actual running of the tests should be done to minimize the probability of any bias from
factors that are not being tested. For example, if there were some concern that the workers
involved may make a difference, all the tests would be run with the same workers.
Environmental effects such as humidity could be compensated for by repeating the same test run
on different days and averaging the outcomes. The most ideal situation would use several
iterations of each run performed in a random order.
Once the outcome for each run is determined, a linear regression is performed to determine
the optimum settings for the parameters tested. The general form of this is shown in Equation
(8-1) for the array shown in Table 8-3.
Table 8-3: Orthogonal Array

Factors Interactions Results
Run (test settings) (by-products) (measured)
A B A*B
1 - - + y1
2 + - - y2
3 - + - y3
4 + + + y4
AVG- (y1 + y3)/2 (y1 + y2)/2 (y2 + y3)/2
AVG+ (y2 + y4)/2 (y3 + y4)/2 (y1 + y4)/2
∆ (AVG+) - (AVG-) for each column
∆A ∆B ∆(A * B)
Y = Y + A + B + (A * B) (8-1)
2 2 2
where:
Y =
expected output
Y =
average output = (y1 + y2 + y3 + y4)/4
∆A =
(AVG+) - (AVG-) values from column A in matrix
A =
coded value of A (high setting = +1, low setting = -1)
∆B , B, ∆C , C, similar to ∆A , A
AVG+ = Average outcome when the factor in a column is at a high setting
AVG- = Average outcome when the factor in a column is at its low setting
Equation 8-1 merely quantifies an assumption that the output will change from its mean
value linearly with changes in the factors. The difference between "AVG+" and "AVG-" ( ∆ ) for
each factor is the change in Y as the factor varies from -1 to +1 (its low and high values). These
values of " ∆ " are used in Equation 8-1 multiplied by the coded values of the factors. If A were
zero (the mid value between -1 and +1), it would have no impact changing the output from its
average value ( Y ). When A is -1, it reduces the output by ( ∆ A)/2, and when A is +1, it
increases the output by ( ∆ A)/2. Factors B and A*B have similar effects, based on the measured
differences in the output, ∆ B and ∆ (A*B), as the factors vary from low to high. Hence,
Equation 8-1 predicts the outcome of a process for any values of the factors between the high and
low values tested.
The regression equation is then used to find values of the factors (between plus and minus
one) which give the desired output (maximum, minimum or nominal). These values are then
translated to settings (e.g., temperature) to be used in the new process. To illustrate, let us use
some hypothetical values, as shown in Table 8-4.
Table 8-4: Sample Test Results

Run A B A*B Y
1 - - + 10
2 + - - 6
3 - + - 8
4 + + + 4
AVG- 9 8 7
AVG+ 5 6 7 Y=7
∆ -4 -2 0
Y = 7 - 2A - B + 0(A*B) = 7 - 2A - B
Using the equations given in Table 8-3, the outcomes listed in Table 8-4 result in the
regression equation Y = 7 - 2A - B. If Y represented some undesirable quantity (e.g., a defect
rate), we would want to minimize it. Hence, A and B would be set to their high values. In the
equation, the high settings are represented by plus one, so Y = 7 - 2 - 1 = 4. We would expect the
average defect rate to change from 7 to 4 by using the high settings of both factors.
If the outcome were something we want to maximize, such as miles per gallon, we would
maximize the regression equation by setting both A and B to their low settings (-1). In that case,
Y = 7 - (-2) - (-1) = 7 + 2 + 1 = 10. We would raise the average gas mileage from 7 to 10 mpg by
setting factors A and B to their low values.
Suppose the outcomes were a measurement that we wanted to be 5.0. Then we would set
the vales of A and B between plus and minus one so that Y = 5. One way would be to set A at
plus one and B at zero. Hence, Y = 7 - 2 - 0 = 5. A would be set at the high value represented by
plus one, and B would be set at a value mid way between the high value and the low value. This
may not be possible. If B is a parameter which can only be present or absent, it can only take the
values plus and minus one, and hence the above solution could not apply. In that case, B could
be set to plus one and A set to 0.5 (a setting three quarters of the difference between the low
setting and the high setting) for Y = 7 - (2 x 0.5) - 1 = 7 - 1 - 1 = 5.
The optimal setting should also consider the costs involved. It may, for example, be less
costly to change A than to change B.
Once the optimal settings are derived, it is good practice to perform a test at those settings
to confirm that the expected new output is indeed achieved. If it is not, it indicates there is
another factor at work that was not tested. This should be identified and made a factor in a new
set of experiments.
Once the improvements have been verified and initiated, new experiments can be devised
to see if settings outside the range tested can produce more improvements.
The simple array of our example can be expanded to handle any number of factors. There
will always be 2n test runs, where n = the number of factors. A three factor, full-factorial matrix
would be as shown in Table 8-5.
Table 8-5: Three Factor Full-Factorial Array

Factors (test settings) Interactions (by-products)
Run A B C A*B B*C A*C A*B*C Outcomes
1 - - - + + + -
2 + - - - + - +
3 - + - - - + +
4 + + - + - - -
5 - - + + - - +
6 + - + - - + -
7 - + + - + - -
8 + + + + + + +
8.1.1 Saturated Arrays: Economical, but Risky
In our numerical example (Table 8-4), the interaction of A and B had no effect. This is
often the case. When it is reasonably safe to assume there will be no interactions, a great
economy can be achieved by using saturated arrays. In these arrays, the interaction columns are
used to determine test settings for additional factors. For example, in the array of Table 8-2, a
factor "C" can be tested using the settings in the column representing the by-product A*B, as
shown in Table 8-6.
Table 8-6: Saturated Array (Table 8-2 Modified)

Factors
(test settings)
Run A B C Results
(replaces A*B) (outcomes)
1 - - +
2 + - -
3 - + -
4 + + +
Thus, three factors can be tested in the same number of tests used for two factors in a full-
factorial array. The matrix of Table 8-5 would permit the testing of seven factors in lieu of the
three it is designed for. The risk is, of course, that some interaction is significant and its effects
will be confounded with the new factor using the same settings. Saturated arrays are also called
Taguchi arrays after Genechi Taguchi, a leading proponent of their use. It is also possible to use
hybrid arrays in which some of the columns of expected interactions are not used for additional
factors, but other columns are. For example, triple interactions are quite rare, so the column in
Table 8-5 for A*B*C can often be commandeered for a new factor, providing the new factor is
not likely to interact with any of the others. The test will then show all the interactions of the
other factors and the effects of the new factor without any additional runs.
8.1.2 Testing for Robustness
We have considered only one outcome per experiment. It is also possible to use multiple
outcomes (dimensions, strength, defects) in order to observe the effects of the factors on all
important considerations. The best solution would then be the one that provided the best overall
results. Another variation is to perform the experiment under differing values of an
uncontrollable factor, such as atmospheric pressure. Desired "settings" of uncontrolled factors
can be obtained either by waiting for the factor to assume a desired test value or by using special
test equipment, such as environmental chambers, to simulate the factor. Again, the preferred
solution is a set of settings for the controllable factors which give the best overall results. For
example, Table 8-7 shows outcomes for three controlled and two uncontrolled variables. If the
desired result was the lowest output (it might represent defect rate, for example) and the
uncontrolled factors were equally likely to be at their low and high values, the settings for test
run 6 would be preferred, even though under some conditions other test settings produced better
outcomes.
Table 8-7: Testing for Robustness

Controlled Factors Settings Uncontrolled Factors "Settings"
Test A B C Uncontrolled D - + - +
Run Factors E - - + +
1 - - - 9.0 1.8 1.6 7.8
2 + - - 2.6 2.0 2.3 4.8
3 - + - 2.7 2.0 1.5 3.4
4 + + - 2.2 1.5 1.7 1.9
5 - - + 2.4 1.6 1.5 2.9
6 + - + 1.9 1.7 1.7 1.8
7 - + + 3.3 1.6 1.6 3.3
8 + + + 2.6 1.8 1.6 1.9
Finally, Taguchi uses outcomes expressed as signal to noise measures, which consider both
the mean and the variation in the output. Interested readers are invited to pursue these avenues
on their own. A comprehensive basic text on the subject is Understanding Industrial Designed
Experiments, by S.R. Schmidt and R.G. Launsby, Air Academy Press, Colorado Springs, CO,
1989.
8.2 Is There Really a Difference?
Recognizing the existence of variance in all data, one should always consider that a
difference in two measurements of a parameter, such as the measured outcome of an experiment,
might be due to chance and not to other factors. For example, consider the data shown in Table
8-8.
Table 8-8: Defect Data

Defects per 100 Units Produced
Day of the Week
Day Shift Night Shift
Monday 1 3
Tuesday 2 7
Wednesday 6 7
Thursday 4 6
Friday 7 2
Total 20 25
Average 4 5
There seems to be a difference between the defect rate of the day shift and that of the night
shift. However, there is a wide range in the data for each shift, and we should question the
validity of assuming there is really a difference between the shifts. To resolve the issue, we will
use a statistical technique from a family of techniques known as ANOVA.
ANOVA stands for Analysis of Variance, and is intended to provide means to separate the
influences of many different factors on a parameter of interest. The specific ANOVA application
which can determine the significance of the difference between two sets of data requires the
following assumptions:
1. The data points follow a normal distribution

2. The data points are independent from each other
3. Variability is about the same in each set of data
4. The data sets we are comparing have the same number of data points
The basic premise we shall use is that if the data sets were really different, there would be a
wider variation between the data sets than within the sets.
The variance within the data could be estimated from the variance in either data set, but
under our assumptions, we can use both data sets and calculate a quantity called the Mean Square
Error (MSE) from the formula:
k
2
∑ (n - 1) Si
MSE = 1 (8-2)
k(n - 1)
where:
k = number of data sets

n = number of measurements in each set
Si2 = Variance of one data set
and:
Si2 =∑
n ( y i - y i )2 (8-3)
1 n -1
where:
yi = value of one data point in a data set

yi = mean value of data in the set
The term k(n - 1) is called the degrees of freedom provided by the data. From the data in
Table 8-8, the degrees of freedom = 2(5 - 1) = 8.
The variance between groups, called the mean square error between groups (MSB) is
computed by:
∑ (y i - y )
n k 2
MSB = (8-4)
k -1 1
where:
y = mean of all data sets, all other terms as defined above

k - 1 = the associated degrees of freedom for MSB, = (2 - 1) = 1
If there is a real difference between data sets, the MSB should be greater than the MSE;
otherwise the ratio should be close to one. Under the assumptions listed, the ratio of MSB to
MSE follows an F distribution. Since this is so, we can use tables of the F distribution, such as
the one in Appendix F, to determine whether or not a ratio calculated from the measured data is
compatible with a ratio of 1.0 (i.e., the hypothesis that there is no real difference), within a
specified risk.
MSB
= F (8-5)
MSE
Using the data in Table 8-8,
5
MSB = [(4 - 4.5) 2 + (5 - 4.5) 2 ] = 2.5 (8-6)
2 -1
2
(5 - 1)Sday + (5 - 1)S 2night
MSE = (8-7)
2(5 - 1)
( )
5 2
∑ y i - y day
2
S day = 1 =
(1 - 4)2 + (2 - 4)2 + (6 - 4)2 + (4 − 4)2 + (7 - 4)2
5 -1 4
=
(- 3)2 + (- 2)2 + (2)2 + (0)2 + (3)2 =
9+4+4+0+9
=
26
= 6.5 (8-8)
4 4 4
( )
5 2
∑ y i - y night
S 2night = 1 =
(3 - 5)2 + (7 - 5)2 + (7 - 5)2 + (6 - 5)2 + (2 - 5)2
5 -1 4
=
(- 2)2 + (2)2 + (2)2 + (1)2 + (- 3)2 =
4 + 4 + 4 +1+ 9
=
22
= 5.5 (8-9)
4 4 4
Thus:
4(6.5) + 4(5.5) 26 + 22 48
MSE = = = = 6 (8-10)
8 8 8
and:
MSB 2.5
F = = = 0.417
MSE 6
Tables of the F distribution are organized by degrees of freedom and percentiles (see
Appendix F). The degrees of freedom are a function of the data, and the percentile is equivalent
to the risk we take (i.e., the probability of being wrong). As discussed many times earlier in this
text, it marks the border of an area of the distribution equal to our acceptable risk (i.e., the
probability of the statistic being in the area cut off is equal to our defined risk when the
hypothesis that there is no difference in the data is true). If we are willing to be wrong no more
than 5% of the time, we use the 0.05 percentile table. Excerpts from such a table are given in
Table 8-9. (To avoid extrapolation, we extracted Table 8-9 from a more extensive table than the
one in Appendix F, which does not have data points for eight degrees of freedom.)
Table 8-9: Critical Values for F at 0.05 Significance

Critical values of F for 0.05 risk
Degrees of Degrees of Freedom for MSB
Freedom for
MSE 1 2 3 4 5
1 161 200 216 225 230
2 18.5 19.0 19.2 19.2 19.3
4 7.71 6.94 6.59 6.39 6.26
8 5.32* 4.46 4.07 3.84 3.69
* Critical value for data used in the example
The table lists what are called critical values. If our calculated F statistic exceeds the value
in the table for the degrees of freedom provided by the data, we can reject the hypothesis that
there is no real difference with no more than a risk equal to the percentile (5% using the above
table). In this case, the table value for 1,8 degrees of freedom is 5.32 and our calculated F
statistic is 0.417. Hence, we must accept the hypothesis that there is no real difference between
the data sets.
This procedure can be used to test the significance of differences in outcomes of statistical
experiments to avoid making expensive process changes when the differences are caused by
statistical variation rather than the factors tested.
8.3 How Strong is the Correlation?
Often it is useful to know the relationship between two variables, such as the outcome of an
experiment and one of the factors. A simple way is to plot paired measurements on a scatter
diagram. For example, if we want to analyze the relationship between the defect rate of a wave
solder process and the solder temperature, we could measure defect rate at various temperatures
and plot the results on a chart with temperature as one axis and defect rate as the other. Figure
8-1 shows such a plot.
o o
o
DEFECT RATE .
o o
o
o
o o
o o
o o
o
o
SOLDER TEMPERATURE
Figure 8-1: Scattergram
In interpreting the scatter diagram, it is important to note that the slope of a line drawn
through the cloud of data points is an artifact of the scales of the axes. Hence, unlike most
charts, the slope of a scattergram is not an indicator of correlation. Rather, the width of the cloud
is the indicator. The narrower the cloud, the better the correlation.
Although an eyeball analysis may be sufficient for many uses, a quantitative evaluation of
correlation is often quite useful. This is easily, if somewhat tediously, accomplished using the
correlation coefficient.
∑ Dx Dy
r = (8-11)
(∑ Dx )(∑ Dy )
2 2
where:
Dx = the difference between a value of x and the mean of the values of x = ( x i - x)

Dy = the difference between a value of y and the mean of the values of y = ( y i - y)
Perfect correlation would be shown by a correlation coefficient equal to one. A negative

result shows an inverse correlation (as one factor increases, the other decreases) and a minus one
is a perfect inverse correlation. A figure of zero indicates no relationship between the variables.
To illustrate, let us use the following data shown in Table 8-10.
Table 8-10: Paired Data

Data Pair x (Temperature) y (Defect Rate)
1 250 0.030
2 255 0.025
3 260 0.040
4 265 0.030
5 270 0.020
6 275 0.010
7 280 0.035
8 285 0.020
9 290 0.010
10 295 0.015
The scattergram for this data is given in Figure 8-2.
.040
.030
Defect
Rate
.020
.010
250 255 260 270 280 290 295

Temperature
Figure 8-2: Scattergram of Data in Table 8-10
This appears to show some correlation, but the data cloud is wide, and the correlation may
or may not be significant. To resolve the issue, we will compute the correlation coefficient.
Computing the terms we need to solve for the correlation coefficient is made easier by
using a data table such as the following shown in Table 8-11.
Table 8-11: Data Analysis

Data Dx 2 Dy
x y Dx Dy2 DxDy
Pair (x - x ) (y - y )
1 250 0.030 22.5 506.25 -0.0065 0.00004225 -0.14625
2 255 0.025 17.5 306.25 -0.0015 0.00000225 -0.02625
3 260 0.040 12.5 156.25 -0.0165 0.00027255 -0.20625
4 265 0.030 7.5 56.25 -0.0065 0.00004225 -0.04875
5 270 0.020 2.5 6.25 +0.0035 0.00001225 +0.00875
6 275 0.010 -2.5 6.25 +0.0135 0.00018225 -0.03375
7 280 0.035 -7.5 56.25 -0.0115 0.00013225 +0.08625
8 285 0.020 -12.5 156.25 +0.0035 0.00001225 -0.04375
9 290 0.010 -17.5 306.25 +0.0135 0.00018225 -0.23625
10 295 0.015 -22.5 506.25 +0.0085 0.00007225 -0.19125
2 2
x y 0 ∑ Dx 0 ∑ Dy ∑ DxDy
272.5 0.0235 2062.5 0.0009528 -0.8375
Solving Equation 8-11:
- 0.8375
r = = - 0.5974 (8-12)
(2062.5)(0.0009528)
The result shows a fair negative correlation between temperature and defect rate. As one
goes down the other goes up. However, there is a lot of noise in the data and predicting one
factor from the other can be done only roughly.
Caveat: Note that correlation does not necessarily mean causation. In the example,
changes in defect rate may be caused by changes in temperature, but it is possible that both are
changing in response to some other factor. For example, the speed of the solder flow might be
the ultimate cause of both temperature changes and defects. Also, one can imagine scenarios
where the defect rate could cause temperature changes (suppose the operator reacted to defects by
adjusting the machine in a way that changes the operating temperature). The moral is that to
improve the process, isolation of the truly critical factors is required.
STAT Section 9: Closing Comments 83
9.0 CLOSING COMMENTS
While both statistics and reliability engineering encompass far more than this book
attempts to cover, the two disciplines intersect quite often in the problems encountered by the
reliability engineer. The Reliability Analysis Center hopes the reader has found this book a
relatively painless introduction to the world of statistics, and a useful reference for its practical
application in reliability engineering.
STAT Appendix A: Poisson Probabilities 85
Appendix A:
Poisson Probabilities
STAT Appendix A: Poisson Probabilities 87
Appendix A:
Poisson Probabilities
The probability of "x" events occurring, when "a" are expected.
Values Values of "a"

of "x" 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 0.9048 0.8187 0.7408 0.6703 0.6065 0.5488 0.4966 0.4493 0.4066
1 0.0905 0.1637 0.2222 0.2681 0.3033 0.3293 0.3476 0.3595 0.3659
2 0.0045 0.0164 0.0333 0.0536 0.0758 0.0988 0.1217 0.1438 0.1647
3 0.0002 0.0011 0.0033 0.0072 0.0126 0.0198 0.0284 0.0383 0.0494
4 0.0000 0.0001 0.0003 0.0007 0.0016 0.0030 0.0050 0.0077 0.0111
5 0.0000 0.0000 0.0000 0.0001 0.0002 0.0004 0.0006 0.0012 0.0020
6 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0003
Values of Values of "a"

"x" 1.0 2.0 3.0 4.0 5.0 10.0 15.0 20.0
0 0.3679 0.1353 0.0498 0.0183 0.0067 0.0000 0.0000 0.0000
1 0.3679 0.2707 0.1494 0.0733 0.0337 0.0005 0.0000 0.0000
2 0.1839 0.2707 0.2240 0.1465 0.0842 0.0023 0.0000 0.0000
3 0.0613 0.1804 0.2240 0.1954 0.1404 0.0076 0.0002 0.0000
4 0.0153 0.0902 0.1680 0.1954 0.1755 0.0189 0.0006 0.0000
5 0.0031 0.0361 0.1008 0.1563 0.1755 0.0378 0.0019 0.0001
6 0.0005 0.0120 0.0504 0.1042 0.1462 0.0631 0.0048 0.0002
7 0.0001 0.0034 0.0216 0.0595 0.1044 0.0901 0.0104 0.0005
8 0.0000 0.0009 0.0081 0.0298 0.0653 0.1126 0.0194 0.0013
9 0.0000 0.0002 0.0027 0.0132 0.0363 0.1251 0.0324 0.0029
10 0.0000 0.0001 0.0008 0.0053 0.0181 0.1251 0.0486 0.0058
11 0.0000 0.0002 0.0019 0.0082 0.1137 0.0663 0.0106
12 0.0000 0.0001 0.0006 0.0034 0.0948 0.0829 0.0176
13 0.0001 0.0002 0.0013 0.0729 0.0956 0.0271
14 0.0000 0.0001 0.0005 0.0521 0.1024 0.0387
15 0.0000 0.0000 0.0002 0.0347 0.1024 0.0516
16 0.0000 0.0001 0.0217 0.0960 0.0646
17 0.0000 0.0000 0.0128 0.0847 0.0760
18 0.0000 0.0071 0.0706 0.0844
19 0.0037 0.0557 0.0888
20 0.0019 0.0418 0.0888
21 0.0009 0.0299 0.0846
22 0.0004 0.0204 0.0769
23 0.0001 0.0133 0.0669
24 0.0001 0.0083 0.0557
25 0.0000 0.0050 0.0446
26 0.0000 0.0029 0.0343
27 0.0000 0.0016 0.0254
28 0.0009 0.0181
29 0.0004 0.0125
30 0.0003 0.0083
31 0.0001 0.0054
32 0.0001 0.0034
33 0.0000 0.0020
34 0.0000 0.0012
35 0.0000 0.0007
36 0.0004
37 0.0003
38 0.0002
39 0.0001
STAT Appendix B: Cumulative Poisson Probabilities 89
Appendix B:
Cumulative Poisson Probabilities
STAT Appendix B: Cumulative Poisson Probabilities 91
Appendix B:
Cumulative Poisson Probabilities
The probability of "x or less" events occurring, when "a" are expected.
Values Values of "a"

of "x" 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 0.9084 0.8187 0.7408 0.6703 0.6065 0.5488 0.4966 0.4493 0.4066
1 0.9953 0.9824 0.9630 0.9384 0.9098 0.8781 0.8442 0.8088 0.7725
2 0.9998 0.9988 0.9963 0.9920 0.9856 0.9769 0.9659 0.9526 0.9372
3 1.0000 0.9999 0.9966 0.9992 0.9982 0.9967 0.9943 0.9909 0.9866
4 1.0000 1.0000 1.0000 0.9999 0.9998 0.9997 0.9993 0.9986 0.9977
5 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 0.9990
6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Values of Values of "a"

"x" 1.0 2.0 3.0 4.0 5.0 10.0 15.0 20.0
0 0.3679 0.1353 0.0498 0.0183 0.0067 0.0000 0.0000 0.0000
1 0.7358 0.4060 0.1992 0.0916 0.0404 0.0005 0.0000 0.0000
2 0.9197 0.6767 0.4232 0.2381 0.1246 0.0028 0.0000 0.0000
3 0.9810 0.8571 0.6472 0.4335 0.2650 0.0104 0.0002 0.0000
4 0.9963 0.9473 0.8152 0.6289 0.4405 0.0293 0.0008 0.0000
5 0.9994 0.9834 0.9160 0.7852 0.6160 0.0671 0.0027 0.0001
6 0.9999 0.9954 0.9664 0.8894 0.7622 0.1302 0.0075 0.0003
7 1.000 0.9988 0.9880 0.9489 0.8666 0.2203 0.0179 0.0008
8 1.000 0.9997 0.9961 0.9787 0.9319 0.3329 0.0373 0.0021
9 1.000 0.9999 0.9988 0.9919 0.9682 0.4580 0.0697 0.0050
10 1.0000 0.9996 0.9972 0.9863 0.5831 0.1183 0.0108
11 1.0000 0.9998 0.9991 0.9945 0.6968 0.1846 0.0214
12 1.0000 0.9999 0.9997 0.9979 0.7916 0.2675 0.0390
13 1.000 0.9999 0.9992 0.8645 0.3631 0.0661
14 1.000 1.000 0.9997 0.9166 0.4655 0.1048
15 1.000 1.000 0.9999 0.9513 0.5679 0.1564
16 1.000 1.000 0.9730 0.6639 0.2210
17 0.9858 0.7486 0.2970
18 0.9929 0.8192 0.3814
19 0.9966 0.8749 0.4702
20 0.9985 0.9167 0.5590
21 0.9994 0.9466 0.6436
22 0.9998 0.9670 0.7205
23 0.9999 0.9803 0.7874
24 1.000 0.9886 0.8431
25 0.9936 0.8877
26 0.9965 0.9220
27 0.9981 0.9474
28 0.9990 0.9655
29 0.9994 0.9780
30 0.9997 0.9863
31 0.9999 0.9917
32 1.0000 0.9951
33 1.0000 0.9971
34 1.0000 0.9983
35 0.9990
36 0.9994
37 0.9997
38 0.9999
39 1.000
STAT Appendix C: The Standard Normal Distribution 93
Appendix C:
The Standard Normal Distribution
STAT Appendix C: The Standard Normal Distribution 95
Appendix C:
The Standard Normal Distribution
0 z
Figures are the areas under the curve between 0 (the mean value of Z) and Z. Note that the
same figures apply to the areas from 0 to -Z.
The area from -Z to Z is simply twice the values shown.
The areas in the tails from Z to plus infinity are one (the total area under the curve) minus
0.5 (the area from minus infinity to 0) minus the figures shown in the table (the areas between 0
and Z). The same formula computes the area under the tail of the curve between -Z and minus
infinity.
To find the area in both tails outside of the range -Z to Z, multiply the figures given in the
table (the areas from 0 to Z) by two (yielding the areas from -Z to Z), and subtract these from one
(the total area under the curve).
Z Area 0 to Z Z Area 0 to Z
0.1 0.0398 2.0 0.4772
0.2 0.0793 2.1 0.4821
0.3 0.1179 2.2 0.4861
0.4 0.1554 2.3 0.4893
0.5 0.1915 2.4 0.4918
0.6 0.2257 2.5 0.4938
0.7 0.2580 2.6 0.4953
0.8 0.2881 2.7 0.4965
0.9 0.3159 2.8 0.4974
1.0 0.3413 2.9 0.4981
1.1 0.3643 3.0 0.4987
1.2 0.3849 3.1 0.4990
1.3 0.4032 3.2 0.4993
1.4 0.9192 3.3 0.4995
1.5 0.4332 3.4 0.4997
1.6 0.4452
1.7 0.4554
1.8 0.4641
1.9 0.4613 ∞ 0.5000
STAT Appendix D: The Chi-Square Distribution 97
Appendix D:
The Chi-Square Distribution
STAT Appendix D: The Chi-Square Distribution 99
Appendix D:
The Chi-Square Distribution
Degrees of Chi-square value when the area under the curve from the value to infinity is:
Freedom 0.99 0.98 0.95 0.90 0.80 0.70 0.50
1 0.000157 0.000628 0.00393 0.0158 0.0642 0.148 0.455
2 0.0201 0.0404 0.103 0.211 0.446 0.713 1.386
3 0.115 0.185 0.352 0.584 1.005 1.424 2.366
4 0.297 0.429 0.711 1.064 1.649 2.195 3.357
5 0.554 0.752 1.145 1.610 2.343 3.000 4.351
6 0.872 1.134 1.635 2.204 3.070 3.828 5.348
7 1.239 1.564 2.167 2.833 3.822 4.671 6.346
8 1.646 2.032 2.733 3.490 4.594 5.527 7.344
9 2.088 2.532 3.325 4.168 5.380 6.393 8.343
10 2.558 3.059 3.940 4.865 6.179 7.267 9.342
11 3.053 3.609 4.575 5.578 6.989 8.148 10.341
12 3.571 4.178 5.226 6.304 7.807 9.034 11.340
13 4.107 4.765 5.892 7.042 8.634 9.926 12.340
14 4.660 5.368 6.571 7.790 9.467 10.821 13.339
15 5.229 5.985 7.261 8.547 10.307 11.721 14.339
16 5.812 6.614 7.962 9.312 11.152 12.624 15.338
17 6.408 7.255 8.672 10.085 12.002 13.531 16.338
18 7.015 7.906 9.390 10.865 12.857 14.440 17.338
19 7.633 8.567 10.117 11.651 13.716 15.352 18.338
20 8.260 9.237 10.851 12.443 14.578 16.266 19.337
21 8.897 9.915 11.591 13.240 15.445 17.182 20.337
22 9.542 10.600 12.338 14.041 16.314 18.101 21.337
23 10.196 11.293 13.091 14.848 17.187 19.021 22.337
24 10.856 11.992 13.848 15.659 18.062 19.943 23.337
25 11.524 12.697 14.611 16.473 18.940 20.867 24.337
26 12.198 13.409 15.379 17.292 19.820 21.792 25.336
27 12.879 14.125 16.151 18.114 20.703 22.719 26.336
28 13.565 14.847 16.928 18.939 21.588 23.647 27.336
29 14.256 15.574 17.708 19.768 22.475 24.577 28.336
30 14.953 16.306 18.493 20.599 23.364 25.508 29.336
Degrees of Chi-square value when the area under the curve from the value to infinity is:
Freedom 0.30 0.20 0.10 0.05 0.02 0.01
1 1.074 1.642 2.706 3.841 5.412 6.635
2 2.408 3.219 4.605 5.991 7.824 9.210
3 3.665 4.642 6.251 7.815 9.837 11.341
4 4.878 5.989 7.779 9.488 11.668 13.277
5 6.064 7.289 9.236 11.070 13.388 15.086
6 7.231 8.558 10.645 12.592 15.033 16.812
7 8.383 9.803 12.017 14.067 16.622 18.475
8 9.524 11.030 13.362 15.507 18.168 20.090
9 10.656 12.242 14.684 16.919 19.679 21.666
10 11.781 13.442 15.987 18.307 21.161 23.209
11 12.899 14.631 17.275 19.675 22.618 24.725
12 14.011 15.812 18.549 21.026 24.054 26.217
13 15.119 16.985 19.812 22.362 25.472 27.688
14 16.222 18.151 21.064 23.685 26.873 29.141
15 17.322 19.311 22.307 24.996 28.259 30.578
16 18.418 20.465 23.542 26.296 29.633 32.000
17 19.511 21.615 24.769 27.587 30.995 33.409
18 20.601 22.760 25.989 28.869 32.346 34.805
19 21.689 23.900 27.204 30.144 33.687 36.191
20 22.775 25.038 28.412 31.410 35.020 37.566
21 23.858 26.171 29.615 32.671 36.343 39.932
22 24.939 27.301 30.813 33.924 37.659 40.289
23 26.108 28.429 32.007 35.172 38.968 41.638
24 27.096 29.553 33.196 36.415 40.270 42.980
25 28.172 30.675 34.382 37.652 41.566 44.314
26 29.246 31.795 35.563 38.885 42.856 45.642
27 30.319 32.912 36.741 40.113 44.140 46.963
28 31.391 34.027 37.916 41.337 45.419 48.278
29 32.461 35.139 39.087 42.557 46.693 49.588
30 33.530 36.250 40.256 43.773 47.962 50.892
STAT Appendix E: The Student t Distribution 101
Appendix E:
The Student t Distribution
STAT Appendix E: The Student t Distribution 103
Appendix E:
The Student t Distribution
Degrees of Freedom = Sample Size - 1
Degrees of Value of t when area from - ∞ to t is:

Freedom 0.75 0.90 0.95 0.975 0.99 0.995 0.9995
1 1.000 3.078 6.314 12.706 31.821 63.657 636.619
2 0.816 1.886 2.920 4.303 6.965 9.925 31.599
3 0.765 1.638 2.353 3.182 4.541 5.841 12.924
4 0.741 1.533 2.132 2.776 3.747 4.604 8.610
5 0.727 1.476 2.015 2.571 3.365 4.032 6.869
6 0.718 1.440 1.943 2.447 3.143 3.707 5.959
7 0.711 1.415 1.895 2.365 2.998 3.499 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355 5.041
9 0.703 1.383 1.833 2.262 2.821 3.250 4.781
10 0.700 1.372 1.812 2.228 2.764 3.169 4.587
11 0.697 1.363 1.796 2.201 2.718 3.106 4.437
12 0.695 1.356 1.782 2.179 2.681 3.055 4.318
13 0.694 1.350 1.771 2.160 2.650 3.012 4.221
14 0.692 1.345 1.761 2.145 2.624 2.977 4.140
15 0.691 1.341 1.753 2.131 2.602 2.947 4.073
16 0.690 1.337 1.746 2.120 2.583 2.921 4.015
17 0.689 1.333 1.740 2.110 2.567 2.898 3.965
18 0.688 1.330 1.734 2.101 2.552 2.878 3.922
19 0.688 1.328 1.729 2.093 2.539 2.861 3.883
20 0.687 1.325 1.725 2.086 2.528 2.845 3.850
21 0.686 1.323 1.721 2.080 2.518 2.831 3.819
22 0.686 1.321 1.717 2.074 2.508 2.819 3.792
23 0.685 1.319 1.714 2.069 2.500 2.807 3.768
24 0.685 1.318 1.711 2.064 2.492 2.797 3.745
25 0.684 1.316 1.708 2.060 2.485 2.787 3.725
26 0.684 1.315 1.706 2.056 2.479 2.779 3.707
27 0.684 1.314 1.703 2.052 2.473 2.771 3.690
28 0.683 1.313 1.701 2.048 2.467 2.763 3.674
29 0.683 1.311 1.699 2.045 2.462 2.756 3.659
30 0.683 1.310 1.697 2.042 2.457 2.750 3.646
40 0.681 1.310 1.684 2.021 2.423 2.704 3.551
60 0.679 1.296 1.671 2.000 2.390 2.660 3.460
120 0.677 1.289 1.658 1.980 2.358 2.617 3.373
Inf. 0.674 1.282 1.645 1.960 2.326 2.576 3.291
STAT Appendix F: Critical Values of the F Distribution for Tests of Significance 105
Appendix F:
Critical Values of the F Distribution for Tests of Significance
STAT Appendix F: Critical Values of the F Distribution for Tests of Significance 107
Appendix F:
Critical Values of the F Distribution for Tests of Significance
for 1% risk
d.o.f. Degrees of Freedom for MSB
MSE 1 2 3 4 5 10 20 40 inf.
1 4052 5000 5403 5625 5764 6056 6210 6290 6370
2 98.5 99.0 99.2 99.2 99.3 99.4 99.4 99.5 99.5
3 34.1 30.8 29.5 28.7 28.2 27.2 26.7 26.4 26.1
4 21.2 18.0 16.7 16.0 15.5 14.5 14.0 13.7 13.5
5 16.3 13.3 12.1 11.4 11.0 10.1 9.55 9.29 9.02
10 10.0 7.56 6.55 5.99 5.64 4.85 4.41 4.17 3.91
20 8.10 5.85 4.94 4.43 4.10 3.37 2.94 2.69 2.42
40 7.31 5.18 4.31 3.83 3.51 2.80 2.37 2.11 1.80
inf. 6.63 4.61 3.78 3.32 3.02 2.32 1.88 1.59 1.00
for 5% risk
MSE 1 2 3 4 5 10 20 40 inf.
1 161 200 216 225 230 242 248 251 254
2 18.5 19.0 19.2 19.2 19.3 19.4 19.4 19.5 19.5
3 10.1 9.55 9.28 9.12 9.01 8.79 8.66 8.59 8.53
4 7.71 6.94 6.59 6.39 6.26 5.96 5.80 5.72 5.63
5 6.61 5.79 5.41 5.19 5.05 4.74 4.56 4.46 4.36
10 4.96 4.10 3.71 3.48 3.33 2.98 2.77 2.66 2.54
20 4.35 3.49 3.10 2.87 2.71 2.35 2.12 1.99 1.84
40 4.08 3.23 2.84 2.61 2.45 2.08 1.84 1.69 1.51
inf. 3.84 3.00 2.60 2.37 2.21 1.83 1.57 1.39 1.00
for 10% risk

MSE 1 2 3 4 5 10 20 40 inf.
1 39.9 49.5 53.6 55.8 57.2 60.2 61.7 62.5 63.3
2 8.53 9.00 9.16 9.24 9.29 9.39 9.44 9.47 9.49
3 5.54 5.46 5.39 5.34 5.31 5.23 5.18 5.16 5.13
4 4.54 4.32 4.19 4.11 4.05 3.92 3.84 3.80 3.76
5 4.06 3.78 3.62 3.52 3.45 3.30 3.21 3.16 3.10
10 3.28 2.92 2.73 2.61 2.52 2.32 2.20 2.13 2.06
20 2.97 2.59 2.38 2.25 2.16 1.94 1.79 1.71 1.61
40 2.84 2.44 2.23 2.09 2.00 1.76 1.61 1.51 1.38
inf. 2.71 2.30 2.08 1.94 1.85 1.60 1.42 1.30 1.00

Practical Statistical Tools For The Reliability Engineer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Statistical Tools For The Reliability Engineer

Uploaded by

Copyright:

Available Formats

Practical Statistical

Reliability Analysis Center

RAC is a DoD Information Analysis Center sponsored by the Defense Technical

Practical Statistical Tools for the

Reliability Analysis Center

Practical Statistical Tools for the Reliability Engineer

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION

12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

13. ABSTRACT (Maximum 200 words)

14. SUBJECT TERMS 15. NUMBER OF PAGES

NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)

REQUESTS FOR TECHNICAL ASSISTANCE ALL OTHER REQUESTS SHOULD BE

Reliability Analysis Center Air Force Research Laboratory

© 1999, IIT Research Institute

This page intentionally left blank.

TABLE OF CONTENTS (CONT’D)

It is important to remember that reliability engineering is not just the application of

This page intentionally left blank.

1.0 WHAT YOU NEED TO KNOW ABOUT PROBABILITY

Statistical inferences will often be expressed as probabilities. Probability can be defined as

In dealing with probabilities, some useful relationships can be applied, depending on

1.1 When Events are Independent

P(a and b) = P(a)P(b) (1-1)

P = (0.01)(0.02) = 0.0002 (1-2)

1.2 When Events are Mutually Exclusive

P(a or b) = P(a) + P(b) (1-3)

P(s) = P(g)P(r) + P(g)Q(r) + Q(g)P(r) (1-5)

P(s) = (0.99)(0.98) + (0.99)(0.02) + (0.01)(0.98) = 0.9702 + 0.0198 + 0.0098 = 0.9998 (1-6)

P(s) = 1.0 - Q(g)Q(r) = 1.0 - 0.0002 = 0.9998 (1-7)

P(s) = P(g) + P(r) - P(g)P(r) (1-8)

where all terms are as defined for Equation 1-5.

P(g) = P(g)P(r) + P(g)Q(r) (1-9)

P(r) = P(r)P(g) + P(r)Q(g) = P(g)P(r) + Q(g)P(r) (1-10)

Substituting Equations 1-9 and 1-10 into Equation 1-8 we get:

P(s) = P(g)P(r) + P(g)Q(r) + P(g)P(r) + Q(g)P(r) - P(g)P(r) (1-11)

which is the same result as Equation 1-5.

1.3 When Events are Not Independent

P(a and b) = P(a)P(b|a) (1-12)

P(a and b) = P(b)P(a|b) (1-13)

P(b|a) = 0.10(1) + 0.90(0.019) (1-14)

= 0.001 [0.1 + 0.017] = 0.0001 + 0.00017 = 0.00027 (1-15)

1.3.1 Bayes’ Theorem

Table 1-1: Known Data

Table 1-2: Converted Data

Since P(a and b) = P(a|b) P(b) = P(b|a) P(a), it follows that:

P(ai) P(b) = P(ai|b) P(b) = P(b|ai) P(ai) (1-16)

P(ai|b) = P(b|ai) P(ai)/P(b) (1-17)

P(a1|b) = P(b|a1) P(a1)/P(b) (1-18)

P(b) = Σ P(b|ai) P(ai) (1-19)

Substituting Equation 1-19 into Equation 1-18:

Substituting the data from Table 1-2 into Equation 1-20:

Table 1-3 summarizes the material presented in this section.

Table 1-3: Summary of Section 1

2.0 INTRODUCTION TO STATISTICS

"Statistics" as a mathematical discipline is concerned with describing something in useful

2.1 Many Ways to be "Average"

Table 2-1: Salary Data

2.2 Ways to Measure Spread

Table 2-2: Spread Analysis

2.3 Introduction to Distributions

Table 2-3: Experimental Data

Table 2-4: Comparison of Results

2.4 Testing Hypotheses