Professional Documents
Culture Documents
A Brief Introduction To Design of Experiments: Jacqueline K. Telford
A Brief Introduction To Design of Experiments: Jacqueline K. Telford
TELFORD
D
A Brief Introduction to Design of Experiments
Jacqueline K. Telford
BACKGROUND
Would you like to be sure that you will be able to the individual runs in the experiment are to be con-
draw valid and definitive conclusions from your data ducted. This multivariable testing method varies the
with the minimum use of resources? If so, you should factors simultaneously. Because the factors are varied
be using design of experiments. Design of experiments, independently of each other, a causal predictive model
also called experimental design, is a structured and orga- can be determined. Data obtained from observational
nized way of conducting and analyzing controlled tests studies or other data not collected in accordance with a
to evaluate the factors that are affecting a response vari- design of experiments approach can only establish cor-
able. The design of experiments specifies the particular relation, not causality. There are also problems with the
setting levels of the combinations of factors at which traditional experimental method of changing one factor
224 Johns Hopkins APL Technical Digest, Volume 27, Number 3 (2007)
A BRIEF INTRODUCTION TO DESIGN OF EXPERIMENTS
at a time, i.e., its inefficiency and its inability to deter- An example of a bias is instrument drift in an experiment
mine effects that are caused by several factors acting in comparing a baseline procedure to a new procedure. If
combination. all the tests using the baseline procedure are conducted
first and then all the tests using the new procedure are
BRIEF HISTORY conducted, the observed difference between the proce-
dures might be entirely due to instrument drift. To guard
Design of experiments was invented by Ronald A.
against erroneous conclusions, the testing sequence of
Fisher in the 1920s and 1930s at Rothamsted Experi-
the baseline and new procedures should be in random
mental Station, an agricultural research station 25
order such as B, N, N, B, N, B, and so on. The instru-
miles north of London. In Fishers first book on design
ment drift or any unknown bias should average out.
of experiments1 he showed how valid conclusions could
Replication increases the sample size and is a method
be drawn efficiently from experiments with natural fluc-
for increasing the precision of the experiment. Replica-
tuations such as temperature, soil conditions, and rain
tion increases the signal-to-noise ratio when the noise
fall, that is, in the presence of nuisance variables. The
originates from uncontrollable nuisance variables. A
known nuisance variables usually cause systematic biases
replicate is a complete repetition of the same experi-
in groups of results (e.g., batch-to-batch variation). The
mental conditions, beginning with the initial setup. A
unknown nuisance variables usually cause random vari-
special design called a Split Plot can be used if some of
ability in the results and are called inherent variability
the factors are hard to vary.
or noise. Although the experimental design method
Blocking is a method for increasing precision by
was first used in an agricultural context, the method
removing the effect of known nuisance factors. An
has been applied successfully in the military and in
example of a known nuisance factor is batch-to-batch
industry since the 1940s. Besse Day, working at the U.S.
variability. In a blocked design, both the baseline and
Naval Experimentation Laboratory, used experimental
new procedures are applied to samples of material from
design to solve problems such as finding the cause of bad
one batch, then to samples from another batch, and
welds at a naval shipyard during World War II. George
so on. The difference between the new and baseline
Box, employed by Imperial Chemical Industries before
procedures is not influenced by the batch-to-batch
coming to the United States, is a leading developer of
differences. Blocking is a restriction of complete ran-
experimental design procedures for optimizing chemical
domization, since both procedures are always applied
processes. W. Edwards Deming taught statistical meth-
to each batch. Blocking increases precision since the
ods, including experimental design, to Japanese scien-
batch-to-batch variability is removed from the experi-
tists and engineers in the early 1950s2 at a time when
mental error.
Made in Japan meant poor quality. Genichi Taguchi,
Orthogonality in an experiment results in the factor
the most well known of this group of Japanese scien-
effects being uncorrelated and therefore more easily
tists, is famous for his quality improvement methods.
interpreted. The factors in an orthogonal experiment
One of the companies where Taguchi first applied his
design are varied independently of each other. The
methods was Toyota. Since the late 1970s, U.S. indus-
main results of data collected using this design can often
try has become interested again in quality improvement
be summarized by taking differences of averages and can
initiatives, now known as Total Quality and Six
be shown graphically by using simple plots of suitably
Sigma programs. Design of experiments is considered
chosen sets of averages. In these days of powerful com-
an advanced method in the Six Sigma programs, which
puters and software, orthogonality is no longer a neces-
were pioneered at Motorola and GE.
sity, but it is still a desirable property because of the ease
of explaining results.
FUNDAMENTAL PRINCIPLES Factorial experimentation is a method in which the
The fundamental principles in design of experi- effects due to each factor and to combinations of fac-
ments are solutions to the problems in experimentation tors are estimated. Factorial designs are geometrically
posed by the two types of nuisance factors and serve constructed and vary all the factors simultaneously and
to improve the efficiency of experiments. Those funda- orthogonally. Factorial designs collect data at the ver-
mental principles are tices of a cube in p-dimensions (p is the number of fac-
tors being studied). If data are collected from all of the
Randomization
vertices, the design is a full factorial, requiring 2p runs.
Replication
Since the total number of combinations increases expo-
Blocking
nentially with the number of factors studied, fractions
Orthogonality
of the full factorial design can be constructed. As the
Factorial experimentation
number of factors increases, the fractions become smaller
Randomization is a method that protects against an and smaller (1/2, 1/4, 1/8, 1/16, ). Fractional factorial
unknown bias distorting the results of the experiment. designs collect data from a specific subset of all possible
vertices and require 2p2q runs, with 22q being the frac-
tional size of the design. If there are only three factors in
the experiment, the geometry of the experimental design
for a full factorial experiment requires eight runs, and a
one-half fractional factorial experiment (an inscribed
tetrahedron) requires four runs (Fig. 1).
Factorial designs, including fractional factorials, have
increased precision over other types of designs because
they have built-in internal replication. Factor effects are
essentially the difference between the average of all runs
at the two levels for a factor, such as high and low.
Replicates of the same points are not needed in a facto- Figure 1. Full factorial and one-half factorial in three
rial design, which seems like a violation of the replication dimensions.
principle in design of experiments. However, half of all
the data points are taken at the high level and the other
half are taken at the low level of each factor, resulting
in a very large number of replicates. Replication is also screening experiment usually involves only two levels of
provided by the factors included in the design that turn each factor and can also be called characterization test-
out to have nonsignificant effects. Because each factor ing or sensitivity analysis.
is varied with respect to all of the factors, information A process is out of statistical control when either
on all factors is collected by each run. In fact, every data the mean or the variability is outside its specifications.
point is used in the analysis many times as well as in the When this happens, the cause must be found and cor-
estimation of every effect and interaction. Additional rected. The cause is found efficiently using an experi-
efficiency of the two-level factorial design comes from mental design similar to the screening design, except
the fact that it spans the factor space, that is, puts half of that the number of levels for the factors need not be two
the design points at each end of the range, which is the for all the factors.
most powerful way of determining whether a factor has Optimizing a process involves determining the shape
a significant effect. of the response variable. Usually a screening design is
performed first to find the relatively few important fac-
tors. A response surface design has several (usually three
USES or four) levels on each of the factors. This produces a
The main uses of design of experiments are more detailed picture of the surface, especially provid-
ing information on which factors have curvature and on
Discovering interactions among factors areas in the response where peaks and plateaus occur.
Screening many factors The EVOP method is an optimization procedure used
Establishing and maintaining quality control when only small changes in the factors can be tolerated
Optimizing a process, including evolutionary opera- in order for normal operations to continue. Examples of
tions (EVOP) EVOP are optimizing the cracking process on crude oil
Designing robust products while still running the oil refinery or tuning the welding
Interaction occurs when the effect on the response power of a welding robot in a car manufacturing assem-
of a change in the level of one factor from low to high bly line.
depends on the level of another factor. In other words, Product robustness, pioneered by Taguchi, uses
when an interaction is present between two factors, the experimental design to study the response surfaces asso-
combined effect of those two factors on the response ciated with both the product means and variances to
variable cannot be predicted from the separate effects. choose appropriate factor settings so that variance and
The effect of two factors acting in combination can bias are both small simultaneously. Designing a robust
either be greater (synergy) or less (interference) than product means learning how to make the response vari-
would be expected from each factor separately. able insensitive to uncontrollable manufacturing process
Frequently there is a need to evaluate a process variability or to the use conditions of the product by the
with many input variables and with measured output customer.
variables. This process could be a complex computer
simulation model or a manufacturing process with raw
materials, temperature, and pressure as the inputs. A MATHEMATICAL FORMULATION AND TERMINOLOGY
screening experiment tells us which input variables (fac- The input variables on the experiment are called
tors) are causing the majority of the variability in the factors. The performance measures resulting from the
output (responses), i.e., which factors are the drivers. A experiment are called responses. Polynomial equations
226 Johns Hopkins APL Technical Digest, Volume 27, Number 3 (2007)
A BRIEF INTRODUCTION TO DESIGN OF EXPERIMENTS
are Taylor series approximations to the unknown true functional form of the IMPLEMENTATION
response variable. An often quoted insight of George Box is, All models The main steps to implement
are wrong. Some are useful.3 The trick is to have the simplest model that an experimental design are as fol-
captures the main features of the data or process. The polynomial equation, lows. Note that the subject matter
shown to the third order in Eq. 1, used to model the response variable Y as a experts are the main contributors to
function of the input factors Xs is the most important steps, i.e., 14,
p p p p p p 10, and 12.
Y = 0 + i Xi + ij Xi X j + ijk Xi X j Xk + L, (1) 1. State the objective of the
i =1 i =1j =1 i =1 j =1 k =1
ij i jk study and the hypotheses to be
tested.
where
2. Determine the response vari-
0 = the overall mean response, able(s) of interest that can be
i = the main effect for factor (i = 1, 2, ... , p), measured.
3. Determine the controllable fac-
ij = the two-way interaction between the ith and jth factors, and
tors of interest that might affect
ijk = the three-way interaction between the ith, jth, and kth factors. the response variables and the
levels of each factor to be used
Usually, two values (called levels) of the Xs are used in the experiment for
in the experiment. It is better
each factor, denoted by high and low and coded as 11 and 21, respectively.
to include more factors in the
A general recommendation for setting the factor ranges is to set the levels
design than to exclude factors,
far enough apart so that one would expect to see a difference in the response
that is, prejudging them to be
but not so far apart as to be out of the likely operating range. The use of only
nonsignificant.
two levels seems to imply that the effects must be linear, but the assumption
4. Determine the uncontrollable
of monotonicity (or nearly so) on the response variable is sufficient. At least
variables that might affect the
three levels of the factors would be required to detect curvature.
response variables, blocking
Interaction is present when the effect of a factor on the response variable
the known nuisance variables
depends on the setting level of another factor. Graphically, this can be seen
and randomizing the runs to
as two nonparallel lines when plotting the averages from the four combina-
protect against unknown nui-
tions of high and low levels of the two factors. The ij terms in Eq. 1 account
sance variables.
for the two-way interactions. Two-way interactions can be thought of as the
5. Determine the total number
corrections to a model of simple additivity of the factor effects, the model
of runs in the experiment, ide-
with only the i terms in Eq. 1. The use of the simple additive model assumes
ally using estimates of variabil-
that the factors act separately and independently on the response variable,
ity, precision required, size of
which is not a very reasonable assumption.
effects expected, etc., but more
Experimental designs can be categorized by their resolution level. A
likely based on available time
design with a higher resolution level can fit higher-order terms in Eq. 1 than
and resources. Reserve some
a design with a lower resolution level. If a high enough resolution level design
resources for unforeseen con-
is not used, only the linear combination of several terms can be estimated,
tingencies and follow-up runs.
not the terms separately. The word resolution was borrowed from the term
Some practitioners recommend
used in optics. Resolution levels are usually denoted by Roman numerals,
using only 25% of the resources
with III, IV, and V being the most commonly used. To resolve all of the
in the first experiment.
two-way interactions, the resolution level must be at least V. Four resolution
6. Design the experiment, remem-
levels and their meanings are given in Table 1.
bering to randomize the runs.
7. Perform a pro forma analy-
Table 1. Resolution levels and their meanings. sis with response variables as
Resolution level Meaning random variables to check
II Main effects are linearly combined with each other (bi1bj). for estimability of the factor
III Main effects are linearly combined with two-way interactions effects and precision of the
(bi1bjk). experiment.
IV Main effects are linearly combined with three-way interac- 8. Perform the experiment strictly
tions (bi1bjkl) and two-way interactions with each other according to the experimen-
(bij1bkl). tal design, including the initial
V Main effects and two-way interactions are not linearly com- setup for each run in a physical
bined except with higher-order interactions (bi1bjklm and experiment. Do not swap the run
bij1bklm). order to make the job easier.
228 Johns Hopkins APL Technical Digest, Volume 27, Number 3 (2007)
A BRIEF INTRODUCTION TO DESIGN OF EXPERIMENTS
decomposition of the engagement process for each defen- more runs were warranted in the initial experiment to
sive weapon system, that is, a radar must detect, track, eliminate the number of additional experiments needed
discriminate, and assess the success of intercept attempts to disentangle all the two-way interactions. For the SWA
and the accuracy, reliability, and timeline factors associ- screening study, we conducted 4096 EADSIM runs to
ated with each of those functions. find the 47 main factors and all 1081 two-way interac-
A fractional factorial experimental design and tions for the 47 factors. This was a Resolution V design.
EADSIM were used to screen the 47 factors above for their An added benefit of conducting more experiments is
relative importance in far-term Northeast Asia (NEA) that SWA error estimates are approximately one-third
and Southwest Asia (SWA) scenarios over the first 10 the size of NEA error estimates, i.e., the relative impor-
days of a war. A three-tiered defense system was employed tance of the performance drivers can be identified with
for both scenarios, including an airborne laser (ABL), a higher certainty in SWA compared to NEA, which can
ground-based (GB) upper tier, and a lower tier comprising be seen in Fig. 2. Note that only a very small fraction of
both ground-based and sea-based (SB) systems. the total number of possible combinations was run, 1 in
We initially conducted 512 EADSIM runs to screen 275 billion since it is a 247238 fractional factorial, even
the sensitivities of the 47 factors in the NEA scenario. for the Resolution V design.
This is a Resolution IV design and resolves all of the 47 Figure 2 illustrates the main factor sensitivities to
main factors but cannot identify which of the 1081 pos- the 47 factors for both the NEA and SWA scenarios,
sible two-way interactions are significant. labeled F1 to F47. The colored dots represent the change
After analyzing results from the initial 512 runs, 17 of protection effectiveness to each factor, and the error
additional, separate experimental designs were needed bars are 95% confidence bounds. The y-axis is the differ-
(for a total of 352 additional EADSIM runs) to identify ence in the average protection effectiveness for a factor
the significant two-way interactions for protection effec- between the good and bad values. Factors are deter-
tiveness. We learned from the NEA screening study that mined to be performance drivers if the 95% confidence
230 Johns Hopkins APL Technical Digest, Volume 27, Number 3 (2007)
A BRIEF INTRODUCTION TO DESIGN OF EXPERIMENTS
Factors Runs
1 3
2 9 = 32
3 27 = 33
45 81 = 34
611 243 = 35
1214 729 = 36
Figure 3. Protection effectiveness: two-way interaction between 1521 2187 = 37
Factors 6 and 9 from the screening experiment.
2232 6561 = 38
The minimum number of runs needed for a three- The full study comprised not only an examination
level Resolution V design as a function of the number of of two theaters (NEA and SWA) but also four force
factors is shown in Table 3. From the two-level screen- levels in each theater. All of the analyses shown previ-
ing designs, 11 main effects were statistically significant ously were conducted at Force Level 4, which is com-
and have at least a 1% effect on protection effectiveness. parable to a Desert Stormlevel of logistics support
Table 3 shows that for 11 factors, a minimum number of before the operation. Force Level 1 is a rapid response
243 runs are needed. Notice that 36 factors out of the with no prior warning and limited weapons available.
original 47 have been deemed nonsignificant and will be Force Levels 2 and 3 are intermediate between Levels
dropped from further experimentation. 1 and 4. The response surfaces for the four force levels
An example of a significant quadratic main effect in the NEA scenario are shown in Fig. 5. The indi-
(Factor 9) and a significant two-way interaction between vidual graphs are the response surfaces for Factors 9
Factors 6 and 9 for the three-level fractional factorial and 11, the two largest main effects for Force Level
response surface experiment is shown in Fig. 4. There are 4. There is a very noticeable curvature for Factor 9,
different values in protection effectiveness when Factor especially at Force Levels 1 and 2. As the force level
9 is at the low level (21), depending on whether the level increases, protection effectiveness increases. The dif-
of Factor 6 is a the low, medium, or high level, but very ferent color bands are 5% increments in protection
little difference if Factor 9 is at the high level (11). The effectiveness: red is between 65% and 70% and orange
shape of the lines in Fig. 4 is curved, indicating that a is between 90% and 95%. The response surfaces flat-
quadratic term is needed for Factor 9 in the polynomial ten out and rise up as the force level increases, that is,
equation. (Factors 6 and 9 are not the sixth and ninth protection effectiveness improves and is less sensitive
factors listed as in the boxed insert.) to changes in the factors. As the force level increases,
The polynomial equation for protection effectiveness there are more assets available, so the reliance on the
with quadratic and cross-product terms resulting from performance of any individual asset diminishes. (Fac-
the 31126 fractional factorial response surface experiment tors 9 and 11 are not the ninth and eleventh values
is shown in Eq. 3. The size of a factor effect on protection listed in the boxed insert.)
effectiveness is actually twice as large as the coefficients
on the X terms since the coefficients are actually slopes
and X has a range of 2 (from 21 to 11).
experiments is to implement valid and efficient exper- ing Study, Cambridge, MA (1982).
3Box, G. E. P., and Draper, N. R., Empirical Model Building and Response
iments that will produce quantitative results and sup-
Surfaces, Wiley, Hoboken, NJ (1987).
port sound decision making.
The Author
Jacqueline K. Telford is a statistician and a Principal Professional Staff member at APL. She obtained a B.S. in mathematics from
Miami University and masters and Ph.D. degrees in statistics from North Carolina State University. Dr. Telford was employed
at the U.S. Nuclear Regulatory Commission in Bethesda, Maryland, from 1979 to 1983. Since joining
APL in 1983, she has been in the Operational Systems Analysis Group of the Global Engagement
Department working primarily on reliability analysis and testing, test sizing, and planning for Trident
programs, and more recently for the Missile Defense Agency. She has successfully applied statistical
analysis methods to numerous other projects, including evaluation of sensors, models of the hydro-
dynamics of radioisotopes, and lethality effects. She has taught statistics courses in the JHU gradu-
ate engineering program, mostly recently on the topic of design of experiments. Her e-mail address is
Jacqueline K. Telford jacqueline.telford@jhuapl.edu.
232 Johns Hopkins APL Technical Digest, Volume 27, Number 3 (2007)